Exposing a Kafka Cluster via a VPC Endpoint Service

Post Syndicated from Grab Tech original https://engineering.grab.com/exposing-kafka-cluster

In large organisations, it is a common practice to isolate the cloud resources of different verticals. Amazon Web Services (AWS) Virtual Private Cloud (VPC) is a convenient way of doing so. At Grab, while our core AWS services reside in a main VPC, a number of Grab Tech Families (TFs) have their own dedicated VPC. One such example is GrabKios. Previously known as “Kudo”, GrabKios was acquired by Grab in 2017 and has always been residing in its own AWS account and dedicated VPC.

In this article, we explore how we exposed an Apache Kafka cluster across multiple Availability Zones (AZs) in Grab’s main VPC, to producers and consumers residing in the GrabKios VPC, via a VPC Endpoint Service. This design is part of Coban unified stream processing platform at Grab.

There are several ways of enabling communication between applications across distinct VPCs; VPC peering is the most straightforward and affordable option. However, it potentially exposes the entire VPC networks to each other, needlessly increasing the attack surface.

Security has always been one of Grab’s top concerns and with Grab’s increasing growth, there is a need to deprecate VPC peering and shift to a method of only exposing services that require remote access. The AWS VPC Endpoint Service allows us to do exactly that for TCP/IPv4 communications within a single AWS region.

Setting up a VPC Endpoint Service compared to VPC peering is already relatively complex. On top of that, we need to expose an Apache Kafka cluster via such an endpoint, which comes with an extra challenge. Apache Kafka requires clients, called producers and consumers, to be able to deterministically establish a TCP connection to all brokers forming the cluster, not just any one of them.

Last but not least, we need a design that optimises performance and cost by limiting data transfer across AZs.

Note: All variable names, port numbers and other details used in this article are only used as examples.

Architecture overview

As shown in this diagram, the Kafka cluster resides in the service provider VPC (Grab’s main VPC) while local Kafka producers and consumers reside in the service consumer VPC (GrabKios VPC).

In Grab’s main VPC, we created a Network Load Balancer (NLB) and set it up across all three AZs, enabling cross-zone load balancing. We then created a VPC Endpoint Service associated with that NLB.

Next, we created a VPC Endpoint Network Interface in the GrabKios VPC, also set up across all three AZs, and attached it to the remote VPC endpoint service in Grab’s main VPC. Apart from this, we also created a Route 53 Private Hosted Zone .grab and a CNAME record kafka.grab that points to the VPC Endpoint Network Interface hostname.

Lastly, we configured producers and consumers to use kafka.grab:10000 as their Kafka bootstrap server endpoint, 10000/tcp being an arbitrary port of our choosing. We will explain the significance of these in later sections.

Search data flow

Network Load Balancer setup

On the NLB in Grab’s main VPC, we set up the corresponding bootstrap listener on port 10000/tcp, associated with a target group containing all of the Kafka brokers forming the cluster. But this listener alone is not enough.

As mentioned earlier, Apache Kafka requires producers and consumers to be able to deterministically establish a TCP connection to all brokers. That’s why we created one listener for every broker in the cluster, incrementing the TCP port number for each new listener, so each broker endpoint would have the same name but with different port numbers, e.g. kafka.grab:10001 and kafka.grab:10002.

We then associated each listener with a dedicated target group containing only the targeted Kafka broker, so that remote producers and consumers could differentiate between the brokers by their TCP port number.

The following listeners and associated target groups were set up on the NLB:

  • 10000/tcp (bootstrap) -> 9094/tcp @ [broker 101, broker 201, broker 301]
  • 10001/tcp -> 9094/tcp @ [broker 101]
  • 10002/tcp -> 9094/tcp @ [broker 201]
  • 10003/tcp -> 9094/tcp @ [broker 301]

Security Group rules

In the Kafka brokers’ Security Group (SG), we added an ingress SG rule allowing 9094/tcp traffic from each of the three private IP addresses of the NLB. As mentioned earlier, the NLB was set up across all three AZs, with each having its own private IP address.

On the GrabKios VPC (consumer side), we created a new SG and attached it to the VPC Endpoint Network Interface. We also added ingress rules to allow all producers and consumers to connect to tcp/10000-10003.

Kafka setup

Kafka brokers typically come with a listener on port 9092/tcp, advertising the brokers by their private IP addresses. We kept that default listener so that local producers and consumers in Grab’s main VPC could still connect directly.

$ kcat -L -b 10.0.0.1:9092
 3 brokers:
 broker 101 at 10.0.0.1:9092 (controller)  
 broker 201 at 10.0.0.2:9092
 broker 301 at 10.0.0.3:9092
... truncated output ...

We also configured all brokers with an additional listener on port 9094/tcp that advertises the brokers by:

  • Their shared private name kafka.grab.
  • Their distinct TCP ports previously set up on the NLB’s dedicated listeners.
$ kcat -L -b 10.0.0.1:9094
 3 brokers:
 broker 101 at kafka.grab:10001 (controller)  
 broker 201 at kafka.grab:10002
 broker 301 at kafka.grab:10003
... truncated output ...

Note that there is a difference in how the broker’s endpoints are advertised in the two outputs above. The latter enables connection to any particular broker from the GrabKios VPC via the VPC Endpoint Service.

It would definitely be possible to advertise the brokers directly with the remote VPC Endpoint Interface hostname instead of kafka.grab, but relying on such a private name presents at least two advantages.

First, it decouples the Kafka deployment in the service provider VPC from the infrastructure deployment in the service consumer VPC. Second, it makes the Kafka cluster easier to expose to other remote VPCs, should we need it in the future.

Limiting data transfer across Availability Zones

At this stage of the setup, our Kafka cluster is fully reachable from producers and consumers in the GrabKios VPC. Yet, the design is not optimal.

When a producer or a consumer in the GrabKios VPC needs to connect to a particular broker, it uses its individual endpoint made up of the shared name kafka.grab and the broker’s dedicated TCP port.

The shared name arbitrarily resolves into one of the three IP addresses of the VPC Endpoint Network Interface, one for each AZ.

Hence, there is a fair chance that the obtained IP address is neither in the client’s AZ nor in that of the target Kafka broker. The probability of this happening can be as high as 2/3 when both client and broker reside in the same AZ and 1/3 when they do not.

While that is of little concern for the initial bootstrap connection, it becomes a serious drawback for actual data transfer, impacting the performance and incurring unnecessary data transfer cost.

For this reason, we created three additional CNAME records in the Private Hosted Zone in the GrabKios VPC, one for each AZ, with each pointing to the VPC Endpoint Network Interface zonal hostname in the corresponding AZ:

  • kafka-az1.grab
  • kafka-az2.grab
  • kafka-az3.grab

Note that we used az1, az2, az3 instead of the typical AWS 1a, 1b, 1c suffixes, because the latter’s mapping is not consistent across AWS accounts.

We also reconfigured each Kafka broker in Grab’s main VPC by setting their 9094/tcp listener to advertise brokers by their new zonal private names.

$ kcat -L -b 10.0.0.1:9094
 3 brokers:
 broker 101 at kafka-az1.grab:10001 (controller)  
 broker 201 at kafka-az2.grab:10002
 broker 301 at kafka-az3.grab:10003
... truncated output ...

Our private zonal names are shared by all brokers in the same AZ while TCP ports remain distinct for each broker. However, this is not clearly shown in the output above because our cluster only counts three brokers, one in each AZ.

The previous common name kafka.grab remains in the GrabKios VPC’s Private Hosted Zone and allows connections to any broker via an arbitrary, likely non-optimal route. GrabKios VPC producers and consumers still use that highly-available endpoint to initiate bootstrap connections to the cluster.

Search data flow

Future improvements

For this setup, scalability is our main challenge. If we add a new broker to this Kafka cluster, we would need to:

  • Assign a new TCP port number to it.
  • Set up a new dedicated listener on that TCP port on the NLB.
  • Configure the newly spun up Kafka broker to advertise its service with the same TCP port number and the private zonal name corresponding to its AZ.
  • Add the new broker to the target group of the bootstrap listener on the NLB.
  • Update the network SG rules on the service consumer side to allow connections to the newly allocated TCP port.

We rely on Terraform to dynamically deploy all AWS infrastructure and on Jenkins and Ansible to deploy and configure Apache Kafka. There is limited overhead but there are still a few manual actions due to a lack of integration. These include transferring newly allocated TCP ports and their corresponding EC2 instances’ IP addresses to our Ansible inventory, commit them to our codebase and trigger a Jenkins job deploying the new Kafka broker.

Another concern of this setup is that it is only applicable for AWS. As we are aiming to be multi-cloud, we may need to port it to Microsoft Azure and leverage the Azure Private Link service.

In both cases, running Kafka on Kubernetes with the Strimzi operator would be helpful in addressing the scalability challenge and reducing our adherence to one particular cloud provider. We will explain how this solution has helped us address these challenges in a future article.


Special thanks to David Virgil Naranjo whose blog post inspired this work.


Join us

Grab is a leading superapp in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across over 400 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

Introducing s2n-quic, a new open-source QUIC protocol implementation in Rust

Post Syndicated from Panos Kampanakis original https://aws.amazon.com/blogs/security/introducing-s2n-quic-open-source-protocol-rust/

At Amazon Web Services (AWS), security, high performance, and strong encryption for everyone are top priorities for all our services. With these priorities in mind, less than a year after QUIC ratification in the Internet Engineering Task Force (IETF), we are introducing support for the QUIC protocol which can boost performance for web applications that currently use Transport Layer Security (TLS) over Transmission Control Protocol (TCP). We are pleased to announce the availability of s2n-quic, an open-source Rust implementation of the QUIC protocol added to our set of AWS encryption open-source libraries.

What is QUIC?

QUIC is an encrypted transport protocol designed for performance and is the foundation of HTTP/3. It is specified in a set of IETF standards ratified in May 2021. QUIC protects its UDP datagrams by using encryption and authentication keys established in a TLS 1.3 handshake carried over QUIC transport. It is designed to improve upon TCP by providing improved first-byte latency and handling of multiple streams, and solving issues such as head-of-line blocking, mobility, and data loss detection. This enables web applications to perform faster, especially over poor networks. Other potential uses include latency-sensitive connections and UDP connections currently using DTLS, which now can run faster.

Renaming s2n

AWS has long supported open-source encryption libraries; in 2015 we introduced s2n as a TLS library. The name s2n is short for signal to noise, and is a nod to the almost magical act of encryption—disguising meaningful signals, like your critical data, as seemingly random noise.

Now that AWS introduces our new QUIC open-source library, we are renaming s2n to s2n-tls. s2n-tls is an efficient TLS library built over other crypto libraries like OpenSSL libcrypto or AWS libcrypto (AWS-LC). AWS-LC is a general-purpose cryptographic library maintained by AWS which originated from the Google project BoringSSL. The s2n family of AWS encryption open-source libraries now consists of s2n-tls, s2n-quic, and s2n-bignum. s2n-bignum is a collection of bignum arithmetic routines maintained by AWS designed for crypto applications.

s2n-quic details

Similar to s2n-tls, s2n-quic is designed to be small and fast, with simplicity as a priority. It is written in Rust, so it reaps some of its benefits such as performance, thread and memory-safety. s2n-quic depends either on s2n-tls or rustls for the TLS 1.3 handshake.

The main advantages of s2n-quic are:

  • Simple API. For example, a QUIC echo server-example can be built with just a few API calls.
  • Highly configurable. s2n-quic is configured with code through providers that allow an application to granularly control functionality. You can see an example of the server’s simple config in the QUIC echo server-example.
  • Extensive testing. Fuzzing (libFuzzer, American Fuzzy Fop (AFL), and honggfuzz), corpus replay unit testing of derived corpus files, testing of concrete and symbolic execution engines with bolero, and extensive integration and unit testing are used to validate the correctness of our implementation.
  • Thorough interoperability testing for every code change. There are multiple public QUIC implementations; s2n-quic is continuously tested to interoperate with many of them.
  • Verified correctness, post-quantum hybrid key exchange, and maturity for the TLS handshake when built with s2n-tls.
  • Thorough compliance coverage tracking of normative language in relevant standards.

Some important features in s2n-quic that can improve performance and connection management include CUBIC congestion controller support, packet pacing, Generic Segmentation Offload (GSO) support, Path MTU Discovery, and unique connection identifiers detached from the address.

AWS is continuing to invest in encryption optimization techniques, UDP performance improvement technologies, and formal code verification with the AWS Automated Reasoning Group to further enhance the library.

Like s2n-tls, which has already been introduced in various AWS services, AWS services that need to make use of the benefits of QUIC will begin integrating s2n-quic. QUIC is a standardized protocol which, when introduced in a service like web content delivery, can improve user experience or application performance. AWS still plans to continue support for existing protocols like TLS, so existing applications will remain interoperable. Amazon CloudFront is scheduled to be the first AWS service to integrate s2n-quic with its support for HTTP/3 in 2022.

Conclusion

If you are interested in using or contributing to s2n-quic source code or documentation, they are publicly available under the terms of the Apache Software License 2.0 from our s2n-quic GitHub repository.

If you package or distribute s2n-quic or s2n-tls, or use it as part of a large multi-user service, you may be eligible for pre-notification of security issues. Please contact [email protected].

If you discover a potential security issue in s2n-quic or s2n-tls, we ask that you notify AWS Security by using our vulnerability reporting page.

Stay tuned for more topics on s2n-quic like quantum-resistance, performance analyses, uses, and other technical details.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Panos Kampanakis

Panos has extensive experience on cybersecurity, applied cryptography, security automation, and vulnerability management. He has trained and presented on various security topics at technical events for numerous years, and also co-authored Cisco Press books, papers, standards, and research publications. He has participated in various security standards bodies to provide common interoperable protocols and languages for security information sharing, cryptography, and PKI. In his current role, Panos works with engineers and industry standards partners to provide cryptographically secure tools, protocols, and standards.

Log4Shell 2 Months Later: Security Strategies for the Internet’s New Normal

Post Syndicated from Jesse Mack original https://blog.rapid7.com/2022/02/17/log4shell-2-months-later-security-strategies-for-the-internets-new-normal/

Log4Shell 2 Months Later: Security Strategies for the Internet's New Normal

CVE-2021-44228 rules everything around us — or so it seemed, at least, for those breathless days in December 2021 when the full scope of Log4Shell was starting to take hold and security teams were strapped for time and resources as they scoured their organizations’ environments for vulnerable instances of Apache Log4j. But now that the peak intensity around this vulnerability has waned and we’ve had a chance to catch our collective breath, where does the effort to patch and remediate stand? What should security teams be focusing on today in the fight against Log4Shell?

On Wednesday, February 16, Rapid7 experts Bob Rudis, Devin Krugly, and Glenn Thorpe sat down for a webinar on the current state of the Log4j vulnerability. They covered where Log4Shell stands now, what the future might hold, and what organizations should be doing proactively to ensure they’re as protected as possible against exploits.

Laying out the landscape

Glenn Thorpe, Rapid7’s Program Manager for Emergent Threat Response, kicked things off with a recap and retrospective of Log4Shell and why it seemingly set fire to the entire internet for a good portion of December. The seriousness of this vulnerability is due to the coming-together of several key factors, including:

  • The ability for vulnerable systems to grant an attacker full administrative access
  • The low level of skill required for exploitation — in many cases, attackers simply have to copy and paste
  • The attack vector’s capability to run undetected over an encrypted channel
  • The pervasiveness of the Log4j library, which means vulnerability scanners alone can’t act as complete solutions against this threat

Put all this together, and it’s no surprise that the volume of exploit attempts leveraging the Log4j vulnerability ramped up throughout December 2021 and has continued to spike periodically throughout January and February 2022. By January 10, ransomware using Log4Shell had been observed, and on January 14, Rapid7’s MDR saw mass Log4j exploits in VMware products.

But while there’s certainly been plenty of Log4j patching done, the picture on that front is far from complete. According to the latest CISA data (also here as a daily-updated spreadsheet), there are still 320 cataloged software products that are known to be affected by vulnerable Log4j as of February 16, 2022 — and 1,406 still awaiting confirmation from the vendor.



Log4Shell 2 Months Later: Security Strategies for the Internet's New Normal

Log4j today: A new normal?

So, where does the effort to put out Log4j fires stand now? Devin Krugly, Rapid7’s Practice Advisor for Vulnerability Risk Management, thinks we’re in a better spot than we were in December — but we’re by no means out of the woods.

“We’re effectively out of fire-fighting mode,” said Devin. That means that, at this point, most security teams have identified the affected systems, implemented mitigations, and patched vulnerable versions of Log4j. But because of the complexity of today’s software supply chains, there are often heavily nested dependencies within vendor systems — some of which Log4j may still be implicated in. This means it’s essential to have a solid inventory of vendor software products that may be using Log4j and to ensure those instances of the library are updated and patched.

“Don’t lose that momentum,” Glenn chimed in. “Don’t put that on your post-mortem action list and forget about it.”

This imperative is all the more critical because of a recent uptick in Log4Shell activity. Rapid7’s Chief Data Scientist Bob Rudis laid out some activity detected by the Project Heisenberg honeypot fleet indicating a revival of Log4j activity in early and mid-February, much of it from new infrastructure and scanning hosts that hadn’t been seen before.

Amid this increase in activity, vulnerable instances of Log4j are anything but gone from the internet. In fact, data from Sonatype as of February 16, 2022 indicates 39% of Log4j downloads are still versions vulnerable to Log4Shell.

“We’re going to be seeing Log4j attempts on the internet, on the regular, at a low level, forever,” Bob said. Log4Shell is now in a family with WannaCry and Conficker (yes, that Conficker) — vulnerabilities that are around indefinitely, and which we’ll need to continually monitor for as attackers use them to try to breach our defenses.

Adopting a defense-in-depth posture in the “new normal” of life with Log4Shell is sure to come with its share of headaches. Luckily, Bob, Devin, and Glenn shared some practical strategies that security teams can adopt to keep their organizations’ defenses strong and avoid some common pitfalls.

Go beyond compensating controls

“My vendor says they’ve removed the JNDI class from the JAR file — does that mean their application is no longer vulnerable to Log4Shell?” This question came up in a few different forms from our webinar audience. The answer from our panelists was nuanced but crystal-clear: maybe for now, but not forever.

Removing the JNDI class is a compensating control — one that provides a quick fix for the vulnerability but doesn’t patch the core, underlying problem via a full update. For example, when you do a backup, you might unknowingly reintroduce the JNDI class after removing it — or, as Devin pointed out, an attacker could chain together a replacement for it.

These kinds of compensating or mitigating controls have their place in a short-term response, but there’s simply no action that can replace the work of upgrading all instances of Log4j to the most up-to-date versions that contain patches for Log4Shell.

“Mitigate for speed, but not in perpetuity,” Glenn recommended.

Find the nooks and crannies

Today’s cloud-centric IT environments are increasingly ephemeral and on-demand — a boost for innovation and speed, but that also means teams can deploy workloads without security teams ever knowing about it. Adopting an “Always Be Scanning” mindset, as Bob put it, is essential to ensure vulnerable instances of Log4j aren’t introduced into your environment.

Continually scanning your internet-facing components is a good and necessary start — but the work doesn’t end there. As Devin pointed out, finding the nooks and crannies where Log4j might crop up is critical. This includes scouring containers and virtual machines, as well as analyzing application and server logs for malicious JNDI strings. You should also ensure your security operations center (SOC) team can quickly and easily identify indicators that your environment is being scanned for reconnaissance into Log4Shell exploit opportunities.

“Involving the SOC team for alerting purposes, if you haven’t already done that, is an absolutely necessity in this case,” said Devin.

Get better at vendor management

It should be clear by now that in a post-Log4j world, organizations must demand the highest possible level of visibility into their software supply chain — and that means being clear, even tough, with vendors.

“Managing stuff on the internet is hard because organizations are chaotic beings by nature, and you’re trying to control the chaos as a security professional,” said Bob. Setting yourself up success in this context means having the highest level of vulnerability possible. After all, how many other vulnerabilities just as bad as Log4Shell — or even worse — might be out there lurking in the corners of your vendors’ code?

The upcoming US government requirements around Software Bill of Materials (SBOM) for vendor procurement should go a long way toward raising expectations for software vendors. Start asking vendors if they can produce an SBOM that details remediation and update of any vulnerable instances of Log4j.

These conversations don’t need to be adversarial — in fact, vendors can be a key resource in the effort to defend against Log4Shell. Especially for smaller organizations or under-resourced security teams, relying on capable third parties can be a smart way to bolster your defenses.

Only you can secure the software supply chain

OK, maybe that subhead is not literally true — a secure software supply chain is a community-wide effort, to which we must all hold each other accountable. The cloud-based digital ecosystem we all inhabit, whether we like it or not, is fundamentally interconnected. A pervasive vulnerability like Log4Shell is an unmistakable reminder of that fact.

It also serves as an opportunity to raise our expectations of ourselves, our organizations, and our partners — and those choices do start at home, with each security team as they update their applications, continually scan their environments, and demand visibility from their vendors. Those actions really do help create a more secure internet for everyone.

So while we’ll be living with Log4Shell probably forever, it’ll be living with us, too. And as scared as you are of the spider, it’s even more scared of your boot.

Want to go more in-depth? Check out the full replay of our webinar, “Log4Shell Two Months Later: Lessons and Insights for Protectors.”

Quick resources:

Bob, Devin, and Glenn mentioned a wealth of handy links in their discussion. Here are those resources for quick, easy reference.

Additional reading:

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Leveraging machine learning to find security vulnerabilities

Post Syndicated from Tiferet Gazit original https://github.blog/2022-02-17-leveraging-machine-learning-find-security-vulnerabilities/

GitHub code scanning now uses machine learning (ML) to alert developers to potential security vulnerabilities in their code.

If you want to set up your repositories to surface more alerts using our new ML technology, get started here. Read on for a behind-the-scenes peek into the ML framework powering this new technology!

Detecting vulnerable code

Code security vulnerabilities can allow malicious actors to manipulate software into behaving in unintended and harmful ways. The best way to prevent such attacks is to detect and fix vulnerable code before it can be exploited. GitHub’s code scanning capabilities leverage the CodeQL analysis engine to find security vulnerabilities in source code and surface alerts in pull requests – before the vulnerable code gets merged and released.

To detect vulnerabilities in a repository, the CodeQL engine first builds a database that encodes a special relational representation of the code. On that database we can then execute a series of CodeQL queries, each of which is designed to find a particular type of security problem.

Many vulnerabilities are caused by a single repeating pattern: untrusted user data is not sanitized and is subsequently accidentally used in an unsafe way. For example, SQL injection is caused by using untrusted user data in a SQL query, and cross-site scripting occurs as a result of untrusted user data being written to a web page. To detect situations in which unsafe user data ends up in a dangerous place, CodeQL queries encapsulate knowledge of a large number of potential sources of user data (for example, web frameworks), as well as potentially risky sinks (such as libraries for executing SQL queries). Members of the security community, alongside security experts at GitHub, continually expand and improve these queries to model additional common libraries and known patterns. Manual modeling, however, can be time-consuming, and there will always be a long tail of less-common libraries and private code that we won’t be able to model manually. This is where machine learning comes in.

We use examples surfaced by the manual models to train deep learning neural networks that can determine whether a code snippet comprises a potentially risky sink.

As a result, we can uncover security vulnerabilities even when they arise from the use of a library we have never seen before. For example, we can detect SQL injection vulnerabilities in the context of lesser-known or closed-source database abstraction libraries.

Screenshot of alerts with "experimental" label
ML-powered queries generate alerts that are marked with the “Experimental” label

Building a training set

We need to train ML models to recognize vulnerable code. While we have experimented some with unsupervised learning, unsurprisingly we found that supervised learning works better. But it comes at a cost! Asking code security experts to manually label millions of code snippets as safe or vulnerable is clearly untenable. So where do we get the data?

The manually written CodeQL queries already embody the expertise of the many security experts who wrote and refined them. We leverage these manual queries as ground-truth oracles, to label examples we then use to train our models. Each sink detected by such a query serves as a positive example in the training set. Since the vast majority of code snippets do not contain vulnerabilities, snippets not detected by the manual models can be regarded as negative examples. We make up for the inherent noise in this inferred labeling with volume. We extract tens of millions of snippets from over a hundred thousand public repositories, run the CodeQL queries on them, and label each as a positive or negative example for each query. This becomes the training set for a machine learning model that can classify code snippets as vulnerable or not.

Of course, we don’t want to train a model that will simply reproduce the manual modeling; we want to train a model that will predict new vulnerabilities that weren’t captured by manual modeling. In effect, we want the ML algorithm to improve on the current version of the manual query in much the same way that the current version improves on older, less-comprehensive versions. To see if we can do this, we actually construct all our training data from an older version of the query that detects fewer vulnerabilities. We then apply the trained model to new repositories it wasn’t trained on. We measure how well we recover the alerts detected by the latest manual query but missed by the older version of the query. This allows us to simulate the ability of a model trained with the current version of the query to recover alerts missed by this current manual model.

Features and modeling

Given a large training set of code snippets labeled as positive or negative examples for each query, we extract features for each snippet and train a deep learning model to classify new examples.

Rather than treating each code snippet simply as a string of words or characters and applying standard natural language processing (NLP) techniques naively to classify these strings, we leverage the power of CodeQL to access a wealth of information about the underlying source code. We use this information to produce a rich set of highly informative features for each code snippet.

One of the main advantages of deep learning models is their ability to combine information from a large set of features to create higher-level features and discover patterns that aren’t obvious to humans. In partnership with security and programming-language experts at GitHub, we use CodeQL to extract the information an expert might examine to inform a decision, such as the entire enclosing function body for a snippet that sits within a function, or the access path and API name. We don’t have to limit ourselves to features a human would find informative, however. We can include features whose usefulness is unknown, or features that can be useful in some instances but not all, such as the argument index for a code snippet that’s an argument to a function. Such features may contain patterns that aren’t apparent to humans, but that the neural network can detect. We therefore let the machine learning model decide whether or how to use all these features, and how to combine them to make the best decision for each snippet.

Slide with text: CodeQL: Deep logical analysis, Deductive reasoning Long chains of reasoning, Encodes knowledge from security experts ML: Pattern detection, Inductive reasoning, Fuses large volume of weak evidence, Encodes knowledge from empirical data.

Once we’ve extracted a rich set of potentially interesting features for each example, we tokenize and sub-tokenize them as is commonly done in NLP applications, with some modifications to capture characteristics specific to code syntax. We generate a vocabulary from the training data and feed lists of indices into the vocabulary into a fairly simple deep learning classifier, with a few layers of feature-by-feature processing followed by concatenation across features and a few layers of combined processing. The output is the probability that the current sample is a vulnerability for each query type.

Due to the scale of our offline data labeling, feature extraction, and training pipelines, we leverage cloud compute, including GPUs for model training. At inference time, however, no GPU is needed.

Inference on a repository

Once we have our trained machine learning model, we use it to classify new code snippets and detect likely vulnerabilities for each query. When ML-generated alerts are enabled by repository owners, CodeQL computes the source code features for the code snippets in that codebase and feeds them into the classifier model. The framework gets back the probability that a given code snippet represents a vulnerability, and uses this probability to surface likely new alerts.

Diagram showing how code snippets feed into CodeQL and the classifier model.

The full process runs on the same standard GitHub Action runners that are used by code scanning more generally, and it’s transparent to the user other than some increased runtime on large repositories. When the code scanning is complete, users can see the ML-generated alerts along with the alerts surfaced by the manual queries, with the “Experimental” label allowing them to filter ML-generated alerts in or out.

Does it work?

When evaluating ML-generated alerts, we consider only new alerts that were not flagged by the manual queries. True positives are the correct alerts that were missed by the manual queries; false positives are the incorrect new alerts generated by the ML model.

To measure metrics at scale, we use the experimental setup described above, in which the labels in the training set are determined using an older version of each manual query. We then test the model on repositories that were not included in the training set, and we measure its ability to recover the alerts detected by the current manual query but missed by the older one. Our metrics vary by query, but on average we measure a recall of approximately 80% with a precision of approximately 60%.

We’re currently extending ML-generated alerts to more JavaScript and Typescript security queries, as well as working to improve both their performance and their runtime. Our future plans include expansion to more programming languages, as well as generalizations that will allow us to capture even more vulnerabilities.

Run the “Experimental” queries if you want to uncover more potential security vulnerabilities in your codebase. The more the community engages with our alerts and provides feedback, the better we can make our algorithms, so please consider giving them a try!

Control access to Amazon Elastic Container Service resources by using ABAC policies

Post Syndicated from Kriti Heda original https://aws.amazon.com/blogs/security/control-access-to-amazon-elastic-container-service-resources-by-using-abac-policies/

As an AWS customer, if you use multiple Amazon Elastic Container Service (Amazon ECS) services/tasks to achieve better isolation, you often have the challenge of how to manage access to these containers. In such cases, using tags can enable you to categorize these services in different ways, such as by owner or environment.

This blog post shows you how tags allow conditional access to Amazon ECS resources. You can use attribute-based access control (ABAC) policies to grant access rights to users through the use of policies that combine attributes together. ABAC can be helpful in rapidly-growing environments, where policy management can become cumbersome. This blog post uses ECS resource tags (owner tag and environment tag) as the attributes that are used to control access in the policies.

Amazon ECS resources have many attributes, such as tags, which can be used to control permissions. You can attach tags to AWS Identity and Access Management (IAM) principals, and create either a single ABAC policy, or a small set of policies for your IAM principals. These ABAC policies can be designed to allow operations when the principal tag (a tag that exists on the user or role making the call) matches the resource tag. They can be used to simplify permission management at scale. A single Amazon ECS policy can enforce permissions across a range of applications, without having to update the policy each time you create new Amazon ECS resources.

This post provides a step-by-step procedure for creating ABAC policies for controlling access to Amazon ECS containers. As the team adds ECS resources to its projects, permissions are automatically applied based on the owner tag and the environment tag. As a result, no policy update is required for each new resource. Using this approach can save time and help improve security, because it relies on granular permissions rules.

Condition key mappings

It’s important to note that each IAM permission in Amazon ECS supports different types of tagging condition keys. The following table maps each condition key to its ECS actions.

Condition key Description ECS actions
aws:RequestTag/${TagKey} Set this tag value to require that a specific tag be used (or not used) when making an API request to create or modify a resource that allows tags. ecs:CreateCluster,
ecs:TagResource,
ecs:CreateCapacityProvider
aws:ResourceTag/${TagKey} Set this tag value to allow or deny user actions on resources with specific tags. ecs:PutAttributes,
ecs:StopTask,
ecs:DeleteCluster,
ecs:DeleteService,
ecs:CreateTaskSet,
ecs:DeleteAttributes,
ecs:DeleteTaskSet,
ecs:DeregisterContainerInstance
aws:RequestTag/${TagKey}
and
aws:ResourceTag/${TagKey}
Supports both RequestTag and ResourceTag ecs:CreateService,
ecs:RunTask,
ecs:StartTask,
ecs:RegisterContainerInstance

For a detailed guide of Amazon ECS actions and the resource types and condition keys they support, see Actions, resources, and condition keys for Amazon Elastic Container Service.

Tutorial overview

The following tutorial gives you a step-by-step process to create and test an Amazon ECS policy that allows IAM roles with principal tags to access resources with matching tags. When a principal makes a request to AWS, their permissions are granted based on whether the principal and resource tags match. This strategy allows individuals to view or edit only the ECS resources required for their jobs.

Scenario

Example Corp. has multiple Amazon ECS containers created for different applications. Each of these containers are created by different owners within the company. The permissions for each of the Amazon ECS resources must be restricted based on the owner of the container, and also based on the environment where the action is performed.

Assume that you’re a lead developer at this company, and you’re an experienced IAM administrator. You’re familiar with creating and managing IAM users, roles, and policies. You want to ensure that the development engineering team members can access only the containers they own. You also need a strategy that will scale as your company grows.

For this scenario, you choose to use AWS resource tags and IAM role principal tags to implement an ABAC strategy for Amazon ECS resources. The condition key mappings table shows which tagging condition keys you can use in a policy for each Amazon ECS action and resources. You can define the tags in the role you created. For this scenario, you define two tags Owner and Environment. These tags restrict permissions in the role based on the tags you defined.

Prerequisites

To perform the steps in this tutorial, you must already have the following:

  • An IAM role or user with sufficient privileges for services like IAM and ECS. Following the security best practices the role should have a minimum set of permissions and grant additional permissions as necessary. You can add the AWS managed policies IAMFullAccess and AmazonECS_FullAccess to create the IAM role to provide permissions for creating IAM and ECS resources.
  • An AWS account that you can sign in to as an IAM role or user.
  • Experience creating and editing IAM users, roles, and policies in the AWS Management Console. For more information, see Tutorial to create IAM resources.

Create an ABAC policy for Amazon ECS resources

After you complete the prerequisites for the tutorial, you will need to define which Amazon ECS privileges and access controls you want in place for the users, and configure the tags needed for creating the ABAC policies. This tutorial focuses on providing step-by-step instructions for creating test users, defining the ABAC policies for the Amazon ECS resources, creating a role, and defining tags for the implementation.

To create the ABAC policy

You create an ABAC policy that defines permissions based on attributes. In AWS, these attributes are called tags.

The sample ABAC policy that follows provides ECS permissions to users when the principal’s tag matches the resource tag.

Sample ABAC policy for ECS resources

The sample ECS ABAC policy that follows allows the user to perform action on the ECS resources, but only when those resources are tagged with the same key-pairs as the principal.

  1. Download the sample ECS policy. This policy allows principals to create, read, edit, and delete resources, but only when those resources are tagged with the same key-value pairs as the principal.
  2. Use the downloaded ECS policy to create the ECS ABAC policy, and name your new policy ECSABAC policy. For more information, see Creating IAM policies.

This sample policy provides permission to each ECS action based on the condition key that action supports. See to the condition key mappings table for a mapping of the ECS actions and the condition key they support.

What does this policy do?

  • The ECSCreateCluster statement allows users to create cluster, create and tag resources. These ECS actions only support the RequestTag condition key. This condition block returns true if every tag passed (tags: owner and environment) in the request is included in the specified list. This is done using the StringEquals condition operator. If an incorrect tag key other than owner or environment tag is passed, or incorrect value for the tags are passed, then the condition returns false. The ECS actions within these statements do not have a specific requirement of a resource type.
  • The ECSDeletion, ECSUpdate, and ECSDescribe statements allow users to update, delete or list/describe ECS resources. The ECS actions under these statements only support the ResourceTag condition key. Statements return true if the specified tag keys are present on the ECS resource and their values match the principal’s tags. These statements return false for mismatched tags (in this policy, the only acceptable tags are owner and environment), or for an incorrect value for the owner and environment tag passed to the ECS resources. They also return false for any ECS action that does not support resource tagging.
  • The ECSCreateService, ECSTaskControl, and ECSRegistration statements contain ECS actions that allow users to create a service, start or run tasks and register container instances in ECS. The ECS actions within these statements support both Request and Resource tag condition keys.

Create IAM roles

Create the following IAM roles and attach the ECSABAC policy you created in the previous procedure. You can create the roles and add tags to them using the AWS console, through the role creation flow, as shown in the following steps.

To create IAM roles

  1. Sign in to the AWS Management Console and navigate to the IAM console.
  2. In the left navigation pane, select Roles, and then select Create Role.
  3. Choose the Another AWS account role type.
  4. For Account ID, enter the AWS account ID mentioned in the prerequisites to which you want to grant access to your resources.
  5. Choose Next: Permissions.
  6. IAM includes a list of the AWS managed and customer managed policies in your account. Select the ECSABAC policy you created previously from the dropdown menu to use for the permissions policy. Alternatively, you can choose Create policy to open a new browser tab and create a new policy, as shown in Figure 1.
    Figure 1. Attach the ECS ABAC policy to the role

    Figure 1. Attach the ECS ABAC policy to the role

  7. Choose Next: Tags.
  8. Add metadata to the role by attaching tags as key-value pairs. Add the following tags to the role: for Key owner, enter Value mock_owner; and for Key environment, enter development, as shown in Figure 2.
    Figure 2. Define the tags in the IAM role

    Figure 2. Define the tags in the IAM role

  9. Choose Next: Review.
  10. For Role name, enter a name for your role. Role names must be unique within your AWS account.
  11. Review the role and then choose Create role.

Test the solution

The following sections present some positive and negative test cases that show how tags can provide fine-grained permission to users through ABAC policies.

Prerequisites for the negative and positive testing

Before you can perform the positive and negative tests, you must first do these steps in the AWS Management Console:

  1. Follow the procedures above for creating IAM role and the ABAC policy.
  2. Switch the role from the role assumed in the prerequisites to the role you created in To create IAM Roles above, following the steps in the documentation Switching to a role.

Perform negative testing

For the negative testing, three test cases are presented here that show how the ABAC policies prevent successful creation of the ECS resources if the owner or environment tags are missing, or if an incorrect tag is used for the creation of the ECS resource.

Negative test case 1: Create cluster without the required tags

In this test case, you check if an ECS cluster is successfully created without any tags. Create an Amazon ECS cluster without any tags (in other words, without adding the owner and environment tag).

To create a cluster without the required tags

  1. Sign in to the AWS Management Console and navigate to the IAM console.
  2. From the navigation bar, select the Region to use.
  3. In the navigation pane, choose Clusters.
  4. On the Clusters page, choose Create Cluster.
  5. For Select cluster compatibility, choose Networking only, then choose Next Step.
  6. On the Configure cluster page, enter a cluster name. For Provisioning Model, choose On-Demand Instance, as shown in Figure 3.
    Figure 3. Create a cluster

    Figure 3. Create a cluster

  7. In the Networking section, configure the VPC for your cluster.
  8. Don’t add any tags in the Tags section, as shown in Figure 4.
    Figure 4. No tags added to the cluster

    Figure 4. No tags added to the cluster

  9. Choose Create.

Expected result of negative test case 1

Because the owner and the environment tags are absent, the ABAC policy prevents the creation of the cluster and throws an error, as shown in Figure 5.

Figure 5. Unsuccessful creation of the ECS cluster due to missing tags

Figure 5. Unsuccessful creation of the ECS cluster due to missing tags

Negative test case 2: Create cluster with a missing tag

In this test case, you check whether an ECS cluster is successfully created missing a single tag. You create a cluster similar to the one created in Negative test case 1. However, in this test case, in the Tags section, you enter only the owner tag. The environment tag is missing, as shown in Figure 6.

To create a cluster with a missing tag

  1. Repeat steps 1-7 from the Negative test case 1 procedure.
  2. In the Tags section, add the owner tag and enter its value as mock_user.
    Figure 6. Create a cluster with the environment tag missing

    Figure 6. Create a cluster with the environment tag missing

Expected result of negative test case 2

The ABAC policy prevents the creation of the cluster, due to the missing environment tag in the cluster. This results in an error, as shown in Figure 7.

Figure 7. Unsuccessful creation of the ECS cluster due to missing tag

Figure 7. Unsuccessful creation of the ECS cluster due to missing tag

Negative test case 3: Create cluster with incorrect tag values

In this test case, you check whether an ECS cluster is successfully created with incorrect tag-value pairs. Create a cluster similar to the one in Negative test case 1. However, in this test case, in the Tags section, enter incorrect values for the owner and the environment tag keys, as shown in Figure 8.

To create a cluster with incorrect tag values

  1. Repeat steps 1-7 from the Negative test case 1 procedure.
  2. In the Tags section, add the owner tag and enter the value as test_user; add the environment tag and enter the value as production.
    Figure 8. Create a cluster with the incorrect values for the tags

    Figure 8. Create a cluster with the incorrect values for the tags

Expected result of negative test case 3

The ABAC policy prevents the creation of the cluster, due to incorrect values for the owner and environment tags in the cluster. This results in an error, as shown in Figure 9.

Figure 9. Unsuccessful creation of the ECS cluster due to incorrect value for the tags

Figure 9. Unsuccessful creation of the ECS cluster due to incorrect value for the tags

Perform positive testing

For the positive testing, two test cases are provided here that show how the ABAC policies allow successful creation of ECS resources, such as ECS clusters and ECS tasks, if the correct tags with correct values are provided as input for the ECS resources.

Positive test case 1: Create cluster with all the correct tag-value pairs

This test case checks whether an ECS cluster is successfully created with the correct tag-value pairs when you create a cluster with both the owner and environment tag that matches the ABAC policy you created earlier.

To create a cluster with all the correct tag-value pairs

  1. Repeat steps 1-7 from the Negative test case 1 procedure.
  2. In the Tags section, add the owner tag and enter the value as mock_user; add the environment tag and enter the value as development, as shown in Figure 10.
    Figure 10. Add correct tags to the cluster

    Figure 10. Add correct tags to the cluster

Expected result of positive test case 1

Because both the owner and the environment tags were input correctly, the ABAC policy allows the successful creation of the cluster without throwing an error, as shown in Figure 11.

Figure 11. Successful creation of the cluster

Figure 11. Successful creation of the cluster

Positive test case 2: Create standalone task with all the correct tag-value pairs

Deploying your application as a standalone task can be ideal in certain situations. For example, suppose you’re developing an application, but you aren’t ready to deploy it with the service scheduler. Maybe your application is a one-time or periodic batch job, and it doesn’t make sense to keep running it, or to restart when it finishes.

For this test case, you run a standalone task with the correct owner and environment tags that match the ABAC policy.

To create a standalone task with all the correct tag-value pairs

  1. To run a standalone task, see Run a standalone task in the Amazon ECS Developer Guide. Figure 12 shows the beginning of the Run Task process.
    Figure 12. Run a standalone task

    Figure 12. Run a standalone task

  2. In the Task tagging configuration section, under Tags, add the owner tag and enter the value as mock_user; add the environment tag and enter the value as development, as shown in Figure 13.
    Figure 13. Creation of the task with the correct tag

    Figure 13. Creation of the task with the correct tag

Expected result of positive test case 2

Because you applied the correct tags in the creation phase, the task is created successfully, as shown in Figure 14.

Figure 14. Successful creation of the task

Figure 14. Successful creation of the task

Cleanup

To avoid incurring future charges, after completing testing, delete any resources you created for this solution that are no longer needed. See the following links for step-by-step instructions for deleting the resources you created in this blog post.

  1. Deregistering an ECS Task Definition
  2. Deleting ECS Clusters
  3. Deleting IAM Policies
  4. Deleting IAM Roles and Instance Profiles

Conclusion

This post demonstrates the basics of how to use ABAC policies to provide fine-grained permissions to users based on attributes such as tags. You learned how to create ABAC policies to restrict permissions to users by associating tags with each ECS resource you create. You can use tags to manage and secure access to ECS resources, including ECS clusters, ECS tasks, ECS task definitions, and ECS services.

For more information about the ECS resources that support tagging, see the Amazon Elastic Container Service Guide.

 
If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on AWS Secrets Manager re:Post or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

hedakrit

Kriti Heda

Kriti is a NJ-based Security Transformation Consultant in the SRC team at AWS. She’s a technology enthusiast who enjoys helping customers find innovative solutions to complex security challenges. She spends her day working to builds and deploy security infrastructure, and automate security operations for the customers. Outside of work, she enjoys adventures, sports, and dancing.

Production ready eBPF, or how we fixed the BSD socket API

Post Syndicated from Lorenz Bauer original https://blog.cloudflare.com/tubular-fixing-the-socket-api-with-ebpf/

Production ready eBPF, or how we fixed the BSD socket API

Production ready eBPF, or how we fixed the BSD socket API

As we develop new products, we often push our operating system – Linux – beyond what is commonly possible. A common theme has been relying on eBPF to build technology that would otherwise have required modifying the kernel. For example, we’ve built DDoS mitigation and a load balancer and use it to monitor our fleet of servers.

This software usually consists of a small-ish eBPF program written in C, executed in the context of the kernel, and a larger user space component that loads the eBPF into the kernel and manages its lifecycle. We’ve found that the ratio of eBPF code to userspace code differs by an order of magnitude or more. We want to shed some light on the issues that a developer has to tackle when dealing with eBPF and present our solutions for building rock-solid production ready applications which contain eBPF.

For this purpose we are open sourcing the production tooling we’ve built for the sk_lookup hook we contributed to the Linux kernel, called tubular. It exists because we’ve outgrown the BSD sockets API. To deliver some products we need features that are just not possible using the standard API.

  • Our services are available on millions of IPs.
  • Multiple services using the same port on different addresses have to coexist, e.g. 1.1.1.1 resolver and our authoritative DNS.
  • Our Spectrum product needs to listen on all 2^16 ports.

The source code for tubular is at https://github.com/cloudflare/tubular, and it allows you to do all the things mentioned above. Maybe the most interesting feature is that you can change the addresses of a service on the fly:

How tubular works

tubular sits at a critical point in the Cloudflare stack, since it has to inspect every connection terminated by a server and decide which application should receive it.

Production ready eBPF, or how we fixed the BSD socket API

Failure to do so will drop or misdirect connections hundreds of times per second. So it has to be incredibly robust during day to day operations. We had the following goals for tubular:

  • Releases must be unattended and happen online
    tubular runs on thousands of machines, so we can’t babysit the process or take servers out of production.
  • Releases must fail safely
    A failure in the process must leave the previous version of tubular running, otherwise we may drop connections.
  • Reduce the impact of (userspace) crashes
    When the inevitable bug comes along we want to minimise the blast radius.

In the past we had built a proof-of-concept control plane for sk_lookup called inet-tool, which proved that we could get away without a persistent service managing the eBPF. Similarly, tubular has tubectl: short-lived invocations make the necessary changes and persisting state is handled by the kernel in the form of eBPF maps. Following this design gave us crash resiliency by default, but left us with the task of mapping the user interface we wanted to the tools available in the eBPF ecosystem.

The tubular user interface

tubular consists of a BPF program that attaches to the sk_lookup hook in the kernel and userspace Go code which manages the BPF program. The tubectl command wraps both in a way that is easy to distribute.

tubectl manages two kinds of objects: bindings and sockets. A binding encodes a rule against which an incoming packet is matched. A socket is a reference to a TCP or UDP socket that can accept new connections or packets.

Bindings and sockets are “glued” together via arbitrary strings called labels. Conceptually, a binding assigns a label to some traffic. The label is then used to find the correct socket.

Production ready eBPF, or how we fixed the BSD socket API

Adding bindings

To create a binding that steers port 80 (aka HTTP) traffic destined for 127.0.0.1 to the label “foo” we use tubectl bind:

$ sudo tubectl bind "foo" tcp 127.0.0.1 80

Due to the power of sk_lookup we can have much more powerful constructs than the BSD API. For example, we can redirect connections to all IPs in 127.0.0.0/24 to a single socket:

$ sudo tubectl bind "bar" tcp 127.0.0.0/24 80

A side effect of this power is that it’s possible to create bindings that “overlap”:

1: tcp 127.0.0.1/32 80 -> "foo"
2: tcp 127.0.0.0/24 80 -> "bar"

The first binding says that HTTP traffic to localhost should go to “foo”, while the second asserts that HTTP traffic in the localhost subnet should go to “bar”. This creates a contradiction, which binding should we choose? tubular resolves this by defining precedence rules for bindings:

  1. A prefix with a longer mask is more specific, e.g. 127.0.0.1/32 wins over 127.0.0.0/24.
  2. A port is more specific than the port wildcard, e.g. port 80 wins over “all ports” (0).

Applying this to our example, HTTP traffic to all IPs in 127.0.0.0/24 will be directed to foo, except for 127.0.0.1 which goes to bar.

Getting ahold of sockets

sk_lookup needs a reference to a TCP or a UDP socket to redirect traffic to it. However, a socket is usually accessible only by the process which created it with the socket syscall. For example, an HTTP server creates a TCP listening socket bound to port 80. How can we gain access to the listening socket?

A fairly well known solution is to make processes cooperate by passing socket file descriptors via SCM_RIGHTS messages to a tubular daemon. That daemon can then take the necessary steps to hook up the socket with sk_lookup. This approach has several drawbacks:

  1. Requires modifying processes to send SCM_RIGHTS
  2. Requires a tubular daemon, which may crash

There is another way of getting at sockets by using systemd, provided socket activation is used. It works by creating an additional service unit with the correct Sockets setting. In other words: we can leverage systemd oneshot action executed on creation of a systemd socket service, registering the socket into tubular. For example:

[Unit]
Requisite=foo.socket

[Service]
Type=oneshot
Sockets=foo.socket
ExecStart=tubectl register "foo"

Since we can rely on systemd to execute tubectl at the correct times we don’t need a daemon of any kind. However, the reality is that a lot of popular software doesn’t use systemd socket activation. Dealing with systemd sockets is complicated and doesn’t invite experimentation. Which brings us to the final trick: pidfd_getfd:

The pidfd_getfd() system call allocates a new file descriptor in the calling process. This new file descriptor is a duplicate of an existing file descriptor, targetfd, in the process referred to by the PID file descriptor pidfd.

We can use it to iterate all file descriptors of a foreign process, and pick the socket we are interested in. To return to our example, we can use the following command to find the TCP socket bound to 127.0.0.1 port 8080 in the httpd process and register it under the “foo” label:

$ sudo tubectl register-pid "foo" $(pidof httpd) tcp 127.0.0.1 8080

It’s easy to wire this up using systemd’s ExecStartPost if the need arises.

[Service]
Type=forking # or notify
ExecStart=/path/to/some/command
ExecStartPost=tubectl register-pid $MAINPID foo tcp 127.0.0.1 8080

Storing state in eBPF maps

As mentioned previously, tubular relies on the kernel to store state, using BPF key / value data structures also known as maps. Using the BPF_OBJ_PIN syscall we can persist them in /sys/fs/bpf:

/sys/fs/bpf/4026532024_dispatcher
├── bindings
├── destination_metrics
├── destinations
├── sockets
└── ...

The way the state is structured differs from how the command line interface presents it to users. Labels like “foo” are convenient for humans, but they are of variable length. Dealing with variable length data in BPF is cumbersome and slow, so the BPF program never references labels at all. Instead, the user space code allocates numeric IDs, which are then used in the BPF. Each ID represents a (label, domain, protocol) tuple, internally called destination.

For example, adding a binding for “foo” tcp 127.0.0.1 … allocates an ID for (“foo“, AF_INET, TCP). Including domain and protocol in the destination allows simpler data structures in the BPF. Each allocation also tracks how many bindings reference a destination so that we can recycle unused IDs. This data is persisted into the destinations hash table, which is keyed by (Label, Domain, Protocol) and contains (ID, Count). Metrics for each destination are tracked in destination_metrics in the form of per-CPU counters.

Production ready eBPF, or how we fixed the BSD socket API

bindings is a longest prefix match (LPM) trie which stores a mapping from (protocol, port, prefix) to (ID, prefix length). The ID is used as a key to the sockets map which contains pointers to kernel socket structures. IDs are allocated in a way that makes them suitable as an array index, which allows using the simpler BPF sockmap (an array) instead of a socket hash table. The prefix length is duplicated in the value to work around shortcomings in the BPF API.

Production ready eBPF, or how we fixed the BSD socket API

Encoding the precedence of bindings

As discussed, bindings have a precedence associated with them. To repeat the earlier example:

1: tcp 127.0.0.1/32 80 -> "foo"
2: tcp 127.0.0.0/24 80 -> "bar"

The first binding should be matched before the second one. We need to encode this in the BPF somehow. One idea is to generate some code that executes the bindings in order of specificity, a technique we’ve used to great effect in l4drop:

1: if (mask(ip, 32) == 127.0.0.1) return "foo"
2: if (mask(ip, 24) == 127.0.0.0) return "bar"
...

This has the downside that the program gets longer the more bindings are added, which slows down execution. It’s also difficult to introspect and debug such long programs. Instead, we use a specialised BPF longest prefix match (LPM) map to do the hard work. This allows inspecting the contents from user space to figure out which bindings are active, which is very difficult if we had compiled bindings into BPF. The LPM map uses a trie behind the scenes, so lookup has complexity proportional to the length of the key instead of linear complexity for the “naive” solution.

However, using a map requires a trick for encoding the precedence of bindings into a key that we can look up. Here is a simplified version of this encoding, which ignores IPv6 and uses labels instead of IDs. To insert the binding tcp 127.0.0.0/24 80 into a trie we first convert the IP address into a number.

127.0.0.0    = 0x7f 00 00 00

Since we’re only interested in the first 24 bits of the address we, can write the whole prefix as

127.0.0.0/24 = 0x7f 00 00 ??

where “?” means that the value is not specified. We choose the number 0x01 to represent TCP and prepend it and the port number (80 decimal is 0x50 hex) to create the full key:

tcp 127.0.0.0/24 80 = 0x01 50 7f 00 00 ??

Converting tcp 127.0.0.1/32 80 happens in exactly the same way. Once the converted values are inserted into the trie, the LPM trie conceptually contains the following keys and values.

LPM trie:
        0x01 50 7f 00 00 ?? = "bar"
        0x01 50 7f 00 00 01 = "foo"

To find the binding for a TCP packet destined for 127.0.0.1:80, we again encode a key and perform a lookup.

input:  0x01 50 7f 00 00 01   TCP packet to 127.0.0.1:80
---------------------------
LPM trie:
        0x01 50 7f 00 00 ?? = "bar"
           y  y  y  y  y
        0x01 50 7f 00 00 01 = "foo"
           y  y  y  y  y  y
---------------------------
result: "foo"

y = byte matches

The trie returns “foo” since its key shares the longest prefix with the input. Note that we stop comparing keys once we reach unspecified “?” bytes, but conceptually “bar” is still a valid result. The distinction becomes clear when looking up the binding for a TCP packet to 127.0.0.255:80.

input:  0x01 50 7f 00 00 ff   TCP packet to 127.0.0.255:80
---------------------------
LPM trie:
        0x01 50 7f 00 00 ?? = "bar"
           y  y  y  y  y
        0x01 50 7f 00 00 01 = "foo"
           y  y  y  y  y  n
---------------------------
result: "bar"

n = byte doesn't match

In this case “foo” is discarded since the last byte doesn’t match the input. However, “bar” is returned since its last byte is unspecified and therefore considered to be a valid match.

Observability with minimal privileges

Linux has the powerful ss tool (part of iproute2) available to inspect socket state:

$ ss -tl src 127.0.0.1
State      Recv-Q      Send-Q           Local Address:Port           Peer Address:Port
LISTEN     0           128                  127.0.0.1:ipp                 0.0.0.0:*

With tubular in the picture this output is not accurate anymore. tubectl bindings makes up for this shortcoming:

$ sudo tubectl bindings tcp 127.0.0.1
Bindings:
 protocol       prefix port label
      tcp 127.0.0.1/32   80   foo

Running this command requires super-user privileges, despite in theory being safe for any user to run. While this is acceptable for casual inspection by a human operator, it’s a dealbreaker for observability via pull-based monitoring systems like Prometheus. The usual approach is to expose metrics via an HTTP server, which would have to run with elevated privileges and be accessible to the Prometheus server somehow. Instead, BPF gives us the tools to enable read-only access to tubular state with minimal privileges.

The key is to carefully set file ownership and mode for state in /sys/fs/bpf. Creating and opening files in /sys/fs/bpf uses BPF_OBJ_PIN and BPF_OBJ_GET. Calling BPF_OBJ_GET with BPF_F_RDONLY is roughly equivalent to open(O_RDONLY) and allows accessing state in a read-only fashion, provided the file permissions are correct. tubular gives the owner full access but restricts read-only access to the group:

$ sudo ls -l /sys/fs/bpf/4026532024_dispatcher | head -n 3
total 0
-rw-r----- 1 root root 0 Feb  2 13:19 bindings
-rw-r----- 1 root root 0 Feb  2 13:19 destination_metrics

It’s easy to choose which user and group should own state when loading tubular:

$ sudo -u root -g tubular tubectl load
created dispatcher in /sys/fs/bpf/4026532024_dispatcher
loaded dispatcher into /proc/self/ns/net
$ sudo ls -l /sys/fs/bpf/4026532024_dispatcher | head -n 3
total 0
-rw-r----- 1 root tubular 0 Feb  2 13:42 bindings
-rw-r----- 1 root tubular 0 Feb  2 13:42 destination_metrics

There is one more obstacle, systemd mounts /sys/fs/bpf in a way that makes it inaccessible to anyone but root. Adding the executable bit to the directory fixes this.

$ sudo chmod -v o+x /sys/fs/bpf
mode of '/sys/fs/bpf' changed from 0700 (rwx------) to 0701 (rwx-----x)

Finally, we can export metrics without privileges:

$ sudo -u nobody -g tubular tubectl metrics 127.0.0.1 8080
Listening on 127.0.0.1:8080
^C

There is a caveat, unfortunately: truly unprivileged access requires unprivileged BPF to be enabled. Many distros have taken to disabling it via the unprivileged_bpf_disabled sysctl, in which case scraping metrics does require CAP_BPF.

Safe releases

tubular is distributed as a single binary, but really consists of two pieces of code with widely differing lifetimes. The BPF program is loaded into the kernel once and then may be active for weeks or months, until it is explicitly replaced. In fact, a reference to the program (and link, see below) is persisted into /sys/fs/bpf:

/sys/fs/bpf/4026532024_dispatcher
├── link
├── program
└── ...

The user space code is executed for seconds at a time and is replaced whenever the binary on disk changes. This means that user space has to be able to deal with an “old” BPF program in the kernel somehow. The simplest way to achieve this is to compare what is loaded into the kernel with the BPF shipped as part of tubectl. If the two don’t match we return an error:

$ sudo tubectl bind foo tcp 127.0.0.1 80
Error: bind: can't open dispatcher: loaded program #158 has differing tag: "938c70b5a8956ff2" doesn't match "e007bfbbf37171f0"

tag is the truncated hash of the instructions making up a BPF program, which the kernel makes available for every loaded program:

$ sudo bpftool prog list id 158
158: sk_lookup  name dispatcher  tag 938c70b5a8956ff2
...

By comparing the tag tubular asserts that it is dealing with a supported version of the BPF program. Of course, just returning an error isn’t enough. There needs to be a way to update the kernel program so that it’s once again safe to make changes. This is where the persisted link in /sys/fs/bpf comes into play. bpf_links are used to attach programs to various BPF hooks. “Enabling” a BPF program is a two-step process: first, load the BPF program, next attach it to a hook using a bpf_link. Afterwards the program will execute the next time the hook is executed. By updating the link we can change the program on the fly, in an atomic manner.

$ sudo tubectl upgrade
Upgraded dispatcher to 2022.1.0-dev, program ID #159
$ sudo bpftool prog list id 159
159: sk_lookup  name dispatcher  tag e007bfbbf37171f0
…
$ sudo tubectl bind foo tcp 127.0.0.1 80
bound foo#tcp:[127.0.0.1/32]:80

Behind the scenes the upgrade procedure is slightly more complicated, since we have to update the pinned program reference in addition to the link. We pin the new program into /sys/fs/bpf:

/sys/fs/bpf/4026532024_dispatcher
├── link
├── program
├── program-upgrade
└── ...

Once the link is updated we atomically rename program-upgrade to replace program. In the future we may be able to use RENAME_EXCHANGE to make upgrades even safer.

Preventing state corruption

So far we’ve completely neglected the fact that multiple invocations of tubectl could modify the state in /sys/fs/bpf at the same time. It’s very hard to reason about what would happen in this case, so in general it’s best to prevent this from ever occurring. A common solution to this is advisory file locks. Unfortunately it seems like BPF maps don’t support locking.

$ sudo flock /sys/fs/bpf/4026532024_dispatcher/bindings echo works!
flock: cannot open lock file /sys/fs/bpf/4026532024_dispatcher/bindings: Input/output error

This led to a bit of head scratching on our part. Luckily it is possible to flock the directory instead of individual maps:

$ sudo flock --exclusive /sys/fs/bpf/foo echo works!
works!

Each tubectl invocation likewise invokes flock(), thereby guaranteeing that only ever a single process is making changes.

Conclusion

tubular is in production at Cloudflare today and has simplified the deployment of Spectrum and our authoritative DNS. It allowed us to leave behind limitations of the BSD socket API. However, its most powerful feature is that the addresses a service is available on can be changed on the fly. In fact, we have built tooling that automates this process across our global network. Need to listen on another million IPs on thousands of machines? No problem, it’s just an HTTP POST away.

Interested in working on tubular and our L4 load balancer unimog? We are hiring in our European offices.

USB Flash Drive Restores Ride Off Into the Sunset

Post Syndicated from original https://www.backblaze.com/blog/usb-flash-drive-restores-ride-off-into-the-sunset/

USB Flash Key Thumb Drive Restore Deprecation
Way back in 2012 we decided it was time to sunset our DVD restore option (Yes, that was a thing.) and replace it with USB flash drive restores to accompany our USB hard drives. Today, the time has come to bid farewell to those little flash keys as well.

Demand for USB flash drives has waned considerably since the halcyon days of 2012, while internet bandwidth has made smaller restores far easier. At the same time, demand for our USB hard drive restores has steadily increased. So while we bid a fond adieu to everyone’s favorite spy movie staple, we remain ready and able to fulfill your physical restore needs with larger-capacity USB hard drives should you need to recover a lot of data all at once.

Yes, as of March 3, we will no longer offer USB flash drive restores, but rest assured that the Backblaze Computer Backup service continues to offer many options to restore your files, including:

  • Download via the web application.
  • Save files to Backblaze B2 Cloud Storage.
  • Order a USB hard drive to keep or redeem through our Restore Return Refund program.
  • And you’ll still be able to recover your Backblaze B2 Snapshots using our Snapshot Drive recovery option.
  • And if you’re administering a Business Group, you can utilize any of the above options depending on your configuration.

In the meantime, fly little USB flash drives…fly!

The post USB Flash Drive Restores Ride Off Into the Sunset appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Possible Government Surveillance of the Otter.ai Transcription App

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/02/possible-government-surveillance-of-the-otter-ai-transcription-app.html

A reporter interviews a Uyghur human-rights advocate, and uses the Otter.ai transcription app.

The next day, I received an odd note from Otter.ai, the automated transcription app that I had used to record the interview. It read: “Hey Phelim, to help us improve your Otter’s experience, what was the purpose of this particular recording with titled ‘Mustafa Aksu’ created at ‘2021-11-08 11:02:41’?”

Customer service or Chinese surveillance? Turns out it’s hard to tell.

[$] A last look at the 4.4 stable series

Post Syndicated from original https://lwn.net/Articles/884787/

Linus Torvalds released
the 4.4 kernel
on January 10, 2016 and promptly left the building for
the greener fields of 4.5. This kernel was finished from his point of
view, but it was just beginning its life in the wider world, and became the
first long-term-stable release to be supported for more than two years.
Indeed, the 4.4 release became one of the longest-supported and most widely
used releases in the history of the kernel project (so far); it was
deployed in vast numbers of Android devices, among other places. The final 4.4 stable
release
took place on February 3, over six years after 4.4 was
“finished”; it is time to take a look at what happened to 4.4 in its
stable life.

Security updates for Thursday

Post Syndicated from original https://lwn.net/Articles/885177/

Security updates have been issued by Debian (drupal7), Fedora (kernel, lua, vim, and xrdp), openSUSE (firejail, json-c, kafka, webkit2gtk3, and xorg-x11-server), Oracle (bind, firefox, ruby:2.5, ruby:2.6, and thunderbird), Red Hat (ruby:2.5 and ruby:2.6), SUSE (apache2, glibc, json-c, libvirt, webkit2gtk3, xen, and xorg-x11-server), and Ubuntu (linux-raspi, linux-raspi-5.4).

Cloud Security and Compliance: The Ultimate Frenemies of Financial Services

Post Syndicated from Ben Austin original https://blog.rapid7.com/2022/02/17/cloud-security-and-compliance-the-ultimate-frenemies-of-financial-services/

Cloud Security and Compliance: The Ultimate Frenemies of Financial Services

Meeting compliance standards as a financial services (finserv) company can be incredibly time-consuming and expensive. Finserv has some of the highest regulatory bars to clear out of any industry — with the exception, perhaps, of healthcare.

That said, these regulations exist for good reason. Even beyond being requirements to operate, meeting compliance standards helps financial services companies gain customer trust, avoid reputational damage, and protect themselves from unnecessary or unprofitable risk.

But as I’m sure just about everyone reading this will agree, meeting regulatory compliance standards does not necessarily mean your organization is fully secure. Often, it’s difficult for government agencies and legislators to keep up with the pace of changing technologies, so regulations tend to lag behind the state of tech. This is often particularly true with emerging technologies, such as hybrid and multi-cloud environments.

On top of all of that, the cost of not having a well-rounded security portfolio is particularly massive here — the average cost of a data breach in financial services is second highest of all industries (only healthcare is more expensive).

Needless to say, financial services organizations have a complex relationship with compliance, particularly as it relates to business-driven cloud migration and innovation.

Here are four ways finserv companies can embrace the love-hate relationship with cloud security and compliance while effectively navigating the need to maintain pace with today’s rapid rate of change.

1. Implement continuous monitoring

Change seems to be the only constant when it comes to multi-cloud environments. And there’s virtually no limit to where these changes can occur — all clouds and regions are fair game. In addition, new compliance regulations are continuously taking shape as cloud security best practices continue to progress. Lawmakers around the globe are tasked with implementing these new and updated rules to protect data in every industry to effectively address the rapidly changing vulnerabilities.

A key component to remain compliant with these rules and regulations is knowing who is responsible for making changes and maintaining compliance. To do this, you’ll need visibility to distinguish between normal changes to infrastructure, applications, or access made by your development team and the changes that occur at the hands of a threat actor.

But the reality is that many data points can make distinguishing threats difficult for security professionals. After all, financial services organizations are an irresistible target for those with malicious intent because there’s a direct line to the dollar value.

Clearly, information security statutes provide necessary oversight. Since assessing data in real time is essential for success in this industry, continuous monitoring can help you stay compliant at every stage of development. This can also be useful during an audit to show your organization has taken a proactive approach to compliance.

2. Automate security processes

Many regulations require organizations to act fast in the wake of a security breach. The European Union (EU)’s General Data Protection Regulation (GDPR), which is setting the standard for privacy and security laws globally, requires supervisory authorities (and at times individuals) to be notified within 72 hours of becoming aware of a data breach. And these correspondences must provide extensive information about the breach, such as how many individuals were impacted, the consequences of the breach, and perhaps most importantly, what the next steps are for containment and mitigation.

While not every jurisdiction has laws that are as strict as the EU’s GDPR, many countries are using GDPR as a baseline for their own guidance in how they hold organizations responsible and accountable for protecting consumer information. Industry regulations and cloud governance frameworks, such as PCI DSS, SOC 2, ISO 27001, and Gramm-Leach-Bliley Act, are just a few of the many standards with which finserv organizations need to ensure compliance. Organizations that do business globally not only need to be aware of these guidelines but also how they impact the way they do business. For example, if a consumer lives in a country protected by GDPR, their data is protected by GDPR guidelines. This is necessary even if the organization doesn’t operate directly in that country.

The best-case scenario is to catch misconfigurations before they go live and cause a breach, so you can safeguard customer data and avoid going through the lengthy disclosure process and the ensuing loss of customer trust. That’s exactly what automation helps you do.

Relying on manual efforts from your team to ensure that owners of noncompliant resources are notified and remediation takes place can be a time-consuming and involved process. Plus, it’s too easy for potential threats to fall between the cracks. By automating continuous auditing of resources, misconfiguration notification, and remediation, organizations can address noncompliance before issues escalate.

3. Improve organizational culture by sharing context

Large teams that leverage multiple platforms can commonly experience information breakdowns. After all, not everyone can understand security jargon. When misunderstandings occur, it can often lead to an unintentional lapse in compliance among employees. Implementing a simplified reporting structure can help security professionals communicate more effectively with resource owners and other immediate stakeholders. Isolated data points from multiple threat alarms can make it confusing and time-consuming for cloud resource owners to understand what happened and what to do next.

Finding a platform that empowers product and engineering teams to take responsibility for their own security, while also providing thorough context about the violation and the necessary remediation steps is essential. This can help set a standard by challenging teams to measure security compliance daily, while minimizing a lot of the friction and guesswork that comes from shifting security earlier in the development lifecycle.

Not only does this help with compliance legalities, but showing these ongoing, team-wide security compliance checks can also satisfy squeamish boards. Gartner predicts that security concerns will continue to be a top priority for board members in the wake of sensational security breaches that are occurring with startling regularity. Ransomware, for example, has gone mainstream, and boards have taken notice. Cybersecurity is seen as a potentially major vulnerability, which means the expectations placed on CISOs are mounting.

Though a lot of focus goes into updating frameworks and systems, corporate culture is the third piece of a powerful security strategy. It must not be overlooked.

4. Gain greater visibility

Robust, multi-cloud environments can make visibility challenging. Enterprises need to govern their clouds using Identity and Access Management (IAM) and adopt a least-privileged access security model across cloud and container environments. But that’s not enough. They also need a strategy that enables them to see vulnerabilities across multiple environments and devices. This is especially important as more insiders gain access to the cloud who also have the ability to make changes and add assets — and do all of this at a startling pace.

This visibility is a significant part of remaining compliant because rapid changes can have unintended results that can be missed without an overarching view of the cloud environment. What might look like harmless or anonymized data could still cause privacy and compliance concerns. For example, knowing simply gender, zip code, and birth date is enough information to identify 87% of Americans. To protect consumers, legislation such as the California Consumer Privacy Act (CCPA) stipulates that toxic data combinations, or data that can be viewed as a whole to reveal personal identities, must be avoided.

In order to remain compliant, organizations must have a system in place to spot toxic data combinations that could run afoul of regulators. This is especially important as data-sharing agreements become more commonplace.

Next steps: Embracing the complex relationship

Finserv organizations must embrace the complex relationship with cloud security and compliance. It is, realistically, the only way to survive and thrive in a world where the cloud is the go-to method of innovation.

Taking steps for improvement in each of the four areas outlined above can feel overwhelming, so we suggest getting started by focusing on these three key actions:

  • Innovate quickly. Innovation is crucial in today’s finserv landscape. Organizations in financial services are competing for attention, which requires continuous digital transformation. The cloud allows these innovations to happen fast, but CISOs must ensure a secure environment for advancements to effectively take place. Is your organization striking the right balance between innovation and safety?
  • Automate aggressively. There are too many data points in today’s multi-cloud environments for security teams to track successfully without automation. Ongoing hygiene practices and internal audits are made possible using automation best practices. Do you have the right controls in place to launch an automation strategy that supports — and enhances — your security processes?
  • Transform culture. Never forget that people are at the center of your security and compliance strategy. Improving communication, education and consistency across teams can upgrade your organization’s compliance strategy. And remember: Your compliance strategy will be under increased scrutiny from executives and boards in the coming years. Does your team understand the “why” behind the security best practices you’re asking them to support?

Let’s navigate the future of cloud security for finserv together. Learn more here.

Additional reading:

Linking AI education to meaningful projects

Post Syndicated from Sue Sentance original https://www.raspberrypi.org/blog/ai-education-meaningful-projects-tara-chklovski/

Our seminars in this series on AI and data science education, co-hosted with The Alan Turing Institute, have been covering a range of different topics and perspectives. This month was no exception. We were delighted to be able to host Tara Chklovski, CEO of Technovation, whose presentation was called ‘Teaching youth to use AI to tackle the Sustainable Development Goals’.

Tara Chklovski.
Tara Chklovski

The Technovation Challenge

Tara started Technovation, formerly called Iridescent, in 2007 with a family science programme in one school in Los Angeles. The nonprofit has grown hugely, and Technovation now runs computing education activities across the world. We heard from Tara that over 350,000 girls from more than 100 countries take part in their programmes, and that the nonprofit focuses particularly on empowering girls to become tech entrepreneurs. The girls, with support from industry volunteers, parents, and the Technovation curriculum, work in teams to solve real-world problems through an annual event called the Technovation Challenge. Working at scale with young people has given the Technovation team the opportunity to investigate the impact of their programmes as well as more generally learn what works in computing education. 

Tara Chklovski describes the Technovation Challenge in an online seminar.
Click to enlarge

Tara’s talk was extremely engaging (you’ll find the recording below), with videos of young people who had participated in recent years. Technovation works with volunteers and organisations to reach young people in communities where opportunities may be lacking, focussing on low- and middle-income countries. Tara spoke about the 900 million teenage girls in the world, a  substantial number of whom live in countries where there is considerable inequality. 

To illustrate the impact of the programme, Tara gave a number of examples of projects that students had developed, including:

  • An air quality sensor linked to messaging about climate change
  • A support circle for girls living in domestic violence situation
  • A project helping mothers communicate with their daughters
  • Support for water collection in Kenya

Early on, the Technovation Challenge had involved the creation of mobile apps, but in recent years, the projects have focused on using AI technologies to solve problems. An key message that Tara wanted to get across was that the focus on real-world problems and teamwork was as important, if not more, than the technical skills the young people were developing.

Technovation has designed an online curriculum to support teams, who may have no prior computing experience, to learn how to design an AI project. Students work through units on topics such as data analysis and building datasets. As well as the technical activities, young people also work through activities on problem-solving approaches, design, and system thinking to help them tackle a real-world problem that is relevant to them. The curriculum supports teams to identify problems in their community and find a path to prototype and share an invention to tackle that problem.

Tara Chklovski describes the Technovation Challenge in an online seminar.
Click to enlarge

While working through the curriculum, teams develop AI models to address the problem that they have chosen. They then submit them to a global competition for beginners, juniors, and seniors. Many of the girls enjoy the Technovation Challenge so much that they come back year on year to further develop their team skills. 

AI Families: Children and parents using AI to solve problems

Technovation runs another programme, AI Families, that focuses on families working together to learn AI concepts and skills and use them to develop projects together. Families worked together with the help of educators to identify meaningful problems in their communities, and developed AI prototypes to address them.

A list of lessons in the AI Families programme from Technovation.

There were 20,000 participants from under-resourced communities in 17 countries through 2018 and 2019. 70% of them were women (mothers and grandmothers) who wanted their children to participate; in this way the programme encouraged parents to be role models for their daughters, as well as enabling families to understand that AI is a tool that could be used to think about what problems in their community can be solved with the help of AI skills and principles. Tara was keen to emphasise that, given the importance of AI in the world, the more people know about it, the more impact they can make on their local communities.

Tara shared links to the curriculum to demonstrate what families in this programme would learn week by week. The AI modules use tools such as Machine Learning for Kids.

The results of the AI Families project as investigated over 2018 and 2019 are reported in this paper.  The findings of the programme included:

  • Learning needs to focus on more than just content; interviews showed that the learners needed to see the application to real-world applications
  • Engaging parents and other family members can support retention and a sense of community, and support a culture of lifelong learning
  • It takes around 3 to 5 years to iteratively develop fun, engaging, effective curriculum, training, and scalable programme delivery methods. This level of patience and commitment is needed from all community and industry partners and funders.

The research describes how the programme worked pre-pandemic. Tara highlighted that although the pandemic has prevented so much face-to-face team work, it has allowed some young people to access education online that they would not have otherwise had access to.

Many perspectives on AI education

Our goal is to listen to a variety of perspectives through this seminar series, and I felt that Tara really offered something fresh and engaging to our seminar audience, many of them (many of you!) regular attendees who we’ve got to know since we’ve been running the seminars. The seminar combined real-life stories with videos, as well as links to the curriculum used by Technovation to support learners of AI. The ‘question and answer’ session after the seminar focused on ways in which people could engage with the programme. On Twitter, one of the seminar participants declared this seminar “my favourite thus far in the series”.  It was indeed very inspirational.

As we near the end of this series, we can start to reflect on what we’ve been learning from all the various speakers, and I intend to do this more formally in a month or two as we prepare Volume 3 of our seminar proceedings. While Tara’s emphasis is on motivating children to want to learn the latest technologies because they can see what they can achieve with them, some of our other speakers have considered the actual concepts we should be teaching, whether we have to change our approach to teaching computer science if we include AI, and how we should engage young learners in the ethics of AI.

Join us for our next seminar

I’m really looking forward to our final seminar in the series, with Stefania Druga, on Tuesday 1 March at 17:00–18:30 GMT. Stefania, PhD candidate at the University of Washington Information School, will also focus on families. In her talk ‘Democratising AI education with and for families’, she will consider the ways that children engage with smart, AI-enabled devices that they are becoming part of their everyday lives. It’s a perfect way to finish this series, and we hope you’ll join us.

Thanks to our seminars series, we are developing a list of AI education resources that seminar speakers and attendees share with us, plus the free resources we are developing at the Foundation. Please do take a look.

You can find all blog posts relating to our previous seminars on this page.

The post Linking AI education to meaningful projects appeared first on Raspberry Pi.

„Луковмарш“ като политическа дъвка

Post Syndicated from Светла Енчева original https://toest.bg/lukovmarsh-kato-politicheska-duvka/

От 2005 г. насам „Луковмарш“ е като котката на Шрьодингер – хем не се провежда, хем се провежда. Провежда се според участниците в събитието и очевидците. Не се провежда според Столичната община и според онези медии, които възприемат институционалните прессъобщения като по-достоверна информация от онова, което виждат с очите си. Този път абсурдът стигна до впечатляващи измерения – кметицата на София забрани неонацисткото шествие, след като вече беше започнало. В резултат то се раздели на три, така че се проведе троен „Луковмарш“.

Това шизоидно дежавю е ефект на съвместното управление на ГЕРБ и подкрепящата „Луковмарш“ ВМРО – първоначално на ниво Столична община, а от 2017 г. партията на „воеводите“ беше част и от третото правителство на Бойко Борисов. Година след година ГЕРБ и организаторите на „Луковмарш“ разиграваха сценка, напомняща филма на Чарли Чаплин „Хлапето“.

Управляващата партия използваше шествието, за да отправя гръмки декларации против неонацизма и антисемитизма по начин, който да се чуе от международните партньори. Столичната община все забраняваше марша и забраната все падаше в съда – било поради формулировката на забраната, било поради липса на усилия от страна на прокуратурата да се осигурят достатъчно доказателства и поради избирателно интерпретиране на наличните факти от страна на съда. В последния момент преди провеждането на марша Йорданка Фандъкова пак го забраняваше. Накрая то се провеждаше под формата на шествие до къщата на Христо Луков, в краен случай – на статичен митинг.

Тази година изненадващо темата за „Луковмарш“ влезе и в парламента.

Инициативата дойде от „Продължаваме промяната“, чиято депутатка Татяна Султанова прочете от трибуната декларация от името на парламентарната група. В нея шествието се определя като неонацистко, профашистко и застрашаващо социалния мир и се казва:

Демократичното настояще и бъдеще на България изключва подобни прояви на ксенофобия и език на омразата. Провокациите към насилие, пропагандирането на расизъм, антисемитизъм, хомофобия, сексизъм нямат място в българското общество.

Последното изречение заслужава специално внимание. Съзнателно или не, с него се разбива едно табу – хомофобията експлицитно се споменава в Народното събрание, и то от политическата сила, спечелила изборите.

По темата взе отношение и БСП в лицето на Александър Симов,

чиято реторика припомни прашасалите клишета, с които социалистическият режим говореше за фашизма. В този дух Симов се позова на стихотворението „Балада за комуниста“ на Веселин Андреев. Андреев е един от официозните социалистически поети, съдия в т.нар. Народен съд и депутат до падането на режима, а в последните години на този период – и член на Централния комитет на БКП. (След промените през 1989 г. е в сериозен конфликт с бившия диктатор Тодор Живков и БКП, впоследствие БСП, и се самоубива през 1991 г.).

Соцпатосът на Симов удобно не забелязва връзката на собствената му партия с неонацистки политически субекти като „Атака“, а в някои отношения – и „Възраждане“. Ала докато реториката на БСП е предвидима,

изказването на Станислав Атанасов от името на парламентарната група на ДПС предизвиква почуда:

„ДПС от 30 години осъжда „Луковмарш“. От 30 години, г-н Симов, ние сме сами. Сами сме и на контрапротеста на „Луковмарш“, за съжаление. […] Сега имате възможността, защото МВР има абсолютно всички ресурси да спре „Луковмарш“.“

„Луковмарш“ се провежда от 2003 г. Излиза, че от 30 години ДПС са против събитие, което се провежда… от 19. Също толкова фактологически „вярно“ е и твърдението, че от партията на Делян Пеевски са сам-самички на контрапротеста срещу марша. Тук ще си позволя да споделя личен опит в качеството си на участвала в няколко от шествията срещу „Луковмарш“. В тях се включват представители на разнообразни до несъвместимост групи – либерали, правозащитници, феминисти, ЛГБТИ и ромски активисти, анархисти, анархокомунисти, комунисти, социалисти, живковисти, агенти на ДС и сталинисти (извинявам се, ако пропускам някоя група). Хора от ДПС на контрапротеста обаче не съм виждала, поне не разпознаваеми лица. И да е имало такива, са били малцинство сред останалите.

От парламентарната група на ГЕРБ разпространиха до медиите декларация в навечерието на шествието.

Професорът по културология Ивайло Дичев иронично коментира, че след 70 години историците ще се кълнат, че в България е нямало „Луковмарш“. Ала ако питате ГЕРБ, в предишните години „Луковмарш“ е нямало, и то изключително поради техните усилия. И ако ще се проведе пак, то ще е поради лошата нова власт (оригиналният правопис на този и следващите цитати е запазен):

Верни на европейската си демократична същност, ние от ГЕРБ, с усилията лично на министър-председателя Бойко Борисов, и в партньорство с всички български институции, сред които МВнР, националния координатор за борба с антисемитизма, МВР, ДАНС, прокуратурата и Столична община, две години подред успявахме да предотвратим провеждането на позорното шествие „Луковмарш“. За първи път в 17-годишната си история това срамно събитие бе осуетено и две поредни години то не мърси булевардите на столицата. […]

Очевидната липса на чувствителност в управляващата коалиция по темата борба с езика на омразата, поражда риска от реабилитиране на „Луковмарш“, чието факелно шествие застрашава международния имидж и репутация на България като страна, доскоро ефективно прилагаща мерки за борба с антисемитизма, ксенофобията, расизма и нетолерантността.

От тези думи нестрадащите от политическа амнезия остават с впечатлението, че бившите управляващи живеят в паралелна вселена. Като се има предвид, че и тази година, както и предишните, неонацисткото шествие се проведе толкова, колкото и не се проведе, къде от ГЕРБ ще сложат акцента този път? Ако надделее обстоятелството, че кметицата на София е от тяхната партия, а прокуратурата е на тяхна страна, „Луковмарш“ не се е провел. Но ако сложат акцента върху новата власт и промените в ръководството на МВР и ДАНС, той ще да се е провел.

Две от трите партии в „Демократична България“ също излязоха с декларации.

„Смятаме, че в 21-ви век е недопустимо да се провежда и разпространява, даже и като спомен, идеологията на Съюза на българските национални легиони или пък да се веят неофашистки символи, да се проповядва омраза срещу хора от различен етнически или расов произход и срещу хора с различни политически възгледи“, се казва в декларация на „Зелено движение“.

От „Да, България“ предлагат формулировка, с която да осъдят „Луковмарш“, като в същото време да подчертаят и антикомунистическата си ценностна ориентация:

Ограничаването на тази проява не представлява нарушаване на права, а e защита на висши ценности и принципи като достойнството и свободата на личността, равенството на всички пред правото, недискриминацията на основа различни признаци като раса, етнос и религия. […]

Заедно с всички честни демократи ние от „Да, България“ казваме: националсоциализмът и комунизмът не са мнение, а престъпление.

От управляващата коалиция за „Луковмарш“ не взеха отношение само от „Има такъв народ“ и ДСБ, а от опозицията – „Възраждане“. След като в по-голямата си част и управляващи, и опозиция осъждат провеждането на неонацисткото шествие,

истинският въпрос е какво следва от това.

Защото ако година след година властта излиза с голи декларации, а Столичната община повтаря едни и същи неуспешни опити да забрани неонацисткото шествие, цялата работа е просто демонстрация на заучена безпомощност. Ако има политическа воля да се сложи край на неонацисткото шествие, това едва ли ще е невъзможно и с инструментите на сегашното несъвършено законодателство. Ако част от проблема е в Наказателния кодекс, депутатите са тези, които могат да го променят.

По-важно от законовите промени обаче е по темите за недемократичните идеологии и престъпленията от омраза да се говори системно. Не само два-три пъти в годината по повод на определени събития, за които в обществото ни няма консенсус. Носителите на политическата власт имат възможност да поставят теми на вниманието на публиката и да предлагат посоки на обществения разговор. С жестове като промяната на тона към Северна Македония и споменаването на хомофобията от парламентарната трибуна от „Продължаваме промяната“ правят наченки в тази посока. Те обаче трябва да се превърнат в систематично усилие, за да не се събудим след още 19 години с поредните декларации срещу „Луковмарш“.

Заглавна снимка: Стопкадър от рекламен клип на „Луковмарш“

Източник

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close