Tag Archives: Go

Making your Go workloads up to 20% faster with Go 1.18 and AWS Graviton

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/making-your-go-workloads-up-to-20-faster-with-go-1-18-and-aws-graviton/

This blog post was written by Syl Taylor, Professional Services Consultant.

In March 2022, the highly anticipated Go 1.18 was released. Go 1.18 brings to the language some long-awaited features and additions, such as generics. It also brings significant performance improvements for Arm’s 64-bit architecture used in AWS Graviton server processors. In this post, we show how migrating Go workloads from Go 1.17.8 to Go 1.18 can help you run your applications up to 20% faster and more cost-effectively. To achieve this goal, we selected a series of realistic and relatable workloads to showcase how they perform when compiled with Go 1.18.


Go is an open-source programming language which can be used to create a wide range of applications. It’s developer-friendly and suitable for designing production-grade workloads in areas such as web development, distributed systems, and cloud-native software.

AWS Graviton2 processors are custom-built by AWS using 64-bit Arm Neoverse cores to deliver the best price-performance for your cloud workloads running in Amazon Elastic Compute Cloud (Amazon EC2). They provide up to 40% better price/performance over comparable x86-based instances for a wide variety of workloads and they can run numerous applications, including those written in Go.

Web service throughput

For web applications, the number of HTTP requests that a server can process in a window of time is an important measurement to determine scalability needs and reduce costs.

To demonstrate the performance improvements for a Go-based web service, we selected the popular Caddy web server. To perform the load testing, we selected the hey application, which was also written in Go. We deployed these packages in a client/server scenario on m6g Graviton instances.

Relative performance comparison for requesting a static webpage

The Caddy web server compiled with Go 1.18 brings a 7-8% throughput improvement as compared with the variant compiled with Go 1.17.8.

We conducted a second test where the client downloads a dynamic page on which the request handler performs some additional processing to write the HTTP response content. The performance gains were also noticeable at 10-11%.

Relative performance comparison for requesting a dynamic webpage

Regular expression searches

Searching through large amounts of text is where regular expression patterns excel. They can be used for many use cases, such as:

  • Checking if a string has a valid format (e.g., email address, domain name, IP address),
  • Finding all of the occurrences of a string (e.g., date) in a text document,
  • Identifying a string and replacing it with another.

However, despite their efficiency in search engines, text editors, or log parsers, regular expression evaluation is an expensive operation to run. We recommend identifying optimizations to reduce search time and compute costs.

The following example uses the Go regexp package to compile a pattern and search for the presence of a standard date format in a large generated string. We observed a 13.5% increase in completed executions with a 12% reduction in execution time.

Relative performance comparison for using regular expressions to check that a pattern exists

In a second example, we used the Go regexp package to find all of the occurrences of a pattern for character sequences in a string, and then replace them with a single character. We observed a 12% increase in evaluation rate with an 11% reduction in execution time.

Relative performance comparison for using regular expressions to find and replace all of the occurrences of a pattern

As with most workloads, the improvements will vary depending on the input data, the hardware selected, and the software stack installed. Furthermore, with this use case, the regular expression usage will have an impact on the overall performance. Given the importance of regex patterns in modern applications, as well as the scale at which they’re used, we recommend upgrading to Go 1.18 for any software that relies heavily on regular expression operations.

Database storage engines

Many database storage engines use a key-value store design to benefit from simplicity of use, faster speed, and improved horizontal scalability. Two implementations commonly used are B-trees and LSM (log-structured merge) trees. In the age of cloud technology, building distributed applications that leverage a suitable database service is important to make sure that you maximize your business outcomes.

B-trees are seen in many database management systems (DBMS), and they’re used to efficiently perform queries using indexes. When we tested a sample program for inserting and deleting in a large B-tree structure, we observed a 10.5% throughput increase with a 10% reduction in execution time.

Relative performance comparison for inserting and deleting in a B-Tree structure

On the other hand, LSM trees can achieve high rates of write throughput, thus making them useful for big data or time series events, such as metrics and real-time analytics. They’re used in modern applications due to their ability to handle large write workloads in a time of rapid data growth. The following are examples of databases that use LSM trees:

  • InfluxDB is a powerful database used for high-speed read and writes on time series data. It’s written in Go and its storage engine uses a variation of LSM called the Time-Structured Merge Tree (TSM).
  • CockroachDB is a popular distributed SQL database written in Go with its own LSM tree implementation.
  • Badger is written in Go and is the engine behind Dgraph, a graph database. Its design leverages LSM trees.

When we tested an LSM tree sample program, we observed a 13.5% throughput increase with a 9.5% reduction in execution time.

We also tested InfluxDB using comparison benchmarks to analyze writes and reads to the database server. On the load stress test, we saw a 10% increase of insertion throughput and a 14.5% faster rate when querying at a large scale.

Relative performance comparison for inserting to and querying from an InfluxDB database

In summary, for databases with an engine written in Go, you’ll likely observe better performance when upgrading to a version that has been compiled with Go 1.18.

Machine learning training

A popular unsupervised machine learning (ML) algorithm is K-Means clustering. It aims to group similar data points into k clusters. We used a dataset of 2D coordinates to train K-Means and obtain the cluster distribution in a deterministic manner. The example program uses an OOP design. We noticed an 18% improvement in execution throughput and a 15% reduction in execution time.

Relative performance comparison for training a K-means model

A widely-used and supervised ML algorithm for both classification and regression is Random Forest. It’s composed of numerous individual decision trees, and it uses a voting mechanism to determine which prediction to use. It’s a powerful method for optimizing ML models.

We ran a deterministic example to train a dense Random Forest. The program uses an OOP design and we noted a 20% improvement in execution throughput and a 15% reduction in execution time.

Relative performance comparison for training a Random Forest model


An efficient, general-purpose method for sorting data is the merge sort algorithm. It works by repeatedly breaking down the data into parts until it can compare single units to each other. Then, it decides their order in the intermediary steps that will merge repeatedly until the final sorted result. To implement this divide-and-conquer approach, merge sort must use recursion. We ran the program using a large dataset of numbers and observed a 7% improvement in execution throughput and a 4.5% reduction in execution time.

Relative performance comparison for running a merge sort algorithm

Depth-first search (DFS) is a fundamental recursive algorithm for traversing tree or graph data structures. Many complex applications rely on DFS variants to solve or optimize hard problems in various areas, such as path finding, scheduling, or circuit design. We implemented a standard DFS traversal in a fully-connected graph. Then we observed a 14.5% improvement in execution throughput and a 13% reduction in execution time.

Relative performance comparison for running a DFS algorithm


In this post, we’ve shown that a variety of applications, not just those primarily compute-bound, can benefit from the 64-bit Arm CPU performance improvements released in Go 1.18. Programs with an object-oriented design, recursion, or that have many function calls in their implementation will likely benefit more from the new register ABI calling convention.

By using AWS Graviton EC2 instances, you can benefit from up to a 40% price/performance improvement over other instance types. Furthermore, you can save even more with Graviton through the additional performance improvements by simply recompiling your Go applications with Go 1.18.

To learn more about Graviton, see the Getting started with AWS Graviton guide.

Production ready eBPF, or how we fixed the BSD socket API

Post Syndicated from Lorenz Bauer original https://blog.cloudflare.com/tubular-fixing-the-socket-api-with-ebpf/

Production ready eBPF, or how we fixed the BSD socket API

Production ready eBPF, or how we fixed the BSD socket API

As we develop new products, we often push our operating system – Linux – beyond what is commonly possible. A common theme has been relying on eBPF to build technology that would otherwise have required modifying the kernel. For example, we’ve built DDoS mitigation and a load balancer and use it to monitor our fleet of servers.

This software usually consists of a small-ish eBPF program written in C, executed in the context of the kernel, and a larger user space component that loads the eBPF into the kernel and manages its lifecycle. We’ve found that the ratio of eBPF code to userspace code differs by an order of magnitude or more. We want to shed some light on the issues that a developer has to tackle when dealing with eBPF and present our solutions for building rock-solid production ready applications which contain eBPF.

For this purpose we are open sourcing the production tooling we’ve built for the sk_lookup hook we contributed to the Linux kernel, called tubular. It exists because we’ve outgrown the BSD sockets API. To deliver some products we need features that are just not possible using the standard API.

  • Our services are available on millions of IPs.
  • Multiple services using the same port on different addresses have to coexist, e.g. resolver and our authoritative DNS.
  • Our Spectrum product needs to listen on all 2^16 ports.

The source code for tubular is at https://github.com/cloudflare/tubular, and it allows you to do all the things mentioned above. Maybe the most interesting feature is that you can change the addresses of a service on the fly:

How tubular works

tubular sits at a critical point in the Cloudflare stack, since it has to inspect every connection terminated by a server and decide which application should receive it.

Production ready eBPF, or how we fixed the BSD socket API

Failure to do so will drop or misdirect connections hundreds of times per second. So it has to be incredibly robust during day to day operations. We had the following goals for tubular:

  • Releases must be unattended and happen online
    tubular runs on thousands of machines, so we can’t babysit the process or take servers out of production.
  • Releases must fail safely
    A failure in the process must leave the previous version of tubular running, otherwise we may drop connections.
  • Reduce the impact of (userspace) crashes
    When the inevitable bug comes along we want to minimise the blast radius.

In the past we had built a proof-of-concept control plane for sk_lookup called inet-tool, which proved that we could get away without a persistent service managing the eBPF. Similarly, tubular has tubectl: short-lived invocations make the necessary changes and persisting state is handled by the kernel in the form of eBPF maps. Following this design gave us crash resiliency by default, but left us with the task of mapping the user interface we wanted to the tools available in the eBPF ecosystem.

The tubular user interface

tubular consists of a BPF program that attaches to the sk_lookup hook in the kernel and userspace Go code which manages the BPF program. The tubectl command wraps both in a way that is easy to distribute.

tubectl manages two kinds of objects: bindings and sockets. A binding encodes a rule against which an incoming packet is matched. A socket is a reference to a TCP or UDP socket that can accept new connections or packets.

Bindings and sockets are “glued” together via arbitrary strings called labels. Conceptually, a binding assigns a label to some traffic. The label is then used to find the correct socket.

Production ready eBPF, or how we fixed the BSD socket API

Adding bindings

To create a binding that steers port 80 (aka HTTP) traffic destined for to the label “foo” we use tubectl bind:

$ sudo tubectl bind "foo" tcp 80

Due to the power of sk_lookup we can have much more powerful constructs than the BSD API. For example, we can redirect connections to all IPs in to a single socket:

$ sudo tubectl bind "bar" tcp 80

A side effect of this power is that it’s possible to create bindings that “overlap”:

1: tcp 80 -> "foo"
2: tcp 80 -> "bar"

The first binding says that HTTP traffic to localhost should go to “foo”, while the second asserts that HTTP traffic in the localhost subnet should go to “bar”. This creates a contradiction, which binding should we choose? tubular resolves this by defining precedence rules for bindings:

  1. A prefix with a longer mask is more specific, e.g. wins over
  2. A port is more specific than the port wildcard, e.g. port 80 wins over “all ports” (0).

Applying this to our example, HTTP traffic to all IPs in will be directed to foo, except for which goes to bar.

Getting ahold of sockets

sk_lookup needs a reference to a TCP or a UDP socket to redirect traffic to it. However, a socket is usually accessible only by the process which created it with the socket syscall. For example, an HTTP server creates a TCP listening socket bound to port 80. How can we gain access to the listening socket?

A fairly well known solution is to make processes cooperate by passing socket file descriptors via SCM_RIGHTS messages to a tubular daemon. That daemon can then take the necessary steps to hook up the socket with sk_lookup. This approach has several drawbacks:

  1. Requires modifying processes to send SCM_RIGHTS
  2. Requires a tubular daemon, which may crash

There is another way of getting at sockets by using systemd, provided socket activation is used. It works by creating an additional service unit with the correct Sockets setting. In other words: we can leverage systemd oneshot action executed on creation of a systemd socket service, registering the socket into tubular. For example:


ExecStart=tubectl register "foo"

Since we can rely on systemd to execute tubectl at the correct times we don’t need a daemon of any kind. However, the reality is that a lot of popular software doesn’t use systemd socket activation. Dealing with systemd sockets is complicated and doesn’t invite experimentation. Which brings us to the final trick: pidfd_getfd:

The pidfd_getfd() system call allocates a new file descriptor in the calling process. This new file descriptor is a duplicate of an existing file descriptor, targetfd, in the process referred to by the PID file descriptor pidfd.

We can use it to iterate all file descriptors of a foreign process, and pick the socket we are interested in. To return to our example, we can use the following command to find the TCP socket bound to port 8080 in the httpd process and register it under the “foo” label:

$ sudo tubectl register-pid "foo" $(pidof httpd) tcp 8080

It’s easy to wire this up using systemd’s ExecStartPost if the need arises.

Type=forking # or notify
ExecStartPost=tubectl register-pid $MAINPID foo tcp 8080

Storing state in eBPF maps

As mentioned previously, tubular relies on the kernel to store state, using BPF key / value data structures also known as maps. Using the BPF_OBJ_PIN syscall we can persist them in /sys/fs/bpf:

├── bindings
├── destination_metrics
├── destinations
├── sockets
└── ...

The way the state is structured differs from how the command line interface presents it to users. Labels like “foo” are convenient for humans, but they are of variable length. Dealing with variable length data in BPF is cumbersome and slow, so the BPF program never references labels at all. Instead, the user space code allocates numeric IDs, which are then used in the BPF. Each ID represents a (label, domain, protocol) tuple, internally called destination.

For example, adding a binding for “foo” tcp … allocates an ID for (“foo“, AF_INET, TCP). Including domain and protocol in the destination allows simpler data structures in the BPF. Each allocation also tracks how many bindings reference a destination so that we can recycle unused IDs. This data is persisted into the destinations hash table, which is keyed by (Label, Domain, Protocol) and contains (ID, Count). Metrics for each destination are tracked in destination_metrics in the form of per-CPU counters.

Production ready eBPF, or how we fixed the BSD socket API

bindings is a longest prefix match (LPM) trie which stores a mapping from (protocol, port, prefix) to (ID, prefix length). The ID is used as a key to the sockets map which contains pointers to kernel socket structures. IDs are allocated in a way that makes them suitable as an array index, which allows using the simpler BPF sockmap (an array) instead of a socket hash table. The prefix length is duplicated in the value to work around shortcomings in the BPF API.

Production ready eBPF, or how we fixed the BSD socket API

Encoding the precedence of bindings

As discussed, bindings have a precedence associated with them. To repeat the earlier example:

1: tcp 80 -> "foo"
2: tcp 80 -> "bar"

The first binding should be matched before the second one. We need to encode this in the BPF somehow. One idea is to generate some code that executes the bindings in order of specificity, a technique we’ve used to great effect in l4drop:

1: if (mask(ip, 32) == return "foo"
2: if (mask(ip, 24) == return "bar"

This has the downside that the program gets longer the more bindings are added, which slows down execution. It’s also difficult to introspect and debug such long programs. Instead, we use a specialised BPF longest prefix match (LPM) map to do the hard work. This allows inspecting the contents from user space to figure out which bindings are active, which is very difficult if we had compiled bindings into BPF. The LPM map uses a trie behind the scenes, so lookup has complexity proportional to the length of the key instead of linear complexity for the “naive” solution.

However, using a map requires a trick for encoding the precedence of bindings into a key that we can look up. Here is a simplified version of this encoding, which ignores IPv6 and uses labels instead of IDs. To insert the binding tcp 80 into a trie we first convert the IP address into a number.    = 0x7f 00 00 00

Since we’re only interested in the first 24 bits of the address we, can write the whole prefix as = 0x7f 00 00 ??

where “?” means that the value is not specified. We choose the number 0x01 to represent TCP and prepend it and the port number (80 decimal is 0x50 hex) to create the full key:

tcp 80 = 0x01 50 7f 00 00 ??

Converting tcp 80 happens in exactly the same way. Once the converted values are inserted into the trie, the LPM trie conceptually contains the following keys and values.

LPM trie:
        0x01 50 7f 00 00 ?? = "bar"
        0x01 50 7f 00 00 01 = "foo"

To find the binding for a TCP packet destined for, we again encode a key and perform a lookup.

input:  0x01 50 7f 00 00 01   TCP packet to
LPM trie:
        0x01 50 7f 00 00 ?? = "bar"
           y  y  y  y  y
        0x01 50 7f 00 00 01 = "foo"
           y  y  y  y  y  y
result: "foo"

y = byte matches

The trie returns “foo” since its key shares the longest prefix with the input. Note that we stop comparing keys once we reach unspecified “?” bytes, but conceptually “bar” is still a valid result. The distinction becomes clear when looking up the binding for a TCP packet to

input:  0x01 50 7f 00 00 ff   TCP packet to
LPM trie:
        0x01 50 7f 00 00 ?? = "bar"
           y  y  y  y  y
        0x01 50 7f 00 00 01 = "foo"
           y  y  y  y  y  n
result: "bar"

n = byte doesn't match

In this case “foo” is discarded since the last byte doesn’t match the input. However, “bar” is returned since its last byte is unspecified and therefore considered to be a valid match.

Observability with minimal privileges

Linux has the powerful ss tool (part of iproute2) available to inspect socket state:

$ ss -tl src
State      Recv-Q      Send-Q           Local Address:Port           Peer Address:Port
LISTEN     0           128               *

With tubular in the picture this output is not accurate anymore. tubectl bindings makes up for this shortcoming:

$ sudo tubectl bindings tcp
 protocol       prefix port label
      tcp   80   foo

Running this command requires super-user privileges, despite in theory being safe for any user to run. While this is acceptable for casual inspection by a human operator, it’s a dealbreaker for observability via pull-based monitoring systems like Prometheus. The usual approach is to expose metrics via an HTTP server, which would have to run with elevated privileges and be accessible to the Prometheus server somehow. Instead, BPF gives us the tools to enable read-only access to tubular state with minimal privileges.

The key is to carefully set file ownership and mode for state in /sys/fs/bpf. Creating and opening files in /sys/fs/bpf uses BPF_OBJ_PIN and BPF_OBJ_GET. Calling BPF_OBJ_GET with BPF_F_RDONLY is roughly equivalent to open(O_RDONLY) and allows accessing state in a read-only fashion, provided the file permissions are correct. tubular gives the owner full access but restricts read-only access to the group:

$ sudo ls -l /sys/fs/bpf/4026532024_dispatcher | head -n 3
total 0
-rw-r----- 1 root root 0 Feb  2 13:19 bindings
-rw-r----- 1 root root 0 Feb  2 13:19 destination_metrics

It’s easy to choose which user and group should own state when loading tubular:

$ sudo -u root -g tubular tubectl load
created dispatcher in /sys/fs/bpf/4026532024_dispatcher
loaded dispatcher into /proc/self/ns/net
$ sudo ls -l /sys/fs/bpf/4026532024_dispatcher | head -n 3
total 0
-rw-r----- 1 root tubular 0 Feb  2 13:42 bindings
-rw-r----- 1 root tubular 0 Feb  2 13:42 destination_metrics

There is one more obstacle, systemd mounts /sys/fs/bpf in a way that makes it inaccessible to anyone but root. Adding the executable bit to the directory fixes this.

$ sudo chmod -v o+x /sys/fs/bpf
mode of '/sys/fs/bpf' changed from 0700 (rwx------) to 0701 (rwx-----x)

Finally, we can export metrics without privileges:

$ sudo -u nobody -g tubular tubectl metrics 8080
Listening on

There is a caveat, unfortunately: truly unprivileged access requires unprivileged BPF to be enabled. Many distros have taken to disabling it via the unprivileged_bpf_disabled sysctl, in which case scraping metrics does require CAP_BPF.

Safe releases

tubular is distributed as a single binary, but really consists of two pieces of code with widely differing lifetimes. The BPF program is loaded into the kernel once and then may be active for weeks or months, until it is explicitly replaced. In fact, a reference to the program (and link, see below) is persisted into /sys/fs/bpf:

├── link
├── program
└── ...

The user space code is executed for seconds at a time and is replaced whenever the binary on disk changes. This means that user space has to be able to deal with an “old” BPF program in the kernel somehow. The simplest way to achieve this is to compare what is loaded into the kernel with the BPF shipped as part of tubectl. If the two don’t match we return an error:

$ sudo tubectl bind foo tcp 80
Error: bind: can't open dispatcher: loaded program #158 has differing tag: "938c70b5a8956ff2" doesn't match "e007bfbbf37171f0"

tag is the truncated hash of the instructions making up a BPF program, which the kernel makes available for every loaded program:

$ sudo bpftool prog list id 158
158: sk_lookup  name dispatcher  tag 938c70b5a8956ff2

By comparing the tag tubular asserts that it is dealing with a supported version of the BPF program. Of course, just returning an error isn’t enough. There needs to be a way to update the kernel program so that it’s once again safe to make changes. This is where the persisted link in /sys/fs/bpf comes into play. bpf_links are used to attach programs to various BPF hooks. “Enabling” a BPF program is a two-step process: first, load the BPF program, next attach it to a hook using a bpf_link. Afterwards the program will execute the next time the hook is executed. By updating the link we can change the program on the fly, in an atomic manner.

$ sudo tubectl upgrade
Upgraded dispatcher to 2022.1.0-dev, program ID #159
$ sudo bpftool prog list id 159
159: sk_lookup  name dispatcher  tag e007bfbbf37171f0
$ sudo tubectl bind foo tcp 80
bound foo#tcp:[]:80

Behind the scenes the upgrade procedure is slightly more complicated, since we have to update the pinned program reference in addition to the link. We pin the new program into /sys/fs/bpf:

├── link
├── program
├── program-upgrade
└── ...

Once the link is updated we atomically rename program-upgrade to replace program. In the future we may be able to use RENAME_EXCHANGE to make upgrades even safer.

Preventing state corruption

So far we’ve completely neglected the fact that multiple invocations of tubectl could modify the state in /sys/fs/bpf at the same time. It’s very hard to reason about what would happen in this case, so in general it’s best to prevent this from ever occurring. A common solution to this is advisory file locks. Unfortunately it seems like BPF maps don’t support locking.

$ sudo flock /sys/fs/bpf/4026532024_dispatcher/bindings echo works!
flock: cannot open lock file /sys/fs/bpf/4026532024_dispatcher/bindings: Input/output error

This led to a bit of head scratching on our part. Luckily it is possible to flock the directory instead of individual maps:

$ sudo flock --exclusive /sys/fs/bpf/foo echo works!

Each tubectl invocation likewise invokes flock(), thereby guaranteeing that only ever a single process is making changes.


tubular is in production at Cloudflare today and has simplified the deployment of Spectrum and our authoritative DNS. It allowed us to leave behind limitations of the BSD socket API. However, its most powerful feature is that the addresses a service is available on can be changed on the fly. In fact, we have built tooling that automates this process across our global network. Need to listen on another million IPs on thousands of machines? No problem, it’s just an HTTP POST away.

Interested in working on tubular and our L4 load balancer unimog? We are hiring in our European offices.

Pairings in CIRCL

Post Syndicated from Armando Faz-Hernández original https://blog.cloudflare.com/circl-pairings-update/

Pairings in CIRCL

Pairings in CIRCL

In 2019, we announced the release of CIRCL, an open-source cryptographic library written in Go that provides optimized implementations of several primitives for key exchange and digital signatures. We are pleased to announce a major update of our library: we have included more packages for elliptic curve-based cryptography (ECC), pairing-based cryptography, and quantum-resistant algorithms.

All of these packages are the foundation of work we’re doing on bringing the benefits of cutting edge research to Cloudflare. In the past we’ve experimented with post-quantum algorithms, used pairings to keep keys safe around the world, and implemented advanced elliptic curves. Now we’re continuing that work, and sharing the foundation with everyone.

In this blog post we’re going to focus on pairing-based cryptography and give you a brief overview of some properties that make this topic so pleasant. If you are not so familiar with elliptic curves, we recommend this primer on ECC.

Otherwise, let’s get ready, pairings have arrived!

What are pairings?

Elliptic curve cryptography enables an efficient instantiation of several cryptographic applications: public-key encryption, signatures, zero-knowledge proofs, and many other more exotic applications like oblivious transfer and OPRFs. With all of those applications you might wonder what is the additional value that pairings offer? To see that, we need first to understand the basic properties of an elliptic curve system, and from that we can highlight the big gap that pairings have.

Conventional elliptic curve systems work with a single group $\mathbb{G}$: the points of an elliptic curve $E$. In this group, usually denoted additively, we can add the points $P$ and $Q$ and get another point on the curve $R=P+Q$; also, we can multiply a point $P$ by an integer scalar $k$ and by repeatedly doing

$$ kP = \underbrace{P+P+\dots+P}_{k \text{ terms}} $$
This operation is known as scalar multiplication, which resembles exponentiation, and there are efficient algorithms for this operation. But given the point $Q=kP$, and $P$, it is very hard for an adversary that doesn’t know $k$ to find it. This is the Elliptic Curve Discrete Logarithm problem (ECDLP).

Now we show a property of scalar multiplication that can help us to understand the properties of pairings.

Scalar Multiplication is a Linear Map

Note the following equivalences:

$ (a+b)P = aP + bP $

$ b (aP) = a (bP) $.

These are very useful properties for many protocols: for example, the last identity allows Alice and Bob to arrive at the same value when following the Diffie-Hellman key-agreement protocol.

But while point addition and scalar multiplication are nice, it’s also useful to be able to multiply points: if we had a point $P$ and $aP$ and $bP$, getting $abP$ out would be very cool and let us do all sorts of things. Unfortunately Diffie-Hellman would immediately be insecure, so we can’t get what we want.

Guess what? Pairings provide an efficient, useful sort of intermediary point multiplication.

It’s intermediate multiplication because although the operation takes two points as operands, the result of a pairing is not a point, but an element of a different group; thus, in a pairing there are more groups involved and all of them must contain the same number of elements.

Pairing is denoted as  $$ e \colon\; \mathbb{G}_1 \times \mathbb{G}_2 \rightarrow \mathbb{G}_T $$
Groups $\mathbb{G}_1$ and $\mathbb{G}_2$ contain points of an elliptic curve $E$. More specifically, they are the $r$-torsion points, for a fixed prime $r$. Some pairing instances fix $\mathbb{G}_1=\mathbb{G}_2$, but it is common to use disjoint sets for efficiency reasons. The third group $\mathbb{G}_T$ has notable differences. First, it is written multiplicatively, unlike the other two groups. $\mathbb{G}_T$ is not the set of points on an elliptic curve. It’s instead a subgroup of the multiplicative group over some larger finite field. It contains the elements that satisfy $x^r=1$, better known as the $r$-roots of unity.

Pairings in CIRCL
Source: “Pairings are not dead, just resting” by Diego Aranha ECC-2017 (inspired by Avanzi’s talk at SPEED-2009).

While every elliptic curve has a pairing, very few have ones that are efficiently computable. Those that do, we call them pairing-friendly curves.

The Pairing Operation is a Bilinear Map

What makes pairings special is that \(e\) is a bilinear map. Yes, the linear property of the scalar multiplication is present twice, one per group. Let’s see the first linear map.

For points $P, Q, R$ and scalars $a$ and $b$ we have:

$ e(P+Q, R) = e(P, R) * e(Q, R) $

$ e(aP, Q) = e(P, Q)^a $

So, a scalar $a$ acting in the first operand as $aP$, finds its way out and escapes from the input of the pairing and appears in the output of the pairing as an exponent in $\mathbb{G}_T$. The same linear map is observed for the second group:

$ e(P, Q+R) = e(P, Q) * e(P, R) $

$ e(P, bQ) = e(P, Q)^b $

Hence, the pairing is bilinear. We will see below how this property becomes useful.

Can bilinear pairings help solving ECDLP?

The MOV (by Menezes, Okamoto, and Vanstone) attack reduces the discrete logarithm problem on elliptic curves to finite fields. An attacker with knowledge of $kP$ and public points $P$ and $Q$ can recover $k$ by computing:

$ g_k = e(kP, Q) = e(P, Q)^k $,

$ g = e(P, Q) $

$ k = \log_g(g_k) $

Note that the discrete logarithm to be solved was moved from $\mathbb{G}_1$ to $\mathbb{G}_T$. So an attacker must ensure that the discrete logarithm is easier to solve in $\mathbb{G}_T$, and surprisingly, for some curves this is the case.

Fortunately, pairings do not present a threat for standard curves (such as the NIST curves or Curve25519) because these curves are constructed in such a way that $\mathbb{G}_T$ gets very large, which makes the pairing operation not efficient anymore.

This attacking strategy was one of the first applications of pairings in cryptanalysis as a tool to solve the discrete logarithm. Later, more people noticed that the properties of pairings are so useful, and can be used constructively to do cryptography. One of the fascinating truisms of cryptography is that one person’s sledgehammer is another person’s brick: while pairings yield a generic attack strategy for the ECDLP problem, it can also be used as a building block in a ton of useful applications.

Applications of Pairings

In the 2000s decade, a large wave of research works were developed aimed at applying pairings to many practical problems. An iconic pairing-based system was created by Antoine Joux, who constructed a one-round Diffie-Hellman key exchange for three parties.

Let’s see first how a three-party Diffie-Hellman is done without pairings. Alice, Bob and Charlie want to agree on a shared key, so they compute, respectively, $aP$, $bP$ and $cP$ for a public point P. Then, they send to each other the points they computed. So Alice receives $cP$ from Charlie and sends $aP$ to Bob, who can then send $baP$ to Charlie and get $acP$ from Alice and so on. After all this is done, they can all compute $k=abcP$. Can this be performed in a single round trip?

Pairings in CIRCL
Two round Diffie-Hellman without pairings.
Pairings in CIRCL
One round Diffie-Hellman with pairings.

The 3-party Diffie-Hellman protocol needs two communication rounds (on the top), but with the use of pairings a one-round trip protocol is possible.

Affirmative! Antoine Joux showed how to agree on a shared secret in a single round of communication. Alice announces $aP$, gets $bP$ and $cP$ from Bob and Charlie respectively, and then computes $k= (bP, cP)^a$. Likewise Bob computes $e(aP,cP)^b$ and Charlie does $e(aP,bP)^c$. It’s not difficult to convince yourself that all these values are equivalent, just by looking at the bilinear property.

$e(bP,cP)^a  = e(aP,cP)^b  = e(aP,bP)^c = e(P,P)^{abc}$

With pairings we’ve done in one round what would otherwise take two.

Another application in cryptography addresses a problem posed by Shamir in 1984: does there exist an encryption scheme in which the public key is an arbitrary string? Imagine if your public key was your email address. It would be easy to remember and certificate authorities and certificate management would be unnecessary.

A solution to this problem came some years later, in 2001, and is the Identity-based Encryption (IBE) scheme proposed by Boneh and Franklin, which uses bilinear pairings as the main tool.

Nowadays, pairings are used for the zk-SNARKS that make Zcash an anonymous currency, and are also used in drand to generate public-verifiable randomness. Pairings and the compact, aggregatable BLS signatures are used in Ethereum. We have used pairings to build Geo Key Manager: pairings let us implement a compact broadcast and negative broadcast scheme that together make Geo Key Manager work.

In order to make these schemes, we have to implement pairings, and to do that we need to understand the mathematics behind them.

Where do pairings come from?

In order to deeply understand pairings we must understand the interplay of geometry and arithmetic, and the origins of the group’s law. The starting point is the concept of a divisor, a formal combination of points on the curve.

$D = \sum n_i P_i$

The sum of all the coefficients $n_i$ is the degree of the divisor. If we have a function on the curve that has poles and zeros, we can count them with multiplicity to get a principal divisor. Poles are counted as negative, while zeros as positive. For example if we take the projective line, the function $x$ has the divisor $(0)-(\infty)$.

The degree of a divisor is the sum of its coefficients. All principal divisors have degree $0$.  The group of degree zero divisors modulo the principal divisors is the Jacobian. This means that we take all the degree zero divisors, and freely add or subtract principle divisors, constructing an abelian variety called the Jacobian.

Until now our constructions have worked for any curve.  Elliptic curves have a special property: since a line intersects the curve in three points, it’s always possible to turn an element of the Jacobian into one of the form $(P)-(O)$ for a point $P$. This is where the addition law of elliptic curves comes from.

Pairings in CIRCL
The relation between addition on an elliptic curve and the geometry of the curve. Source file ECClines.svg

Given a function $f$ we can evaluate it on a divisor $D=\sum n_i P_i$ by taking the product $\prod f(P_i)^{n_i}$. And if two functions $f$ and $g$ have disjoint divisors, we have the Weil duality:

$ f(\text{div}(g)) = g(\text{div}(f)) $,

The existence of Weil duality is what gives us the bilinear map we seek. Given an $r$-torsion point $T$ we have a function $f$ whose divisor is $r(T)-r(O)$. We can write down an auxiliary function $g$ such that $f(rP)=g^r(P)$ for any $P$. We then get a pairing by taking.


The auxiliary point $X$ is any point that makes the numerator and denominator defined.

In practice, the pairing we have defined above, the Weil pairing, is little used. It was historically the first pairing and is extremely important in the mathematics of elliptic curves, but faster alternatives based on more complicated underlying mathematics are used today. These faster pairings have different $\mathbb{G}_1$ and $\mathbb{G}_2$, while the Weil pairing made them the same.

Shift in parameters

As we saw earlier, the discrete logarithm problem can be attacked either on the group of elliptic curve points or on the third group (an extension of a prime field) whichever is weaker. This is why the parameters that define a pairing must balance the security of the three groups.

Before 2015 a good balance between the extension degree, the size of the prime field, and the security of the scheme was achieved by the family of Barreto-Naehrig (BN) curves. For 128 bits of security, BN curves use an extension of degree 12, and have a prime of size 256 bits; as a result they are an efficient choice for implementation.

A breakthrough for pairings occurred in 2015 when Kim and Barbescu published a result that accelerated the attacks in finite fields. This resulted in increasing the size of fields to comply with standard security levels. Just as short hashes like MD5 got depreciated as they became insecure and $2^{64}$ was no longer enough, and RSA-1024 was replaced with RSA-2048, we regularly change parameters to deal with improved attacks.

For pairings this implied the use of larger primes for all the groups. Roughly speaking, the previous 192-bit security level becomes the new 128-bit level after this attack. Also, this shift in parameters brings the family of Barreto-Lynn-Scott (BLS) curves to the stage because pairings on BLS curves are faster than BN in this new setting. Hence, currently BLS curves using an extension of degree 12, and primes of around 384 bits provide the equivalent to 128 bit security.

The IETF draft (currently in preparation) draft-irtf-cfrg-pairing-friendly-curves specifies secure pairing-friendly elliptic curves. It includes parameters for BN and BLS families of curves. It also targets different security levels to provide crypto agility for some applications relying on pairing-based cryptography.

Implementing Pairings in Go

Historically, notable examples of software libraries implementing pairings include PBC by Ben Lynn, Miracl by Michael Scott, and Relic by Diego Aranha. All of them are written in C/C++ and some ports and wrappers to other languages exist.

In the Go standard library we can find the golang.org/x/crypto/bn256 package by Adam Langley, an implementation of a pairing using a BN curve with 256-bit prime. Our colleague Brendan McMillion built github.com/cloudflare/bn256 that dramatically improves the speed of pairing operations for that curve. See the RWC-2018 talk to see our use case of pairing-based cryptography. This time we want to go one step further, and we started looking for alternative pairing implementations.

Although one can find many libraries that implement pairings, our goal is to rely on one that is efficient, includes protection against side channels, and exposes a flexible API oriented towards applications that permit generality, while avoiding common security pitfalls. This motivated us to include pairings in CIRCL. We followed best practices on secure code development and we want to share with you some details about the implementation.

We started by choosing a pairing-friendly curve. Due to the attack previously mentioned, the BN256 curve does not meet the 128-bit security level. Thus there is a need for using stronger curves. Such a stronger curve is the BLS12-381 curve that is widely used in the zk-SNARK protocols and short signature schemes. Using this curve allows us to make our Go pairing implementation interoperable with other implementations available in other programming languages, so other projects can benefit from using CIRCL too.

This code snippet tests the linearity property of pairings and shows how easy it is to use our library.

import (
    e "github.com/cloudflare/circl/ecc/bls12381"

func ExamplePairing() {
    P,  Q := e.G1Generator(), e.G2Generator()
    a,  b := new(e.Scalar), new(e.Scalar)
    aP, bQ := new(e.G1), new(e.G2)
    ea, eb := new(e.Gt), new(e.Gt)


    aP.ScalarMult(a, P)
    bQ.ScalarMult(b, Q)

    g  := e.Pair( P, Q)
    ga := e.Pair(aP, Q)
    gb := e.Pair( P,bQ)

    ea.Exp(g, a)
    eb.Exp(g, b)
    linearLeft := ea.IsEqual(ga) // e(P,Q)^a == e(aP,Q)
    linearRight:= eb.IsEqual(gb) // e(P,Q)^b == e(P,bQ)

    fmt.Print(linearLeft && linearRight)
    // Output: true

We applied several optimizations that allowed us to improve on performance and security of the implementation. In fact, as the parameters of the curve are fixed, some other optimizations become easier to apply; for example, the code for prime field arithmetic and the towering construction for extension fields as we detail next.

Formally-verified arithmetic using fiat-crypto

One of the more difficult parts of a cryptography library to implement correctly is the prime field arithmetic. Typically people specialize it for speed, but there are many tedious constraints on what the inputs to operations can be to ensure correctness. Vulnerabilities have happened when people get it wrong, across many libraries. However, this code is perfect for machines to write and check. One such tool is fiat-crypto.

Using fiat-crypto to generate the prime field arithmetic means that we have a formal verification that the code does what we need. The fiat-crypto tool is invoked in this script, and produces Go code for addition, subtraction, multiplication, and squaring over the 381-bit prime field used in the BLS12-381 curve. Other operations are not covered, but those are much easier to check and analyze by hand.

Another advantage is that it avoids relying on the generic big.Int package, which is slow, frequently spills variables to the heap causing dynamic memory allocations, and most importantly, does not run in constant-time. Instead, the code produced is straight-line code, no branches at all, and relies on the math/bits package for accessing machine-level instructions. Automated code generation also means that it’s easier to apply new techniques to all primitives.

Tower Field Arithmetic

In addition to prime field arithmetic, say integers modulo a prime number p, a pairing also requires high-level arithmetic operations over extension fields.

To better understand what an extension field is, think of the analogous case of going from the reals to the complex numbers: the operations are referred as usual, there exist addition, multiplication and division $(+, -, \times, /)$, however they are computed quite differently.

The complex numbers are a quadratic extension over the reals, so imagine a two-level house. The first floor is where the real numbers live, however, they cannot access the second floor by themselves. On the other hand, the complex numbers can access the entire house through the use of a staircase. The equation $f(x)=x^2+1$ was not solvable over the reals, but is solvable over the complex numbers, since they have the number $i^2=1$. And because they have the number $i$, they also have to have numbers like $3i$ and $5+i$ that solve other equations that weren’t solvable over the reals either. This second story has given the roots of the polynomials a place to live.

Algebraically we can view the complex numbers as $\mathbb{R}[x]/(x^2+1)$, the space of polynomials where we consider $x^2=-1$. Given a polynomial like $x^3+5x+1$, we can turn it into $4x+1$, which is another way of writing $1+4i$. In this new field $x^2+1=0$ holds automatically, and we have added both $x$ as a root of the polynomial we picked. This process of writing down a field extension by adding a root of a polynomial works over any field, including finite fields.

Following this analogy, one can construct not a house but a tower, say a $k=12$ floor building where the ground floor is the prime field $\mathbb{F}_p$. The reason to build such a large tower is because we want to host VIP guests: namely a group called $\mu_r$, the $r$-roots of unity. There are exactly $r$ members and they behave as an (algebraic) group, i.e. for all $x,y \in \mu_r$, it follows $x*y \in \mu_r$ and $x^r = 1$.

One particularity is that making operations on the top-floor can be costly. Assume that whenever an operation is needed in the top-floor, members who live on the main floor are required to provide their assistance. Hence an operation on the top-floor needs to go all the way down to the ground floor. For this reason, our tower needs more than a staircase, it needs an efficient way to move between the levels, something even better than an elevator.

What if we use portals? Imagine anyone on the twelfth floor using a portal to go down immediately to the sixth floor, and then use another one connecting the sixth to the second floor, and finally another portal connecting to the ground floor. Thus, one can get faster to the first floor rather than descending through a long staircase.

The building analogy can be used to understand and construct a tower of finite fields. We use only some extensions to build a twelfth extension from the (ground) prime field $\mathbb{F}_p$.

$\mathbb{F}_{p}$ ⇒  $\mathbb{F}_{p^2}$ ⇒  $\mathbb{F}_{p^4}$ ⇒  $\mathbb{F}_{p^{12}}$

In fact, the extension of finite fields is as follows:

  • $\mathbb{F}_{p^2}$ is built as polynomials in $\mathbb{F}_p[u]$ reduced modulo $u^2+1=0$.
  • $\mathbb{F}_{p^6}$ is built as polynomials in $\mathbb{F}_{p^2}[v]$ reduced modulo $v^2+u+1=0$.
  • $\mathbb{F}_{p^{12}}$ is built as polynomials in $\mathbb{F}_{p^6}[w]$ reduced modulo $w^2+v=0$, or as polynomials in $\mathbb{F}_{p^4}[w]$ reduced modulo $w^3+v=0$.

The portals here are the polynomials used as modulus, as they allow us to move from one extension to the other.

Different constructions for higher extensions have an impact on the number of operations performed. Thus, we implemented the latter tower field for $\mathbb{F}_{p^{12}}$ as it results in a lower number of operations. The arithmetic operations are quite easy to implement and manually verify, so at this level formal verification is not as effective as in the case of prime field arithmetic. However, having an automated tool that generates code for this arithmetic would be useful for developers not familiar with the internals of field towering. The fiat-crypto tool keeps track of this idea [Issue 904, Issue 851].

Now, we describe more details about the main core operations of a bilinear pairing.

The Miller loop and the final exponentiation

The pairing function we implemented is the optimal r-ate pairing, which is defined as:

$ e(P,Q) = f_Q(P)^{\text{exp}} $

That is the construction of a function $f$ based on $Q$, then evaluated on a point $P$, and the result of that is raised to a specific power. The efficient function evaluation is performed using “the Miller loop”, which is an algorithm devised by Victor Miller and has a similar structure to a double-and-add algorithm for scalar multiplication.

After having computed $f_Q(P)$ this value is an element of $\mathbb{F}_{p^{12}}$, however it is not yet an $r$-root of unity; in order to do so, the final exponentiation accomplishes this task. Since the exponent is constant for each curve, special algorithms can be tuned for it.

One interesting acceleration opportunity presents itself: in the Miller loop the elements of $\mathbb{F}_{p^{12}}$ that we have to multiply by are special — as polynomials, they have no linear term and their constant term lives in $\mathbb{F}_{p^{2}}$. We created a specialized multiplication that avoids multiplications where the input has to be zero. This specialization accelerated the pairing computation by 12%.

So far, we have described how the internal operations of a pairing are performed. There are still some other functions floating around regarding the use of pairings in cryptographic protocols. It is also important to optimize these functions and now we will discuss some of them.

Product of Pairings

Often protocols will want to evaluate a product of pairings, rather than a single pairing. This is the case if we’re evaluating multiple signatures, or if the protocol uses cancellation between different equations to ensure security, as in the dual system encoding approach to designing protocols. If each pairing was evaluated individually, this would require multiple evaluations of the final exponentiation. However, we can evaluate the product first, and then evaluate the final exponentiation once. This requires a different interface that can take vectors of points.

Occasionally, there is a sign or an exponent in the factors of the product. It’s very easy to deal with a sign explicitly by negating one of the input points, almost for free. General exponents are more complicated, especially when considering the need for side channel protection. But since we expose the interface, later work on the library will accelerate it without changing applications.

Regarding API exposure, one of the trickiest and most error prone aspects of software engineering is input validation. So we must check that raw binary inputs decode correctly as the points used for a pairing. Part of this verification includes subgroup membership testing which is the topic we discuss next.

Subgroup Membership Testing

Checking that a point is on the curve is easy, but checking that it has the right order is not: the classical way to do this is an entire expensive scalar multiplication. But implementing pairings involves the use of many clever tricks that help to make things run faster.

One example is twisting: the $\mathbb{G}_2$ group are points with coordinates in $\mathbb{F}_{p^{12}}$, however, one can use a smaller field to reduce the number of operations. The trick here is using an associated curve $E’$, which is a twist of the original curve $E$. This allows us to work on the subfield $\mathbb{F}_{p^{2}}$ that has cheaper operations.

Additionally, twisting the curve over $\mathbb{G}_2$ carries some efficiently computable endomorphisms coming from the field structure. For the cost of two field multiplications, we can compute an additional endomorphism, dramatically decreasing the cost of scalar multiplication.

By searching for the smallest combination of scalars that could zero out the $r$-torsion points, Sean Bowe came up with a much more efficient way to do subgroup checks. We implement his trick, with a big reduction in the complexity of some applications.

As can be seen, implementing a pairing is full of subtleties. We just saw that point validation in the pairing setting is a bit more challenging than in the conventional case of elliptic curve cryptography. This kind of reformulation also applies to other operations that require special care on their implementation. One another example is how to encode binary strings as elements of the group $\mathbb{G}_1$ (or $\mathbb{G}_2$). Although this operation might sound simple, implementing it securely needs to take into consideration several aspects; thus we expand more on this topic.

Hash to Curve

An important piece on the Boneh-Franklin Identity-based Encryption scheme is a special hash function that maps an arbitrary string — the identity, e.g., an email address — to a point on an elliptic curve, and that still behaves as a conventional cryptographic hash function (such as SHA-256) that is hard to invert and collision-resistant. This operation is commonly known as hashing to curve.

Boneh and Franklin found a particular way to perform hashing to curve: apply a conventional hash function to the input bitstring, and interpret the result as the $y$-coordinate of a point, then from the curve equation $y^2=x^3+b$, find the $x$-coordinate as $x=\sqrt[3]{y^2-b}$. The cubic root always exists on fields of characteristic $p\equiv 2 \bmod{3}$. But this algorithm does not apply to other fields in general restricting the parameters to be used.

Another popular algorithm, but since now we need to remark it is an insecure way for performing hash to curve is the following. Let the hash of the input be the $x$-coordinate, and from it find the $y$-coordinate by computing a square root $y= \sqrt{x^3+b}$. Note that not all $x$-coordinates lead that the square root exists, which means the algorithm may fail; thus, it’s a probabilistic algorithm. To make sure it works always, a counter can be added to $x$ and increased everytime the square root is not found. Although this algorithm always finds a point on the curve, this also makes the algorithm run in variable time i.e., it’s a non-constant time algorithm. The lack of this property on implementations of cryptographic algorithms makes them susceptible to timing attacks. The DragonBlood attack is an example of how a non-constant time hashing algorithm resulted in a full key recovery of WPA3 Wi-Fi passwords.

Secure hash to curve algorithms must guarantee several properties. It must be ensured that any input passed to the hash produces a point on the targeted group. That is no special inputs must trigger exceptional cases, and the output point must belong to the correct group. We make emphasis on the correct group since in certain applications the target group is the entire set of points of an elliptic curve, but in other cases, such as in the pairing setting, the target group is a subgroup of the entire curve, recall that $\mathbb{G}_1$ and $\mathbb{G}_2$ are $r$-torsion points. Finally, some cryptographic protocols are proven secure provided that the hash to curve function behaves as a random oracle of points. This requirement adds another level of complexity to the hash to curve function.

Fortunately, several researchers have addressed most of these problems and some other researchers have been involved in efforts to define a concrete specification for secure algorithms for hashing to curves, by extending the sort of geometric trick that worked for the Boneh-Franklin curve. We have participated in the Crypto Forum Research Group (CFRG) at IETF on the work-in-progress Internet draft-irtf-cfrg-hash-to-curve. This document specifies secure algorithms for hashing targeting several elliptic curves including the BLS12-381 curve. At Cloudflare, we are actively collaborating in several working groups of IETF, see Jonathan Hoyland’s post to know more about it.

Our implementation complies with the recommendations given in the hash to curve draft and also includes many implementation techniques and tricks derived from a vast number of academic research articles in pairing-based cryptography. A good compilation of most of these tricks is delightfully explained by Craig Costello.

We hope this post helps you to shed some light and guidance on the development of pairing-based cryptography, as it has become much more relevant these days. We will share with you soon an interesting use case in which the application of pairing-based cryptography helps us to harden the security of our infrastructure.

What’s next?

We invite you to use our CIRCL library, now equipped with bilinear pairings. But there is more: look at other primitives already available such as HPKE, VOPRF, and Post-Quantum algorithms. On our side, we will continue improving the performance and security of our library, and let us know if any of your projects uses CIRCL, we would like to know your use case. Reach us at research.cloudflare.com.

You’ll soon hear more about how we’re using CIRCL across Cloudflare.

Increasing developer happiness with GitHub code scanning

Post Syndicated from Sam Partington original https://github.blog/2021-09-07-increasing-developer-happiness-github-code-scanning/

You probably already know about using GitHub code scanning to secure your code. But how about using it to make your day-to-day coding easier? We’ve been making internal use of CodeQL, our code analysis engine for code scanning, to keep code quality high by protecting ourselves from those annoying coding mistakes that are easy to make but hard to spot! Read on for some examples of what we’ve done so far and how you can make the most of CodeQL for yourself.

Plugging a memory leak

Go’s defer statement defers the execution of a function until the surrounding function returns. This is useful for cleaning up: For example, closing resources like file handles or completing database transactions.

When changing existing code, you can end up moving a defer statement inside a loop. If you do so, you’ll still have to wait until the end of the function for cleanup; it won’t happen at the end of the iteration. We’ve seen this mistake lead to memory leaks in production.

Wouldn’t it be great if this mistake could be pointed out to you? We wanted to live in that happy world, and all it took was four lines of CodeQL.

A nice postscript to this story is that seeing this query led another team at GitHub to add CodeQL to their repository. They’d been bitten by a defer-in-loop memory leak before and didn’t want it to happen again. Once code scanning was set up for them, CodeQL discovered another problem in their codebase, which was similar to the one we’ll discuss next.

The error you can’t ignore

We use GORM, a Go Object Relational Mapper, in some of our codebases. Error handling in GORM is different than in idiomatic Go code, because it has a chainable API. Here’s an example:

if err := db.Where("name = ?", "jinzhu").First(&user).Error; err !=
nil {
 // error handling...

As you can imagine, it’s easy to write code like db.Where("name = ?", "jinzhu").First(&user) and not check that Error field.

At least it used to be easy to do that. We’ve now created a CodeQL query which detects GORM calls that don’t check the associated Error field and flags these calls in pull requests. You’ll also find a similar query for error checking functions which return pointers in the security-and-quality query suite for CodeQL.

Loopy performance problems

In addition to protecting against missing error checking, we also want to keep our database-querying code performant. “N+1 queries” are a common performance issue. This is where some expensive operation is performed once for every member of a set, so the code will get slower as the number of items increases. Database calls in a loop are often the culprit here; typically, you’ll get better performance from a batch query outside of the loop instead.

We created a custom CodeQL query, which looks for calls to any of the GORM methods that actually result in a query being performed. We filter that list of calls down to those that happen within a loop and fail CI if any are encountered. What’s nice about CodeQL is that we’re not limited to database calls directly within the body of a loop―calls within functions called directly or indirectly from the loop are caught too.

Using these queries

These queries are experimental, so we’ve not included them in our standard suites. However, you can use them by referencing a special query suite we’ve created.

First, create a file .github/codeql/go-developer-happiness.qls in the repository you would like to analyze:

- import: codeql-suites/go-developer-happiness.qls
  from: codeql-go

Next, set up a CodeQL workflow (or edit an existing one) and amend the “Initialize CodeQL” section of the template as follows:

- name: Initialize CodeQL
  uses: github/codeql-action/[email protected]
    languages: go
    queries: ./.github/codeql/go-developer-happiness.qls

For more information and configuration examples, please refer to the documentation for running custom CodeQL queries in GitHub code scanning.

Making your own

Are there common “gotchas” in your codebase? Why not ease developer friction with some custom CodeQL queries of your own? You can learn more about writing CodeQL with our documentation and discussions―and also find out more about contributing queries back to the community―in the CodeQL repository at https://github.com/github/codeql. We look forward to seeing what you come up with!

Go is not an easy language

Post Syndicated from arp242.net original https://www.arp242.net/go-easy.html

Go is not an easy programming language. It is simple in many ways: the syntax
is simple, most of the semantics are simple. But a language is more than just
syntax; it’s about doing useful stuff. And doing useful stuff is not always
easy in Go.

Turns out that combining all those simple features in a way to do something
useful can be tricky. How do you remove an item from an array in Ruby?
list.delete_at(i). And remove entries by value? list.delete(value). Pretty
easy, yeah?

In Go it’s … less easy; to remove the index i you need to do:

list = append(list[:i], list[i+1:]...)

And to remove the value v you’ll need to use a loop:

n := 0
for _, l := range list {
    if l != v {
        list[n] = l
list = list[:n]

Is this unacceptably hard? Not really; I think most programmers can figure out
what the above does even without prior Go experience. But it’s not exactly
easy either. I’m usually lazy and copy these kind of things from the Slice
page because I want to focus on actually solving the problem at
hand, rather than plumbing like this.

It’s also easy to get it (subtly) wrong or suboptimal, especially for less
experienced programmers. For example compare the above to copying to a new array
and copying to a new pre-allocated array (make([]string, 0, len(list))):

InPlace             116 ns/op      0 B/op   0 allocs/op
NewArrayPreAlloc    525 ns/op    896 B/op   1 allocs/op
NewArray           1529 ns/op   2040 B/op   8 allocs/op

While 1529ns is still plenty fast enough for many use cases and isn’t something
to excessively worry about, there are plenty of cases where these things do
matter and having the guarantee to always use the best possible algorithm with
list.delete(value) has some value.

Goroutines are another good example. “Look how is it is to start a goroutine!
Just add go and you’re done!” Well, yes; you’re done until you have five
million of those running at the same time and then you’re left wondering where
all your memory went, and it’s not hard to “leak” goroutines by accident either.

There are a number of patterns to limit the number of goroutines, and none of
them are exactly easy. A simple example might be something like:

var (
	jobs    = 20                 // Run 20 jobs in total.
	running = make(chan bool, 3) // Limit concurrent jobs to 3.
	done    = make(chan bool)    // Signal that all jobs are done.

for i := 1; i <= jobs; i++ {
	running <- true // Fill running; this will block and wait if it's already full.

	// Start a job.
	go func(i int) {
		defer func() {
			<-running      // Drain running so new jobs can be added.
			if i == jobs { // Last job, signal that we're done.
				done <- true

		// "do work"
		time.Sleep(1 * time.Second)

<-done // Wait until all jobs are done.

There’s a reason I annotated this with some comments: for people not intimately
familiar with Go this may take some effort to understand. This also won’t ensure
that the numbers are printed in order (which may or may not be a requirement).

Go’s concurrency primitives may be simple and easy to use, but combining them to
solve common real-world scenarios is a lot less simple. The original version of
the above example was actually incorrect.

In Simple Made Easy Rich Hickey argues that we shouldn’t confuse “simple”
with “it’s easy to write”: just because you can do something useful in one or
two lines doesn’t mean the underlying concepts – and therefore the entire
program – are “simple” as in “simple to understand”.

I feel there is some wisdom in this; in most cases we shouldn’t sacrifice
“simple” for “easy”, but that doesn’t mean we can’t think at all about how to
make things easier. Just because concepts are simple doesn’t mean they’re easy
to use, can’t be misused, or can’t be used in ways that lead to (subtle) bugs.
Pushing Hickey’s argument to the extreme we’d end up with something like
Brainfuck and that would of course be silly.

Ideally a language should reduce the cognitive load required to reason about its
behaviour; there are many ways to increase this cognitive load: complex
intertwined language features is one of them, and getting “distracted” by
implementing fairly basic things from those simple concepts is another: it’s
another block of code I need to reason about. While I’m not overly concerned
about code formatting or syntax choices, I do think it can matter to reduce this
cognitive load when reading code.

The lack of generics probably plays some part here; implementing a slices
package which does these kind of things in a generic way is hard right now.
Generics makes this possible and also makes things more complex (more language
features are used), but they also make things easier and, arguably, less complex
on other fronts.[1]

Are these insurmountable problems? No. I still use (and like) Go after all. But
I also don’t think that Go is a language that you “could pick up in ~5-10
minutes”, which was the comment that prompted this post; a sentiment I’ve seen
expressed many times.

As a corollary to all of the above; learning the language isn’t just about
learning the syntax to write your ifs and fors; it’s about learning a way of
thinking. I’ve seen many people coming from Python or C♯ try to shoehorn
concepts or patterns from those languages in Go. Common ones include using
struct embedding as inheritance, panics as exceptions, “pseudo-dynamic
programming” with interface{}, and so forth. It rarely ends well, if ever.

I did this as well when I was writing my first Go program; it’s only natural.
And when I started as a Ruby programmed I tried to write Python code in Ruby
(although this works a bit better as the languages are more similar, but there
are still plenty of odd things you can do such as using for loops).

This is why I don’t like it when people get redirected to the Tour of Go to
“learn the language”, as it just teaches basic syntax and little more. It’s nice
as a little, well, tour to get a bit of a feel of the language and see how it
roughly works and what it can roughly do, but it’s ill-suited to actually learn
the language.


  1. Contrary to popular belief the Go team was never “against” generics;
    I’ve seen many comments to the effect of “the Go team doesn’t think
    generics are useful”, but this was never the case. 

Bitmasks for nicer APIs

Post Syndicated from arp242.net original https://www.arp242.net/bitmask.html

Bitmasks is one of those things where the basic idea is simple to understand:
it’s just 0s and 1s being toggled on and off. But actually “having it click”
to the point where it’s easy to work with can be a bit trickier. At least, it is
(or rather, was) for me 😅

With a bitmask you hide (or “mask”) certain bits of a number, which can be
useful for various things as we’ll see later on. There are two reasons one might
use bitmasks: for efficiency or for nicer APIs. Efficiency is rarely an issue
except for some embedded or specialized use cases, but everyone likes nice APIs,
so this is about that.

A while ago I added colouring support to my little zli library. Adding
colours to your terminal is not very hard as such, just print an escape code:

fmt.Println("\x1b[34mRed text!\x1b[0m")

But a library makes this a bit easier. There’s already a bunch of libraries out
there for Go specifically, the most popular being Fatih Arslan’s color:

color.New(color.FgRed).Add(color.Bold).Add(color.BgCyan).Println("bold red")

This is stored as:

type (
    Attribute int
    Color     struct { params  []Attribute }

I wanted a simple way to add some colouring, which looks a bit nicer than the
method chain in the color library, and eventually figured out you don’t need a
[]int to store all the different attributes but that a single uint64 will do
as well:

zli.Colorf("bold red", zli.Red | zli.Bold | zli.Cyan.Bg())

// Or alternatively, use Color.String():
fmt.Printf("%sbold red%s\n", zli.Red|zli.Bold|zli.Cyan.Bg(), zli.Reset)

Which in my eyes looks a bit nicer than Fatih’s library, and also makes it
easier to add 256 and true colour support.

All of the below can be used in any language by the way, and little of this is
specific to Go. You will need Go 1.13 or newer for the binary literals to work.

Here’s how zli stores all of this in a uint64:

                                   fg true, 256, 16 color mode ─┬──┐
                                bg true, 256, 16 color mode ─┬─┐│  │
                                                             │ ││  │┌── parsing error
 ┌───── bg color ────────────┐ ┌───── fg color ────────────┐ │ ││  ││┌─ term attr
 v                           v v                           v v vv  vvv         v
 0000_0000 0000_0000 0000_0000 0000_0000 0000_0000 0000_0000 0000_0000 0000_0000
 ^         ^         ^         ^         ^         ^         ^         ^
64        56        48        40        32        24        16         8

I’ll go over it in detail later, but in short (from right to left):

  • The first 9 bits are flags for the basic terminal attributes such as bold,
    italic, etc.

  • The next bit is to signal a parsing error for true colour codes (e.g. #123123).

  • There are 3 flags for the foreground and background colour each to signal that
    a colour should be applied, and how it should be interpreted (there are 3
    different ways to set the colour: 16-colour, 256-colour, and 24-bit “true
    colour”, which use different escape codes).

  • The colours for the foreground and background are stored separately, because
    you can apply both a foreground and background. These are 24-bit numbers.

  • A value of 0 is reset.

With this, you can make any combination of the common text attributes, the above

zli.Colorf("bold red", zli.Red | zli.Bold | zli.Cyan.Bg())

Would be the following in binary layout:

                                              fg 16 color mode ────┐
                                           bg 16 color mode ───┐   │
                                                               │   │        bold
                bg color ─┬──┐                fg color ─┬──┐   │   │           │
                          v  v                          v  v   v   v           v
 0000_0000 0000_0000 0000_0110 0000_0000 0000_0000 0000_0001 0010_0100 0000_0001
 ^         ^         ^         ^         ^         ^         ^         ^
64        56        48        40        32        24        16         8

We need to go through several steps to actually do something meaningful with
this. First, we want to get all the flag values (the first 24 bits); a “flag” is
a bit being set to true (1) or false (0).

const (
    Bold         = 0b0_0000_0001
    Faint        = 0b0_0000_0010
    Italic       = 0b0_0000_0100
    Underline    = 0b0_0000_1000
    BlinkSlow    = 0b0_0001_0000
    BlinkRapid   = 0b0_0010_0000
    ReverseVideo = 0b0_0100_0000
    Concealed    = 0b0_1000_0000
    CrossedOut   = 0b1_0000_0000

func applyColor(c uint64) {
    if c & Bold != 0 {
        // Write escape code for bold
    if c & Faint != 0 {
        // Write escape code for faint
    // etc.

& is the bitwise AND operator. It works just as the more familiar && except
that it operates on every individual bit where 0 is false and 1 is true.
The end result will be 1 if both bits are “true” (1). An example with just
four bits:

0011 & 0101 = 0001

This can be thought of as four separate operations (from left to right):

0 AND 0 = 0      both false
0 AND 1 = 0      first value is false, so the end result is false
1 AND 0 = 0      second value is false
1 AND 1 = 1      both true

So what if c & Bold != 0 does is check if the “bold bit” is set:

Only bold set:
0 0000 0001 & 0 0000 0001 = 0 0000 0001

Underline bit set:
0 0000 1000 & 0 0000 0001 = 0 0000 0000      0 since there are no cases of "1 AND 1"

Bold and underline bits set:
0 0000 1001 & 0 0000 0001 = 0 0000 0001      Only "bold AND bold" is "1 AND 1"

As you can see, c & Bold != 0 could also be written as c & Bold == Bold.

The colours themselves are stored as a regular number like any other, except
that they’re “offset” a number of bits. To get the actual number value we need
to clear all the bits we don’t care about, and shift it all to the right:

const (
    colorOffsetFg   = 16

    colorMode16Fg   = 0b0000_0100_0000_0000
    colorMode256Fg  = 0b0000_1000_0000_0000
    colorModeTrueFg = 0b0001_0000_0000_0000

    maskFg          = 0b00000000_00000000_00000000_11111111_11111111_11111111_00000000_00000000

func getColor(c uint64) {
    if c & colorMode16Fg != 0  {
        cc := (c & maskFg) >> colorOffsetFg
        // ..write escape code for this color..

First we check if the “16 colour mode” flag is set using the same method as the
terminal attributes, and then we AND it with maskFg to clear all the bits we
don’t care about:

                                   fg true, 256, 16 color mode ─┬──┐
                                bg true, 256, 16 color mode ─┬─┐│  │
                                                             │ ││  │┌── parsing error
 ┌───── bg color ────────────┐ ┌───── fg color ────────────┐ │ ││  ││┌─ term attr
 v                           v v                           v v vv  vvv         v
 0000_0000 0000_0000 0000_0110 0000_0000 0000_0000 0000_0001 0010_0100 0000_1001
AND maskFg
 0000_0000 0000_0000 0000_0000 0000_0000 0000_0000 0000_0001 0000_0000 0000_0000
 ^         ^         ^         ^         ^         ^         ^         ^
64        56        48        40        32        24        16         8

After the AND operation we’re left with just the 24 bits we care about, and
everything else is set to 0. To get a normal number from this we need to shift
the bits to the right with >>:

1010 >> 1 = 0101    All bits shifted one position to the right.
1010 >> 2 = 0010    Shift two, note that one bit gets discarded.

Instead of >> 16 you can also subtract 65535 (a 16-bit number): (c &
maskFg) - 65535
. The end result is the same, but bit shifts are much easier to
reason about in this context.

We repeat this for the background colour (except that we shift everything 40
bits to the right). The background is actually a bit easier since we don’t need
to AND anything to clear bits, as all the bits to the right will just be

cc := c >> ColorOffsetBg

For 256 and “true” 24-bit colours we do the same, except that we need to send
different escape codes for them, which is a detail that doesn’t really matter
for this explainer about bitmasks.

To set the background colour we use the Bg() function to transforms a
foreground colour to a background one. This avoids having to define BgCyan
constants like Fatih’s library, and makes working with 256 and true colour

const (
    colorMode16Fg   = 0b00000_0100_0000_0000
    colorMode16Bg   = 0b0010_0000_0000_0000

    maskFg          = 0b00000000_00000000_00000000_11111111_11111111_11111111_00000000_00000000

func Bg(c uint64) uint64 {
    if c & colorMode16Fg != 0 {
        c = c ^ colorMode16Fg | colorMode16Bg
    return (c &^ maskFg) | (c & maskFg << 24)

First we check if the foreground colour flags is set; if it is then move that
bit to the corresponding background flag.

| is the OR operator; this works like || except on individual bits like in
the above example for &. Note that unlike || it won’t stop if the first
condition is false/0: if any of the two values are 1 the end result will be

0 OR 0 = 0      both false
0 OR 1 = 1      second value is true, so end result is true
1 OR 0 = 1      first value is true
1 OR 1 = 1      both true

0011 | 0101 = 0111

^ is the “exclusive or”, or XOR, operator. It’s similar to OR except that it
only outputs 1 if exactly one value is 1, and not if both are:

0 XOR 0 = 0      both false
0 XOR 1 = 1      second value is true, so end result is true
1 XOR 0 = 1      first value is true
1 XOR 1 = 0      both true, so result is 0

0011 ^ 0101 = 0101

Putting both together, c ^ colorMode16Fg clears the foreground flag and |
sets the background flag.

The last line moves the bits from the foreground colour to the background

return (c &^ maskFg) | (c & maskFg << 24)

&^ is “AND NOT”: these are two operations: first it will inverse the right
side (“NOT”) and then ANDs the result. So in our example the maskFg value is


We then used this inversed maskFg value to clear the foreground colour,
leaving everything else intact:

 0000_0000 0000_0000 0000_0110 0000_0000 0000_0000 0000_0001 0010_0100 0000_1001
 0000_0000 0000_0000 0000_0110 0000_0000 0000_0000 0000_0000 0010_0100 0000_1001
 ^         ^         ^         ^         ^         ^         ^         ^
64        56        48        40        32        24        16         8

C and most other languages don’t have this operator and have ~ for NOT (which
Go doesn’t have), so the above would be (c & ~maskFg) in most other languages.

Finally, we set the background colour by clearing all bits that are not part of
the foreground colour, shifting them to the correct place, and ORing this to get
the final result.

I skipped a number of implementation details in the above example for clarity,
especially for people not familiar with Go. The full code is of course
. Putting all of
this together gives a fairly nice API IMHO in about 200 lines of code which
mostly avoids boilerplateism.

I only showed the 16-bit colours in the examples, in reality most of this is
duplicated for 256 and true colours as well. It’s all the same logic, just with
different values. I also skipped over the details of terminal colour codes, as
this article isn’t really about that.

In many of the above examples I used binary literals for the constants, and this
seemed the best way to communicate how it all works for this article. This isn’t
necessarily the best or easiest way to write things in actual code, especially
not for such large numbers. In the actual code it looks like:

const (
    ColorOffsetFg = 16
    ColorOffsetBg = 40

const (
    maskFg Color = (256*256*256 - 1) << ColorOffsetFg
    maskBg Color = maskFg << (ColorOffsetBg - ColorOffsetFg)

// Basic terminal attributes.
const (
    Reset Color = 0
    Bold  Color = 1 << (iota - 1)
    // ...

Figuring out how this works is left as an exercise for the reader 🙂

Another thing that might be useful is a little helper function to print a number
as binary; it helps visualise things if you’re confused:

func bin(c uint64) {
    reBin := regexp.MustCompile(`([01])([01])([01])([01])([01])([01])([01])([01])`)
    reverse := func(s string) string {
        runes := []rune(s)
        for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 {
            runes[i], runes[j] = runes[j], runes[i]
        return string(runes)
    fmt.Printf("%[2]s → %[1]d\n", c,
        reverse(reBin.ReplaceAllString(reverse(fmt.Sprintf("%064b", c)),
            `$1$2$3${4}_$5$6$7$8 `)))

I put a slighly more advanced version of this at

You can also write a little wrapper to make things a bit easier:

type Bitflag64 uint64 uint64

func (f Bitflag64) Has(flag Bitflag64) bool { return f&flag != 0 }
func (f *Bitflag64) Set(flag Bitflag64)     { *f = *f | flag }
func (f *Bitflag64) Clear(flag Bitflag64)   { *f = *f &^ flag }
func (f *Bitflag64) Toggle(flag Bitflag64)  { *f = *f ^ flag }

If you need more than 64 bits then not all is lost; you can use type thingy

Here’s an example where I did it wrong:

type APITokenPermissions struct {
    Count      bool 
    Export     bool 
    SiteRead   bool 
    SiteCreate bool 
    SiteUpdate bool 

This records the permissions for an API token the user creates. Looks nice, but
how do you check that only Count is set?

if p.Count && !p.Export && !p.SiteRead && !p.SiteCreate && !p.SiteUpdate { .. }

Ugh; not very nice, and neither is checking if multiple permissions are set:

if perm.Export && perm.SiteRead && perm.SiteCreate && perm.SiteUpdate { .. }

Had I stored it as a bitmask instead, it would have been easier:

if perm & Count == 0 { .. }

const permSomething = perm.Export | perm.SiteRead | perm.SiteCreate | perm.SiteUpdate
if perm & permEndpointSomething == 0 { .. }

No one likes functions with these kind of signatures either:

f(false, false, true)
f(true, false, true)

But with a bitmask things can look a lot nicer:

const (
    AddWarpdrive   = 0b0001
    AddTractorBeam = 0b0010
    AddPhasers     = 0b0100

f(AddWarpdrive | AddPhasers)

Building, bundling, and deploying applications with the AWS CDK

Post Syndicated from Cory Hall original https://aws.amazon.com/blogs/devops/building-apps-with-aws-cdk/

The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to model and provision your cloud application resources using familiar programming languages.

The post CDK Pipelines: Continuous delivery for AWS CDK applications showed how you can use CDK Pipelines to deploy a TypeScript-based AWS Lambda function. In that post, you learned how to add additional build commands to the pipeline to compile the TypeScript code to JavaScript, which is needed to create the Lambda deployment package.

In this post, we dive deeper into how you can perform these build commands as part of your AWS CDK build process by using the native AWS CDK bundling functionality.

If you’re working with Python, TypeScript, or JavaScript-based Lambda functions, you may already be familiar with the PythonFunction and NodejsFunction constructs, which use the bundling functionality. This post describes how to write your own bundling logic for instances where a higher-level construct either doesn’t already exist or doesn’t meet your needs. To illustrate this, I walk through two different examples: a Lambda function written in Golang and a static site created with Nuxt.js.


A typical CI/CD pipeline contains steps to build and compile your source code, bundle it into a deployable artifact, push it to artifact stores, and deploy to an environment. In this post, we focus on the building, compiling, and bundling stages of the pipeline.

The AWS CDK has the concept of bundling source code into a deployable artifact. As of this writing, this works for two main types of assets: Docker images published to Amazon Elastic Container Registry (Amazon ECR) and files published to Amazon Simple Storage Service (Amazon S3). For files published to Amazon S3, this can be as simple as pointing to a local file or directory, which the AWS CDK uploads to Amazon S3 for you.

When you build an AWS CDK application (by running cdk synth), a cloud assembly is produced. The cloud assembly consists of a set of files and directories that define your deployable AWS CDK application. In the context of the AWS CDK, it might include the following:

  • AWS CloudFormation templates and instructions on where to deploy them
  • Dockerfiles, corresponding application source code, and information about where to build and push the images to
  • File assets and information about which S3 buckets to upload the files to

Use case

For this use case, our application consists of front-end and backend components. The example code is available in the GitHub repo. In the repository, I have split the example into two separate AWS CDK applications. The repo also contains the Golang Lambda example app and the Nuxt.js static site.

Golang Lambda function

To create a Golang-based Lambda function, you must first create a Lambda function deployment package. For Go, this consists of a .zip file containing a Go executable. Because we don’t commit the Go executable to our source repository, our CI/CD pipeline must perform the necessary steps to create it.

In the context of the AWS CDK, when we create a Lambda function, we have to tell the AWS CDK where to find the deployment package. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  runtime: lambda.Runtime.GO_1_X,
  handler: 'main',
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-go-executable')),

In the preceding code, the lambda.Code.fromAsset() method tells the AWS CDK where to find the Golang executable. When we run cdk synth, it stages this Go executable in the cloud assembly, which it zips and publishes to Amazon S3 as part of the PublishAssets stage.

If we’re running the AWS CDK as part of a CI/CD pipeline, this executable doesn’t exist yet, so how do we create it? One method is CDK bundling. The lambda.Code.fromAsset() method takes a second optional argument, AssetOptions, which contains the bundling parameter. With this bundling parameter, we can tell the AWS CDK to perform steps prior to staging the files in the cloud assembly.

Breaking down the BundlingOptions parameter further, we can perform the build inside a Docker container or locally.

Building inside a Docker container

For this to work, we need to make sure that we have Docker running on our build machine. In AWS CodeBuild, this means setting privileged: true. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-source-code'), {
    bundling: {
      image: lambda.Runtime.GO_1_X.bundlingDockerImage,
      command: [
        'bash', '-c', [
          'go test -v',
          'GOOS=linux go build -o /asset-output/main',
      ].join(' && '),

We specify two parameters:

  • image (required) – The Docker image to perform the build commands in
  • command (optional) – The command to run within the container

The AWS CDK mounts the folder specified as the first argument to fromAsset at /asset-input inside the container, and mounts the asset output directory (where the cloud assembly is staged) at /asset-output inside the container.

After we perform the build commands, we need to make sure we copy the Golang executable to the /asset-output location (or specify it as the build output location like in the preceding example).

This is the equivalent of running something like the following code:

docker run \
  --rm \
  -v folder-containing-source-code:/asset-input \
  -v cdk.out/asset.1234a4b5/:/asset-output \
  lambci/lambda:build-go1.x \
  bash -c 'GOOS=linux go build -o /asset-output/main'

Building locally

To build locally (not in a Docker container), we have to provide the local parameter. See the following code:

new lambda.Function(this, 'MyGoFunction', {
  code: lambda.Code.fromAsset(path.join(__dirname, 'folder-containing-source-code'), {
    bundling: {
      image: lambda.Runtime.GO_1_X.bundlingDockerImage,
      command: [],
      local: {
        tryBundle(outputDir: string) {
          try {
            spawnSync('go version')
          } catch {
            return false

          spawnSync(`GOOS=linux go build -o ${path.join(outputDir, 'main')}`);
          return true

The local parameter must implement the ILocalBundling interface. The tryBundle method is passed the asset output directory, and expects you to return a boolean (true or false). If you return true, the AWS CDK doesn’t try to perform Docker bundling. If you return false, it falls back to Docker bundling. Just like with Docker bundling, you must make sure that you place the Go executable in the outputDir.

Typically, you should perform some validation steps to ensure that you have the required dependencies installed locally to perform the build. This could be checking to see if you have go installed, or checking a specific version of go. This can be useful if you don’t have control over what type of build environment this might run in (for example, if you’re building a construct to be consumed by others).

If we run cdk synth on this, we see a new message telling us that the AWS CDK is bundling the asset. If we include additional commands like go test, we also see the output of those commands. This is especially useful if you wanted to fail a build if tests failed. See the following code:

$ cdk synth
Bundling asset GolangLambdaStack/MyGoFunction/Code/Stage...
✓  . (9ms)
✓  clients (5ms)

DONE 8 tests in 11.476s
✓  clients (5ms) (coverage: 84.6% of statements)
✓  . (6ms) (coverage: 78.4% of statements)

DONE 8 tests in 2.464s

Cloud Assembly

If we look at the cloud assembly that was generated (located at cdk.out), we see something like the following code:

$ cdk synth
Bundling asset GolangLambdaStack/MyGoFunction/Code/Stage...
✓  . (9ms)
✓  clients (5ms)

DONE 8 tests in 11.476s
✓  clients (5ms) (coverage: 84.6% of statements)
✓  . (6ms) (coverage: 78.4% of statements)

DONE 8 tests in 2.464s

It contains our GolangLambdaStack CloudFormation template that defines our Lambda function, as well as our Golang executable, bundled at asset.01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952/main.

Let’s look at how the AWS CDK uses this information. The GolangLambdaStack.assets.json file contains all the information necessary for the AWS CDK to know where and how to publish our assets (in this use case, our Golang Lambda executable). See the following code:

  "version": "5.0.0",
  "files": {
    "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952": {
      "source": {
        "path": "asset.01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952",
        "packaging": "zip"
      "destinations": {
        "current_account-current_region": {
          "bucketName": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}",
          "objectKey": "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952.zip",
          "assumeRoleArn": "arn:${AWS::Partition}:iam::${AWS::AccountId}:role/cdk-hnb659fds-file-publishing-role-${AWS::AccountId}-${AWS::Region}"

The file contains information about where to find the source files (source.path) and what type of packaging (source.packaging). It also tells the AWS CDK where to publish this .zip file (bucketName and objectKey) and what AWS Identity and Access Management (IAM) role to use (assumeRoleArn). In this use case, we only deploy to a single account and Region, but if you have multiple accounts or Regions, you see multiple destinations in this file.

The GolangLambdaStack.template.json file that defines our Lambda resource looks something like the following code:

  "Resources": {
    "MyGoFunction0AB33E85": {
      "Type": "AWS::Lambda::Function",
      "Properties": {
        "Code": {
          "S3Bucket": {
            "Fn::Sub": "cdk-hnb659fds-assets-${AWS::AccountId}-${AWS::Region}"
          "S3Key": "01cf34ff646d380829dc4f2f6fc93995b13277bde7db81c24ac8500a83a06952.zip"
        "Handler": "main",

The S3Bucket and S3Key match the bucketName and objectKey from the assets.json file. By default, the S3Key is generated by calculating a hash of the folder location that you pass to lambda.Code.fromAsset(), (for this post, folder-containing-source-code). This means that any time we update our source code, this calculated hash changes and a new Lambda function deployment is triggered.

Nuxt.js static site

In this section, I walk through building a static site using the Nuxt.js framework. You can apply the same logic to any static site framework that requires you to run a build step prior to deploying.

To deploy this static site, we use the BucketDeployment construct. This is a construct that allows you to populate an S3 bucket with the contents of .zip files from other S3 buckets or from a local disk.

Typically, we simply tell the BucketDeployment construct where to find the files that it needs to deploy to the S3 bucket. See the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-directory')),
  destinationBucket: myBucket

To deploy a static site built with a framework like Nuxt.js, we need to first run a build step to compile the site into something that can be deployed. For Nuxt.js, we run the following two commands:

  • yarn install – Installs all our dependencies
  • yarn generate – Builds the application and generates every route as an HTML file (used for static hosting)

This creates a dist directory, which you can deploy to Amazon S3.

Just like with the Golang Lambda example, we can perform these steps as part of the AWS CDK through either local or Docker bundling.

Building inside a Docker container

To build inside a Docker container, use the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-nuxtjs-project'), {
      bundling: {
        image: cdk.BundlingDockerImage.fromRegistry('node:lts'),
        command: [
          'bash', '-c', [
            'yarn install',
            'yarn generate',
            'cp -r /asset-input/dist/* /asset-output/',
          ].join(' && '),

For this post, we build inside the publicly available node:lts image hosted on DockerHub. Inside the container, we run our build commands yarn install && yarn generate, and copy the generated dist directory to our output directory (the cloud assembly).

The parameters are the same as described in the Golang example we walked through earlier.

Building locally

To build locally, use the following code:

new s3_deployment.BucketDeployment(this, 'DeployMySite', {
  sources: [
    s3_deployment.Source.asset(path.join(__dirname, 'path-to-nuxtjs-project'), {
      bundling: {
        local: {
          tryBundle(outputDir: string) {
            try {
              spawnSync('yarn --version');
            } catch {
              return false

            spawnSync('yarn install && yarn generate');

       fs.copySync(path.join(__dirname, ‘path-to-nuxtjs-project’, ‘dist’), outputDir);
            return true
        image: cdk.BundlingDockerImage.fromRegistry('node:lts'),
        command: [],

Building locally works the same as the Golang example we walked through earlier, with one exception. We have one additional command to run that copies the generated dist folder to our output directory (cloud assembly).


This post showed how you can easily compile your backend and front-end applications using the AWS CDK. You can find the example code for this post in this GitHub repo. If you have any questions or comments, please comment on the GitHub repo. If you have any additional examples you want to add, we encourage you to create a Pull Request with your example!

Our code also contains examples of deploying the applications using CDK Pipelines, so if you’re interested in deploying the example yourself, check out the example repo.


About the author

Cory Hall

Cory is a Solutions Architect at Amazon Web Services with a passion for DevOps and is based in Charlotte, NC. Cory works with enterprise AWS customers to help them design, deploy, and scale applications to achieve their business goals.

Optimally scaling Kafka consumer applications

Post Syndicated from Grab Tech original https://engineering.grab.com/optimally-scaling-kafka-consumer-applications

Earlier this year, we took you on a journey on how we built and deployed our event sourcing and stream processing framework at Grab. We’re happy to share that we’re able to reliably maintain our uptime and continue to service close to 400 billion events a week. We haven’t stopped there though. To ensure that we can scale our framework as the Grab business continuously grows, we have spent efforts optimizing our infrastructure.

In this article, we will dive deeper into our Kubernetes infrastructure setup for our stream processing framework. We will cover why and how we focus on optimal scalability and availability of our infrastructure.

Quick Architecture Recap

Coban Platform Architecture

The Coban platform provides lightweight Golang plugin architecture-based data processing pipelines running in Kubernetes. These are essentially Kafka consumer pods that consume data, process it, and then materialize the results into various sinks (RDMS, other Kafka topics).

Anatomy of a Processing Pod

Anatomy of a Processing Pod

Each stream processing pod (the smallest unit of a pipeline’s deployment) has three top level components:

  • Trigger: An interface that connects directly to the source of the data and converts it into an event channel.
  • Runtime: This is the app’s entry point and the orchestrator of the pod. It manages the worker pools, triggers, event channels, and lifecycle events.
  • Pipeline plugin: This is provided by the user, and conforms to a contract that the platform team publishes. It contains the domain logic for the pipeline and houses the pipeline orchestration defined by a user based on our Stream Processing Framework.

Optimal Scaling

We initially architected our Kubernetes setup around horizontal pod autoscaling (HPA), which scales the number of pods per deployment based on CPU and memory usage. HPA keeps CPU and memory per pod specified in the deployment manifest and scales horizontally as the load changes.

These were the areas of application wastage we observed on our platform:

  • As Grab’s traffic is uneven, we’d always have to provision for peak traffic. As users would not (or could not) always account for ramps, they would be fairly liberal with setting limit values (CPU and memory), leading to resource wastage.
  • Pods often had uneven traffic distribution despite fairly even partition load distribution in Kafka. The Stream Processing Framework(SPF) is essentially Kafka consumers consuming from Kafka topics, hence the number of pods scaling in and out resulted in unequal partition load per pod.

Vertically Scaling with Fixed Number of Pods

We initially kept the number of pods for a pipeline equal to the number of partitions in the topic the pipeline consumes from. This ensured even distribution of partitions to each pod providing balanced consumption. In order to abstract this from the end user, we automated the application deployment process to directly call the Kafka API to fetch the number of partitions during runtime.

After achieving a fixed number of pods for the pipeline, we wanted to move away from HPA. We wanted our pods to scale up and down as the load increases or decreases without any manual intervention. Vertical pod autoscaling (VPA) solves this problem as it relieves us from any manual operation for setting up resources for our deployment.

We just deploy the application and let VPA handle the resources required for its operation. It’s known to not be very susceptible to quick load changes as it trains its model to monitor the deployment’s load trend over a period of time before recommending an optimal resource. This process ensures the optimal resource allocation for our pipelines considering the historic trends on throughput.

We saw a ~45% reduction in our total resource usage vs resource requested after moving to VPA with a fixed number of pods from HPA.

Anatomy of a Processing Pod

Managing Availability

We broadly classify our workloads as latency sensitive (critical) and latency tolerant (non-critical). As a result, we could optimize scheduling and cost efficiency using priority classes and overprovisioning on heterogeneous node types on AWS.

Kubernetes Priority Classes

The main cost of running EKS in AWS is attributed to the EC2 machines that form the worker nodes for the Kubernetes cluster. Running On-Demand brings all the guarantees of instance availability but it is definitely very expensive. Hence, our first action to drive cost optimisation was to include Spot instances in our worker node group.

With the uncertainty of losing a spot instance, we started assigning priority to our various applications. We then let the user choose the priority of their pipeline depending on their use case. Different priorities would result in different node affinity to different kinds of instance groups (On-Demand/Spot). For example, Critical pipelines (latency sensitive) run on On-Demand worker node groups and Non-critical pipelines (latency tolerant) on Spot instance worker node groups.

We use priority class as a method of preemption, as well as a node affinity that chooses a certain priority pipeline for the node group to deploy to.


With spot instances running we realised a need to make our cluster quickly respond to failures. We wanted to achieve quick rescheduling of evicted pods, hence we added overprovisioning to our cluster. This means we keep some noop pods occupying free space running in our worker node groups for the quick scheduling of evicted or deploying pods.

The overprovisioned pods are the lowest priority pods, thus can be preempted by any pod waiting in the queue for scheduling. We used cluster proportional autoscaler to decide the right number of these overprovisioned pods, which scales up and down proportionally to cluster size (i.e number of nodes and CPU in worker node group). This relieves us from tuning the number of these noop pods as the cluster scales up or down over the period keeping the free space proportional to current cluster capacity.

Lastly, overprovisioning also helped improve the deployment time because there is no  dependency on the time required for Auto Scaling Groups (ASG) to add a new node to the cluster every time we want to deploy a new application.

Future Improvements

Evolution is an ongoing process. In the next few months, we plan to work on custom resources for combining VPA and fixed deployment size. Our current architecture setup works fine for now, but we would like to create a more tuneable in-house CRD(Custom Resource Definition) for VPA that incorporates rightsizing our Kubernetes deployment horizontally.

Authored By Shubham Badkur on behalf of the Coban team at Grab – Ryan Ooi, Karan Kamath, Hui Yang, Yuguang Xiao, Jump Char, Jason Cusick, Shrinand Thakkar, Dean Barlan, Shivam Dixit, Andy Nguyen, and Ravi Tandon.

Join us

Grab is more than just the leading ride-hailing and mobile payments platform in Southeast Asia. We use data and technology to improve everything from transportation to payments and financial services across a region of more than 620 million people. We aspire to unlock the true potential of Southeast Asia and look for like-minded individuals to join us on this ride.

If you share our vision of driving South East Asia forward, apply to join our team today.

Go Modules- A guide for monorepos (Part 2)

Post Syndicated from Grab Tech original https://engineering.grab.com/go-module-a-guide-for-monorepos-part-2

This is the second post on the Go module series, which highlights Grab’s experience working with Go modules in a multi-module monorepo. In this article, we’ll focus on suggested solutions for catching unexpected changes to the go.mod file and addressing dependency issues. We’ll also cover automatic upgrades and other learnings uncovered from the initial obstacles in using Go modules.

Vendoring process issues

Our previous vendoring process fell solely on the developer who wanted to add or update a dependency. However, it was often the case that the developer came across many unexpected changes, due to previous vendoring attempts, accidental imports and changes to dependencies.

The developer would then have to resolve these issues before being able to make a change, costing time and causing frustration with the process. It became clear that it wasn’t practical to expect the developer to catch all of the potential issues while vendoring, especially since Go modules itself was new and still in development.

Avoiding unexpected changes

Reluctantly, we added a check to our CI process which ran on every merge request. This helped ensure that there are no unexpected changes required to go mod. This added time to every build and often flagged a failure, but it saved a lot of post-merge hassle. We then realized that we should have done this from the beginning.

Since we hadn’t enabled Go modules for builds yet, we couldn’t rely on the \mod=readonly flag. We implemented the check by running go mod vendor and then checking the resulting difference.

If there were any changes to go.mod or the vendor directory, the merge request would get rejected. This worked well in ensuring the integrity of our go.mod.

Roadblocks and learnings

However, as this was the first time we were using Go modules on our CI system, it uncovered some more problems.

Private repository access

There was the problem of accessing private repositories. We had to ensure that the CI system was able to clone all of our private repositories as well as the main monorepo, by adding the relevant SSH deploy keys to the repository.

False positives

The check sometimes fired false positives – detecting a go mod failure when there were no changes. This was often due to network issues, especially when the modules are hosted by less reliable third-party servers. This is somewhat solved in Go 1.13 onwards with the introduction of proxy servers, but our workaround was simply to retry the command several times.

We also avoided adding dependencies hosted by a domain that we haven’t seen before, unless absolutely necessary.

Inconsistent Go versions

We found several inconsistencies between Go versions – running go mod vendor on one Go version gave different results to another. One example was a change to the checksums. These inconsistencies are less common now, but still remain between Go 1.12 and later versions. The only solution is to stick to a single version when running the vendoring process.

Automated upgrades

There are benefits to using Go modules for vendoring. It’s faster than previous solutions, better supported by the community and it’s part of the language, so it doesn’t require any extra tools or wrappers to use it.

One of the most useful benefits from using Go modules is that it enables automated upgrades of dependencies in the go.mod file – and it becomes more useful as more third-party modules adopt Go modules and semantic versioning.

Automated updates workflow
Automated updates workflow

We call our solution for automating updates at Grab the AutoVend Bot. It is built around a single Go command, go list -m -u all, which finds and lists available updates to the dependencies listed in go.mod (add \json for JSON output). We integrated the bot with our development workflow and change-request system to take the output from this command and create merge requests automatically, one per update.

Once the merge request is approved (by a human, after verifying the test results), the bot would push the change. We have hundreds of dependencies in our main monorepo module, so we’ve scheduled it to run a small number each day so we’re not overwhelmed.

By reducing the manual effort required to update dependencies to almost nothing, we have been able to apply hundreds of updates to our dependencies, and ensure our most critical dependencies are on the latest version. This not only helps keep our dependencies free from bugs and security flaws, but it makes future updates far easier and less impactful by reducing the set of changes needed.

In Summary

Using Go modules for vendoring has given us valuable and low-risk exposure to the feature. We have been able to detect and solve issues early, without affecting our regular builds, and develop tooling that’ll help us in future.

Although Go modules is part of the standard Go toolchain, it shouldn’t be viewed as a complete off the shelf solution that can be dropped into a codebase, especially a monorepo.

Like many other Go tools, the Modules feature comprises many small, focused tools that work best when combined together with other code. By embracing this concept and leveraging things like go list, go mod graph and go mod vendor, Go modules can be made to integrate into existing workflows, and deliver the benefits of structured versioning and reproducible builds.

I hope you have enjoyed this article on using Go modules and vendoring within a monorepo.

Join us

Grab is more than just the leading ride-hailing and mobile payments platform in Southeast Asia. We use data and technology to improve everything from transportation to payments and financial services across a region of more than 620 million people. We aspire to unlock the true potential of Southeast Asia and look for like-minded individuals to join us on this ride.

If you share our vision of driving South East Asia forward, apply to join our team today.


The cute Go gopher logo for this blog’s cover image was inspired by Renee French’s original work.

Go Modules- A guide for monorepos (Part 1)

Post Syndicated from Grab Tech original https://engineering.grab.com/go-module-a-guide-for-monorepos-part-1

Go modules are a new feature in Go for versioning packages and managing dependencies. It has been almost 2 years in the making, and it’s finally production-ready in the Go 1.14 release early this year. Go recommends using single-module repositories by default, and warns that multi-module repositories require great care.

At Grab, we have a large monorepo and changing from our existing monorepo structure has been an interesting and humbling adventure. We faced serious obstacles to fully adopting Go modules. This series of articles describes Grab’s experience working with Go modules in a multi-module monorepo, the challenges we faced along the way, and the solutions we came up with.

To fully appreciate Grab’s journey in using Go Modules, it’s important to learn about the beginning of our vendoring process.

Native support for vendoring using the vendor folder

With Go 1.5 came the concept of the vendor folder, a new package discovery method, providing native support for vendoring in Go for the first time.

With the vendor folder, projects influenced the lookup path simply by copying packages into a vendor folder nested at the project root. Go uses these packages before traversing the GOPATH root, which allows a monorepo structure to vendor packages within the same repo as if they were 3rd-party libraries. This enabled go build to work consistently without any need for extra scripts or env var modifications.

Initial obstacles

There was no official command for managing the vendor folder, and even copying the files in the vendor folder manually was common.

At Grab, different teams took different approaches. This meant that we had multiple version manifests and lock files for our monorepo’s vendor folder. It worked fine as long as there were no conflicts. At this time very few 3rd-party libraries were using proper tagging and semantic versioning, so it was worse because the lock files were largely a jumble of commit hashes and timestamps.

Jumbled commit hashes and timestamps
Jumbled commit hashes and timestamps

As a result of the multiple versions and lock files, the vendor directory was not reproducible, and we couldn’t be sure what versions we had in there.

Temporary relief

We eventually settled on using Glide, and standardized our vendoring process. Glide gave us a reproducible, verifiable vendor folder for our dependencies, which worked up until we switched to Go modules.

Vendoring using Go modules

I first heard about Go modules from Russ Cox’s talk at GopherCon Singapore in 2018, and soon after started working on adopting modules at Grab, which was to manage our existing vendor folder.

This allowed us to align with the official Go toolchain and familiarise ourselves with Go modules while the feature matured.

Switching to go mod

Go modules introduced a go mod vendor command for exporting all dependencies from go.mod into vendor. We didn’t plan to enable Go modules for builds at this point, so our builds continued to run exactly as before, indifferent to the fact that the vendor directory was created using go mod.

The initial task to switch to go mod vendor was relatively straightforward as listed here:

  1. Generated a go.mod file from our glide.yaml dependencies. This was scripted so it could be kept up to date without manual effort.
  2. Replaced the vendor directory.
  3. Committed the changes.
  4. Used go mod instead of glide to manage the vendor folder.

The change was extremely large (due to differences in how glide and go mod handled the pruning of unused code), but equivalent in terms of Go code. However, there were some additional changes needed besides porting the version file.

Addressing incompatible dependencies

Some of our dependencies were not yet compatible with Go modules, so we had to use Go module’s replace directive to substitute them with a working version.

A more complex issue was that parts of our codebase relied on nested vendor directories, and had dependencies that were incompatible with the top level. The go mod vendor command attempts to include all code nested under the root path, whether or not they have used a sub-vendor directory, so this led to conflicts.

Problematic paths

Rather than resolving all the incompatibilities, which would’ve been a major undertaking in the monorepo, we decided to exclude these paths from Go modules instead. This was accomplished by placing an empty go.mod file in the problematic paths.

Nested modules

The empty go.mod file worked. This brought us to an important rule of Go modules, which is central to understanding many of the issues we encountered:

A module cannot contain other modules

This means that although the modules are within the same repository, Go modules treat them as though they are completely independent. When running go mod commands in the root of the monorepo, Go doesn’t even ‘see’ the other modules nested within.

Tackling maintenance issues

After completing the initial migration of our vendor directory to go mod vendor however, it opened up a different set of problems related to maintenance.

With Glide, we could guarantee that the Glide files and vendor directory would not change unless we deliberately changed them. This was not the case after switching to Go modules; we found that the go.mod file frequently required unexpected changes to keep our vendor directory reproducible.

There are two frequent cases that cause the go.mod file to need updates: dependency inheritance and implicit updates.

Dependency inheritance

Dependency inheritance is a consequence of Go modules version selection. If one of the monorepo’s dependencies uses Go modules, then the monorepo inherits those version requirements as well.

When starting a new module, the default is to use the latest version of dependencies. This was an issue for us as some of our monorepo dependencies had not been updated for some time. As engineers wanted to import their module from the monorepo, it caused go mod vendor to pull in a huge amount of updates.

To solve this issue, we wrote a quick script to copy the dependency versions from one module to another.

One key learning here is to have other modules use the monorepo’s versions, and if any updates are needed then the monorepo should be updated first.

Implicit updates

Implicit updates are a more subtle problem. The typical Go modules workflow is to use standard Go commands: go build, go test, and so on, and they will automatically update the go.mod file as needed. However, this was sometimes surprising, and it wasn’t always clear why the go.mod file was being updated. Some of the reasons we found were:

  • A new import was added by mistake, causing the dependency to be added to the go.mod file
  • There is a local replace for some module B, and B changes its own go.mod. When there’s a local replace, it bypasses versioning, so the changes to B’s go.mod are immediately inherited.
  • The build imports a package from a dependency that can’t be satisfied with the current version, so Go attempts to update it.

This means that simply creating a tag in an external repository is sometimes enough to affect the go.mod file, if you already have a broken import in the codebase.

Resolving unexpected dependencies using graphs

To investigate the unexpected dependencies, the command go mod graph proved the most useful.

Running graph with good old grep was good enough, but its output is also compatible with the digraph tool for more sophisticated queries. For example, we could use the following command to trace the source of a dependency on cloud.google.com/go:

$ go mod graph | digraph somepath grab.com/example cloud.google.com/[email protected]

github.com/hashicorp/vault/[email protected] github.com/hashicorp/vault/[email protected]

github.com/hashicorp/vault/[email protected] google.golang.org/[email protected]

google.golang.org/[email protected] google.golang.org/[email protected]

google.golang.org/[email protected] cloud.google.com/[email protected]
Diagram generated using modgraphviz
Diagram generated using modgraphviz

Stay tuned for more

I hope you have enjoyed this article. In our next post, we’ll cover the other solutions we have for catching unexpected changes to the go.mod file and addressing dependency issues.

Join us

Grab is more than just the leading ride-hailing and mobile payments platform in Southeast Asia. We use data and technology to improve everything from transportation to payments and financial services across a region of more than 620 million people. We aspire to unlock the true potential of Southeast Asia and look for like-minded individuals to join us on this ride.

If you share our vision of driving South East Asia forward, apply to join our team today.


The cute Go gopher logo for this blog’s cover image was inspired by Renee French’s original work.

Plumbing At Scale

Post Syndicated from Grab Tech original https://engineering.grab.com/plumbing-at-scale

When you open the Grab app and hit book, a series of events are generated that define your personalised experience with us: booking state machines kick into motion, driver partners are notified, reward points are computed, your feed is generated, etc. While it is important for you to know that a request has been received, a lot happens asynchronously in our back-end services.

As custodians and builders of the streaming platform at Grab operating at massive scale (think terabytes of data ingress each hour), the Coban team’s mission is to provide a NoOps, managed platform for seamless, secure access to event streams in real-time, for every team at Grab.

Coban Sewu Waterfall In Indonesia
Coban Sewu Waterfall In Indonesia. (Streams, get it?)

Streaming systems are often at the heart of event-driven architectures, and what starts as a need for a simple message bus for asynchronous processing of events quickly evolves into one that requires a more sophisticated stream processing paradigms.
Earlier this year, we saw common patterns of event processing emerge across our Go backend ecosystem, including:

  • Filtering and mapping stream events of one type to another
  • Aggregating events into time windows and materializing them back to the event log or to various types of transactional and analytics databases

Generally, a class of problems surfaced which could be elegantly solved through an event sourcing1 platform with a stream processing framework built over it, similar to the Keystone platform at Netflix2.

This article details our journey building and deploying an event sourcing platform in Go, building a stream processing framework over it, and then scaling it (reliably and efficiently) to service over 300 billion events a week.

Event Sourcing

Event sourcing is an architectural pattern where changes to an application state are stored as a sequence of events, which can be replayed, recomputed, and queried for state at any time. An implementation of the event sourcing pattern typically has three parts to it:

  • An event log
  • Processor selection logic: The logic that selects which chunk of domain logic to run based on an incoming event
  • Processor domain logic: The domain logic that mutates an application’s state
Event Sourcing
Event Sourcing

Event sourcing is a building block on which architectural patterns such as Command Query Responsibility Segregation3, serverless systems, and stream processing pipelines are built.

The Case For Stream Processing

Below are some use cases serviced by stream processing, built on event sourcing.

Asynchronous State Management

A pub-sub system allows for change events from one service to be fanned out to multiple interested subscribers without letting any one subscriber block the progress of others. Abstracting the event log and centralising it democratises access to this log to all back-end services. It enables the back-end services to apply changes from this centralised log to their own state, independent of downstream services, and/or publish their state changes to it.

Time Windowed Aggregations

Time-windowed aggregates are a common requirement for machine learning models (as features) as well as analytics. For example, personalising the Grab app landing page requires counting your interaction with various widget elements in recent history, not any one event in particular. Similarly, an analyst may not be interested in the details of a singular booking in real-time, but in building demand heatmaps segmented by geohashes. For latency-sensitive lookups, especially for the personalisation example, pre-aggregations are preferred instead of post-aggregations.

Stream Joins, Filtering, Mapping

Event logs are typically sharded by some notion of topics to logically divide events of interest around a theme (booking events, profile updates, etc.). Building bigger topics out of smaller ones, as well as smaller ones from bigger ones are common ways to compose “substreams” of the log of interest directed towards specific services. For example, a promo service may only be interested in listening to booking events for promotional bookings.

Realtime Business Intelligence

Outputs of stream processing workloads are also plugged into realtime Business Intelligence (BI) and stream analytics solutions upstream, as raw data for visualizations on operations dashboards.


For offline analytics, as well as reconciliation and disaster recovery, having an archive in a cold store helps for certain mission critical streams.

Platform Requirements

Any processing platform for event sourcing and stream processing has certain expectations around its functionality.

Scaling and Elasticity

Stream/Event Processing pipelines need to be elastic and responsive to changes in traffic patterns, especially considering that user activity (rides, food, deliveries, payments) varies dramatically during the course of a day or week. A spike in food orders on rainy days shouldn’t cause indefinite order processing latencies.


For a platform team, it’s important that users can easily onboard and manage their pipeline lifecycles, at their preferred cadence. To scale effectively, the process of scaffolding, configuring, and deploying pipelines needs to be standardised, and infrastructure managed. Both the platform and users are able to leverage common standards of telemetry, configuration, and deployment strategies, and users benefit from a lack of infrastructure management overhead.


Our platform has quickly scaled to support hundreds of pipelines. Workload isolation, independent processing uptime guarantees, and resource allocation and cost audit are important requirements necessitating multi-tenancy, which help amortize platform overhead costs.


Whether latency sensitive or latency tolerant, all workloads have certain expectations on processing uptime. From a user’s perspective, there must be guarantees on pipeline uptimes and data completeness, upper bounds on processing delays, instrumentation for alerting, and self-healing properties of the platform for remediation.

Tunable Tradeoffs

Some pipelines are latency sensitive, and rely on processing completeness seconds after event ingress. Other pipelines are latency tolerant, and can tolerate disruption to processing lasting in tens of minutes. A one size fits all solution is likely to be either cost inefficient or unreliable. Having a way for users to make these tradeoffs consciously becomes important for ensuring efficient processing guarantees at a reasonable cost. Similarly, in the case of upstream failures or unavailability, being able to tune failure modes (like wait, continue, or retry) comes in handy.

Stream Processing Framework

While basic event sourcing covers simple use cases like archival, more complicated ones benefit from a common framework that shifts the mental model for processing from per event processing to stream pipeline orchestration.
Given that Go is a “paved road” for back-end development at Grab, and we have service code and bindings for streaming data in a mono-repository, we built a Go framework with a subset of capabilities provided by other streaming frameworks like Flink4.

Logic Blocks In A Stream Processing Pipeline
Logic Blocks In A Stream Processing Pipeline


Some capabilities built into the framework include:

  • Deduplication: Enables pipelines to idempotently reprocess data in case of rewinds/replays, and provides some processing guarantees within a time window for certain use cases including sinking to datastores.
  • Filtering and Mapping: An ability to filter a source stream data and map them onto target streams.
  • Aggregation: An ability to generate and execute aggregation logic such as sum, avg, max, and min in a window.
  • Windowing: An ability to window processing into tumbling, sliding, and session windows.
  • Join: An ability to combine two streams together with certain join keys in a window.
  • Processor Chaining: Various functionalities can be chained to build more complicated pipelines from simpler ones. For example: filter a large stream into a smaller one, aggregate it over a time window, and then map it to a new stream.
  • Rewind: The ability to rewind the processing logic by a few hours through configuration.
  • Replay: The ability to replay archived data into the same or a separate pipeline via configuration.
  • Sinks: A number of connectors to standard Grab stores are provided, with concerns of auth, telemetry, etc. managed in the runtime.
  • Error Handling: Providing an easy way to indicate whether to wait, skip, and/or retry in case of upstream failures is an important tuning parameter that users need for making sensible tradeoffs in dimensions of backpressure, latency, correctness, etc.


Coban Platform
Coban Platform

Our event log is primarily a bunch of critical Kafka clusters, which are being polled by various pipelines deployed by service teams on the platform for incoming events. Each pipeline is an isolated deployment, has an identity, and the ability to connect to various upstream sinks to materialise results into, including the event log itself.
There is also a metastore available as an intermediate store for processing pipelines, so the pipelines themselves are stateless with their lifecycle completely beholden to the whims of their owners.

Anatomy of a Processing Pipeline

Anatomy Of A Stream Processing Pod
Anatomy Of A Stream Processing Pod

Anatomy of a Stream Processing Pod
Each stream processing pod (the smallest unit of a pipeline’s deployment) has three top level components:

  • Triggers: An interface that connects directly to the source of the data and converts it into an event channel.
  • Runtime: This is the app’s entrypoint and the orchestrator of the pod. It manages the worker pools, triggers, event channels, and lifecycle events.
  • The Pipeline plugin: The plugin is provided by the user, and conforms to a contract that the platform team publishes. It contains the domain logic for the pipeline and houses the pipeline orchestration defined by a user based on our Stream Processing Framework.

Deployment Infrastructure

Our deployment infrastructure heavily leverages Kubernetes on AWS. After a (pretty high) initial cost for infrastructure set up, we’ve found scaling to hundreds of pipelines a breeze with the Kubernetes provided controls. We package our stateless pipeline workloads into Kubernetes deployments, with each pod containing a unit of a stream pipeline, with sidecars that integrate them with our monitoring systems. Other cluster wide tooling deployed (usually as DaemonSets) deal with metric collection, log ingestion, and autoscaling. We currently use the Horizontal Pod Autoscaler5 to manage traffic elasticity, and the Cluster Autoscaler6 to manage worker node scaling.

A Typical Kubernetes Set Up On AWS


Some pipelines require storage for use cases ranging from deduplication to stores for materialised results of time windowed aggregations. All our pipelines have access to clusters of ScyllaDB instances (which we use as our internal store), made available to pipeline authors via interfaces in the Stream Processing Framework. Results of these aggregations are then made available to backend services via our GrabStats service, which is a thin query layer over the latest pipeline results.

Compute Isolation

A nice property of packaging pipelines as Kubernetes deployments is a good degree of compute workload isolation for pipelines. While node resources of pipeline pods are still shared (and there are potential noisy neighbour issues on matters like logging throughput), the pipeline pods of various pods can be scheduled and rescheduled across a wide range of nodes safely and swiftly, with minimal impact to pods of other pipelines.


Stateless processing pods mean we can set up backup or redundant Kubernetes clusters in hot-hot, hot-warm, or hot-cold modes. We use this to ensure high processing availability despite limited control plane guarantees from any single cluster. (Since EKS SLAs for the Kubernetes control plane guarantee only 99.9% uptime today7.) Transparent to our users, we make the deployment systems aware of multiple available targets for scheduling.

Availability vs Cost

As alluded to in the “Platform Requirements” section, having a way of trading off availability for cost becomes important where the requirements and criticality of each processing pipeline are very different. Given that AWS spot instances are a lot cheaper8 than on-demand ones, we use user annotated Kubernetes priority classes to determine deployment targets for pipelines. For latency tolerant pipelines, we schedule them on Spot instances which are routinely between 40-90% cheaper than on demand instances on which latency sensitive pipelines run. The caveat is that Spot instances occasionally disappear, and these workloads are disrupted until a replacement node for their scheduling can be found.

What’s Next?

  • Expand the ecosystem of triggers to support custom sources of data i.e. the “event log”, as well as push based (RPC driven) versus just pull based triggers
  • Build a control plane for API integration with pipeline lifecycle management
  • Move some workloads to use the Vertical Pod Autoscaler9 in Kubernetes instead of horizontal scaling, as most of our workloads have a limit on parallelism (which is their partition count in Kafka topics)
  • Move from Go plugins for pipelines to plugins over RPC, like what HashiCorp does10, to enable processing logic in non-Go languages.
  • Use either pod gossip or a service mesh with a control plane to set up quotas for shared infrastructure usage per pipeline. This is to protect upstream dependencies and the metastore from surges in event backlogs.
  • Improve availability guarantees for pipeline pods by occasionally redistributing/rescheduling pods across nodes in our Kubernetes cluster to prevent entire workloads being served out of a few large nodes.

Authored By Karan Kamath on behalf of the Coban team at Grab-
Zezhou Yu, Ryan Ooi, Hui Yang, Yuguang Xiao, Ling Zhang, Roy Kim, Matt Hino, Jump Char, Lincoln Lee, Jason Cusick, Shrinand Thakkar, Dean Barlan, Shivam Dixit, Shubham Badkur, Fahad Pervaiz, Andy Nguyen, Ravi Tandon, Ken Fishkin, and Jim Caputo.


Coban Sewu Waterfall Photo by Dwinanda Nurhanif Mujito on Unsplash

Cover Photo by tian kuan on Unsplash

  1. https://martinfowler.com/eaaDev/EventSourcing.html 

  2. https://medium.com/netflix-techblog/keystone-real-time-stream-processing-platform-a3ee651812a 

  3. https://martinfowler.com/bliki/CQRS.html 

  4. https://flink.apache.org 

  5. https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ 

  6. https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler 

  7. https://aws.amazon.com/eks/sla/ 

  8. https://aws.amazon.com/ec2/pricing/ 

  9. https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler 

  10. https://github.com/hashicorp/go-plugin 

How we implemented domain-driven development in Golang

Post Syndicated from Grab Tech original https://engineering.grab.com/domain-driven-development-in-golang

Partnerships have always been core to Grab’s super app strategy. We believe in collaborating with partners who are the best in what they do – combining their expertise with what we’re good at so that we can bring high-quality new services to our customers, at the same time create new opportunities for the merchant and driver-partners in our ecosystem.

That’s why we launched GrabPlatform last year. To make it easier for partners to either integrate Grab into their services, or integrate their services into Grab.

In view of that, part of the GrabPlatform’s team mission is to make it easy for partners to integrate with Grab services. These partners are external companies that would like to offer Grab’s services such as ride-booking through their own websites or applications. To do that, we decided to build a website that will serve as a one-stop-shop that would allow them to self-service these integrations.

The challenges we faced with the conventional approach

In the process of building this website, our team noticed that the majority of the functions and responsibilities were added to files without proper segregation. A single file would contain more than 500 lines of code. Each of these files were imported from different collections of source codes, resulting in an unstructured codebase. Any changes to the existing functions risked breaking existing functionality; we realized then that we needed to proactively plan for the future. Hence, we decided to use the principles of Domain-Driven Design (DDD) and idiomatic Go. This blog aims to demonstrate the process of how we leveraged those concepts to design a modern application.

How we implemented DDD in our codebase

Here’s how we went about solving our unstructured codebase using DDD principles.

Step 1: Gather domain (business) knowledge

We collaborated closely with our domain experts (in our case, this was our product team) to identify functionality and flow. From them, we discovered the following key points:

  • After creating a project, developers are added to the project.
  • The domain experts wanted an ability to add other products (e.g. Pricing service, ETA service, GrabPay service) to their projects.
  • They wanted the ability to create multiple authentication clients to access the above products.

Step 2: Break down domain knowledge into bounded context

Now that we had gathered the required domain knowledge (i.e. what our code needed to reflect to our partners), it was time to use the DDD strategic tool Bounded Context to break down problems into subcontexts. Here is a graphical representation of how we converted the problem into smaller units.

Bounded Context

We identified several dependencies on each of the units involved in the project. Take some of these examples:

  • The project domain overlapped with the product and developer domains.
  • Our RideBooking project can only exist if it has some products like Ridebooking APIs and not the other way around.

What this means is a product can exist independent of the project, but a project will have no significance without any product. In the same way, a project is dependent on the developers, but developers can exist whether or not they belong to a project.

Step 3: Identify value objects or entity (lowest layer)

Looking at the above bounded contexts, we figured out the building blocks (i.e. value objects or entity) to break down the above functionality and flow.

// ProjectDAO ...
type ProjectDAO struct {
  ID            int64
  UUID          string
  Status        ProjectStatus
  CreatedAt     time.Time

// DeveloperDAO ...
type DeveloperDAO struct {
  ID            int64
  UUID          string
  PhoneHash     *string
  Status        Status
  CreatedAt     time.Time

// ProductDAO ...
type ProductDAO struct {
  ID            int64
  UUID          string
  Name          string
  Description   *string
  Status        ProductStatus
  CreatedAt     time.Time

// DeveloperProjectDAO to map developer's to a project
type DeveloperProjectDAO struct {
  ID            int64
  DeveloperID   int64
  ProjectID     int64
  Status        DeveloperProjectStatus

// ProductProjectDAO to map product's to a project
type ProductProjectDAO struct {
  ID            int64
  ProjectID     int64
  ProductID     int64
  Status        ProjectProductStatus

All the objects shown above have ID as a field and can be identifiable, hence they are identified as entities and not as value objects. But if we apply domain knowledge, DeveloperProjectDAO and ProductProjectDAO are actually not independent entities. Project object is the aggregate root since it must exist before the child fields, DevProjectDAO and ProdcutProjectDAO, can exist.

Step 4: Create the repositories

As stated above, we created an interface to abstract the working logic of a particular domain (i.e. Repository). Here is an example of how we designed the repositories:

// ProductRepositoryImpl responsible for product functionality
type ProductRepositoryImpl struct {
  productDao storage.IProductDao // private field

type ProductRepository interface {
  GetProductsByIDs(ctx context.Context, ids []int64) ([]IProduct, error)

// DeveloperRepositoryImpl
type DeveloperRepositoryImpl struct {
  developerDAO storage.IDeveloperDao // private field

type DeveloperRepository interface {
  FindActiveAllowedByDeveloperIDs(ctx context.Context, developerIDs []interface{}) ([]*Developer, error)
  GetDeveloperDetailByProfile(ctx context.Context, developerProfile *appdto.DeveloperProfile) (IDeveloper, error)

Here is a look at how we designed our repository for aggregate root project:

// Unexported Struct
type productProjectRepositoryImpl struct {
  productProjectDAO storage.IProjectProductDao // private field

type ProductProjectRepository interface {
  GetAllProjectProductByProjectID(ctx context.Context, projectID int64) ([]*ProjectProduct, error)

// Unexported Struct
type developerProjectRepositoryImpl struct {
  developerProjectDAO storage.IDeveloperProjectDao // private field

type DeveloperProjectRepository interface {
  GetDevelopersByProjectIDs(ctx context.Context, projectIDs []interface{}) ([]*DeveloperProject, error)
  UpdateMappingWithRole(ctx context.Context, developer IDeveloper, project IProject, role string) (*DeveloperProject, error)

// Unexported Struct
type projectRepositoryImpl struct {
  projectDao storage.IProjectDao // private field

type ProjectRepository interface {
  GetProjectsByIDs(ctx context.Context, projectIDs []interface{}) ([]*Project, error)
  GetActiveProjectByUUID(ctx context.Context, uuid string) (IProject, error)
  GetProjectByUUID(ctx context.Context, uuid string) (*Project, error)

type ProjectAggregatorImpl struct {
  projectRepositoryImpl           // private field
  developerProjectRepositoryImpl  // private field
  productProjectRepositoryImpl    // private field

type ProjectAggregator interface {
  GetProjects(ctx context.Context) ([]*dto.Project, error)
  AddDeveloper(ctx context.Context, request *appdto.AddDeveloperRequest) (*appdto.AddDeveloperResponse, error)
  GetProjectWithProducts(ctx context.Context, uuid string) (IProject, error)

Step 5: Identify Domain Events

The functions described in Step 4 only returns the ID of the developer and product, which conveys no information to the users. In order to provide developer and product information, we use the domain-event technique to return the actual product and developer attributes.

A domain event is something that happened in a bounded context that you want another context of a domain to be aware of. For example, if there are new updates to the developer domain, it’s important to convey these updates to the project domain. This propagation technique is termed as domain event. Domain events enable independence between different classes.

One way to implement it is seen here:

// file: project\_aggregator.go
func (p *ProjectAggregatorImpl) GetProjects(ctx context.Context) ([]*dto.Project, error) {
  developers := p.EventHandler.Handle(DomainEvent.FindDeveloperByDeveloperIDs{DeveloperIDs})

// file: event\_type.go
type FindDeveloperByDeveloperIDs struct{ developerID []interface{} }

// file: event\_handler.go
func (e *EventHandler) Handle(event interface{}) interface{} {
  switch op := event.(type) {
      case FindDeveloperByDeveloperIDs:
            developers, _ := e.developerRepository.FindDeveloperByDeveloperIDs(op.developerIDs)
            return developers
      case ....
Domain Event

Some common mistakes to avoid when implementing DDD in your codebase:

  • Not engaging with domain experts. Not interacting with domain experts is a common mistake when using DDD. Talking to domain experts to get an understanding of the problem domain from their perspective is at the core of DDD. Starting with schemas or data modelling instead of talking to domain experts may create code based on a relational model instead of it built around a domain model.
  • Ignoring the language of the domain experts. Creating a ubiquitous language shared with domain experts is also a core DDD practice. This common language must be used in all discussions as well as in the code, e.g. in class and method names.
  • Not identifying bounded contexts. A common approach to solving a complex problem is breaking it down into smaller parts. Creating bounded contexts is breaking down a large domain into smaller ones, each handling one cohesive part of the domain.
  • Using an anaemic domain model. This is a common sign that a team is not doing DDD and often a symptom of a failure in the modelling process. At first, an anaemic domain model often looks like a real domain model with correct names, but the classes lack functionalities. They contain only the Get and Set methods.

How the DDD model improved our software development

Thanks to this brand new clean up, we achieved the following:

  • Core functionalities are evenly distributed to the overall codebase and not limited to just a few files.
  • The developers are aware of what each folder is responsible for by simply looking at the file naming and folder structure.
  • The risk of breaking major functionalities by merely making small changes is greatly reduced. Changing a feature is now more efficient.

The team now finds the code well structured and we require less hand-holding for onboarders, thanks to the simplicity of the structure.

Finally, the most important thing, we now have a system oriented towards our business necessities. Everyone ends up using the same language and terms. Developers communicate better with the business team. The work is more efficient when it comes to establishing solutions for the models that reflect how the business operates, instead of how the software operates.

Lessons Learnt

  • Use DDD to collaborate among all project disciplines (product, business, partner, and so on) and clearly understand the business requirements.
  • Establish a ubiquitous language to discuss domain-related concepts.
  • Use bounded contexts to break down complex domains into manageable parts.
  • Implement a layered architecture (i.e. DDD building blocks) to focus on particular aspects of the application.
  • To simplify your dependency, use domain event to communicate with sub-bounded context.

Preventing Pipeline Calls from Crashing Redis Clusters

Post Syndicated from Grab Tech original https://engineering.grab.com/preventing-pipeline-calls-from-crashing-redis-clusters


On Feb 15th, 2019, a slave node in Redis, an in-memory data structure storage, failed requiring a replacement. During this period, roughly only 1 in 21 calls to Apollo, a primary transport booking service, succeeded. This brought Grab rides down significantly for the one minute it took the Redis Cluster to self-recover. This behavior was totally unexpected and completely breached our intention of having multiple replicas.

This blog post describes Grab’s outage post-mortem findings.

Understanding the infrastructure

With Grab’s continuous growth, our services must handle large amounts of data traffic involving high processing power for reading and writing operations. To address this significant growth, reduce handler latency, and improve overall performance, many of our services use Redis – a common in-memory data structure storage – as a cache, database, or message broker. Furthermore, we use a Redis Cluster, a distributed implementation of Redis, for shorter latency and higher availability.

Apollo is our driver-side state machine. It is on almost all requests’ critical path and is a primary component for booking transport and providing great service for customer bookings. It stores individual driver availability in an AWS ElastiCache Redis Cluster, letting our booking service efficiently assign jobs to drivers. It’s critical to keep Apollo running and available 24/7.

Apollo's infrastructure

Because of Apollo’s significance, its Redis Cluster has 3 shards each with 2 slaves. It hashes all keys and, according to the hash value, divides them into three partitions. Each partition has two replications to increase reliability.

We use the Go-Redis client, a popular Redis library, to direct all written queries to the master nodes (which then write to their slaves) to ensure consistency with the database.

Master and slave nodes in the Redis Cluster

For reading related queries, engineers usually turn on the ReadOnly flag and turn off the RouteByLatency flag. These effectively turn on ReadOnlyFromSlaves in the Grab gredis3 library, so the client directs all reading queries to the slave nodes instead of the master nodes. This load distribution frees up master node CPU usage.

Client reading and writing from/to the Redis Cluster

When designing a system, we consider potential hardware outages and network issues. We also think of ways to ensure our Redis Cluster is highly efficient and available; setting the above-mentioned flags help us achieve these goals.

Ideally, this Redis Cluster configuration would not cause issues even if a master or slave node breaks. Apollo should still function smoothly. So, why did that February Apollo outage happen? Why did a single down slave node cause a 95+% call failure rate to the Redis Cluster during the dim-out time?

Let’s start by discussing how to construct a local Redis Cluster step by step, then try and replicate the outage. We’ll look at the reasons behind the outage and provide suggestions on how to use a Redis Cluster client in Go.

How to set up a local Redis Cluster

1. Download and install Redis from here.

2. Set up configuration files for each node. For example, in Apollo, we have 9 nodes, so we need to create 9 files like this with different port numbers(x).

// file_name: node_x.conf (do not include this line in file)

port 600x

cluster-enabled yes

cluster-config-file cluster-node-x.conf

cluster-node-timeout 5000

appendonly yes

appendfilename node-x.aof

dbfilename dump-x.rdb

3. Initiate each node in an individual terminal tab with:

$PATH/redis-4.0.9/src/redis-server node_1.conf

4. Use this Ruby script to create a Redis Cluster. (Each master has two slaves.)

$PATH/redis-4.0.9/src/redis-trib.rb create --replicas 2127.0.0.1:6001.....

>>> Performing Cluster Check (using node

M: 7b4a5d9a421d45714e533618e4a2b3becc5f8913

   slots:0-5460 (5461 slots) master

   2 additional replica(s)

S: 07272db642467a07d515367c677e3e3428b7b998

   slots: (0 slots) slave

   replicates 05363c0ad70a2993db893434b9f61983a6fc0bf8

S: 65a9b839cd18dcae9b5c4f310b05af7627f2185b

   slots: (0 slots) slave

   replicates 7b4a5d9a421d45714e533618e4a2b3becc5f8913

M: 05363c0ad70a2993db893434b9f61983a6fc0bf8

   slots:10923-16383 (5461 slots) master

   2 additional replica(s)

S: a78586a7343be88393fe40498609734b787d3b01

   slots: (0 slots) slave

   replicates 72306f44d3ffa773810c810cfdd53c856cfda893

S: e94c150d910997e90ea6f1100034af7e8b3e0cdf

   slots: (0 slots) slave

   replicates 05363c0ad70a2993db893434b9f61983a6fc0bf8

M: 72306f44d3ffa773810c810cfdd53c856cfda893

   slots:5461-10922 (5462 slots) master

   2 additional replica(s)

S: ac6ffbf25f48b1726fe8d5c4ac7597d07987bcd7

   slots: (0 slots) slave

   replicates 7b4a5d9a421d45714e533618e4a2b3becc5f8913

S: bc56b2960018032d0707307725766ec81e7d43d9

   slots: (0 slots) slave

   replicates 72306f44d3ffa773810c810cfdd53c856cfda893

[OK] All nodes agree about slots configuration.

5. Finally, we try to send queries to our Redis Cluster, e.g.

$PATH/redis-4.0.9/src/redis-cli -c -p 6001 hset driverID 100 state available updated_at 11111

What happens when nodes become unreachable?

Redis Cluster Server

As long as the majority of a Redis Cluster’s masters and at least one slave node for each unreachable master are reachable, the cluster is accessible. It can survive even if a few nodes fail.

Let’s say we have N masters, each with K slaves, and random T nodes become unreachable. This algorithm calculates the Redis Cluster failure rate percentage:

if T <= K:
        availability = 100%
        availability = 100% - (1/(N*K - T))

If you successfully built your own Redis Cluster locally, try to kill any node with a simple command-c. The Redis Cluster broadcasts to all nodes that the killed node is now unreachable, so other nodes no longer direct traffic to that port.

If you bring this node back up, all nodes know it’s reachable again. If you kill a master node, the Redis Cluster promotes a slave node to a temp master for writing queries.

$PATH/redis-4.0.9/src/redis-server node_x.conf

With this information, we can’t answer the big question of why a single slave node failure caused an over 95% failure rate in the Apollo outage. Per the above theory, the Redis Cluster should still be 100% available. So, the Redis Cluster server could properly handle an outage, and we concluded it wasn’t the failure rate’s cause. So we looked at the client side and Apollo’s queries.

Golang Redis Cluster Client & Apollo Queries

Apollo’s client side is based on the Go-Redis Library.

During the Apollo outage, we found some code returned many errors during certain pipeline GET calls. When Apollo tried to send a pipeline of HMGET calls to its Redis Cluster, the pipeline returned errors.

First, we looked at the pipeline implementation code in the Go-Redis library. In the function defaultProcessPipeline, the code assigns each command to a Redis node in this line err:=c.mapCmdsByNode(cmds, cmdsMap).

func (c *ClusterClient) mapCmdsByNode(cmds []Cmder, cmdsMap *cmdsMap) error {
state, err := c.state.Get()
        if err != nil {
                setCmdsErr(cmds, err)

        cmdsAreReadOnly := c.cmdsAreReadOnly(cmds)
        for_, cmd := range cmds {
                var node *clusterNode
                var err error
                if cmdsAreReadOnly {
                        _, node, err = c.cmdSlotAndNode(cmd)
                } else {
                        slot := c.cmdSlot(cmd)
                        node, err = state.slotMasterNode(slot)
                if err != nil {
                cmdsMap.m[node] = append(cmdsMap.m[node], cmd)
        return nil

Next, since the readOnly flag is on, we look at the cmdSlotAndNode function. As mentioned earlier, you can get better performance by setting readOnlyFromSlaves to true, which sets RouteByLatency to false. By doing this, RouteByLatency will not take priority and the master does not receive the read commands.

func (c *ClusterClient) cmdSlotAndNode(cmd Cmder) (int, *clusterNode, error) {
        state, err := c.state.Get()
        if err != nil {
                return 0, nil, err

        cmdInfo := c.cmdInfo(cmd.Name())
        slot := cmdSlot(cmd, cmdFirstKeyPos(cmd, cmdInfo))

        if c.opt.ReadOnly && cmdInfo != nil && cmdInfo.ReadOnly {
                if c.opt.RouteByLatency {
                        node, err:= state.slotClosestNode(slot)
                        return slot, node, err

                if c.opt.RouteRandomly {
                        node:= state.slotRandomNode(slot)
                        return slot, node, nil

                node, err:= state.slotSlaveNode(slot)
                return slot, node, err

        node, err:= state.slotMasterNode(slot)
        return slot, node, err

Now, let’s try and better understand the outage.

  1. When a slave becomes unreachable, all commands assigned to that slave node fail.
  2. We found in Grab’s Redis library code that a single error in all cmds could cause the entire pipeline to fail.
  3. In addition, engineers return a failure in their code if err != nil. This explains the high failure rate during the outage.
func (w *goRedisWrapperImpl) getResultFromCommands(cmds []goredis.Cmder) ([]gredisapi.ReplyPair, error) {
        results := make([]gredisapi.ReplyPair, len(cmds))
        var err error
        for idx, cmd := range cmds {
                results[idx].Value, results[idx].Err = cmd.(*goredis.Cmd).Result()
                if results[idx].Err == goredis.Nil {
                        results[idx].Err = nil
                if err == nil && results[idx].Err != nil {
                        err = results[idx].Err

        return results, err

Our next question was, “Why did it take almost one minute for Apollo to recover?”.  The Redis Cluster broadcasts instantly to its other nodes when one node is unreachable. So we looked at how the client assigns jobs.

When the Redis Cluster client loads the node states, it only refreshes the state once a minute. So there’s a maximum one minute delay of state changes between the client and server. Within that minute, the Redis client kept sending queries to that unreachable slave node.

func (c *clusterStateHolder) Get() (*clusterState, error) {
        v := c.state.Load()
        if v != nil {
                state := v.(*clusterState)
                if time.Since(state.createdAt) > time.Minute {
                return state, nil
        return c.Reload()

What happened to the write queries? Did we lose new data during that one min gap? That’s a very good question! The answer is no since all write queries only went to the master nodes and the Redis Cluster client with a watcher for the master nodes. So, whenever any master node becomes unreachable, the client is not oblivious to the change in state and is well aware of the current state. See the Watcher code.

How to use Go Redis safely?

Redis Cluster Client

One way to avoid a potential outage like our Apollo outage is to create another Redis Cluster client for pipelining only and with a true RouteByLatency value. The Redis Cluster determines the latency according to ping calls to its server.

In this case, all pipelining queries would read through the master nodesif the latency is less than 1ms (code), and as long as the majority side of partitions are alive, the client will get the expected results. More load would go to master with this setting, so be careful about CPU usage in the master nodes when you make the change.

Pipeline Usage

In some cases, the master nodes might not handle so much traffic. Another way to mitigate the impact of an outage is to check for  errors on individual queries when errors happen in a pipeline call.

In Grab’s Redis Cluster library, the function Pipeline(PipelineReadOnly) returns a response with an error for individual reply.

func (c *clientImpl) Pipeline(ctx context.Context, argsList [][]interface{}) ([]gredisapi.ReplyPair, error) {
        defer c.stats.Duration(statsPkgName, metricElapsed, time.Now(), c.getTags(tagFunctionPipeline)...)
        pipe := c.wrappedClient.Pipeline()
        cmds := make([]goredis.Cmder, len(argsList))
        for i, args := range argsList {
                cmd := goredis.NewCmd(args...)
                cmds[i] = cmd
                _ = pipe.Process(cmd)
        _, _ = pipe.Exec()
        return c.wrappedClient.getResultFromCommands(cmds)

func (w *goRedisWrapperImpl) getResultFromCommands(cmds []goredis.Cmder) ([]gredisapi.ReplyPair, error) {
        results := make([]gredisapi.ReplyPair, len(cmds))
        var err error
        for idx, cmd := range cmds {
                results[idx].Value, results[idx].Err = cmd.(*goredis.Cmd).Result()
                if results[idx].Err == goredis.Nil {
                        results[idx].Err = nil
                if err == nil && results[idx].Err != nil {
                        err = results[idx].Err

        return results, err

type ReplyPair struct {
        Value interface{}
        Err   error

Instead of returning nil or an error message when err != nil, we could check for errors for each result so successful queries are not affected. This might have minimized the outage’s business impact.

Go Redis Cluster Library

One way to fix the Redis Cluster library is to reload nodes’ status when an error happens.In the go-redis library, defaultProcessor has this logic, which can be applied to defaultProcessPipeline.

In Conclusion

We’ve shown how to build a local Redis Cluster server, explained how Redis Clusters work, and identified its potential risks and solutions. Redis Cluster is a great tool to optimize service performance, but there are potential risks when using it. Please carefully consider our points about how to best use it. If you have any questions, please ask them in the comments section.

[$] A filesystem “change journal” and other topics

Post Syndicated from jake original https://lwn.net/Articles/755277/rss

At the 2017 Linux Storage, Filesystem, and Memory-Management Summit
(LSFMM), Amir Goldstein presented his work
on adding a superblock watch mechanism to provide a scalable way to notify
of changes in a filesystem. At the 2018 edition of LSFMM, he was back to
discuss adding NTFS-like change
to the kernel in support of backup solutions of various
sorts. As a second topic for the session, he also wanted to discuss doing
more performance-regression testing
for filesystems.

EC2 Instance Update – M5 Instances with Local NVMe Storage (M5d)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-instance-update-m5-instances-with-local-nvme-storage-m5d/

Earlier this month we launched the C5 Instances with Local NVMe Storage and I told you that we would be doing the same for additional instance types in the near future!

Today we are introducing M5 instances equipped with local NVMe storage. Available for immediate use in 5 regions, these instances are a great fit for workloads that require a balance of compute and memory resources. Here are the specs:

Instance Name vCPUs RAM Local Storage EBS-Optimized Bandwidth Network Bandwidth
m5d.large 2 8 GiB 1 x 75 GB NVMe SSD Up to 2.120 Gbps Up to 10 Gbps
m5d.xlarge 4 16 GiB 1 x 150 GB NVMe SSD Up to 2.120 Gbps Up to 10 Gbps
m5d.2xlarge 8 32 GiB 1 x 300 GB NVMe SSD Up to 2.120 Gbps Up to 10 Gbps
m5d.4xlarge 16 64 GiB 1 x 600 GB NVMe SSD 2.210 Gbps Up to 10 Gbps
m5d.12xlarge 48 192 GiB 2 x 900 GB NVMe SSD 5.0 Gbps 10 Gbps
m5d.24xlarge 96 384 GiB 4 x 900 GB NVMe SSD 10.0 Gbps 25 Gbps

The M5d instances are powered by Custom Intel® Xeon® Platinum 8175M series processors running at 2.5 GHz, including support for AVX-512.

You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe; this includes the latest Amazon Linux, Microsoft Windows (Server 2008 R2, Server 2012, Server 2012 R2 and Server 2016), Ubuntu, RHEL, SUSE, and CentOS AMIs.

Here are a couple of things to keep in mind about the local NVMe storage on the M5d instances:

Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.

Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.

Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.

Available Now
M5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent M5 instances.



[$] Advanced computing with IPython

Post Syndicated from jake original https://lwn.net/Articles/756192/rss

If you use Python, there’s a good chance you have heard of IPython, which provides an enhanced read-eval-print
loop (REPL) for Python. But there is more to IPython than just a more
convenient REPL. Today’s IPython comes with integrated libraries that turn
it into an assistant for several advanced computing tasks. We will look at
two of those tasks, using multiple languages and distributed computing, in
this article.

Build your own weather station with our new guide!

Post Syndicated from Richard Hayler original https://www.raspberrypi.org/blog/build-your-own-weather-station/

One of the most common enquiries I receive at Pi Towers is “How can I get my hands on a Raspberry Pi Oracle Weather Station?” Now the answer is: “Why not build your own version using our guide?”

Build Your Own weather station kit assembled

Tadaaaa! The BYO weather station fully assembled.

Our Oracle Weather Station

In 2016 we sent out nearly 1000 Raspberry Pi Oracle Weather Station kits to schools from around the world who had applied to be part of our weather station programme. In the original kit was a special HAT that allows the Pi to collect weather data with a set of sensors.

The original Raspberry Pi Oracle Weather Station HAT – Build Your Own Raspberry Pi weather station

The original Raspberry Pi Oracle Weather Station HAT

We designed the HAT to enable students to create their own weather stations and mount them at their schools. As part of the programme, we also provide an ever-growing range of supporting resources. We’ve seen Oracle Weather Stations in great locations with a huge differences in climate, and they’ve even recorded the effects of a solar eclipse.

Our new BYO weather station guide

We only had a single batch of HATs made, and unfortunately we’ve given nearly* all the Weather Station kits away. Not only are the kits really popular, we also receive lots of questions about how to add extra sensors or how to take more precise measurements of a particular weather phenomenon. So today, to satisfy your demand for a hackable weather station, we’re launching our Build your own weather station guide!

Build Your Own Raspberry Pi weather station

Fun with meteorological experiments!

Our guide suggests the use of many of the sensors from the Oracle Weather Station kit, so can build a station that’s as close as possible to the original. As you know, the Raspberry Pi is incredibly versatile, and we’ve made it easy to hack the design in case you want to use different sensors.

Many other tutorials for Pi-powered weather stations don’t explain how the various sensors work or how to store your data. Ours goes into more detail. It shows you how to put together a breadboard prototype, it describes how to write Python code to take readings in different ways, and it guides you through recording these readings in a database.

Build Your Own Raspberry Pi weather station on a breadboard

There’s also a section on how to make your station weatherproof. And in case you want to move past the breadboard stage, we also help you with that. The guide shows you how to solder together all the components, similar to the original Oracle Weather Station HAT.

Who should try this build

We think this is a great project to tackle at home, at a STEM club, Scout group, or CoderDojo, and we’re sure that many of you will be chomping at the bit to get started. Before you do, please note that we’ve designed the build to be as straight-forward as possible, but it’s still fairly advanced both in terms of electronics and programming. You should read through the whole guide before purchasing any components.

Build Your Own Raspberry Pi weather station – components

The sensors and components we’re suggesting balance cost, accuracy, and easy of use. Depending on what you want to use your station for, you may wish to use different components. Similarly, the final soldered design in the guide may not be the most elegant, but we think it is achievable for someone with modest soldering experience and basic equipment.

You can build a functioning weather station without soldering with our guide, but the build will be more durable if you do solder it. If you’ve never tried soldering before, that’s OK: we have a Getting started with soldering resource plus video tutorial that will walk you through how it works step by step.

Prototyping HAT for Raspberry Pi weather station sensors

For those of you who are more experienced makers, there are plenty of different ways to put the final build together. We always like to hear about alternative builds, so please post your designs in the Weather Station forum.

Our plans for the guide

Our next step is publishing supplementary guides for adding extra functionality to your weather station. We’d love to hear which enhancements you would most like to see! Our current ideas under development include adding a webcam, making a tweeting weather station, adding a light/UV meter, and incorporating a lightning sensor. Let us know which of these is your favourite, or suggest your own amazing ideas in the comments!

*We do have a very small number of kits reserved for interesting projects or locations: a particularly cool experiment, a novel idea for how the Oracle Weather Station could be used, or places with specific weather phenomena. If have such a project in mind, please send a brief outline to [email protected], and we’ll consider how we might be able to help you.

The post Build your own weather station with our new guide! appeared first on Raspberry Pi.

Storing Encrypted Credentials In Git

Post Syndicated from Bozho original https://techblog.bozho.net/storing-encrypted-credentials-in-git/

We all know that we should not commit any passwords or keys to the repo with our code (no matter if public or private). Yet, thousands of production passwords can be found on GitHub (and probably thousands more in internal company repositories). Some have tried to fix that by removing the passwords (once they learned it’s not a good idea to store them publicly), but passwords have remained in the git history.

Knowing what not to do is the first and very important step. But how do we store production credentials. Database credentials, system secrets (e.g. for HMACs), access keys for 3rd party services like payment providers or social networks. There doesn’t seem to be an agreed upon solution.

I’ve previously argued with the 12-factor app recommendation to use environment variables – if you have a few that might be okay, but when the number of variables grow (as in any real application), it becomes impractical. And you can set environment variables via a bash script, but you’d have to store it somewhere. And in fact, even separate environment variables should be stored somewhere.

This somewhere could be a local directory (risky), a shared storage, e.g. FTP or S3 bucket with limited access, or a separate git repository. I think I prefer the git repository as it allows versioning (Note: S3 also does, but is provider-specific). So you can store all your environment-specific properties files with all their credentials and environment-specific configurations in a git repo with limited access (only Ops people). And that’s not bad, as long as it’s not the same repo as the source code.

Such a repo would look like this:

└─── production
|   |   application.properites
|   |   keystore.jks
└─── staging
|   |   application.properites
|   |   keystore.jks
└─── on-premise-client1
|   |   application.properites
|   |   keystore.jks
└─── on-premise-client2
|   |   application.properites
|   |   keystore.jks

Since many companies are using GitHub or BitBucket for their repositories, storing production credentials on a public provider may still be risky. That’s why it’s a good idea to encrypt the files in the repository. A good way to do it is via git-crypt. It is “transparent” encryption because it supports diff and encryption and decryption on the fly. Once you set it up, you continue working with the repo as if it’s not encrypted. There’s even a fork that works on Windows.

You simply run git-crypt init (after you’ve put the git-crypt binary on your OS Path), which generates a key. Then you specify your .gitattributes, e.g. like that:

secretfile filter=git-crypt diff=git-crypt
*.key filter=git-crypt diff=git-crypt
*.properties filter=git-crypt diff=git-crypt
*.jks filter=git-crypt diff=git-crypt

And you’re done. Well, almost. If this is a fresh repo, everything is good. If it is an existing repo, you’d have to clean up your history which contains the unencrypted files. Following these steps will get you there, with one addition – before calling git commit, you should call git-crypt status -f so that the existing files are actually encrypted.

You’re almost done. We should somehow share and backup the keys. For the sharing part, it’s not a big issue to have a team of 2-3 Ops people share the same key, but you could also use the GPG option of git-crypt (as documented in the README). What’s left is to backup your secret key (that’s generated in the .git/git-crypt directory). You can store it (password-protected) in some other storage, be it a company shared folder, Dropbox/Google Drive, or even your email. Just make sure your computer is not the only place where it’s present and that it’s protected. I don’t think key rotation is necessary, but you can devise some rotation procedure.

git-crypt authors claim to shine when it comes to encrypting just a few files in an otherwise public repo. And recommend looking at git-remote-gcrypt. But as often there are non-sensitive parts of environment-specific configurations, you may not want to encrypt everything. And I think it’s perfectly fine to use git-crypt even in a separate repo scenario. And even though encryption is an okay approach to protect credentials in your source code repo, it’s still not necessarily a good idea to have the environment configurations in the same repo. Especially given that different people/teams manage these credentials. Even in small companies, maybe not all members have production access.

The outstanding questions in this case is – how do you sync the properties with code changes. Sometimes the code adds new properties that should be reflected in the environment configurations. There are two scenarios here – first, properties that could vary across environments, but can have default values (e.g. scheduled job periods), and second, properties that require explicit configuration (e.g. database credentials). The former can have the default values bundled in the code repo and therefore in the release artifact, allowing external files to override them. The latter should be announced to the people who do the deployment so that they can set the proper values.

The whole process of having versioned environment-speific configurations is actually quite simple and logical, even with the encryption added to the picture. And I think it’s a good security practice we should try to follow.

The post Storing Encrypted Credentials In Git appeared first on Bozho's tech blog.

Friday Squid Blogging: Do Cephalopods Contain Alien DNA?

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/06/friday_squid_bl_627.html

Maybe not DNA, but biological somethings.

Cause of Cambrian explosion — Terrestrial or Cosmic?“:

Abstract: We review the salient evidence consistent with or predicted by the Hoyle-Wickramasinghe (H-W) thesis of Cometary (Cosmic) Biology. Much of this physical and biological evidence is multifactorial. One particular focus are the recent studies which date the emergence of the complex retroviruses of vertebrate lines at or just before the Cambrian Explosion of ~500 Ma. Such viruses are known to be plausibly associated with major evolutionary genomic processes. We believe this coincidence is not fortuitous but is consistent with a key prediction of H-W theory whereby major extinction-diversification evolutionary boundaries coincide with virus-bearing cometary-bolide bombardment events. A second focus is the remarkable evolution of intelligent complexity (Cephalopods) culminating in the emergence of the Octopus. A third focus concerns the micro-organism fossil evidence contained within meteorites as well as the detection in the upper atmosphere of apparent incoming life-bearing particles from space. In our view the totality of the multifactorial data and critical analyses assembled by Fred Hoyle, Chandra Wickramasinghe and their many colleagues since the 1960s leads to a very plausible conclusion — life may have been seeded here on Earth by life-bearing comets as soon as conditions on Earth allowed it to flourish (about or just before 4.1 Billion years ago); and living organisms such as space-resistant and space-hardy bacteria, viruses, more complex eukaryotic cells, fertilised ova and seeds have been continuously delivered ever since to Earth so being one important driver of further terrestrial evolution which has resulted in considerable genetic diversity and which has led to the emergence of mankind.

Two commentaries.

This is almost certainly not true.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here.