Post Syndicated from Robert Graham original http://blog.erratasec.com/2016/10/wtf-yahoofisa-search-in-kernel.html
A surprising detail in the Yahoo/FISA email search scandal is that they do it with a kernel module. I thought I’d write up some (rambling) notes.
What the government was searching for
As described in the previoius blog post, we’ll assume the government is searching for the following string, and possibly other strings like it within emails:
### Begin ASRAR El Mojahedeen v2.0 Encrypted Message ###
I point this out because it’s simple search identifying things. It’s not natural language processing. It’s not searching for phrases like “bomb president”.
Also, it’s not AV/spam/childporn processing. Those look at different things. For example, filtering message containing childporn involves calculating a SHA2 hash of email attachments and looking up the hashes in a table of known bad content (or even more in-depth analysis). This is quite different from searching.
The Kernel vs. User Space
Operating systems have two parts, the kernel and user space. The kernel is the operating system proper (e.g. the “Linux kernel”). The software we run is in user space, such as browsers, word processors, games, web servers, databases, GNU utilities [sic], and so on.
The kernel has raw access to the machine, memory, network devices, graphics cards, and so on. User space has virtual access to these things. The user space is the original “virtual machines”, before kernels got so bloated that we needed a third layer to virtualize them too.
This separation between kernel and user has two main benefits. The first is security, controlling which bit of software has access to what. It means, for example, that one user on the machine can’t access another’s files. The second benefit is stability: if one program crashes, the others continue to run unaffected.
Downside of a Kernel Module
Writing a search program as a kernel module (instead of a user space module) defeats the benefits of user space programs, making the machine less stable and less secure.
Moreover, the sort of thing this module does (parsing emails) has a history of big gapping security flaws. Parsing stuff in the kernel makes cybersecurity experts run away screaming in terror.
On the other hand, people have been doing security stuff (SSL implementations and anti-virus scanning) in the kernel in other situations, so it’s not unprecedented. I mean, it’s still wrong, but it’s been done before.
Upside of a Kernel Module
If doing this is as a kernel module (instead of in user space) is so bad, then why does Yahoo do it? It’s probably due to the widely held, but false, belief that putting stuff in the kernel makes it faster.
Everybody knows that kernels are faster, for two reasons. First is that as a program runs, making a system call switches context, from running in user space to running in kernel space. This step is expensive/slow. Kernel modules don’t incur this expense, because code just jumps from one location in the kernel to another. The second performance issue is virtual memory, where reading memory requires an extra step in user space, to translate the virtual memory address to a physical one. Kernel modules access physical memory directly, without this extra step.
But everyone is wrong. Using features like hugepages gets rid of the cost of virtual memory translation cost. There are ways to mitigate the cost of user/kernel transitions, such as moving data in bulk instead of a little bit at a time. Also, CPUs have improved in recent years, dramatically reducing the cost of a kernel/user transition.
The problem we face, though, is inertia. Everyone knows moving modules into the kernel makes things faster. It’s hard getting them to un-learn what they’ve been taught.
Also, following this logic, Yahoo may already have many email handling functions in the kernel. If they’ve already gone down the route of bad design, then they’d have to do this email search as a kernel module as well, to avoid the user/kernel transition cost.
Another possible reason for the kernel-module is that it’s what the programmers knew how to do. That’s especially true if the contractor has experience with other kernel software, such as NSA implants. They might’ve read Phrack magazine on the topic, which might have been their sole education on the subject. [http://phrack.org/issues/61/13.html]
How it was probably done
I don’t know Yahoo’s infrastructure. Presumably they have front-end systems designed to balance the load (and accelerate SSL processing), and back-end systems that do the heavy processing, such as spam and virus checking.
The typical way to do this sort of thing (search) is simply tap into the network traffic, either as a separate computer sniffing (eavesdropping on) the network, or something within the system that taps into the network traffic, such as a netfilter module. Netfilter is the Linux firewall mechanism, and has ways to easily “hook” into specific traffic, either from user space or from a kernel module. There is also a related user space mechanism of hooking network APIs like recv() with a preload shared library.
This traditional mechanism doesn’t work as well anymore. For one thing, incoming email traffic is likely encrypted using SSL (using STARTTLS, for example). For another thing, companies are increasingly encrypting intra-data-center traffic, either with SSL or with hard-coded keys.
Therefore, instead of tapping into network traffic, the code might tap directly into the mail handling software. A good example of this is Sendmail’s milter interface, that allows the easy creation of third-party mail filtering applications, specifically for spam and anti-virus.
But it would be insane to write a milter as a kernel module, since mail handling is done in user space, thus adding unnecessary user/kernel transitions. Consequently, we make the assumption that Yahoo’s intra-data-center traffic in unencrypted, and that for FISA search thing, they wrote something like a kernel-module with netfilter hooks.
How it should’ve been done
Assuming the above guess is correct, that they used kernel netfilter hooks, there are a few alternatives.
They could do user space netfilter hooks instead, but they do have a performance impact. They require a transition from the kernel to user, then a second transition back into the kernel. If the system is designed for high performance, this might be a noticeable performance impact. I doubt it, as it’s still small compared to the rest of the computations involved, but it’s the sort of thing that engineers are prejudiced against, even before they measure the performance impact.
A better way of doing it is hooking the libraries. These days, most software uses shared libraries (.so) to make system calls like recv(). You can write your own shared library, and preload it. When the library function is called, you do your own processing, then call the original function.
Hooking the libraries then lets you tap into the network traffic, but without any additional kernel/user transition.
Yet another way is simple changes in the mail handling software that allows custom hooks to be written.
Third party contractors
We’ve been thinking in terms of technical solutions. There is also the problem of politics.
Almost certainly, the solution was developed by outsiders, by defense contractors like Booz-Allen. (I point them out because of the whole Snowden/Martin thing). This restricts your technical options.
You don’t want to give contractors access to your source code. Nor do you want to the contractors to be making custom changes to your source code, such as adding hooks. Therefore, you are looking at external changes, such as hooking the network stack.
The advantage of a netfilter hook in the kernel is that it has the least additional impact on the system. It can be developed and thoroughly tested by Booz-Allen, then delivered to Yahoo!, who can then install it with little effort.
This is my #1 guess why this was a kernel module – it allowed the most separation between Yahoo! and a defense contractor who wrote it. In other words, there is no technical reason for it — but a political reason.
Let’s talk search
There two ways to search things: using an NFA and using a DFA.
An NFA is the normal way of using regex, or grep. It allows complex patterns to be written, but it requires a potentially large amount of CPU power (i.e. it’s slow). It also requires backtracking within a message, thus meaning the entire email must be reassembled before searching can begin.
The DFA alternative instead creates a large table in memory, then does a single pass over a message to search. Because it does only a single pass, without backtracking, the message can be streamed through the search module, without needing to reassemble the message. In theory, anything searched by an NFA can be searched by a DFA, though in practice some unbounded regex expressions require too much memory, so DFAs usually require simpler patterns.
The DFA approach, by the way, is about 4-gbps per 2.x-GHz Intel x86 server CPU. Because no reassembly is required, it can tap directly into anything above the TCP stack, like netfilter. Or, it can tap below the TCP stack (like libpcap), but would require some logic to re-order/de-duplicate TCP packets, to present the same ordered stream as TCP.
DFAs would therefore require little or no memory. In contrast, the NFA approach will require more CPU and memory just to reassemble email messages, and the search itself would also be slower.
The naïve approach to searching is to use NFAs. It’s what most people start out with. The smart approach is to use DFAs. You see that in the evolution of the Snort intrusion detection engine, where they started out using complex NFAs and then over the years switched to the faster DFAs.
You also see it in the network processor market. These are specialized CPUs designed for things like firewalls. They advertise fast regex acceleration, but what they really do is just convert NFAs into something that is mostly a DFA, which you can do on any processor anyway. I have a low opinion of network processors, since what they accelerate are bad decisions. Correctly designed network applications don’t need any special acceleration, except maybe SSL public-key crypto.
So, what the government’s code needs to do is a very lightweight parse of the SMTP protocol in order to extract the from/to email addresses, then a very lightweight search of the message’s content in order to detect if any of the offending strings have been found. When the pattern is found, it then reports the addresses it found.
I don’t know Yahoo’s system for processing incoming emails. I don’t know the contents of the court order forcing them to do a search, and what needs to be secret. Therefore, I’m only making guesses here.
But they are educated guesses. In 9 times out of 10 in situations similar to Yahoo, I’m guessing that a “kernel module” would be the most natural solution. It’s how engineers are trained to think, and it would likely be the best fit organizationally. Sure, it really REALLY annoys cybersecurity experts, but nobody cares what we think, so that doesn’t matter.