On-device WebAuthn and what makes it hard to do well

Post Syndicated from original https://mjg59.dreamwidth.org/62746.html

WebAuthn improves login security a lot by making it significantly harder for a user’s credentials to be misused – a WebAuthn token will only respond to a challenge if it’s issued by the site a secret was issued to, and in general will only do so if the user provides proof of physical presence[1]. But giving people tokens is tedious and also I have a new laptop which only has USB-C but does have a working fingerprint reader and I hate the aesthetics of the Yubikey 5C Nano, so I’ve been thinking about what WebAuthn looks like done without extra hardware.

Let’s talk about the broad set of problems first. For this to work you want to be able to generate a key in hardware (so it can’t just be copied elsewhere if the machine is compromised), prove to a remote site that it’s generated in hardware (so the remote site isn’t confused about what security assertions you’re making), and tie use of that key to the user being physically present (which may range from “I touched this object” to “I presented biometric evidence of identity”). What’s important here is that a compromised OS shouldn’t be able to just fake a response. For that to be possible, the chain between proof of physical presence to the secret needs to be outside the control of the OS.

For a physical security token like a Yubikey, this is pretty easy. The communication protocol involves the OS passing a challenge and the source of the challenge to the token. The token then waits for a physical touch, verifies that the source of the challenge corresponds to the secret it’s being asked to respond to the challenge with, and provides a response. At the point where keys are being enrolled, the token can generate a signed attestation that it generated the key, and a remote site can then conclude that this key is legitimately sequestered away from the OS. This all takes place outside the control of the OS, meeting all the goals described above.

How about Macs? The easiest approach here is to make use of the secure enclave and TouchID. The secure enclave is a separate piece of hardware built into either a support chip (for x86-based Macs) or directly on the SoC (for ARM-based Macs). It’s capable of generating keys and also capable of producing attestations that said key was generated on an Apple secure enclave (“Apple Anonymous Attestation”, which has the interesting property of attesting that it was generated on Apple hardware, but not which Apple hardware, avoiding a lot of privacy concerns). These keys can have an associated policy that says they’re only usable if the user provides a legitimate touch on the fingerprint sensor, which means it can not only assert physical presence of a user, it can assert physical presence of an authorised user. Communication between the fingerprint sensor and the secure enclave is a private channel that the OS can’t meaningfully interfere with, which means even a compromised OS can’t fake physical presence responses (eg, the OS can’t record a legitimate fingerprint press and then send that to the secure enclave again in order to mimic the user being present – the secure enclave requires that each response from the fingerprint sensor be unique). This achieves our goals.

The PC space is more complicated. In the Mac case, communication between the biometric sensors (be that TouchID or FaceID) occurs in a controlled communication channel where all the hardware involved knows how to talk to the other hardware. In the PC case, the typical location where we’d store secrets is in the TPM, but TPMs conform to a standardised spec that has no understanding of this sort of communication, and biometric components on PCs have no way to communicate with the TPM other than via the OS. We can generate keys in the TPM, and the TPM can attest to those keys being TPM-generated, which means an attacker can’t exfiltrate those secrets and mimic the user’s token on another machine. But in the absence of any explicit binding between the TPM and the physical presence indicator, the association needs to be up to code running on the CPU. If that’s in the OS, an attacker who compromises the OS can simply ask the TPM to respond to an challenge it wants, skipping the biometric validation entirely.

Windows solves this problem in an interesting way. The Windows Hello Enhanced Signin doesn’t add new hardware, but relies on the use of virtualisation. The agent that handles WebAuthn responses isn’t running in the OS, it’s running in another VM that’s entirely isolated from the OS. Hardware that supports this model has a mechanism for proving its identity to the local code (eg, fingerprint readers that support this can sign their responses with a key that has a certificate that chains back to Microsoft). Additionally, the secrets that are associated with the TPM can be held in this VM rather than in the OS, meaning that the OS can’t use them directly. This means we have a flow where a browser asks for a WebAuthn response, that’s passed to the VM, the VM asks the biometric device for proof of user presence (including some sort of random value to prevent the OS just replaying that), receives it, and then asks the TPM to generate a response to the challenge. Compromising the OS doesn’t give you the ability to forge the responses between the biometric device and the VM, and doesn’t give you access to the secrets in the TPM, so again we meet all our goals.

On Linux (and other free OSes), things are less good. Projects like tpm-fido generate keys on the TPM, but there’s no secure channel between that code and whatever’s providing proof of physical presence. An attacker who compromises the OS may not be able to copy the keys to their own system, but while they’re on the compromised system they can respond to as many challenges as they like. That’s not the same security assertion we have in the other cases.

Overall, Apple’s approach is the simplest – having binding between the various hardware components involved means you can just ignore the OS entirely. Windows doesn’t have the luxury of having as much control over what the hardware landscape looks like, so has to rely on virtualisation to provide a security barrier against a compromised OS. And in Linux land, we’re fucked. Who do I have to pay to write a lightweight hypervisor that runs on commodity hardware and provides an environment where we can run this sort of code?

[1] As I discussed recently there are scenarios where these assertions are less strong, but even so

comment count unavailable comments