On IDs

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/ids.html

When programming software that cooperates with software running on behalf of
other users, other sessions or other computers it is often necessary to work with
unique identifiers. These can be bound to various hardware and software objects
as well as lifetimes. Often, when people look for such an ID to use they pick
the wrong one because semantics and lifetime or the IDs are not clear. Here’s a
little incomprehensive list of IDs accessible on Linux and how you should or
should not use them.

Hardware IDs

  1. /sys/class/dmi/id/product_uuid: The main board product UUID, as
    set by the board manufacturer and encoded in the BIOS DMI information. It may
    be used to identify a mainboard and only the mainboard. It changes when the
    user replaces the main board. Also, often enough BIOS manufacturers write bogus
    serials into it. In addition, it is x86-specific. Access for unprivileged users
    is forbidden. Hence it is of little general use.
  2. CPUID/EAX=3 CPU serial number: A CPU UUID, as set by the
    CPU manufacturer and encoded on the CPU chip. It may be used to identify a CPU
    and only a CPU. It changes when the user replaces the CPU. Also, most modern
    CPUs don’t implement this feature anymore, and older computers tend to disable
    this option by default, controllable via a BIOS Setup option. In addition, it
    is x86-specific. Hence this too is of little general use.
  3. /sys/class/net/*/address: One or more network MAC addresses, as
    set by the network adapter manufacturer and encoded on some network card
    EEPROM. It changes when the user replaces the network card. Since network cards
    are optional and there may be more than one the availability if this ID is not
    guaranteed and you might have more than one to choose from. On virtual machines
    the MAC addresses tend to be random. This too is hence of little general use.
  4. /sys/bus/usb/devices/*/serial: Serial numbers of various USB
    devices, as encoded in the USB device EEPROM. Most devices don’t have a serial
    number set, and if they have it is often bogus. If the user replaces his USB
    hardware or plugs it into another machine these IDs may change or appear in
    other machines. This hence too is of little use.

There are various other hardware IDs available, many of which you may
discover via the ID_SERIAL udev property of various devices, such hard disks
and similar. They all have in common that they are bound to specific
(replacable) hardware, not universally available, often filled with bogus data
and random in virtualized environments. Or in other words: don’t use them, don’t
rely on them for identification, unless you really know what you are doing and
in general they do not guarantee what you might hope they guarantee.

Software IDs

  1. /proc/sys/kernel/random/boot_id: A random ID that is regenerated
    on each boot. As such it can be used to identify the local machine’s current
    boot. It’s universally available on any recent Linux kernel. It’s a good and
    safe choice if you need to identify a specific boot on a specific booted
    kernel.
  2. gethostname(), /proc/sys/kernel/hostname: A non-random ID
    configured by the administrator to identify a machine in the network. Often
    this is not set at all or is set to some default value such as
    localhost and not even unique in the local network. In addition it
    might change during runtime, for example because it changes based on updated
    DHCP information. As such it is almost entirely useless for anything but
    presentation to the user. It has very weak semantics and relies on correct
    configuration by the administrator. Don’t use this to identify machines in a
    distributed environment. It won’t work unless centrally administered, which
    makes it useless in a globalized, mobile world. It has no place in
    automatically generated filenames that shall be bound to specific hosts. Just
    don’t use it, please. It’s really not what many people think it is.
    gethostname() is standardized in POSIX and hence portable to other
    Unixes.
  3. IP Addresses returned by SIOCGIFCONF or the respective Netlink APIs: These
    tend to be dynamically assigned and often enough only valid on local networks
    or even only the local links (i.e. 192.168.x.x style addresses, or even
    169.254.x.x/IPv4LL). Unfortunately they hence have little use outside of
    networking.
  4. gethostid(): Returns a supposedly unique 32-bit identifier for the
    current machine. The semantics of this is not clear. On most machines this
    simply returns a value based on a local IPv4 address. On others it is
    administrator controlled via the /etc/hostid file. Since the semantics
    of this ID are not clear and most often is just a value based on the IP address it is
    almost always the wrong choice to use. On top of that 32bit are not
    particularly a lot. On the other hand this is standardized in POSIX and hence
    portable to other Unixes. It’s probably best to ignore this value and if people
    don’t want to ignore it they should probably symlink /etc/hostid to
    /var/lib/dbus/machine-id or something similar.
  5. /var/lib/dbus/machine-id: An ID identifying a specific Linux/Unix
    installation. It does not change if hardware is replaced. It is not unreliable
    in virtualized environments. This value has clear semantics and is considered
    part of the D-Bus API. It is supposedly globally unique and portable to all
    systems that have D-Bus. On Linux, it is universally available, given that
    almost all non-embedded and even a fair share of the embedded machines ship
    D-Bus now. This is the recommended way to identify a machine, possibly with a
    fallback to the host name to cover systems that still lack D-Bus. If your
    application links against libdbus, you may access this ID with
    dbus_get_local_machine_id(), if not you can read it directly from the file system.
  6. /proc/self/sessionid: An ID identifying a specific Linux login
    session. This ID is maintained by the kernel and part of the auditing logic. It
    is uniquely assigned to each login session during a specific system boot,
    shared by each process of a session, even across su/sudo and cannot be changed
    by userspace. Unfortunately some distributions have so far failed to set things
    up properly for this to work (Hey, you, Ubuntu!), and this ID is always
    (uint32_t) -1 for them. But there’s hope they get this fixed
    eventually. Nonetheless it is a good choice for a unique session identifier on
    the local machine and for the current boot. To make this ID globally unique it
    is best combined with /proc/sys/kernel/random/boot_id.
  7. getuid(): An ID identifying a specific Unix/Linux user. This ID is
    usually automatically assigned when a user is created. It is not unique across
    machines and may be reassigned to a different user if the original user was
    deleted. As such it should be used only locally and with the limited validity
    in time in mind. To make this ID globally unique it is not sufficient to
    combine it with /var/lib/dbus/machine-id, because the same ID might be
    used for a different user that is created later with the same UID. Nonetheless
    this combination is often good enough. It is available on all POSIX systems.
  8. ID_FS_UUID: an ID that identifies a specific file system in the
    udev tree. It is not always clear how these serials are generated but this
    tends to be available on almost all modern disk file systems. It is not
    available for NFS mounts or virtual file systems. Nonetheless this is often a
    good way to identify a file system, and in the case of the root directory even
    an installation. However due to the weakly defined generation semantics the
    D-Bus machine ID is generally preferrable.

Generating IDs

Linux offers a kernel interface to generate UUIDs on demand, by reading from
/proc/sys/kernel/random/uuid. This is a very simple interface to
generate UUIDs. That said, the logic behind UUIDs is unnecessarily complex and
often it is a better choice to simply read 16 bytes or so from
/dev/urandom.

Summary

And the gist of it all: Use /var/lib/dbus/machine-id! Use
/proc/self/sessionid! Use /proc/sys/kernel/random/boot_id!
Use getuid()! Use /dev/urandom!
And forget about the
rest, in particular the host name, or the hardware IDs such as DMI. And keep in
mind that you may combine the aforementioned IDs in various ways to get
different semantics and validity constraints.