Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Devuan considers machine IDs (distrowatch.com)
57 points by tomkat0789 on March 30, 2019 | hide | past | favorite | 72 comments


If a file on your computer is being used by a program to send information to someone, the answer isn't to destroy/randomize the file and break other applications, the answer is to not use the program that is sending your information somewhere.


Sure, but how do you know what programs are misusing it?


This speaks to a more general need for user-friendly audit logs of which resources are accessed by which programs. I should be able to tell on any platform if Spotify called fopen on something in my documents folder.


SELinux or app-armor, keep it in enforcing mode.


I think this is largely possible with eBPF, if you cared enough.


Rename it and see which programs complain :)


  auditd(8)



Or sandbox the program so that it sees a "safe" version of that information.


But what faster way to discover which applications to remove than by deleting /etc/machine-id?


Chromium reads it, but are we sure it's sending it somewhere? Maybe it uses it for bookkeeping of local sessions or something like that.


I checked the Chromium source and for Linux they explicitly mention not being allowed to send it externally[0]; they hash it via SHA1, encode it as base64, and use that value.

Interestingly for Windows they pull the machine id from the registry[1] and (at first glance) it doesn't seem like they're doing any hashing. The raw value gets used.

Haven't checked if the value gets sent externally but based upon the comment on the Linux code I'd bet it's a yes.

[0]: https://github.com/chromium/chromium/blob/aae20fb7d3616de40e...

[1]: https://github.com/chromium/chromium/blob/01a03aab2d89c93c15...


> I checked the Chromium source and for Linux they explicitly mention not being allowed to send it externally[0]; they hash it via SHA1, encode it as base64, and use that value.

/etc/machine-id seems to be a random value. What purpose does hashing it before sending it do? The hash should still uniquely identify a machine. Am I missing something? Kind of makes me think that it's just done so people say "it's ok, because they're hashing it first, so it's secure!", while in reality hashing doesn't do anything to alleviate any concern.


> Am I missing something?

The source code links https://www.freedesktop.org/software/systemd/man/machine-id.... mentions hashing using an "application-specific key", which would at least make it not correlatable between different apps (so $WEBSITE can't correlate machine IDs with $WELL_BEHAVED_APP 's machine IDs.)

But either I'm missing something, or Chromium is - it looks like it's straight up hashing the file and not actually using any application-specific keys!


> they explicitly mention not being allowed to send it externally[0]; they hash it via SHA1, encode it as base64, and use that value.

If it's sending that value out, then there is no logical difference between that and just sending out the machine-id in the first place.


/etc/machine-id reading was implemented in service of the Chromium "Enterprise" component:

https://bugs.chromium.org/p/chromium/issues/detail?id=812641

Comment 26 implements Linux support for reading the machine's unique identifier:

https://chromium.googlesource.com/chromium/src.git/+/15dc90a...

That patch is Linux-only support in service of the greater patch:

https://chromium.googlesource.com/chromium/src.git/+/81a7040...

And having read several rounds of "Device enrollment" phrases now, while the design document is non-public, I would hazard a guess that this is the essential components of enterprise device management.

Google hashes the ID before making use of it, so the actual ID remains disguised, but this absolutely would be necessary if they were trying to implement ChromeOS enterprise device management. (You need a unique identifier per enterprise machine, etc.)

Presumably they only care about this id file with respect to enterprise ChromeOS installations, since they make no effort at all to locate the file in any other location than the one.

It looks more like they simply don't care about reading the file in non-enterprise circumstances, since either the machine is enterprise-managed or it isn't, and as they only transmit hashes of the ID rather than the ID itself, they're in compliance with the FreeDesktop guidelines that require this file to be present:

https://www.freedesktop.org/software/systemd/man/machine-id....



Removing it just makes it more difficult to write legit programs that has use of such features while anything nefarious will be able to find other things to use as fingerprints, including hardware serials, MACs and their own fingerprint files spread across the filesystem in non-standard locations.

Unless the OS is meant to be built for privacy and has a goal to run every app in a sandbox where nothing is fingerprintable, removing easily available fingerprintes would be a disservice to all.


> Removing it just makes it more difficult to write legit programs

Yeah, that's something I simply could not care less about.

That said, I don't remove it. I set its permissions so that it isn't world-readable instead.


Why?


OK, from the man page:

> The /etc/machine-id file contains the unique machine ID of the local system that is set during installation. The machine ID is a single newline-terminated, hexadecimal, 32-character, lowercase ID. When decoded from hexadecimal, this corresponds to a 16-byte/128-bit value.

> The machine ID is usually generated from a random source during system installation and stays constant for all subsequent boots. Optionally, for stateless systems, it is generated during runtime at early boot if it is found to be empty.

So if that works for "stateless systems", why can't all machines be "stateless systems"?


It's not clear that would make anyone happy.

You might really want to track machines in your fleet, in a way that persists across reboots. Let's say you can access a machine remotely but you got all your ethernet cables tangled up, so you don't know which physical machine you SSHed into.

Or if you are being concerned about being tracked by a third party, you don't want this identifier to exist at all, even if it doesn't persist through reboots.

I agree there are other solutions in both cases.


So basically deleting the file as the last step before a shut down or reboot will work, is that correct?


Incorrect.

Machine IDs are stored in several places, some of which are not even files.

* http://jdebp.uk./Softwares/nosh/guide/commands/machine-id.xm...

They can be resurrected if not all storage locations are dealt with. Moreover, some systems use things like the SMBIOS product UUID.

* http://jdebp.uk./Softwares/nosh/guide/commands/setup-machine...

The correct approach is not deleting files.

* http://jdebp.uk./Softwares/nosh/guide/commands/erase-machine...

* https://lists.debian.org/debian-user/2019/03/msg00550.html


Thanks.

So at boot, one would run "erase-machine-id". Then create a random 30-character hexadecimal number. And then either set "the systemd.machine_id= kernel command line parameter" to it. Or pass it via "the option --machine-id= to systemd".


No.

    % system-control cat machine-id
    start:#!/bin/nosh
    start:true
    stop:#!/bin/nosh
    stop:envdir env
    stop:erase-machine-id
    run:#!/bin/nosh
    run:#Set up and tear down the machine ID
    run:envdir env
    run:setup-machine-id
    restart:#!/bin/sh
    restart:exec false      # ignore script arguments
    %


OK, thanks.

So "setup-machine-id". Does that generate a value that's unrelated to any preexisting versions, analogs, etc?

I ask because, when I deleted /etc/machine-id and ran "systemd-machine-id-setup", it generated a new machine-id by copying the D-Bus machine ID.


I'll test that.

Edit: OK, so I created a Debian VM, noted /etc/machine-id, deleted it, and rebooted. And found that it was still missing.

Running systemd-machine-id-setup generated a new machine-id from the D-Bus machine ID. And it was the same as the initial one.

But I also see in man machine-id:

> The machine-id may also be set, for example when network booting, by setting the systemd.machine_id= kernel command line parameter or passing the option --machine-id= to systemd. A machine-id may not be set to all zeros.


Does something like

dd if=/dev/urandom bs=1 count=16 | hexdump

Give you something that you can use?


Well, the machine-id of that Debian VM was ...

    38d05397c25548b4f4bda7751b5062
... and ...

    $ FOO=`cat /dev/urandom | tr -dc a-z0-9 | head -c${1:-30}`
    $ echo $FOO
      6we4gvnmx00w208ffty6i11m82rw6d
But I have no clue whether systemd would be happy with that. Maybe later I can test more.

Edit: Oops. Make that ...

    $ FOO=`cat /dev/urandom | tr -dc abcdef0-9 | head -c${1:-30}`
    $ echo $FOO
      b486935dbb9e9fe603328a19e2b5b4


Maybe use pwgen?

Something like:

    $ pwgen -1s 31
    qHfKU46H2RA2WUr0EZ1zBHfIBLKZKuT


I think that it needs to be hexadecimal. But not sure.


No worries. :)


Because then Google doesn’t have a persistent identifier to track you, of course.


What is it with the fetish for inflicting Truenaming in cyberspace?

It's incredibly annoying. It needlessly bloats digital footprints, and it creates an opportunity for exploitation by nefarious actors.

Leave the Truenaming to the User's that need it. It doesn't do any good being baked in by default. If they really need it, they'll figure out a way to implement it. If they definitely cannot afford it, and aren't aware it is there by default, you are doing more harm putting It in than you would be by leaving well enough alone.


Why would anyone worry that applications like chrome abuse that file? If chrome wants a unique identifier it could generate it itself.


Revisiting this, I agree that there's quite some "meh" about this. I mean, there's no way to really know how machines have and share identifiers. So one must assume that they have, and do. And deal with it.

VMs seem generally good enough. But then there's WebGL, which generates identifiers based on the host graphics system and guest virtual video driver. So all Debian VMs on a given host have the same identifier.

If it really matters, though, you gotta use different hardware.


Because the machine-id by default never changes after OS installation.


Neither does a file Chrome generates. Not even if reinstalled.


Huh?

Even if you do

    $ sudo apt-get -y purge chrome
[or whatever its package name is]?

And if necessary, find and delete everything that it created.


apt doesn't know about any file the application might have ever created in your home directory.


> And if necessary, find and delete everything that it created.


Actually, apt does a pretty good job at finding stuff. Sometimes it can't delete, but it warns you about that.


Apt only knows about the files that are listed in the package.


Sure, but then that means that honest packages should document all files that the software can be configured to created automatically. Or at least, all but user-specified ones.


In Linux, I can find and nuke everything that Chrome (or any other app) generates.


Bad title (and I know that fault lies with the source site). At first glance it reads as if the Devuan team is considering embracing/adopting machine IDs, which is counter to their philosophy, when in fact they are against unique identifiers.


Doesn't a MAC address lookup or disk UUID lookup provide similar fingerprinting capabilities? Even the contents of one's .bashrc file could be used for fingerprinting.

I mean, if you start blocking one thing, where do you stop?


Ideally you work in reverse and only need to grant access to things to which you want to allow access. Start from zero and let an application provide a manifest of what it wants to access. Ex:

* Open outbound TCP sockets * Read from $HOME/.config/chromium * Read from /etc/machine-id

It's not a new concept at all and there are multiple approaches for implementing things like this. Getting mass market adoption is sadly next to impossible.


That's basically how smartphones do it.

The problem is still that the granularity is not right (except for users who simply want to trust the application). For example, when uploading a photo, I don't want to give Facebook access to my entire filesystem, just the photo that I click. And I don't want to give Facebook access to my camera indefinitely, just now.

It will require a lot of design to get security right without deteriorating the UX too much.

But I agree, it's better than simply blocking everything.


> For example, when uploading a photo, I don't want to give Facebook access to my entire filesystem, just the photo that I click. And I don't want to give Facebook access to my camera indefinitely, just now.

FYI that's exactly how iOS does it. When you choose a picture to upload from the camera roll, the target app only gains access to that one photo.


> I don't want to give Facebook access to my entire filesystem,

Android's systems of intents and changing how storage security works in Android Q will help with this somewhat. They start expressly prohibiting access to the full filesystem and images and other intent extras must be passed through the intent call, rather than a reference to it on the filesystem.


That sounds great. But what I really want is if the app does want access to the entire filesystem, then the OS will present the app with a sandboxed filesystem, instead of just blocking the app (causing the app to refuse to work, which is what will happen in practice).


Oh, chromium may break? Then I use an alternative. This is true for any other program.


Or, devuan could arrange to always use 0xfoad or something appropriate if the file is missing.

Even better, it looks like there has been a file that always has the same value for a while; presumably they can just keep that in place.


Privacy concerns aside, using the boot device serial number might be a better solution as it can't be deleted or modified and survives reboots and reinstalls.

This line will find it easily. Not mine, I simply put together the work of others adding only very small modifications. Needs smartctl (smartmontools package on Debian) which can be run only by root.

smartctl -i `df -P / | tail -n 1 | awk '/.*/ { print $1 }'` | egrep ^"Serial Number:" | awk '{print $3}'


Why is it there in the first place?


https://dbus.freedesktop.org/doc/dbus-uuidgen.1.html has some explanation, in particular:

> The important properties of the machine UUID are that 1) it remains unchanged until the next reboot and 2) it is different for any two running instances of the OS kernel. That is, if two processes see the same UUID, they should also see the same shared memory, UNIX domain sockets, local X displays, localhost.localdomain resolution, process IDs, and so forth.

Because it's possible to forward things like D-Bus, the X11 $DISPLAY, etc. over the network, two processes might be aware of each other over such a connection but not be running on the same machine and therefore be unable to share resources. The machine ID lets them check for that, so you can properly handle things like "I'm going to send a message to the screensaver in my display to not activate, I don't care if it's the same machine" vs. "I'm going to send a message to the terminal in my display to open a new tab, but only if it's actually on the same machine, otherwise I should start a new terminal". (These days I think that definition should be updated to "container" instead of "kernel": if you're running separate logical machines inside the same kernel with separate PIDs etc., they should have separate machine IDs.)

systemd and (IIRC) cloud-init use it to run once-per-machine tasks on machines that could come from images: if you want to prep a number of machines in advance, do the install, then change the machine ID. At boot time, startup scripts will say "Oh, this machine ID has not been initialized yet" and do things, and then not do them on the next boot.


> These days I think that definition should be updated to "container" instead of "kernel": if you're running separate logical machines inside the same kernel with separate PIDs etc., they should have separate machine IDs.

Indeed. If that weren't the case, and the "same kernel" were enough, then things could just use /proc/sys/kernel/random/boot_id.



How about making it a symlink to a kernel feature:

   /etc/machine-id -> /proc/some/path/machine-id
this fictitious proc entry that I just invented serves up bullshit content to unprivileged processes, but a true ID to the superuser.


See also:

    /proc/sys/kernel/random/boot_id
    /proc/sys/kernel/random/uuid
----

Because the machine-id is intended to be something that persists between reboots, it is necessarily something that would live in the filesystem, independent of the kernel.

Your described functionality could be accomplished with a FUSE filesystem, though.

However, that functionality would be problematic. Programs (like D-Bus) expect to be able to use it to identify whether 2 communicating processes are on the same host.

If it served different bullshit to each process, it would be entirely non-functional. (Sans returning the true ID to root, this is /proc/sys/kernel/random/uuid)

Perhaps instead, use a determined-at-boot value (as the machine-id(5) docs say is acceptable for stateless systems). If this is a kernel construct that isn't associated with a specific (PID?) namespace, then this would also be problematic, as different containers would be considered to be the same "host". (Sans returning the true ID to root, this is /proc/sys/kernel/random/boot_id)


> it is necessarily something that would live in the filesystem, independent of the kernel.

Counterexample: struct utsname and the uname system call.


That's a good counterpoint.

It would still have to live somewhere in the filesystem, and have a userspace program load it in to the kernel, just as `utsname.nodename` is loaded from /etc/hostname.


Yes; that area of the filesystem can be readable only to root. It could also have other avenues of entry: it could come from the boot firmware via the kernel command line, or be in a device tree blob or whatever.


> some applications, such as the Chromium web browser, may report an error if this file is not present

Not present, or not present and readable to the application?


A random machine-id doesnt help. Use a static number instead such as 1111111111111111111111111111111111111111


can't you simply symlink /etc/machine-id to /dev/random?


That won't work because /etc/machine-id is supposed to return a 32 character hex string, but I like the idea. You could do something like this:

  # rm -f /etc/machine-id
  # mkfifo /etc/machine-id
  # while true; do head -c 16 /dev/urandom | od -A n -x | tr -d ' ' > /etc/machine-id; done &


A word of warning. If you try this, make sure the last command is started on boot and runs before D-Bus! I completely forgot I had done this, and I just spent a few hours trying to figure out why my system was hanging on boot. It turns out that D-Bus reads /etc/machine-id on start-up, and naturally by design, it will wait until it receives data from the named pipe before proceeding with execution.


Perhaps a better idea (once you strip out the dashes) would be to use a random ID that is automatically (re-)generated on boot:

  /proc/sys/kernel/random/boot_id


It seems that you can set a random value at boot. But maybe not whenever it's requested.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: