Fedora desktop Planet

Making an Orbic Speed RC400L autoboot when USB power is attached

Posted by Matthew Garrett on December 07, 2022 06:33 AM
As I mentioned a couple of weeks ago, I've been trying to hack an Orbic Speed RC400L mobile hotspot so it'll automatically boot when power is attached. When plugged in it would flash a "Welcome" screen and then switch to a display showing the battery charging - it wouldn't show up on USB, and didn't turn on any networking. So, my initial assumption was that the bootloader was making a policy decision not to boot Linux. After getting root (as described in the previous post), I was able to cat /proc/mtd and see that partition 7 was titled "aboot". Aboot is a commonly used Android bootloader, based on Little Kernel - LK provides the hardware interface, aboot is simply an app that runs on top of it. I was able to find the source code for Quectel's aboot, which is intended to run on the same SoC that's in this hotspot, so it was relatively easy to line up a bunch of the Ghidra decompilation with actual source (top tip: find interesting strings in your decompilation and paste them into github search, and see whether you get a repo back).

Unfortunately looking through this showed various cases where bootloader policy decisions were made, but all of them seemed to result in Linux booting. Patching them and flashing the patched loader back to the hotspot didn't change the behaviour. So now I was confused: it seemed like Linux was loading, but there wasn't an obvious point in the boot scripts where it then decided not to do stuff. No boot logs were retained between boots, which made things even more annoying. But then I realised that, well, I have root - I can just do my own logging. I hacked in an additional init script to dump dmesg to /var, powered it down, and then plugged in a USB cable. It booted to the charging screen. I hit the power button and it booted fully, appearing on USB. I adb shelled in, checked the logs, and saw that it had booted twice. So, we were definitely entering Linux before showing the charging screen. But what was the difference?

Diffing the dmesg showed that there was a major distinction on the kernel command line. The kernel command line is data populated by the bootloader and then passed to the kernel - it's how you provide arguments that alter kernel behaviour without having to recompile it, but it's also exposed to userland by the running kernel so it also serves as a way for the bootloader to pass information to the running userland. The boot that resulted in the charging screen had a androidboot.poweronreason=USB argument, the one that booted fully had androidboot.poweronreason=PWRKEY. Searching the filesystem for androidboot.poweronreason showed that the script that configures USB did not enable USB if poweronreason was USB, and the same string also showed up in a bunch of other applications. The bootloader was always booting Linux, but it was telling Linux why it had booted, and if that reason was "USB" then Linux was choosing not to enable USB and not starting the networking stack.

One approach would be to modify every application that parsed this data and make it work even if the power on reason was "USB". That would have been tedious. It seemed easier to modify the bootloader. Looking for that string in Ghidra showed that it was reading a register from the power management controller and then interpreting that to determine the reason it had booted. In effect, it was doing something like:
boot_reason = read_pmic_boot_reason();
switch(boot_reason) {
case 0x10:
  bootparam = strdup("androidboot.poweronreason=PWRKEY");
  break;
case 0x20:
  bootparam = strdup("androidboot.poweronreason=USB");
  break;
default:
  bootparam = strdup("androidboot.poweronreason=Hard_Reset");
}
Changing the 0x20 to 0xff meant that the USB case would never be detected, and it would fall through to the default. All the userland code was happy to accept "Hard_Reset" as a legitimate reason to boot, and now plugging in USB results in the modem booting to a functional state. Woo.

If you want to do this yourself, dump the aboot partition from your device, and search for the hex sequence "03 02 00 0a 20"". Change the final 0x20 to 0xff, copy it back to the device, and write it to mtdblock7. If it doesn't work, feel free to curse me, but I'm almost certainly going to be no use to you whatsoever. Also, please, do not just attempt to mechanically apply this to other hotspots. It's not going to end well.

comment count unavailable comments

Making unphishable 2FA phishable

Posted by Matthew Garrett on November 30, 2022 09:53 PM
One of the huge benefits of WebAuthn is that it makes traditional phishing attacks impossible. An attacker sends you a link to a site that looks legitimate but isn't, and you type in your credentials. With SMS or TOTP-based 2FA, you type in your second factor as well, and the attacker now has both your credentials and a legitimate (if time-limited) second factor token to log in with. WebAuthn prevents this by verifying that the site it's sending the secret to is the one that issued it in the first place - visit an attacker-controlled site and said attacker may get your username and password, but they won't be able to obtain a valid WebAuthn response.

But what if there was a mechanism for an attacker to direct a user to a legitimate login page, resulting in a happy WebAuthn flow, and obtain valid credentials for that user anyway? This seems like the lead-in to someone saying "The Aristocrats", but unfortunately it's (a) real, (b) RFC-defined, and (c) implemented in a whole bunch of places that handle sensitive credentials. The villain of this piece is RFC 8628, and while it exists for good reasons it can be used in a whole bunch of ways that have unfortunate security consequences.

What is the RFC 8628-defined Device Authorization Grant, and why does it exist? Imagine a device that you don't want to type a password into - either it has no input devices at all (eg, some IoT thing) or it's awkward to type a complicated password (eg, a TV with an on-screen keyboard). You want that device to be able to access resources on behalf of a user, so you want to ensure that that user authenticates the device. RFC 8628 describes an approach where the device requests the credentials, and then presents a code to the user (either on screen or over Bluetooth or something), and starts polling an endpoint for a result. The user visits a URL and types in that code (or is given a URL that has the code pre-populated) and is then guided through a standard auth process. The key distinction is that if the user authenticates correctly, the issued credentials are passed back to the device rather than the user - on successful auth, the endpoint the device is polling will return an oauth token.

But what happens if it's not a device that requests the credentials, but an attacker? What if said attacker obfuscates the URL in some way and tricks a user into clicking it? The user will be presented with their legitimate ID provider login screen, and if they're using a WebAuthn token for second factor it'll work correctly (because it's genuinely talking to the real ID provider!). The user will then typically be prompted to approve the request, but in every example I've seen the language used here is very generic and doesn't describe what's going on or ask the user. AWS simply says "An application or device requested authorization using your AWS sign-in" and has a big "Allow" button, giving the user no indication at all that hitting "Allow" may give a third party their credentials.

This isn't novel! Christoph Tafani-Dereeper has an excellent writeup on this topic from last year, which builds on Nestori Syynimaa's earlier work. But whenever I've talked about this, people seem surprised at the consequences. WebAuthn is supposed to protect against phishing attacks, but this approach subverts that protection by presenting the user with a legitimate login page and then handing their credentials to someone else.

RFC 8628 actually recognises this vector and presents a set of mitigations. Unfortunately nobody actually seems to implement these, and most of the mitigations are based around the idea that this flow will only be used for physical devices. Sadly, AWS uses this for initial authentication for the aws-cli tool, so there's no device in that scenario. Another mitigation is that there's a relatively short window where the code is valid, and so sending a link via email is likely to result in it expiring before the user clicks it. An attacker could avoid this by directing the user to a domain under their control that triggers the flow and then redirects the user to the login page, ensuring that the code is only generated after the user has clicked the link.

Can this be avoided? The best way to do so is to ensure that you don't support this token issuance flow anywhere, or if you do then ensure that any tokens issued that way are extremely narrowly scoped. Unfortunately if you're an AWS user, that's probably not viable - this flow is required for the cli tool to perform SSO login, and users are going to end up with broadly scoped tokens as a result. The logs are also not terribly useful.

The infuriating thing is that this isn't necessary for CLI tooling. The reason this approach is taken is that you need a way to get the token to a local process even if the user is doing authentication in a browser. This can be avoided by having the process listen on localhost, and then have the login flow redirect to localhost (including the token) on successful completion. In this scenario the attacker can't get access to the token without having access to the user's machine, and if they have that they probably have access to the token anyway.

There's no real moral here other than "Security is hard". Sorry.

comment count unavailable comments

Poking a mobile hotspot

Posted by Matthew Garrett on November 26, 2022 07:09 AM
I've been playing with an Orbic Speed, a relatively outdated device that only speaks LTE Cat 4, but the towers I can see from here are, uh, not well provisioned so throughput really isn't a concern (and refurbs are $18, so). As usual I'm pretty terrible at just buying devices and using them for their intended purpose, and in this case it has the irritating behaviour that if there's a power cut and the battery runs out it doesn't boot again when power returns, so here's what I've learned so far.

First, it's clearly running Linux (nmap indicates that, as do the headers from the built-in webserver). The login page for the web interface has some text reading "Open Source Notice" that highlights when you move the mouse over it, but that's it - there's code to make the text light up, but it's not actually a link. There's no exposed license notices at all, although there is a copy on the filesystem that doesn't seem to be reachable from anywhere. The notice tells you to email them to receive source code, but doesn't actually provide an email address.

Still! Let's see what else we can figure out. There's no open ports other than the web server, but there is an update utility that includes some interesting components. First, there's a copy of adb, the Android Debug Bridge. That doesn't mean the device is running Android, it's common for embedded devices from various vendors to use a bunch of Android infrastructure (including the bootloader) while having a non-Android userland on top. But this is still slightly surprising, because the device isn't exposing an adb interface over USB. There's also drivers for various Qualcomm endpoints that are, again, not exposed. Running the utility under Windows while the modem is connected results in the modem rebooting and Windows talking about new hardware being detected, and watching the device manager shows a bunch of COM ports being detected and bound by Qualcomm drivers. So, what's it doing?

Sticking the utility into Ghidra and looking for strings that correspond to the output that the tool conveniently leaves in the logs subdirectory shows that after finding a device it calls vendor_device_send_cmd(). This is implemented in a copy of libusb-win32 that, again, has no offer for source code. But it's also easy to drop that into Ghidra and discover thatn vendor_device_send_cmd() is just a wrapper for usb_control_msg(dev,0xc0,0xa0,0,0,NULL,0,1000);. Sending that from Linux results in the device rebooting and suddenly exposing some more USB endpoints, including a functional adb interface. Although, annoyingly, the rndis interface that enables USB tethering via the modem is now missing.

Unfortunately the adb user is unprivileged, but most files on the system are world-readable. data/logs/atfwd.log is especially interesting. This modem has an application processor built into the modem chipset itself, and while the modem implements the Hayes Command Set there's also a mechanism for userland to register that certain AT commands should be pushed up to userland. These are handled by the atfwd_daemon that runs as root, and conveniently logs everything it's up to. This includes having logged all the communications executed when the update tool was run earlier, so let's dig into that.

The system sends a bunch of AT+SYSCMD= commands, each of which is in the form of echo (stuff) >>/usrdata/sec/chipid. Once that's all done, it sends AT+CHIPID, receives a response of CHIPID:PASS, and then AT+SER=3,1, at which point the modem reboots back into the normal mode - adb is gone, but rndis is back. But the logs also reveal that between the CHIPID request and the response is a security check that involves RSA. The logs on the client side show that the text being written to the chipid file is a single block of base64 encoded data. Decoding it just gives apparently random binary. Heading back to Ghidra shows that atfwd_daemon is reading the chipid file and then decrypting it with an RSA key. The key is obtained by calling a series of functions, each of which returns a long base64-encoded string. Decoding each of these gives 1028 bytes of high entropy data, which is then passed to another function that decrypts it using AES CBC using a key of 000102030405060708090a0b0c0d0e0f and an initialization vector of all 0s. This is somewhat weird, since there's 1028 bytes of data and 128 bit AES works on blocks of 16 bytes. The behaviour of OpenSSL is apparently to just pad the data out to a multiple of 16 bytes, but that implies that we're going to end up with a block of garbage at the end. It turns out not to matter - despite the fact that we decrypt 1028 bytes of input only the first 200 bytes mean anything, with the rest just being garbage. Concatenating all of that together gives us a PKCS#8 private key blob in PEM format. Which means we have not only the private key, but also the public key.

So, what's in the encrypted data, and where did it come from in the first place? It turns out to be a JSON blob that contains the IMEI and the serial number of the modem. This is information that can be read from the modem in the first place, so it's not secret. The modem decrypts it, compares the values in the blob to its own values, and if they match sets a flag indicating that validation has succeeeded. But what encrypted it in the first place? It turns out that the json blob is just POSTed to http://pro.w.ifelman.com/api/encrypt and an encrypted blob returned. Of course, the fact that it's being encrypted on the server with the public key and sent to the modem that decrypted with the private key means that having access to the modem gives us the public key as well, which means we can just encrypt our own blobs.

What does that buy us? Digging through the code shows the only case that it seems to matter is when parsing the AT+SER command. The first argument to this is the serial mode to transition to, and the second is whether this should be a temporary transition or a permanent one. Once parsed, these arguments are passed to /sbin/usb/compositions/switch_usb which just writes the mode out to /usrdata/mode.cfg (if permanent) or /usrdata/mode_tmp.cfg (if temporary). On boot, /data/usb/boot_hsusb_composition reads the number from this file and chooses which USB profile to apply. This requires no special permissions, except if the number is 3 - if so, the RSA verification has to be performed first. This is somewhat strange, since mode 9 gives the same rndis functionality as mode 3, but also still leaves the debug and diagnostic interfaces enabled.

So what's the point of all of this? I'm honestly not sure! It doesn't seem like any sort of effective enforcement mechanism (even ignoring the fact that you can just create your own blobs, if you change the IMEI on the device somehow, you can just POST the new values to the server and get back a new blob), so the best I've been able to come up with is to ensure that there's some mapping between IMEI and serial number before the device can be transitioned into production mode during manufacturing.

But, uh, we can just ignore all of this anyway. Remember that AT+SYSCMD= stuff that was writing the data to /usrdata/sec/chipid in the first place? Anything that's passed to AT+SYSCMD is just executed as root. Which means we can just write a new value (including 3) to /usrdata/mode.cfg in the first place, without needing to jump through any of these hoops. Which also means we can just adb push a shell onto there and then use the AT interface to make it suid root, which avoids needing to figure out how to exploit any of the bugs that are just sitting there given it's running a 3.18.48 kernel.

Anyway, I've now got a modem that's got working USB tethering and also exposes a working adb interface, and I've got root on it. Which let me dump the bootloader and discover that it implements fastboot and has an oem off-mode-charge command which solves the problem I wanted to solve of having the device boot when it gets power again. Unfortunately I still need to get into fastboot mode. I haven't found a way to do it through software (adb reboot bootloader doesn't do anything), but this post suggests it's just a matter of grounding a test pad, at which point I should just be able to run fastboot oem off-mode-charge and it'll be all set. But that's a job for tomorrow.

Edit: Got into fastboot mode and ran fastboot oem off-mode-charge 0 but sadly it doesn't actually do anything, so I guess next is going to involve patching the bootloader binary. Since it's signed with a cert titled "General Use Test Key (for testing only)" it apparently doesn't have secure boot enabled, so this should be easy enough.

comment count unavailable comments

macOS Dark Mode

Posted by Caolán McNamara on November 24, 2022 03:39 PM


For LibreOffice 7.5 I've reworked the theming on macOS to get some support for Dark Mode, as seen above. As a side effect "accent colors" work in Light Mode too.

Crosswords 0.3.6: “The Man from G.N.O.M.E”

Posted by Jonathan Blandford on November 07, 2022 11:25 AM

Note: a half-written version of this blog post was accidentally published earlier, and was briefly picked up by planet.gnome.org. The [Publish] and [Preview] buttons are unfortunately right next to each other 🙁

Release

It’s time for another GNOME Crosswords release! This one is particularly exciting to me as it brings a lot of new content. The major highlights include new puzzles, a new distribution channel, and the start of adaptive behavior with an eye to working on mobile. It’s available for download in flathub.

New Puzzles

First — and most important — is the addition of new puzzles. We added three more puzzle sets from different sources. I wrote to a number of crossword creators in the local FOSS community and asked if I could include their puzzles in this release. Thanks to Bob, Carl, and dew.mountain for kindly agreeing to add their puzzles to the distribution.

Next, Rosanna and I sat down and together wrote some quick mini-puzzles. These were pretty  fun to create, so hopefully we can build a good collection over time. Let us know if you’re interested in helping us.

Finally, I wrote to Māyā of Auckland to ask if they minded if I included their puzzles. Māyā writes cryptic crosswords that are of a very high quality — and that are a lot of fun to solve. They very kindly agreed to let me distribute them, so I’ve included ~90 cryptics.

This adds up to over 100 new puzzles available in this version.

Adding these puzzles took a surprising amount of time. They ended up highlighting a half-dozen or so bugs in the code base, and lead to substantial rewriting of a number of sections. In detail:

  • One of Bob’s puzzles came in puz format and used circled letters, so I added circle support to the puz importer. This is used fairly commonly. Now that we support it, I’m noticing circles a lot more in the wild.
  • Some of Māyā’s puzzles were made available in JPZ format. As a result, I finally wrote a convertor for this format. I also refactored the convertor to make it possible to add other formats in the future.
    NOTE: I did not have a lot of jpz puzzles to test while writing this. If you have some, please try this feature out and let me know if it doesn’t load correctly.
  • Māyā’s puzzles also thoroughly broke the enumeration code. The old regex-based parser wasn’t good enough to handle them, so I wrote a state-machine-based parser to handle this properly. There were really cute enumerations in those puzzles like:

    After first first exchange, 10 relates to blood vessels (7-)
    Solo star of this? Yes and no (3,3,4,1.1.1.1.1)that couldn’t be parsed with the old approach.
  • These puzzles also forced me to get much more strict about the text handling. The ipuz spec allows a subset of html in certain tags, and you certainly run into puzzles with this in the wild. Complicating matters, you also come across random ampersands (&) that weren’t escaped, as well as other random entities.I wrote a simple html_to_markup() function that parses html (as best as I can) and converts it to valid GMarkup tags to pass to labels. This is mostly a matter of handling entities and escaping unknown tags, but there were definitely some subtleties involved. If someone has a better version of this functionality, let me know. On the other hand, maybe this implementation is helpful to someone else out there.

Thanks again to these puzzle authors!

Adaptive layout

The next big feature we added is making Crosswords work well at different screen sizes. I had a couple requests to make this work on mobile devices and Sam did a great design, so we tried to make this plausible. Unfortunately, there’s not a lot of formal guidance as to what’s needed here, other than to support small screens. I’d love if the GNOME Mobile folks would put together a checklist of things to support, as well as instructions as how best to test / simulate that environment without having dedicated hardware.

Nevertheless, we now support smaller screens! Here’s a video of it in action.

<video class="wp-video-shortcode" controls="controls" height="460" id="video-7001-1" preload="metadata" width="840"><source src="https://blogs.gnome.org/jrb/files/2022/11/Screencast-from-11-06-2022-033143-PM.webm?_=1" type="video/webm">https://blogs.gnome.org/jrb/files/2022/11/Screencast-from-11-06-2022-033143-PM.webm</video>

To make this work proved to be pretty tricky, as that packing is impossible to do with a standard GtkBox layout. The challenge is that that scrolled window holding the grid needs to have vexpand=TRUE set, but also have a maximum size so that the clue at the bottom doesn’t extend beyond the grid. I finally gave up and wrote a custom widget to do this sizing:

Note: I got a lot closer to a working solution with constraints, and was really impressed with how well they worked. Emmanuele has done a fantastic job with them. There’s a bit of a learning curve to get started, but they’re definitely worth a second look if you haven’t tried using them before. They’re more powerful than regular box-based solutions, and ended up being easier to maintain and experiment with.

For the sizing behavior to feel right, I also wrote code to restore the window size on restart. Fortunately, there’s a great example of how to do this in the tutorial. Unfortunately, the example doesn’t work correctly. After figuring out why, I wrote an MR to fix the documentation. Unfortunately I don’t know vala well enough to update that example, maybe someone who does can help me update it?

Availability

One more fantastic contribution arrived  towards the end of the release cycle: Davide and Michel packaged Crosswords for Fedora. I had never actually tried to install the game locally, but meson magically worked and they got it built. Now, running ‘sudo dnf install crosswords‘ works with f36/f37. As it’s Fedora, I may have to spend some time next cycle to straighten out all the licenses across the code base and puzzles, but that’s time worth spending.

One spooky Halloween-themed bug came with this work: word-list-tests test failure on s390x. That’s a category of bug I never expected to have to deal with ever again in my life!

In general, I’m really excited about the future of flatpak and what it means to the larger Linux ecosystem. I will continue to focus my efforts on that. But having it on Fedora is nice validation, and hopefully will lead to more people solving crosswords. I’d love it if other distros follow suit and ship this as well.

Odds and Ends

Other fixes this cycle include:

  • Massive cleanup to event code (Federico). This was huge, and moves us to a simpler event system. Really appreciated!
  • Also, Federico let me call him a couple Saturdays to get general design advice
  • New setting to avoid changing crossword direction when arrowing through the grid (Jonathan and Rosanna)
  • Mime types defined for ipuz, jpz, and puz files (Davide)
  • Cleanup of the color and contrast of shapes in cells (Jonathan)
  • New Italian translation (Davide)
  • MacOS fixes and testing (Vinson)
  • Loads of bug fixes (all)

Until next time!

Stop Using QtWebKit

Posted by Michael Catanzaro on November 04, 2022 05:20 PM

Today, WebKit in Linux operating systems is much more secure than it used to be. The problems that I previously discussed in this old, formerly-popular blog post are nowadays a thing of the past. Most major Linux operating systems now update WebKitGTK and WPE WebKit on a regular basis to ensure known vulnerabilities are fixed. (Not all Linux operating systems include WPE WebKit. It’s basically WebKitGTK without the dependency on GTK, and is the best choice if you want to use WebKit on embedded devices.) All major operating systems have removed older, insecure versions of WebKitGTK (“WebKit 1”) that were previously a major security problem for Linux users. And today WebKitGTK and WPE WebKit both provide a webkit_web_context_set_sandbox_enabled() API which, if enabled, employs Linux namespaces to prevent a compromised web content process from accessing your personal data, similar to Flatpak’s sandbox. (If you are a developer and your application does not already enable the sandbox, you should fix that!)

Unfortunately, QtWebKit has not benefited from these improvements. QtWebKit was removed from the upstream WebKit codebase back in 2013. Its current status in Fedora is, unfortunately, representative of other major Linux operating systems. Fedora currently contains two versions of QtWebKit:

  • The qtwebkit package contains upstream QtWebKit 2.3.4 from 2014. I believe this is used by Qt 4 applications. For avoidance of doubt, you should not use applications that depend on a web engine that has not been updated in eight years.
  • The newer qt5-qtwebkit contains Konstantin Tokarev’s fork of QtWebKit, which is de facto the new upstream and without a doubt the best version of QtWebKit available currently. Although it has received occasional updates, most recently 5.212.0-alpha4 from March 2020, it’s still based on WebKitGTK 2.12 from 2016, and the release notes bluntly state that it’s not very safe to use. Looking at WebKitGTK security advisories beginning with WSA-2016-0006, I manually counted 507 CVEs that have been fixed in WebKitGTK 2.14.0 or newer.

These CVEs are mostly (but not exclusively) remote code execution vulnerabilities. Many of those CVEs no doubt correspond to bugs that were introduced more recently than 2.12, but the exact number is not important: what’s important is that it’s a lot, far too many for backporting security fixes to be practical. Since qt5-qtwebkit is two years newer than qtwebkit, the qtwebkit package is no doubt in even worse shape. And because QtWebKit does not have any web process sandbox, any remote code execution is game over: an attacker that exploits QtWebKit gains full access to your user account on your computer, and can steal or destroy all your files, read all your passwords out of your password manager, and do anything else that your user account can do with your computer. In contrast, with WebKitGTK or WPE WebKit’s web process sandbox enabled, attackers only get access to content that’s mounted within the sandbox, which is a much more limited environment without access to your home directory or session bus.

In short, it’s long past time for Linux operating systems to remove QtWebKit and everything that depends on it. Do not feed untrusted data into QtWebKit. Don’t give it any HTML that you didn’t write yourself, and certainly don’t give it anything that contains injected data. Uninstall it and whatever applications depend on it.

Update: I forgot to mention what to do if you are a developer and your application still uses QtWebKit. You should ensure it uses the most recent release of QtWebEngine for Qt 6. Do not use old versions of Qt 6, and do not use QtWebEngine for Qt 5.

Linux Boot Partitions

Posted by Lennart Poettering on November 02, 2022 11:00 PM

💽 Linux Boot Partitions and How to Set Them Up 🚀

Let’s have a look how traditional Linux distributions set up /boot/ and the ESP, and how this could be improved.

How Linux distributions traditionally have been setting up their “boot” file systems has been varying to some degree, but the most common choice has been to have a separate partition mounted to /boot/. Usually the partition is formatted as a Linux file system such as ext2/ext3/ext4. The partition contains the kernel images, the initrd and various boot loader resources. Some distributions, like Debian and Ubuntu, also store ancillary files associated with the kernel here, such as kconfig or System.map. Such a traditional boot partition is only defined within the context of the distribution, and typically not immediately recognizable as such when looking just at the partition table (i.e. it uses the generic Linux partition type UUID).

With the arrival of UEFI a new partition relevant for boot appeared, the EFI System Partition (ESP). This partition is defined by the firmware environment, but typically accessed by Linux to install or update boot loaders. The choice of file system is not up to Linux, but effectively mandated by the UEFI specifications: vFAT. In theory it could be formatted as other file systems too. However, this would require the firmware to support file systems other than vFAT. This is rare and firmware specific though, as vFAT is the only file system mandated by the UEFI specification. In other words, vFAT is the only file system which is guaranteed to be universally supported.

There’s a major overlap of the type of the data typically stored in the ESP and in the traditional boot partition mentioned earlier: a variety of boot loader resources as well as kernels/initrds.

Unlike the traditional boot partition, the ESP is easily recognizable in the partition table via its GPT partition type UUID. The ESP is also a shared resource: all OSes installed on the same disk will share it and put their boot resources into them (as opposed to the traditional boot partition, of which there is one per installed Linux OS, and only that one will put resources there).

To summarize, the most common setup on typical Linux distributions is something like this:

Type Linux Mount Point File System Choice
Linux “Boot” Partition /boot/ Any Linux File System, typically ext2/ext3/ext4
ESP /boot/efi/ vFAT

As mentioned, not all distributions or local installations agree on this. For example, it’s probably worth mentioning that some distributions decided to put kernels onto the root file system of the OS itself. For this setup to work the boot loader itself [sic!] must implement a non-trivial part of the storage stack. This may have to include RAID, storage drivers, networked storage, volume management, disk encryption, and Linux file systems. Leaving aside the conceptual argument that complex storage stacks don’t belong in boot loaders there are very practical problems with this approach. Reimplementing the Linux storage stack in all its combinations is a massive amount of work. It took decades to implement what we have on Linux now, and it will take a similar amount of work to catch up in the boot loader’s reimplementation. Moreover, there’s a political complication: some Linux file system communities made clear they have no interest in supporting a second file system implementation that is not maintained as part of the Linux kernel.

What’s interesting is that the /boot/efi/ mount point is nested below the /boot/ mount point. This effectively means that to access the ESP the Boot partition must exist and be mounted first. A system with just an ESP and without a Boot partition hence doesn’t fit well into the current model. The Boot partition will also have to carry an empty “efi” directory that can be used as the inner mount point, and serves no other purpose.

Given that the traditional boot partition and the ESP may carry similar data (i.e. boot loader resources, kernels, initrds) one may wonder why they are separate concepts. Historically, this was the easiest way to make the pre-UEFI way how Linux systems were booted compatible with UEFI: conceptually, the ESP can be seen as just a minor addition to the status quo ante that way. Today, primarily two reasons remained:

  • Some distributions see a benefit in support for complex Linux file system concepts such as hardlinks, symlinks, SELinux labels/extended attributes and so on when storing boot loader resources. – I personally believe that making use of features in the boot file systems that the firmware environment cannot really make sense of is very clearly not advisable. The UEFI file system APIs know no symlinks, and what is SELinux to UEFI anyway? Moreover, putting more than the absolute minimum of simple data files into such file systems immediately raises questions about how to authenticate them comprehensively (including all fancy metadata) cryptographically on use (see below).

  • On real-life systems that ship with non-Linux OSes the ESP often comes pre-installed with a size too small to carry multiple Linux kernels and initrds. As growing the size of an existing ESP is problematic (for example, because there’s no space available immediately after the ESP, or because some low-quality firmware reacts badly to the ESP changing size) placing the kernel in a separate, secondary partition (i.e. the boot partition) circumvents these space issues.

File System Choices

We already mentioned that the ESP effectively has to be vFAT, as that is what UEFI (more or less) guarantees. The file system choice for the boot partition is not quite as restricted, but using arbitrary Linux file systems is not really an option either. The file system must be accessible by both the boot loader and the Linux OS. Hence only file systems that are available in both can be used. Note that such secondary implementations of Linux file systems in the boot environment – limited as they may be – are not typically welcomed or supported by the maintainers of the canonical file system implementation in the upstream Linux kernel. Modern file systems are notoriously complicated and delicate and simply don’t belong in boot loaders.

In a trusted boot world, the two file systems for the ESP and the /boot/ partition should be considered untrusted: any code or essential data read from them must be authenticated cryptographically before use. And even more, the file system structures themselves are also untrusted. The file system driver reading them must be careful not to be exploitable by a rogue file system image. Effectively this means a simple file system (for which a driver can be more easily validated and reviewed) is generally a better choice than a complex file system (Linux file system communities made it pretty clear that robustness against rogue file system images is outside of their scope and not what is being tested for.).

Some approaches tried to address the fact that boot partitions are untrusted territory by encrypting them via a mechanism compatible to LUKS, and adding decryption capabilities to the boot loader so it can access it. This misses the point though, as encryption does not imply authentication, and only authentication is typically desired. The boot loader and kernel code are typically Open Source anyway, and hence there’s little value in attempting to keep secret what is already public knowledge. Moreover, encryption implies the existence of an encryption key. Physically typing in the decryption key on a keyboard might still be acceptable on desktop systems with a single human user in front, but outside of that scenario unlock via TPM, PKCS#11 or network services are typically required. And even on the desktop FIDO2 unlocking is probably the future. Implementing all the technologies these unlocking mechanisms require in the boot loader is not realistic, unless the boot loader shall become a full OS on its own as it would require subsystems for FIDO2, PKCS#11, USB, Bluetooth network, smart card access, and so on.

File System Access Patterns

Note that traditionally both mentioned partitions were read-only during most parts of the boot. Only later, once the OS is up, write access was required to implement OS or boot loader updates. In today’s world things have become a bit more complicated. A modern OS might want to require some limited write access already in the boot loader, to implement boot counting/boot assessment/automatic fallback (e.g., if the same kernel fails to boot 3 times, automatically revert to older kernel), or to maintain an early storage-based random seed. This means that even though the file system is mostly read-only, we need limited write access after all.

vFAT cannot compete with modern Linux file systems such as btrfs when it comes to data safety guarantees. It’s not a journaled file system, does not use CoW or any form of checksumming. This means when used for the system boot process we need to be particularly careful when accessing it, and in particular when making changes to it (i.e., trying to keep changes local to single sectors). It is essential to use write patterns that minimize the chance of file system corruption. Checking the file system (“fsck”) before modification (and probably also reading) is important, as is ensuring the file system is put into a “clean” state as quickly as possible after each modification.

Code quality of the firmware in typical systems is known to not always be great. When relying on the file system driver included in the firmware it’s hence a good idea to limit use to operations that have a better chance to be correctly implemented. For example, when writing from the UEFI environment it might be wise to avoid any operation that requires allocation algorithms, but instead focus on access patterns that only override already written data, and do not require allocation of new space for the data.

Besides write access from the boot loader code (as described above) these file systems will require write access from the OS, to facilitate boot loader and kernel/initrd updates. These types of accesses are generally not fully random accesses (i.e., never partial file updates) but usually mean adding new files as whole, and removing old files as a whole. Existing files are typically not modified once created, though they might be replaced wholly by newer versions.

Boot Loader Updates

Note that the update cycle frequencies for boot loaders and for kernels/initrds are probably similar these days. While kernels are still vastly more complex than boot loaders, security issues are regularly found in both. In particular, as boot loaders (through “shim” and similar components) carry certificate/keyring and denylist information, which typically require frequent updates. Update cycles hence have to be expected regularly.

Boot Partition Discovery

The traditional boot partition was not recognizable by looking just at the partition table. On MBR systems it was directly referenced from the boot sector of the disk, and on EFI systems from information stored in the ESP. This is less than ideal since by losing this entrypoint information the system becomes unbootable. It’s typically a better, more robust idea to make boot partitions recognizable as such in the partition table directly. This is done for the ESP via the GPT partition type UUID. For traditional boot partitions this was not done though.

Current Situation Summary

Let’s try to summarize the above:

  • Currently, typical deployments use two distinct boot partitions, often using two distinct file system implementations

  • Firmware effectively dictates existence of the ESP, and the use of vFAT

  • In userspace view: the ESP mount is nested below the general Boot partition mount

  • Resources stored in both partitions are primarily kernel/initrd, and boot loader resources

  • The mandatory use of vFAT brings certain data safety challenges, as does quality of firmware file system driver code

  • During boot limited write access is needed, during OS runtime more comprehensive write access is needed (though still not fully random).

  • Less restricted but still limited write patterns from OS environment (only full file additions/updates/removals, during OS/boot loader updates)

  • Boot loaders should not implement complex storage stacks.

  • ESP can be auto-discovered from the partition table, traditional boot partition cannot.

  • ESP and the traditional boot partition are not protected cryptographically neither in structure nor contents. It is expected that loaded files are individually authenticated after being read.

  • The ESP is a shared resource — the traditional boot partition a resource specific to each installed Linux OS on the same disk.

How to Do it Better

Now that we have discussed many of the issues with the status quo ante, let’s see how we can do things better:

  • Two partitions for essentially the same data is a bad idea. Given they carry data very similar or identical in nature, the common case should be to have only one (but see below).

  • Two file system implementations are worse than one. Given that vFAT is more or less mandated by UEFI and the only format universally understood by all players, and thus has to be used anyway, it might as well be the only file system that is used.

  • Data safety is unnecessarily bad so far: both ESP and boot partition are continuously mounted from the OS, even though access is pretty restricted: outside of update cycles access is typically not required.

  • All partitions should be auto-discoverable/self-descriptive

  • The two partitions should not be exposed as nested mounts to userspace

To be more specific, here’s how I think a better way to set this all up would look like:

  • Whenever possible, only have one boot partition, not two. On EFI systems, make it the ESP. On non-EFI systems use an XBOOTLDR partition instead (see below). Only have both in the case where a Linux OS is installed on a system that already contains an OS with an ESP that is too small to carry sufficient kernels/initrds. When a system contains a XBOOTLDR partition put kernels/initrd on that, otherwise the ESP.

  • Instead of the vaguely defined, traditional Linux “boot” partition use the XBOOTLDR partition type as defined by the Discoverable Partitions Specification. This ensures the partition is discoverable, and can be automatically mounted by things like systemd-gpt-auto-generator. Use XBOOTLDR only if you have to, i.e., when dealing with systems that lack UEFI (and where the ESP hence has no value) or to address the mentioned size issues with the ESP. Note that unlike the traditional boot partition the XBOOTLDR partition is a shared resource, i.e., shared between multiple parallel Linux OS installations on the same disk. Because of this it is typically wise to place a per-OS directory at the top of the XBOOTLDR file system to avoid conflicts.

  • Use vFAT for both partitions, it’s the only thing universally understood among relevant firmwares and Linux. It’s simple enough to be useful for untrusted storage. Or to say this differently: writing a file system driver that is not easily vulnerable to rogue disk images is much easier for vFAT than for let’s say btrfs. – But the choice of vFAT implies some care needs to be taken to address the data safety issues it brings, see below.

  • Mount the two partitions via the “automount” logic. For example, via systemd’s automount units, with a very short idle time-out (one second or so). This improves data safety immensely, as the file systems will remain mounted (and thus possibly in a “dirty” state) only for very short periods of time, when they are actually accessed – and all that while the fact that they are not mounted continuously is mostly not noticeable for applications as the file system paths remain continuously around. Given that the backing file system (vFAT) has poor data safety properties, it is essential to shorten the access for unclean file system state as much as possible. In fact, this is what the aforementioned systemd-gpt-auto-generator logic actually does by default.

  • Whenever mounting one of the two partitions, do a file system check (fsck; in fact this is also what systemd-gpt-auto-generatordoes by default, hooked into the automount logic, to run on first access). This ensures that even if the file system is in an unclean state it is restored to be clean when needed, i.e., on first access.

  • Do not mount the two partitions nested, i.e., no more /boot/efi/. First of all, as mentioned above, it should be possible (and is desirable) to only have one of the two. Hence it is simply a bad idea to require the other as well, just to be able to mount it. More importantly though, by nesting them, automounting is complicated, as it is necessary to trigger the first automount to establish the second automount, which defeats the point of automounting them in the first place. Use the two distinct mount points /efi/ (for the ESP) and /boot/ (for XBOOTLDR) instead. You might have guessed, but that too is what systemd-gpt-auto-generator does by default.

  • When making additions or updates to ESP/XBOOTLDR from the OS make sure to create a file and write it in full, then syncfs() the whole file system, then rename to give it its final name, and syncfs() again. Similar when removing files.

  • When writing from the boot loader environment/UEFI to ESP/XBOOTLDR, do not append to files or create new files. Instead overwrite already allocated file contents (for example to maintain a random seed file) or rename already allocated files to include information in the file name (and ideally do not increase the file name in length; for example to maintain boot counters).

  • Consider adopting UKIs, which minimize the number of files that need to be updated on the ESP/XBOOTLDR during OS/kernel updates (ideally down to 1)

  • Consider adopting systemd-boot, which minimizes the number of files that need to be updated on boot loader updates (ideally down to 1)

  • Consider removing any mention of ESP/XBOOTLDR from /etc/fstab, and just let systemd-gpt-auto-generator do its thing.

  • Stop implementing file systems, complex storage, disk encryption, … in your boot loader.

Implementing things like that you gain:

  • Simplicity: only one file system implementation, typically only one partition and mount point

  • Robust auto-discovery of all partitions, no need to even configure /etc/fstab

  • Data safety guarantees as good as possible, given the circumstances

To summarize this in a table:

Type Linux Mount Point File System Choice Automount
ESP /efi/ vFAT yes
XBOOTLDR /boot/ vFAT yes

A note regarding modern boot loaders that implement the Boot Loader Specification: both partitions are explicitly listed in the specification as sources for both Type #1 and Type #2 boot menu entries. Hence, if you use such a modern boot loader (e.g. systemd-boot) these two partitions are the preferred location for boot loader resources, kernels and initrds anyway.

Addendum: You got RAID?

You might wonder, what about RAID setups and the ESP? This comes up regularly in discussions: how to set up the ESP so that (software) RAID1 (mirroring) can be done on the ESP. Long story short: I’d strongly advise against using RAID on the ESP. Firmware typically doesn’t have native RAID support, and given that firmware and boot loader can write to the file systems involved, any attempt to use software RAID on them will mean that a boot cycle might corrupt the RAID sync, and immediately requires a re-synchronization after boot. If RAID1 backing for the ESP is really necessary, the only way to implement that safely would be to implement this as a driver for UEFI – but that creates certain bootstrapping issues (i.e., where to place the driver if not the ESP, a file system the driver is supposed to be used for), and also reimplements a considerable component of the OS storage stack in firmware mode, which seems problematic.

So what to do instead? My recommendation would be to solve this via userspace tooling. If redundant disk support shall be implemented for the ESP, then create separate ESPs on all disks, and synchronize them on the file system level instead of the block level. Or in other words, the tools that install/update/manage kernels or boot loaders should be taught to maintain multiple ESPs instead of one. Copy the kernels/boot loader files to all of them, and remove them from all of them. Under the assumption that the goal of RAID is a more reliable system this should be the best way to achieve that, as it doesn’t pretend the firmware could do things it actually cannot do. Moreover it minimizes the complexity of the boot loader, shifting the syncing logic to userspace, where it’s typically easier to get right.

Addendum: Networked Boot

The discussion above focuses on booting up from a local disk. When thinking about networked boot I think two scenarios are particularly relevant:

  1. PXE-style network booting. I think in this mode of operation focus should be on directly booting a single UKI image instead of a boot loader. This sidesteps the whole issue of maintaining any boot partition at all, and simplifies the boot process greatly. In scenarios where this is not sufficient, and an interactive boot menu or other boot loader features are desired, it might be a good idea to take inspiration from the UKI concept, and build a single boot loader EFI binary (such as systemd-boot), and include the UKIs for the boot menu items and other resources inside it via PE sections. Or in other words, build a single boot loader binary that is “supercharged” and contains all auxiliary resources in its own PE sections. (Note: this does not exist, it’s an idea I intend to explore with systemd-boot). Benefit: a single file has to be downloaded via PXE/TFTP, not more. Disadvantage: unused resources are downloaded unnecessarily. Either way: in this context there is no local storage, and the ESP/XBOOTLDR discussion above is without relevance.

  2. Initrd-style network booting. In this scenario the boot loader and kernel/initrd (better: UKI) are available on a local disk. The initrd then configures the network and transitions to a network share or file system on a network block device for the root file system. In this case the discussion above applies, and in fact the ESP or XBOOTLDR partition would be the only partition available locally on disk.

And this is all I have for today.

Brave New Trusted Boot World

Posted by Lennart Poettering on October 23, 2022 10:00 PM

🔐 Brave New Trusted Boot World 🚀

This document looks at the boot process of general purpose Linux distributions. It covers the status quo and how we envision Linux boot to work in the future with a focus on robustness and simplicity.

This document will assume that the reader has comprehensive familiarity with TPM 2.0 security chips and their capabilities (e.g., PCRs, measurements, SRK), boot loaders, the shim binary, Linux, initrds, UEFI Firmware, PE binaries, and SecureBoot.

Problem Description

Status quo ante of the boot logic on typical Linux distributions:

  • Most popular Linux distributions generate initrds locally, and they are unsigned, thus not protected through SecureBoot (since that would require local SecureBoot key enrollment, which is generally not done), nor TPM PCRs.

  • Boot chain is typically Firmware → shimgrub → Linux kernel → initrd (dracut or similar) → root file system

  • Firmware’s UEFI SecureBoot protects shim, shim’s key management protects grub and kernel. No code signing protects initrd. initrd acquires the key for encrypted root fs from the user (or TPM/FIDO2/PKCS11).

  • shim/grub/kernel is measured into TPM PCR 4, among other stuff

  • EFI TPM event log reports measured data into TPM PCRs, and can be used to reconstruct and validate state of TPM PCRs from the used resources.

  • No userspace components are typically measured, except for what IMA measures

  • New kernels require locally generating new boot loader scripts and generating a new initrd each time. OS updates thus mean fragile generation of multiple resources and copying multiple files into the boot partition.

Problems with the status quo ante:

  • initrd typically unlocks root file system encryption, but is not protected whatsoever, and trivial to attack and modify offline

  • OS updates are brittle: PCR values of grub are very hard to pre-calculate, as grub measures chosen control flow path, not just code images. PCR values vary wildly, and OS provided resources are not measured into separate PCRs. Grub’s PCR measurements might be useful up to a point to reason about the boot after the fact, for the most basic remote attestation purposes, but useless for calculating them ahead of time during the OS build process (which would be desirable to be able to bind secrets to future expected PCR state, for example to bind secrets to an OS in a way that it remain accessible even after that OS is updated).

  • Updates of a boot loader are not robust, require multi-file updates of ESP and boot partition, and regeneration of boot scripts

  • No rollback protection (no way to cryptographically invalidate access to TPM-bound secrets on OS updates)

  • Remote attestation of running software is needlessly complex since initrds are generated locally and thus basically are guaranteed to vary on each system.

  • Locking resources maintained by arbitrary user apps to TPM state (PCRs) is not realistic for general purpose systems, since PCRs will change on every OS update, and there’s no mechanism to re-enroll each such resource before every OS update, and remove the old enrollment after the update.

  • There is no concept to cryptographically invalidate/revoke secrets for an older OS version once updated to a new OS version. An attacker thus can always access the secrets generated on old OSes if they manage to exploit an old version of the OS — even if a newer version already has been deployed.

Goals of the new design:

  • Provide a fully signed execution path from firmware to userspace, no exceptions

  • Provide a fully measured execution path from firmware to userspace, no exceptions

  • Separate out TPM PCRs assignments, by “owner” of measured resources, so that resources can be bound to them in a fine-grained fashion.

  • Allow easy pre-calculation of expected PCR values based on booted kernel/initrd, configuration, local identity of the system

  • Rollback protection

  • Simple & robust updates: one updated file per concept

  • Updates without requiring re-enrollment/local preparation of the TPM-protected resources (no more “brittle” PCR hashes that must be propagated into every TPM-protected resource on each OS update)

  • System ready for easy remote attestation, to prove validity of booted OS, configuration and local identity

  • Ability to bind secrets to specific phases of the boot, e.g. the root fs encryption key should be retrievable from the TPM only in the initrd, but not after the host transitioned into the root fs.

  • Reasonably secure, automatic, unattended unlocking of disk encryption secrets should be possible.

  • “Democratize” use of PCR policies by defining PCR register meanings, and making binding to them robust against updates, so that external projects can safely and securely bind their own data to them (or use them for remote attestation) without risking breakage whenever the OS is updated.

  • Build around TPM 2.0 (with graceful fallback for TPM-less systems if desired, but TPM 1.2 support is out of scope)

Considered attack scenarios and considerations:

  • Evil Maid: neither online nor offline (i.e. “at rest”), physical access to a storage device should enable an attacker to read the user’s plaintext data on disk (confidentiality); neither online nor offline, physical access to a storage device should allow undetected modification/backdooring of user data or OS (integrity), or exfiltration of secrets.

  • TPMs are assumed to be reasonably “secure”, i.e. can securely store/encrypt secrets. Communication to TPM is not “secure” though and must be protected on the wire.

  • Similar, the CPU is assumed to be reasonably “secure”

  • SecureBoot is assumed to be reasonably “secure” to permit validated boot up to and including shim+boot loader+kernel (but see discussion below)

  • All user data must be encrypted and authenticated. All vendor and administrator data must be authenticated.

  • It is assumed all software involved regularly contains vulnerabilities and requires frequent updates to address them, plus regular revocation of old versions.

  • It is further assumed that key material used for signing code by the OS vendor can reasonably be kept secure (via use of HSM, and similar, where secret key information never leaves the signing hardware) and does not require frequent roll-over.

Proposed Construction

Central to the proposed design is the concept of a Unified Kernel Image (UKI). These UKIs are the combination of a Linux kernel image, and initrd, a UEFI boot stub program (and further resources, see below) into one single UEFI PE file that can either be directly invoked by the UEFI firmware (which is useful in particular in some cloud/Confidential Computing environments) or through a boot loader (which is generally useful to implement support for multiple kernel versions, with interactive or automatic selection of image to boot into, potentially with automatic fallback management to increase robustness).

UKI Components

Specifically, UKIs typically consist of the following resources:

  1. An UEFI boot stub that is a small piece of code still running in UEFI mode and that transitions into the Linux kernel included in the UKI (e.g., as implemented in sd-stub, see below)

  2. The Linux kernel to boot in the .linux PE section

  3. The initrd that the kernel shall unpack and invoke in the .initrd PE section

  4. A kernel command line string, in the .cmdline PE section

  5. Optionally, information describing the OS this kernel is intended for, in the .osrel PE section (derived from /etc/os-release of the booted OS). This is useful for presentation of the UKI in the boot loader menu, and ordering it against other entries, using the included version information.

  6. Optionally, information describing kernel release information (i.e. uname -r output) in the .uname PE section. This is also useful for presentation of the UKI in the boot loader menu, and ordering it against other entries.

  7. Optionally, a boot splash to bring to screen before transitioning into the Linux kernel in the .splash PE section

  8. Optionally, a compiled Devicetree database file, for systems which need it, in the .dtb PE section

  9. Optionally, the public key in PEM format that matches the signatures of the .pcrsig PE section (see below), in a .pcrpkey PE section.

  10. Optionally, a JSON file encoding expected PCR 11 hash values seen from userspace once the UKI has booted up, along with signatures of these expected PCR 11 hash values, matching a specific public key in the .pcrsig PE section. (Note: we use plural for “values” and “signatures” here, as this JSON file will typically carry a separate value and signature for each PCR bank for PCR 11, i.e. one pair of value and signature for the SHA1 bank, and another pair for the SHA256 bank, and so on. This ensures when enrolling or unlocking a TPM-bound secret we’ll always have a signature around matching the banks available locally (after all, which banks the local hardware supports is up to the hardware). For the sake of simplifying this already overly complex topic, we’ll pretend in the rest of the text there was only one PCR signature per UKI we have to care about, even if this is not actually the case.)

Given UKIs are regular UEFI PE files, they can thus be signed as one for SecureBoot, protecting all of the individual resources listed above at once, and their combination. Standard Linux tools such as sbsigntool and pesign can be used to sign UKI files.

UKIs wrap all of the above data in a single file, hence all of the above components can be updated in one go through single file atomic updates, which is useful given that the primary expected storage place for these UKIs is the UEFI System Partition (ESP), which is a vFAT file system, with its limited data safety guarantees.

UKIs can be generated via a single, relatively simple objcopy invocation, that glues the listed components together, generating one PE binary that then can be signed for SecureBoot. (For details on building these, see below.)

Note that the primary location to place UKIs in is the EFI System Partition (or an otherwise firmware accessible file system). This typically means a VFAT file system of some form. Hence an effective UKI size limit of 4GiB is in place, as that’s the largest file size a FAT32 file system supports.

Basic UEFI Stub Execution Flow

The mentioned UEFI stub program will execute the following operations in UEFI mode before transitioning into the Linux kernel that is included in its .linux PE section:

  1. The PE sections listed are searched for in the invoked UKI the stub is part of, and superficially validated (i.e. general file format is in order).

  2. All PE sections listed above of the invoked UKI are measured into TPM PCR 11. This TPM PCR is expected to be all zeroes before the UKI initializes. Pre-calculation is thus very straight-forward if the resources included in the PE image are known. (Note: as a single exception the .pcrsig PE section is excluded from this measurement, as it is supposed to carry the expected result of the measurement, and thus cannot also be input to it, see below for further details about this section.)

  3. If the .splash PE section is included in the UKI it is brought onto the screen

  4. If the .dtb PE section is included in the UKI it is activated using the Devicetree UEFI “fix-up” protocol

  5. If a command line was passed from the boot loader to the UKI executable it is discarded if SecureBoot is enabled and the command line from the .cmdline used. If SecureBoot is disabled and a command line was passed it is used in place of the one from .cmdline. Either way the used command line is measured into TPM PCR 12. (This of course removes any flexibility of control of the kernel command line of the local user. In many scenarios this is probably considered beneficial, but in others it is not, and some flexibility might be desired. Thus, this concept probably needs to be extended sooner or later, to allow more flexible kernel command line policies to be enforced via definitions embedded into the UKI. For example: allowing definition of multiple kernel command lines the user/boot menu can select one from; allowing additional allowlisted parameters to be specified; or even optionally allowing any verification of the kernel command line to be turned off even in SecureBoot mode. It would then be up to the builder of the UKI to decide on the policy of the kernel command line.)

  6. It will set a couple of volatile EFI variables to inform userspace about executed TPM PCR measurements (and which PCR registers were used), and other execution properties. (For example: the EFI variable StubPcrKernelImage in the 4a67b082-0a4c-41cf-b6c7-440b29bb8c4f vendor namespace indicates the PCR register used for the UKI measurement, i.e. the value “11”).

  7. An initrd cpio archive is dynamically synthesized from the .pcrsig and .pcrpkey PE section data (this is later passed to the invoked Linux kernel as additional initrd, to be overlaid with the main initrd from the .initrd section). These files are later available in the /.extra/ directory in the initrd context.

  8. The Linux kernel from the .linux PE section is invoked with with a combined initrd that is composed from the blob from the .initrd PE section, the dynamically generated initrd containing the .pcrsig and .pcrpkey PE sections, and possibly some additional components like sysexts or syscfgs.

TPM PCR Assignments

In the construction above we take possession of two PCR registers previously unused on generic Linux distributions:

  • TPM PCR 11 shall contain measurements of all components of the UKI (with exception of the .pcrsig PE section, see above). This PCR will also contain measurements of the boot phase once userspace takes over (see below).

  • TPM PCR 12 shall contain measurements of the used kernel command line. (Plus potentially other forms of parameterization/configuration passed into the UKI, not discussed in this document)

On top of that we intend to define two more PCR registers like this:

  • TPM PCR 15 shall contain measurements of the volume encryption key of the root file system of the OS.

  • [TPM PCR 13 shall contain measurements of additional extension images for the initrd, to enable a modularized initrd – not covered by this document]

(See the Linux TPM PCR Registry for an overview how these four PCRs fit into the list of Linux PCR assignments.)

For all four PCRs the assumption is that they are zero before the UKI initializes, and only the data that the UKI and the OS measure into them is included. This makes pre-calculating them straightforward: given a specific set of UKI components, it is immediately clear what PCR values can be expected in PCR 11 once the UKI booted up. Given a kernel command line (and other parameterization/configuration) it is clear what PCR values are expected in PCR 12.

Note that these four PCRs are defined by the conceptual “owner” of the resources measured into them. PCR 11 only contains resources the OS vendor controls. Thus it is straight-forward for the OS vendor to pre-calculate and then cryptographically sign the expected values for PCR 11. The PCR 11 values will be identical on all systems that run the same version of the UKI. PCR 12 only contains resources the administrator controls, thus the administrator can pre-calculate PCR values, and they will be correct on all instances of the OS that use the same parameters/configuration. PCR 15 only contains resources inherently local to the local system, i.e. the cryptographic key material that encrypts the root file system of the OS.

Separating out these three roles does not imply these actually need to be separate when used. However the assumption is that in many popular environments these three roles should be separate.

By separating out these PCRs by the owner’s role, it becomes straightforward to remotely attest, individually, on the software that runs on a node (PCR 11), the configuration it uses (PCR 12) or the identity of the system (PCR 15). Moreover, it becomes straightforward to robustly and securely encrypt data so that it can only be unlocked on a specific set of systems that share the same OS, or the same configuration, or have a specific identity – or a combination thereof.

Note that the mentioned PCRs are so far not typically used on generic Linux-based operating systems, to our knowledge. Windows uses them, but given that Windows and Linux should typically not be included in the same boot process this should be unproblematic, as Windows’ use of these PCRs should thus not conflict with ours.

To summarize:

PCR Purpose Owner Expected Value before UKI boot Pre-Calculable
11 Measurement of UKI components and boot phases OS Vendor Zero Yes
(at UKI build time)
12 Measurement of kernel command line, additional kernel runtime configuration such as systemd credentials, systemd syscfg images Administrator Zero Yes
(when system configuration is assembled)
13 System Extension Images of initrd
(and possibly more)
(Administrator) Zero Yes
(when set of extensions is assembled)
15 Measurement of root file system volume key
(Possibly later more: measurement of root file system UUIDs and labels and of the machine ID /etc/machine-id)
Local System Zero Yes
(after first boot once ll such IDs are determined)

Signature Keys

In the model above in particular two sets of private/public key pairs are relevant:

  • The SecureBoot key to sign the UKI PE executable with. This controls permissible choices of OS/kernel

  • The key to sign the expected PCR 11 values with. Signatures made with this key will end up in the .pcrsig PE section. The public key part will end up in the .pcrpkey PE section.

Typically the key pair for the PCR 11 signatures should be chosen with a narrow focus, reused for exactly one specific OS (e.g. “Fedora Desktop Edition”) and the series of UKIs that belong to it (all the way through all the versions of the OS). The SecureBoot signature key can be used with a broader focus, if desired. By keeping the PCR 11 signature key narrow in focus one can ensure that secrets bound to the signature key can only be unlocked on the narrow set of UKIs desired.

TPM Policy Use

Depending on the intended access policy to a resource protected by the TPM, one or more of the PCRs described above should be selected to bind TPM policy to.

For example, the root file system encryption key should likely be bound to TPM PCR 11, so that it can only be unlocked if a specific set of UKIs is booted (it should then, once acquired, be measured into PCR 15, as discussed above, so that later TPM objects can be bound to it, further down the chain). With the model described above this is reasonably straight-forward to do:

  • When userspace wants to bind disk encryption to a specific series of UKIs (“enrollment”), it looks for the public key passed to the initrd in the /.extra/ directory (which as discussed above originates in the .pcrpkey PE section of the UKI). The relevant userspace component (e.g. systemd) is then responsible for generating a random key to be used as symmetric encryption key for the storage volume (let’s call it disk encryption key _here, DEK_). The TPM is then used to encrypt (“seal”) the DEK with its internal Storage Root Key (TPM SRK). A TPM2 policy is bound to the encrypted DEK. The policy enforces that the DEK may only be decrypted if a valid signature is provided that matches the state of PCR 11 and the public key provided in the /.extra/ directory of the initrd. The plaintext DEK key is passed to the kernel to implement disk encryption (e.g. LUKS/dm-crypt). (Alternatively, hardware disk encryption can be used too, i.e. Intel MKTME, AMD SME or even OPAL, all of which are outside of the scope of this document.) The TPM-encrypted version of the DEK which the TPM returned is written to the encrypted volume’s superblock.

  • When userspace wants to unlock disk encryption on a specific UKI, it looks for the signature data passed to the initrd in the /.extra/ directory (which as discussed above originates in the .pcrsig PE section of the UKI). It then reads the encrypted version of the DEK from the superblock of the encrypted volume. The signature and the encrypted DEK are then passed to the TPM. The TPM then checks if the current PCR 11 state matches the supplied signature from the .pcrsig section and the public key used during enrollment. If all checks out it decrypts (“unseals”) the DEK and passes it back to the OS, where it is then passed to the kernel which implements the symmetric part of disk encryption.

Note that in this scheme the encrypted volume’s DEK is not bound to specific literal PCR hash values, but to a public key which is expected to sign PCR hash values.

Also note that the state of PCR 11 only matters during unlocking. It is not used or checked when enrolling.

In this scenario:

  • Input to the TPM part of the enrollment process are the TPM’s internal SRK, the plaintext DEK provided by the OS, and the public key later used for signing expected PCR values, also provided by the OS. – Output is the encrypted (“sealed”) DEK.

  • Input to the TPM part of the unlocking process are the TPM’s internal SRK, the current TPM PCR 11 values, the public key used during enrollment, a signature that matches both these PCR values and the public key, and the encrypted DEK. – Output is the plaintext (“unsealed”) DEK.

Note that sealing/unsealing is done entirely on the TPM chip, the host OS just provides the inputs (well, only the inputs that the TPM chip doesn’t know already on its own), and receives the outputs. With the exception of the plaintext DEK, none of the inputs/outputs are sensitive, and can safely be stored in the open. On the wire the plaintext DEK is protected via TPM parameter encryption (not discussed in detail here because though important not in scope for this document).

TPM PCR 11 is the most important of the mentioned PCRs, and its use is thus explained in detail here. The other mentioned PCRs can be used in similar ways, but signatures/public keys must be provided via other means.

This scheme builds on the functionality Linux’ LUKS2 functionality provides, i.e. key management supporting multiple slots, and the ability to embed arbitrary metadata in the encrypted volume’s superblock. Note that this means the TPM2-based logic explained here doesn’t have to be the only way to unlock an encrypted volume. For example, in many setups it is wise to enroll both this TPM-based mechanism and an additional “recovery key” (i.e. a high-entropy computer generated passphrase the user can provide manually in case they lose access to the TPM and need to access their data), of which either can be used to unlock the volume.

Boot Phases

Secrets needed during boot-up (such as the root file system encryption key) should typically not be accessible anymore afterwards, to protect them from access if a system is attacked during runtime. To implement this the scheme above is extended in one way: at certain milestones of the boot process additional fixed “words” should be measured into PCR 11. These milestones are placed at conceptual security boundaries, i.e. whenever code transitions from a higher privileged context to a less privileged context.

Specifically:

  • When the initrd initializes (“initrd-enter”)

  • When the initrd transitions into the root file system (“initrd-leave”)

  • When the early boot phase of the OS on the root file system has completed, i.e. all storage and file systems have been set up and mounted, immediately before regular services are started (“sysinit”)

  • When the OS on the root file system completed the boot process far enough to allow unprivileged users to log in (“complete”)

  • When the OS begins shut down (“shutdown”)

  • When the service manager is mostly finished with shutting down and is about to pass control to the final phase of the shutdown logic (“final”)

By measuring these additional words into PCR 11 the distinct phases of the boot process can be distinguished in a relatively straight-forward fashion and the expected PCR values in each phase can be determined.

The phases are measured into PCR 11 (as opposed to some other PCR) mostly because available PCRs are scarce, and the boot phases defined are typically specific to a chosen OS, and hence fit well with the other data measured into PCR 11: the UKI which is also specific to the OS. The OS vendor generates both the UKI and defines the boot phases, and thus can safely and reliably pre-calculate/sign the expected PCR values for each phase of the boot.

Revocation/Rollback Protection

In order to secure secrets stored at rest, in particular in environments where unattended decryption shall be possible, it is essential that an attacker cannot use old, known-buggy – but properly signed – versions of software to access them.

Specifically, if disk encryption is bound to an OS vendor (via UKIs that include expected PCR values, signed by the vendor’s public key) there must be a mechanism to lock out old versions of the OS or UKI from accessing TPM based secrets once it is determined that the old version is vulnerable.

To implement this we propose making use of one of the “counters” TPM 2.0 devices provide: integer registers that are persistent in the TPM and can only be increased on request of the OS, but never be decreased. When sealing resources to the TPM, a policy may be declared to the TPM that restricts how the resources can later be unlocked: here we use one that requires that along with the expected PCR values (as discussed above) a counter integer range is provided to the TPM chip, along with a suitable signature covering both, matching the public key provided during sealing. The sealing/unsealing mechanism described above is thus extended: the signature passed to the TPM during unsealing now covers both the expected PCR values and the expected counter range. To be able to use a signature associated with an UKI provided by the vendor to unseal a resource, the counter thus must be at least increased to the lower end of the range the signature is for. By doing so the ability is lost to unseal the resource for signatures associated with older versions of the UKI, because their upper end of the range disables access once the counter has been increased far enough. By carefully choosing the upper and lower end of the counter range whenever the PCR values for an UKI shall be signed it is thus possible to ensure that updates can invalidate prior versions’ access to resources. By placing some space between the upper and lower end of the range it is possible to allow a controlled level of fallback UKI support, with clearly defined milestones where fallback to older versions of an UKI is not permitted anymore.

Example: a hypothetical distribution FooOS releases a regular stream of UKI kernels 5.1, 5.2, 5.3, … It signs the expected PCR values for these kernels with a key pair it maintains in a HSM. When signing UKI 5.1 it includes information directed at the TPM in the signed data declaring that the TPM counter must be above 100, and below 120, in order for the signature to be used. Thus, when the UKI is booted up and used for unlocking an encrypted volume the unlocking code must first increase the counter to 100 if needed, as the TPM will otherwise refuse unlocking the volume. The next release of the UKI, i.e. UKI 5.2 is a feature release, i.e. reverting back to the old kernel locally is acceptable. It thus does not increase the lower bound, but it increases the upper bound for the counter in the signature payload, thus encoding a valid range 100…121 in the signed payload. Now a major security vulnerability is discovered in UKI 5.1. A new UKI 5.3 is prepared that fixes this issue. It is now essential that UKI 5.1 can no longer be used to unlock the TPM secrets. Thus UKI 5.3 will bump the lower bound to 121, and increase the upper bound by one, thus allowing a range 121…122. Or in other words: for each new UKI release the signed data shall include a counter range declaration where the upper bound is increased by one. The lower range is left as-is between releases, except when an old version shall be cut off, in which case it is bumped to one above the upper bound used in that release.

UKI Generation

As mentioned earlier, UKIs are the combination of various resources into one PE file. For most of these individual components there are pre-existing tools to generate the components. For example the included kernel image can be generated with the usual Linux kernel build system. The initrd included in the UKI can be generated with existing tools such as dracut and similar. Once the basic components (.linux, .initrd, .cmdline, .splash, .dtb, .osrel, .uname) have been acquired the combination process works roughly like this:

  1. The expected PCR 11 hashes (and signatures for them) for the UKI are calculated. The tool for that takes all basic UKI components and a signing key as input, and generates a JSON object as output that includes both the literal expected PCR hash values and a signature for them. (For all selected TPM2 banks)

  2. The EFI stub binary is now combined with the basic components, the generated JSON PCR signature object from the first step (in the .pcrsig section) and the public key for it (in the .pcrpkey section). This is done via a simple “objcopy” invocation resulting in a single UKI PE binary.

  3. The resulting EFI PE binary is then signed for SecureBoot (via a tool such as sbsign or similar).

Note that the UKI model implies pre-built initrds. How to generate these (and securely extend and parameterize them) is outside of the scope of this document, but a related document will be provided highlighting these concepts.

Protection Coverage of SecureBoot Signing and PCRs

The scheme discussed here touches both SecureBoot code signing and TPM PCR measurements. These two distinct mechanisms cover separate parts of the boot process.

Specifically:

  • Firmware/Shim SecureBoot signing covers bootloader and UKI

  • TPM PCR 11 covers the UKI components and boot phase

  • TPM PCR 12 covers admin configuration

  • TPM PCR 15 covers the local identity of the host

Note that this means SecureBoot coverage ends once the system transitions from the initrd into the root file system. It is assumed that trust and integrity have been established before this transition by some means, for example LUKS/dm-crypt/dm-integrity, ideally bound to PCR 11 (i.e. UKI and boot phase).

A robust and secure update scheme for PCR 11 (i.e. UKI) has been described above, which allows binding TPM-locked resources to a UKI. For PCR 12 no such scheme is currently designed, but might be added later (use case: permit access to certain secrets only if the system runs with configuration signed by a specific set of keys). Given that resources measured into PCR 15 typically aren’t updated (or if they are updated loss of access to other resources linked to them is desired) no update scheme should be necessary for it.

This document focuses on the three PCRs discussed above. Disk encryption and other userspace may choose to also bind to other PCRs. However, doing so means the PCR brittleness issue returns that this design is supposed to remove. PCRs defined by the various firmware UEFI/TPM specifications generally do not know any concept for signatures of expected PCR values.

It is known that the industry-adopted SecureBoot signing keys are too broad to act as more than a denylist for known bad code. It is thus probably a good idea to enroll vendor SecureBoot keys wherever possible (e.g. in environments where the hardware is very well known, and VM environments), to raise the bar on preparing rogue UKI-like PE binaries that will result in PCR values that match expectations but actually contain bad code. Discussion about that is however outside of the scope of this document.

Whole OS embedded in the UKI

The above is written under the assumption that the UKI embeds an initrd whose job it is to set up the root file system: find it, validate it, cryptographically unlock it and similar. Once the root file system is found, the system transitions into it.

While this is the traditional design and likely what most systems will use, it is also possible to embed a regular root file system into the UKI and avoid any transition to an on-disk root file system. In this mode the whole OS would be encapsulated in the UKI, and signed/measured as one. In such a scenario the whole of the OS must be loaded into RAM and remain there, which typically restricts the general usability of such an approach. However, for specific purposes this might be the design of choice, for example to implement self-sufficient recovery or provisioning systems.

Proposed Implementations & Current Status

The toolset for most of the above is already implemented in systemd and related projects in one way or another. Specifically:

  1. The systemd-stub (or short: sd-stub) component implements the discussed UEFI stub program

  2. The systemd-measure tool can be used to pre-calculate expected PCR 11 values given the UKI components and can sign the result, as discussed in the UKI Image Generation section above.

  3. The systemd-cryptenroll and systemd-cryptsetup tools can be used to bind a LUKS2 encrypted file system volume to a TPM and PCR 11 public key/signatures, according to the scheme described above. (The two components also implement a “recovery key” concept, as discussed above)

  4. The systemd-pcrphase component measures specific words into PCR 11 at the discussed phases of the boot process.

  5. The systemd-creds tool may be used to encrypt/decrypt data objects called “credentials” that can be passed into services and booted systems, and are automatically decrypted (if needed) immediately before service invocation. Encryption is typically bound to the local TPM, to ensure the data cannot be recovered elsewhere.

Note that systemd-stub (i.e. the UEFI code glued into the UKI) is distinct from systemd-boot (i.e. the UEFI boot loader than can manage multiple UKIs and other boot menu items and implements automatic fallback, an interactive menu and a programmatic interface for the OS among other things). One can be used without the other – both sd-stub without sd-boot and vice versa – though they integrate nicely if used in combination.

Note that the mechanisms described are relatively generic, and can be implemented and be consumed in other software too, systemd should be considered a reference implementation, though one that found comprehensive adoption across Linux distributions.

Some concepts discussed above are currently not implemented. Specifically:

  1. The rollback protection logic is currently not implemented.

  2. The mentioned measurement of the root file system volume key to PCR 15 is implemented, but not merged into the systemd main branch yet.

The UAPI Group

We recently started a new group for discussing concepts and specifications of basic OS components, including UKIs as described above. It's called the UAPI Group. Please have a look at the various documents and specifications already available there, and expect more to come. Contributions welcome!

Glossary

TPM

Trusted Platform Module; a security chip found in many modern systems, both physical systems and increasingly also in virtualized environments. Traditionally a discrete chip on the mainboard but today often implemented in firmware, and lately directly in the CPU SoC.

PCR

Platform Configuration Register; a set of registers on a TPM that are initialized to zero at boot. The firmware and OS can “extend” these registers with hashes of data used during the boot process and afterwards. “Extension” means the supplied data is first cryptographically hashed. The resulting hash value is then combined with the previous value of the PCR and the combination hashed again. The result will become the new value of the PCR. By doing this iteratively for all parts of the boot process (always with the data that will be used next during the boot process) a concept of “Measured Boot” can be implemented: as long as every element in the boot chain measures (i.e. extends into the PCR) the next part of the boot like this, the resulting PCR values will prove cryptographically that only a certain set of boot components can have been used to boot up. A standards compliant TPM usually has 24 PCRs, but more than half of those are already assigned specific meanings by the firmware. Some of the others may be used by the OS, of which we use four in the concepts discussed in this document.

Measurement

The act of “extending” a PCR with some data object.

SRK

Storage Root Key; a special cryptographic key generated by a TPM that never leaves the TPM, and can be used to encrypt/decrypt data passed to the TPM.

UKI

Unified Kernel Image; the concept this document is about. A combination of kernel, initrd and other resources. See above.

SecureBoot

A mechanism where every software component involved in the boot process is cryptographically signed and checked against a set of public keys stored in the mainboard hardware, implemented in firmware, before it is used.

Measured Boot

A boot process where each component measures (i.e., hashes and extends into a TPM PCR, see above) the next component it will pass control to before doing so. This serves two purposes: it can be used to bind security policy for encrypted secrets to the resulting PCR values (or signatures thereof, see above), and it can be used to reason about used software after the fact, for example for the purpose of remote attestation.

initrd

Short for “initial RAM disk”, which – strictly speaking – is a misnomer today, because no RAM disk is anymore involved, but a tmpfs file system instance. Also known as “initramfs”, which is also misleading, given the file system is not ramfs anymore, but tmpfs (both of which are in-memory file systems on Linux, with different semantics). The initrd is passed to the Linux kernel and is basically a file system tree in cpio archive. The kernel unpacks the image into a tmpfs (i.e., into an in-memory file system), and then executes a binary from it. It thus contains the binaries for the first userspace code the kernel invokes. Typically, the initrd’s job is to find the actual root file system, unlock it (if encrypted), and transition into it.

UEFI

Short for “Unified Extensible Firmware Interface”, it is a widely adopted standard for PC firmware, with native support for SecureBoot and Measured Boot.

EFI

More or less synonymous to UEFI, IRL.

Shim

A boot component originating in the Linux world, which in a way extends the public key database SecureBoot maintains (which is under control from Microsoft) with a second layer (which is under control of the Linux distributions and of the owner of the physical device).

PE

Portable Executable; a file format for executable binaries, originally from the Windows world, but also used by UEFI firmware. PE files may contain code and data, categorized in labeled “sections”

ESP

EFI System Partition; a special partition on a storage medium that the firmware is able to look for UEFI PE binaries in to execute at boot.

HSM

Hardware Security Module; a piece of hardware that can generate and store secret cryptographic keys, and execute operations with them, without the keys leaving the hardware (though this is configurable). TPMs can act as HSMs.

DEK

Disk Encryption Key; an asymmetric cryptographic key used for unlocking disk encryption, i.e. passed to LUKS/dm-crypt for activating an encrypted storage volume.

LUKS2

Linux Unified Key Setup Version 2; a specification for a superblock for encrypted volumes widely used on Linux. LUKS2 is the default on-disk format for the cryptsetup suite of tools. It provides flexible key management with multiple independent key slots and allows embedding arbitrary metadata in a JSON format in the superblock.

Thanks

I’d like to thank Alain Gefflaut, Anna Trikalinou, Christian Brauner, Daan de Meyer, Luca Boccassi, Zbigniew Jędrzejewski-Szmek for reviewing this text.

Windows High Contrast Improvements

Posted by Caolán McNamara on October 08, 2022 07:52 PM

 

Spent a little time this week to explore our Windows accessibility High Contrast support because I was working on a GTK High Contrast issue by coincidence when xisco mentioned the Windows one in the regular ESC.

Left is the sad original case of unreadable menubar and missing sidebar icons. Right is the newly fixed case with legible menubar and restored sidebar icons. Additionally the "default" buttons of dialogs ("insert" in this case) are now using the correct text color, preview is visible, and EditBoxes render with the correct border color for focus and mouse over. Normal themes also gain a more subtle border color in Edits to indicate focus.

Cloud desktops aren't as good as you'd think

Posted by Matthew Garrett on October 06, 2022 08:31 AM
Fast laptops are expensive, cheap laptops are slow. But even a fast laptop is slower than a decent workstation, and if your developers want a local build environment they're probably going to want a decent workstation. They'll want a fast (and expensive) laptop as well, though, because they're not going to carry their workstation home with them and obviously you expect them to be able to work from home. And in two or three years they'll probably want a new laptop and a new workstation, and that's even more money. Not to mention the risks associated with them doing development work on their laptop and then drunkenly leaving it in a bar or having it stolen or the contents being copied off it while they're passing through immigration at an airport. Surely there's a better way?

This is the thinking that leads to "Let's give developers a Chromebook and a VM running in the cloud". And it's an appealing option! You spend far less on the laptop, and the VM is probably cheaper than the workstation - you can shut it down when it's idle, you can upgrade it to have more CPUs and RAM as necessary, and you get to impose all sorts of additional neat security policies because you have full control over the network. You can run a full desktop environment on the VM, stream it to a cheap laptop, and get the fast workstation experience on something that weighs about a kilogram. Your developers get the benefit of a fast machine wherever they are, and everyone's happy.

But having worked at more than one company that's tried this approach, my experience is that very few people end up happy. I'm going to give a few reasons here, but I can't guarantee that they cover everything - and, to be clear, many (possibly most) of the reasons I'm going to describe aren't impossible to fix, they're simply not priorities. I'm also going to restrict this discussion to the case of "We run a full graphical environment on the VM, and stream that to the laptop" - an approach that only offers SSH access is much more manageable, but also significantly more restricted in certain ways. With those details mentioned, let's begin.

The first thing to note is that the overall experience is heavily tied to the protocol you use for the remote display. Chrome Remote Desktop is extremely appealing from a simplicity perspective, but is also lacking some extremely key features (eg, letting you use multiple displays on the local system), so from a developer perspective it's suboptimal. If you read the rest of this post and want to try this anyway, spend some time working with your users to find out what their requirements are and figure out which technology best suits them.

Second, let's talk about GPUs. Trying to run a modern desktop environment without any GPU acceleration is going to be a miserable experience. Sure, throwing enough CPU at the problem will get you past the worst of this, but you're still going to end up with users who need to do 3D visualisation, or who are doing VR development, or who expect WebGL to work without burning up every single one of the CPU cores you so graciously allocated to their VM. Cloud providers will happily give you GPU instances, but that's going to cost more and you're going to need to re-run your numbers to verify that this is still a financial win. "But most of my users don't need that!" you might say, and we'll get to that later on.

Next! Video quality! This seems like a trivial point, but if you're giving your users a VM as their primary interface, then they're going to do things like try to use Youtube inside it because there's a conference presentation that's relevant to their interests. The obvious solution here is "Do your video streaming in a browser on the local system, not on the VM" but from personal experience that's a super awkward pain point! If I click on a link inside the VM it's going to open a browser there, and now I have a browser in the VM and a local browser and which of them contains the tab I'm looking for WHO CAN SAY. So your users are going to watch stuff inside their VM, and re-compressing decompressed video is going to look like shit unless you're throwing a huge amount of bandwidth at the problem. And this is ignoring the additional irritation of your browser being unreadable while you're rapidly scrolling through pages, or terminal output from build processes being a muddy blur of artifacts, or the corner case of "I work for Youtube and I need to be able to examine 4K streams to determine whether changes have resulted in a degraded experience" which is a very real job and one that becomes impossible when you pass their lovingly crafted optimisations through whatever codec your remote desktop protocol has decided to pick based on some random guesses about the local network, and look everyone is going to have a bad time.

The browser experience. As mentioned before, you'll have local browsers and remote browsers. Do they have the same security policy? Who knows! Are all the third party services you depend on going to be ok with the same user being logged in from two different IPs simultaneously because they lost track of which browser they had an open session in? Who knows! Are your users going to become frustrated? Who knows oh wait no I know the answer to this one, it's "yes".

Accessibility! More of your users than you expect rely on various accessibility interfaces, be those mechanisms for increasing contrast, screen magnifiers, text-to-speech, speech-to-text, alternative input mechanisms and so on. And you probably don't know this, but most of these mechanisms involve having accessibility software be able to introspect the UI of applications in order to provide appropriate input or expose available options and the like. So, I'm running a local text-to-speech agent. How does it know what's happening in the remote VM? It doesn't because it's just getting an a/v stream, so you need to run another accessibility stack inside the remote VM and the two of them are unaware of each others existence and this works just as badly as you'd think. Alternative input mechanism? Good fucking luck with that, you're at best going to fall back to "Send synthesized keyboard inputs" and that is nowhere near as good as "Set the contents of this text box to this unicode string" and yeah I used to work on accessibility software maybe you can tell. And how is the VM going to send data to a braille output device? Anyway, good luck with the lawsuits over arbitrarily making life harder for a bunch of members of a protected class.

One of the benefits here is supposed to be a security improvement, so let's talk about WebAuthn. I'm a big fan of WebAuthn, given that it's a multi-factor authentication mechanism that actually does a good job of protecting against phishing, but if my users are running stuff inside a VM, how do I use it? If you work at Google there's a solution, but that does mean limiting yourself to Chrome Remote Desktop (there are extremely good reasons why this isn't generally available). Microsoft have apparently just specced a mechanism for doing this over RDP, but otherwise you're left doing stuff like forwarding USB over IP, and that means that your USB WebAuthn no longer works locally. It also doesn't work for any other type of WebAuthn token, such as a bluetooth device, or an Apple TouchID sensor, or any of the Windows Hello support. If you're planning on moving to WebAuthn and also planning on moving to remote VM desktops, you're going to have a bad time.

That's the stuff that comes to mind immediately. And sure, maybe each of these issues is irrelevant to most of your users. But the actual question you need to ask is what percentage of your users will hit one or more of these, because if that's more than an insignificant percentage you'll still be staffing all the teams that dealt with hardware, handling local OS installs, worrying about lost or stolen devices, and the glorious future of just being able to stop worrying about this is going to be gone and the financial benefits you promised would appear are probably not going to work out in the same way.

A lot of this falls back to the usual story of corporate IT - understand the needs of your users and whether what you're proposing actually meets them. Almost everything I've described here is a corner case, but if your company is larger than about 20 people there's a high probability that at least one person is going to fall into at least one of these corner cases. You're going to need to spend a lot of time understanding your user population to have a real understanding of what the actual costs are here, and I haven't seen anyone do that work before trying to launch this and (inevitably) going back to just giving people actual computers.

There are alternatives! Modern IDEs tend to support SSHing out to remote hosts to perform builds there, so as long as you're ok with source code being visible on laptops you can at least shift the "I need a workstation with a bunch of CPU" problem out to the cloud. The laptops are going to need to be more expensive because they're also going to need to run more software locally, but it wouldn't surprise me if this ends up being cheaper than the full-on cloud desktop experience in most cases.

Overall, the most important thing to take into account here is that your users almost certainly have more use cases than you expect, and this sort of change is going to have direct impact on the workflow of every single one of your users. Make sure you know how much that's going to be, and take that into consideration when suggesting it'll save you money.

comment count unavailable comments

Handling WebAuthn over remote SSH connections

Posted by Matthew Garrett on September 20, 2022 02:17 AM
Being able to SSH into remote machines and do work there is great. Using hardware security tokens for 2FA is also great. But trying to use them both at the same time doesn't work super well, because if you hit a WebAuthn request on the remote machine it doesn't matter how much you mash your token - it's not going to work.

But could it?

The SSH agent protocol abstracts key management out of SSH itself and into a separate process. When you run "ssh-add .ssh/id_rsa", that key is being loaded into the SSH agent. When SSH wants to use that key to authenticate to a remote system, it asks the SSH agent to perform the cryptographic signatures on its behalf. SSH also supports forwarding the SSH agent protocol over SSH itself, so if you SSH into a remote system then remote clients can also access your keys - this allows you to bounce through one remote system into another without having to copy your keys to those remote systems.

More recently, SSH gained the ability to store SSH keys on hardware tokens such as Yubikeys. If configured appropriately, this means that even if you forward your agent to a remote site, that site can't do anything with your keys unless you physically touch the token. But out of the box, this is only useful for SSH keys - you can't do anything else with this support.

Well, that's what I thought, at least. And then I looked at the code and realised that SSH is communicating with the security tokens using the same library that a browser would, except it ensures that any signature request starts with the string "ssh:" (which a genuine WebAuthn request never will). This constraint can actually be disabled by passing -O no-restrict-websafe to ssh-agent, except that was broken until this weekend. But let's assume there's a glorious future where that patch gets backported everywhere, and see what we can do with it.

First we need to load the key into the security token. For this I ended up hacking up the Go SSH agent support. Annoyingly it doesn't seem to be possible to make calls to the agent without going via one of the exported methods here, so I don't think this logic can be implemented without modifying the agent module itself. But this is basically as simple as adding another key message type that looks something like:
type ecdsaSkKeyMsg struct {
       Type        string `sshtype:"17|25"`
       Curve       string
       PubKeyBytes []byte
       RpId        string
       Flags       uint8
       KeyHandle   []byte
       Reserved    []byte
       Comments    string
       Constraints []byte `ssh:"rest"`
}
Where Type is ssh.KeyAlgoSKECDSA256, Curve is "nistp256", RpId is the identity of the relying party (eg, "webauthn.io"), Flags is 0x1 if you want the user to have to touch the key, KeyHandle is the hardware token's representation of the key (basically an opaque blob that's sufficient for the token to regenerate the keypair - this is generally stored by the remote site and handed back to you when it wants you to authenticate). The other fields can be ignored, other than PubKeyBytes, which is supposed to be the public half of the keypair.

This causes an obvious problem. We have an opaque blob that represents a keypair. We don't have the public key. And OpenSSH verifies that PubKeyByes is a legitimate ecdsa public key before it'll load the key. Fortunately it only verifies that it's a legitimate ecdsa public key, and does nothing to verify that it's related to the private key in any way. So, just generate a new ECDSA key (ecdsa.GenerateKey(elliptic.P256(), rand.Reader)) and marshal it ( elliptic.Marshal(ecKey.Curve, ecKey.X, ecKey.Y)) and we're good. Pass that struct to ssh.Marshal() and then make an agent call.

Now you can use the standard agent interfaces to trigger a signature event. You want to pass the raw challenge (not the hash of the challenge!) - the SSH code will do the hashing itself. If you're using agent forwarding this will be forwarded from the remote system to your local one, and your security token should start blinking - touch it and you'll get back an ssh.Signature blob. ssh.Unmarshal() the Blob member to a struct like
type ecSig struct {
        R *big.Int
        S *big.Int
}
and then ssh.Unmarshal the Rest member to
type authData struct {
        Flags    uint8
        SigCount uint32
}
The signature needs to be converted back to a DER-encoded ASN.1 structure (eg,
var b cryptobyte.Builder
b.AddASN1(asn1.SEQUENCE, func(b *cryptobyte.Builder) {
        b.AddASN1BigInt(ecSig.R)
        b.AddASN1BigInt(ecSig.S)
})
signatureDER, _ := b.Bytes()
, and then you need to construct the Authenticator Data structure. For this, take the RpId used earlier and generate the sha256. Append the one byte Flags variable, and then convert SigCount to big endian and append those 4 bytes. You should now have a 37 byte structure. This needs to be CBOR encoded (I used github.com/fxamacker/cbor and just called cbor.Marshal(data, cbor.EncOptions{})).

Now base64 encode the sha256 of the challenge data, the DER-encoded signature and the CBOR-encoded authenticator data and you've got everything you need to provide to the remote site to satisfy the challenge.

There are alternative approaches - you can use USB/IP to forward the hardware token directly to the remote system. But that means you can't use it locally, so it's less than ideal. Or you could implement a proxy that communicates with the key locally and have that tunneled through to the remote host, but at that point you're just reinventing ssh-agent.

And you should bear in mind that the default behaviour of blocking this sort of request is for a good reason! If someone is able to compromise a remote system that you're SSHed into, they can potentially trick you into hitting the key to sign a request they've made on behalf of an arbitrary site. Obviously they could do the same without any of this if they've compromised your local system, but there is some additional risk to this. It would be nice to have sensible MAC policies that default-denied access to the SSH agent socket and only allowed trustworthy binaries to do so, or maybe have some sort of reasonable flatpak-style portal to gate access. For my threat model I think it's a worthwhile security tradeoff, but you should evaluate that carefully yourself.

Anyway. Now to figure out whether there's a reasonable way to get browsers to work with this.

comment count unavailable comments

Bring Your Own Disaster

Posted by Matthew Garrett on September 19, 2022 07:12 AM
After my last post, someone suggested that having employers be able to restrict keys to machines they control is a bad thing. So here's why I think Bring Your Own Device (BYOD) scenarios are bad not only for employers, but also for users.

There's obvious mutual appeal to having developers use their own hardware rather than rely on employer-provided hardware. The user gets to use hardware they're familiar with, and which matches their ergonomic desires. The employer gets to save on the money required to buy new hardware for the employee. From this perspective, there's a clear win-win outcome.

But once you start thinking about security, it gets more complicated. If I, as an employer, want to ensure that any systems that can access my resources meet a certain security baseline (eg, I don't want my developers using unpatched Windows ME), I need some of my own software installed on there. And that software doesn't magically go away when the user is doing their own thing. If a user lends their machine to their partner, is the partner fully informed about what level of access I have? Are they going to feel that their privacy has been violated if they find out afterwards?

But it's not just about monitoring. If an employee's machine is compromised and the compromise is detected, what happens next? If the employer owns the system then it's easy - you pick up the device for forensic analysis and give the employee a new machine to use while that's going on. If the employee owns the system, they're probably not going to be super enthusiastic about handing over a machine that also contains a bunch of their personal data. In much of the world the law is probably on their side, and even if it isn't then telling the employee that they have a choice between handing over their laptop or getting fired probably isn't going to end well.

But obviously this is all predicated on the idea that an employer needs visibility into what's happening on systems that have access to their systems, or which are used to develop code that they'll be deploying. And I think it's fair to say that not everyone needs that! But if you hold any sort of personal data (including passwords) for any external users, I really do think you need to protect against compromised employee machines, and that does mean having some degree of insight into what's happening on those machines. If you don't want to deal with the complicated consequences of allowing employees to use their own hardware, it's rational to ensure that only employer-owned hardware can be used.

But what about the employers that don't currently need that? If there's no plausible future where you'll host user data, or where you'll sell products to others who'll host user data, then sure! But if that might happen in future (even if it doesn't right now), what's your transition plan? How are you going to deal with employees who are happily using their personal systems right now? At what point are you going to buy new laptops for everyone? BYOD might work for you now, but will it always?

And if your employer insists on employees using their own hardware, those employees should ask what happens in the event of a security breach. Whose responsibility is it to ensure that hardware is kept up to date? Is there an expectation that security can insist on the hardware being handed over for investigation? What information about the employee's use of their own hardware is going to be logged, who has access to those logs, and how long are those logs going to be kept for? If those questions can't be answered in a reasonable way, it's a huge red flag. You shouldn't have to give up your privacy and (potentially) your hardware for a job.

Using technical mechanisms to ensure that employees only use employer-provided hardware is understandably icky, but it's something that allows employers to impose appropriate security policies without violating employee privacy.

comment count unavailable comments

git signatures with SSH certificates

Posted by Matthew Garrett on September 15, 2022 01:34 AM
Last night I complained that git's SSH signature format didn't support using SSH certificates rather than raw keys, and was swiftly corrected, once again highlighting that the best way to make something happen is to complain about it on the internet in order to trigger the universe to retcon it into existence to make you look like a fool. But anyway. Let's talk about making this work!

git's SSH signing support is actually just it shelling out to ssh-keygen with a specific set of options, so let's go through an example of this with ssh-keygen. First, here's my certificate:

$ ssh-keygen -L -f id_aurora-cert.pub
id_aurora-cert.pub:
Type: ecdsa-sha2-nistp256-cert-v01@openssh.com user certificate
Public key: ECDSA-CERT SHA256:(elided)
Signing CA: RSA SHA256:(elided)
Key ID: "mgarrett@aurora.tech"
Serial: 10505979558050566331
Valid: from 2022-09-13T17:23:53 to 2022-09-14T13:24:23
Principals:
mgarrett@aurora.tech
Critical Options: (none)
Extensions:
permit-agent-forwarding
permit-port-forwarding
permit-pty

Ok! Now let's sign something:

$ ssh-keygen -Y sign -f ~/.ssh/id_aurora-cert.pub -n git /tmp/testfile
Signing file /tmp/testfile
Write signature to /tmp/testfile.sig

To verify this we need an allowed signatures file, which should look something like:

*@aurora.tech cert-authority ssh-rsa AAA(elided)

Perfect. Let's verify it:

$ cat /tmp/testfile | ssh-keygen -Y verify -f /tmp/allowed_signers -I mgarrett@aurora.tech -n git -s /tmp/testfile.sig
Good "git" signature for mgarrett@aurora.tech with ECDSA-CERT key SHA256:(elided)


Woo! So, how do we make use of this in git? Generating the signatures is as simple as

$ git config --global commit.gpgsign true
$ git config --global gpg.format ssh
$ git config --global user.signingkey /home/mjg59/.ssh/id_aurora-cert.pub


and then getting on with life. Any commits will now be signed with the provided certificate. Unfortunately, git itself won't handle verification of these - it calls ssh-keygen -Y find-principals which doesn't deal with wildcards in the allowed signers file correctly, and then falls back to verifying the signature without making any assertions about identity. Which means you're going to have to implement this in your own CI by extracting the commit and the signature, extracting the identity from the commit metadata and calling ssh-keygen on your own. But it can be made to work!

But why would you want to? The current approach of managing keys for git isn't ideal - you can kind of piggy-back off github/gitlab SSH key infrastructure, but if you're an enterprise using SSH certificates for access then your users don't necessarily have enrolled keys to start with. And using certificates gives you extra benefits, such as having your CA verify that keys are hardware-backed before issuing a cert. Want to ensure that whoever made a commit was actually on an authorised laptop? Now you can!

I'll probably spend a little while looking into whether it's plausible to make the git verification code work with certificates or whether the right thing is to fix up ssh-keygen -Y find-principals to work with wildcard identities, but either way it's probably not much effort to get this working out of the box.

Edit to add: thanks to this commenter for pointing out that current OpenSSH git actually makes this work already!

comment count unavailable comments

Crosswords 0.3.5: Border Styles

Posted by Jonathan Blandford on September 05, 2022 09:13 PM

It’s time for another GNOME Crosswords update. We’ve been busy since GUADEC and have managed to add quite a few user-visible features. We also fixed a bad bug where undo would break autosaving and added French translations. Buckle up, as this release goes to eleven!

As always, this is available on flathub.

Border Style

The first big feature in this release is recoloring the borders between cells. We previously had a single color (black) between the cells no matter what cell type it was. The recoloring is a subtle change, but I think it makes the puzzle look a lot smoother. We go with light borders between open cells, darker borders between blocks, and a really dark border around the outside. We also interpolate colors between cells with a background color explicitly set and to emphasize the cursor. Getting the details right took a good number of iterations but I’m happy with the ultimate effect.

This also validates the decision to create a widget for each border segment, as it made this a lot easier to implement. We use CSS styles to manage all the color combinations.

<figure aria-describedby="caption-attachment-6971" class="wp-caption aligncenter" id="attachment_6971" style="width: 300px"><figcaption class="wp-caption-text" id="caption-attachment-6971">Border style comparison: new (upper-left) vs old (lower-right)</figcaption></figure> <figure aria-describedby="caption-attachment-6884" class="wp-caption aligncenter" id="attachment_6884" style="width: 300px">Highlight of border colors<figcaption class="wp-caption-text" id="caption-attachment-6884">Border styles with colored backgrounds</figcaption></figure> <figure aria-describedby="caption-attachment-6899" class="wp-caption aligncenter" id="attachment_6899" style="width: 300px"><figcaption class="wp-caption-text" id="caption-attachment-6899">Acrostic puzzles also look cleaner</figcaption></figure>

SVG Overlay

The next big feature we added was an SVG overlay over the puzzle. This lets us add graphics that extend beyond the borders of the existing widgets. Doing this lets us finally support enumerations in puzzles.

What’s an enumeration? Some crosswords let you know how many words follow a clue. It does this by putting hints in parentheses. It can also indicate punctuation where appropriate. As an example, (2-6,7) matches “GO-FASTER STRIPES” in 7dn in the puzzle below. In paper crosswords, solvers will often mark these hints in the grid with a pencil. We now do that on their behalf.

<figure aria-describedby="caption-attachment-6890" class="wp-caption aligncenter" id="attachment_6890" style="width: 840px"><figcaption class="wp-caption-text" id="caption-attachment-6890">Enumerations rendered in the grid</figcaption></figure>

We can also use the svg overlay to support barred crosswords. These puzzles look really cool – though are notoriously difficult to create and solve. I don’t know if we’ll get a good source for these until we seriously improve the editor, but I ported an existing one to .ipuz for testing purposes.

<figure aria-describedby="caption-attachment-6893" class="wp-caption aligncenter" id="attachment_6893" style="width: 300px"><figcaption class="wp-caption-text" id="caption-attachment-6893">Barred crossword</figcaption></figure>

Multi-character Support

I wasn’t fully happy with the state of editing when the IJ digraph support landed a few releases ago. The input method ended up being pretty obscure. As a result, I added generic multi-character functionality to the game. This feature will let you enter an arbitrary number of characters in a single cell. This is necessary for rebus-style crosswords which you occasionally find in the wild. It can also be used for the Dutch IJ as a more standard entry. You can enable this mode this by typing Ctrl-<period> or Escape.

<figure aria-describedby="caption-attachment-6905" class="wp-caption aligncenter" id="attachment_6905" style="width: 300px"><figcaption class="wp-caption-text" id="caption-attachment-6905">Rebus-style crossword</figcaption></figure>

This adds exciting functionality for fellow, faux, heavy-metal enthusiasts! We can now enter multi-character glyph. Behold:

<video class="wp-video-shortcode" controls="controls" height="473" id="video-6878-2" loop="1" preload="metadata" width="840"><source src="https://blogs.gnome.org/jrb/files/2022/09/Untitled.webm?_=2" type="video/webm">https://blogs.gnome.org/jrb/files/2022/09/Untitled.webm</video>

Not the real answer, but close!

I’ve always wanted to do a puzzle with “SPıN̈ALTAP” as an answer. The N-diaresis doesn’t have its own unicode glyph and requires a combining character to work (u308). This feature lets you type it into the puzzle.

Keyboard Preferences

I’ve been involved with GNOME long enough to have a healthy suspicion of preference dialogs, and I resisted adding one for a long time. Unfortunately, there’s one preference that I’ve found I can’t avoid. Some people really like their online crossword widget to skip over letters when entering and some like them to overwrite letters — and there’s not a lot of compromise between the two sides. This has been the biggest behavior request I’ve received since releasing this game.

One of the goals this cycle was to refactor the internal state mechanism to make it cleaner (design doc). Along the way, it made this behavior change a lot easier to write.

<figure aria-describedby="caption-attachment-6920" class="wp-caption aligncenter" id="attachment_6920" style="width: 730px"><figcaption class="wp-caption-text" id="caption-attachment-6920">This concept is so hard to explain! Anyone have suggestions for better text?</figcaption></figure>

GNOME Contributor’s Puzzle Set

One last thing worth mentioning is a new puzzle-set (repo here). Since starting this project, I’ve had a few people send me puzzles to include, and we haven’t had a good place to put them. I created this as a place to ship puzzles from the wider GNOME-community. To start with, we’re including a handful of mini-puzzles and have started adding some standard ones. As soon as we get a bit more critical mass I’ll ship it via flathub.

If you’re interested in trying your hand at writing a crossword, this is a good way to make it available. Let me know if you’re interested.

thanks

Thanks to Federico for massive help and advise with the PlayState refactor, and for providing an SVG widget for the overlay. Hopefully GtkImage will get scalable SVG support in the future.

Thanks to Bob for finding and fixing a number of memleaks, and to the translators for language support.

Thanks to Zana and Carl for contributing puzzles.

And thanks to you for reading this far!

Download on FLATHUB

Speeding up getting firmware updates to end users

Posted by Richard Hughes on September 02, 2022 04:29 PM

At the moment, when a vendor decides to support a new device using the LVFS in Linux or ChromeOS they have to do a few things:

  1. Write a plugin for fwupd that understands how to copy the firmware into the specific device
  2. Add a quirk entry into a file that matches a specific VID/PID or VEN/DEV to tell fwupd what plugin to load for this new device
  3. Actually ship that fwupd version in the next ChromeOS release, or convince Linux distros to rebase to the new version
  4. Get an account on the LVFS
  5. Upload some firmware, test it, then push it to end-users

Then the next device comes along a few months later. This time the vendor only has to update a quirk file with a new VID/PID, convince the distributor to ship the new fwupd update and then push the new firmware. Lets look at the timescales for each thing:

  1. Write plugin: Depends on programmer and GLib proficiency, but typically a few weeks
  2. Add quirk entry: 2 minutes to write, usually less than 12 hours for upstream review
  3. Ensure latest fwupd is shipped (~30 days for upstream, +~10 days for Fedora, +several months for Ubuntu, and +almost infinity for Debian stable
  4. Get LVFS account: 10 minutes for me to add, usually a few days to get legal clearance and to do vendor checks
  5. Upload firmware: Less than 5 minutes to write release notes and upload the file, and then stable remote is synced every 6 hours

So the slow part is step 3, and it’s slower than the others by several orders of magnitude – and it’s also the part that we have to do even when adding just one more VID/PID in the quirk file. We’ve ruled out shipping quirk entries in the metadata as it means devices don’t enumerate when offline (which is a good chunk of the fwupd userbase).

So what can we do? We already support two plugins that use the class code, rather than the exact VID/PID. For example, this DFU entry means “match any USB device with class 0xFE (application specific) and subclass 0x01” which means these kind of devices don’t need any updates (although, they still might need a quirk if they are non-complaint, for example needing Flags = detach-for-attach) – but in the most case they just work:

[USB\CLASS_FE&SUBCLASS_01]
Plugin = dfu

The same can be done for Fastboot devices, matching class 0xFF (vendor specific), subclass 0x42 (sic) and protocol 0x03, although the same caveat for non-compliant devices that need things like FastbootOperationDelay = 250:

[USB\CLASS_FF&SUBCLASS_42&PROT_03]
Plugin = fastboot

I think we should move more into this kind of “device opts into fwupd plugin” direction, so the obvious answer is to somehow have a registry of class/subclass/protocol values. The USB consortium defines a few (e.g. class 0xFE subclass 0x02 is an IRDA bridge – remember those!) but the base class 0xFF is completely unspecified. It doesn’t seem right to hijack it, and you only get 255 possible values – and sometimes you really do want the class/subclass to be the correct things, e.g. base class 0x10 is “Audio/Video Devices” for example.

There is something extra we can use, the Microsoft OS Descriptors which although somewhat proprietary are still mostly specified and supported in Linux. The simpler version 1 specification could be used, and although we could squeeze FWUPDPLU or FWUPDFLA as the CompatibleID, we couldn’t squeeze the plugin name (e.g. logitech-bulkcontroller) or the GUID (16 bytes) in an 8 byte Sub-compatibleID. I guess we could just number the plugins, or use half-a-GUID or something, but then it all starts to get somewhat hacky. Also, as a final nail-in-the-coffin, some non-compliant devices also don’t respond well (as in, they hang, and stop working…) when probing the string index of 0xEE – and so it’s also not without risk. If we have an “allowlist or denylist of devices that don’t support Microsoft OS Descriptors” (like Microsoft had to do) then we’re either back at updating the quirk file for each device added – which is what we wanted to avoid in the first place – or we risk regressions on end-user machines. So pass.

The version 2 specification is somewhat more helpful. It defines a new device capability that can return variable length properties, using a UUID as a key – although we do need to use the newish “BOS” descriptor. This is only available in devices using USB 2.1 and newer, although that’s probably the majority of devices in use these days. If I understand correctly, using a USB-C requires the device to support USB-2 and above, so that’s probably most new-design modern devices covered in reality.

Lets dig into this specification a bit: Some USB 2/3 devices already export a BOS “Binary Object Store” descriptor, which includes things like Wireless USB details, USB 2.0 extensions, SuperSpeed USB connection details and a Container ID. We could certainly hijack a new bDevCapabilityType which would allow us to store a binary blob (e.g. Plugin=foobarbaz\nFlags=QuirkValueHere\n) but that doesn’t seem super awesome to just use a random out-of-specification (looking at you fastboot…) value.

What the BOS descriptor does give us is the ability to use the platform capability descriptor, which is bDevCapabilityType=0x05 according to Microsoft OS Descriptors 2.0 Specification. For UUID D8DD60DF-4589-4CC7-9CD2-659D9E648A9F, this is identified as a structured blob of data Windows usually uses to put workarounds like the suspend mode of the device and that kind of thing.

The descriptor allows us to create a “descriptor set” which is really a posh way of saying “set these per-device registry keys when plugged in” which we could certainly (ab?)use for setting fwupd quirks and matching to plugins. It’s literally the REG_EXPAND_SZ, REG_DWORD things you can see in regedit.exe. Worst case you plug the device into Windows, and you get a few useless REG_SZ’s created of things like “fwupd.Plugin=logitech_hidpp” which Windows will ignore (or we hope so) and that we can use in fwupd to remove the need for most of the quirk files completely. Of course, you’ll still need a new enough fwupd that actually contains the device plugin update code, but we can’t do anything at all about that unless someone invents an OpenHardware time machine.

Can anybody see a problem? If so, tell me now as I’m going to prototype this next week. Of course, it needs vendor buy-in but I think the LVFS is at a point where we can tell them what to do. :) Comments welcome.

New fwupd 1.8.4 release

Posted by Richard Hughes on August 30, 2022 11:14 AM

Today I tagged fwupd 1.8.4 which adds a few nice features and bug fixes. One specific enhancement I wanted to shout about is that we’re now supplying translated summary, description text and suggested actions for each HSI security failure. Two of the most common criticisms of the new GNOME security panel were “but what does it mean” and also “and what should I do” which ironically were fixed long before all the hubbub erupted. If you want to see both new bits of data then make sure you’re using gnome-control-center from the main branch and then install the new fwupd version – although if you’re stuck on a distro version of fwupd GNOME will still fallback to the single-line summary line as before.

One additional new feature that might accidentally fix another criticism with the panel is that fwupd now reads your system BIOS settings, and has the ability to change them if the user desires (and has authorization to do). This means we have to match the HSI failure (e.g. IOMMU disabled) with the BIOS setting, which isn’t standardized at all between vendors. We currently support this on modern Lenovo and Dell platforms via the firmware-attributes kernel interface; other vendors just have to add the kernel WMI bridge and it should mostly magically start to work.

As we now know what the failure is, what we need to change, and how to change it, we can actually ask the user if they want to change the setting automatically in the fwupdmgr security command line. This would allow us to add a “JFDI” action in the new GNOME device security panel rather than asking the user to manually change a firmware setting in the BIOS. We won’t do this for GNOME 43 as we need a few months of real-world testing to see what attributes are 100% safe to change on actual user systems, but for GNOME 44 the panel could be a whole lot more helpful than it is now.

A new tantalizing features then become available when using fwupd, as we can now read and change firmware settings. One is the ability to emulate the BIOS settings of another machine, which is fairly uninteresting to end users, but allows us the developers to reproduce bugs much easier now that we’re doing cleverer things. One more interesting deployment feature is that we also support reading out a file from /etc and applying those firmware settings at startup. This means you can now deploy a machine using something like Ansible, and have the firmware settings set up in the same way you set up the local machine state. There are lots of docs on how this all works and I encourage you to try this out and let us know how it goes. One caveat is that this doesn’t work if you have a password set on your BIOS settings, but we’re working on this for the next version.

Needless to say, please tell us about any problems with the new release. As always, comments welcome.

zero coverity defects for hunspell

Posted by Caolán McNamara on August 27, 2022 03:18 PM

 

After the recent 1.7.1 release of hunspell I spent a little time massaging the source to address some long standing coverity complaints and achieved the elusive 0 outstanding defects with 0 dismissed state.

Speeding up the kernel testing loop

Posted by Bastien Nocera on August 17, 2022 09:14 AM

When I create kernel contributions, I usually rely on a specific hardware, which makes using a system on which I need to deploy kernels too complicated or time-consuming to be worth it. Yes, I'm an idiot that hacks the kernel directly on their main machine, though in my defense, I usually just need to compile drivers rather than full kernels.

But sometimes I work on a part of the kernel that can't be easily swapped out, like the USB sub-system. In which case I need to test out full kernels.

I usually prefer compiling full kernels as RPMs, on my Fedora system as it makes it easier to remove old test versions and clearly tag more information in the changelog or version numbers if I need to.

Step one, build as non-root

First, if you haven't already done so, create an ~/.rpmmacros file (I know...), and add a few lines so you don't need to be root, or write stuff in /usr to create RPMs.

$ cat ~/.rpmmacros
%_topdir        /home/hadess/Projects/packages
%_tmppath        %{_topdir}/tmp

Easy enough. Now we can use fedpkg or rpmbuild to create RPMs. Don't forget to run those under “powerprofilesctl launch” to speed things up a bit.

Step two, build less

We're hacking the kernel, so let's try and build from upstream. Instead of the aforementioned fedpkg, we'll use “make binrpm-pkg” in the upstream kernel, which builds the kernel locally, as it normally would, and then packages just the binaries into an RPM. This means that you can't really redistribute the results of this command, but it's fine for our use.

 If you choose to build a source RPM using “make rpm-pkg”, know that this one will build the kernel inside rpmbuild, this will be important later.

 Now that we're building from the kernel sources, that's our time to activate the cheat code. Run “make localmodconfig”. It will generate a .config file containing just the currently loaded modules. Don't forget to modify it to include your new driver, or driver for a device you'll need for testing.

Step three, build faster

If running “make rpm-pkg” is the same as running “make ; make modules” and then packaging up the results, does that mean that the “%{?_smp_mflags}” RPM macro is ignored, I make you ask rhetorically. The answer is yes. “make -j16 rpm-pkg”. Boom. Faster.

Step four, build fasterer

As we're building in the kernel tree locally before creating a binary package, already compiled modules and binaries are kept, and shouldn't need to be recompiled. This last trick can however be used to speed up compilation significantly if you use multiple kernel trees, or need to clean the build tree for whatever reason. In my tests, it made things slightly slower for a single tree compilation.

$ sudo dnf install -y ccache
$ make CC="ccache gcc" -j16 binrpm-pkg

Easy.

And if you want to speed up the rpm-pkg build:

$ cat ~/.rpmmacros
[...]
%__cc            ccache gcc
%__cxx            ccache g++

More information is available in Speeding Up Linux Kernel Builds With Ccache.

Step five, package faster

Now, if you've implemented all this, you'll see that the compilation still stops for a significant amount of time just before writing “Wrote kernel...rpm”. A quick look at top will show a single CPU core pegged to 100% CPU. It's rpmbuild compressing the package that you will just install and forget about.

$ cat ~/.rpmmacros
[...]
%_binary_payload    w2T16.xzdio

More information is available in Accelerating Ceph RPM Packaging: Using Multithreaded Compression.

TL;DR and further work

All those changes sped up the kernel compilation part of my development from around 20 minutes to less than 2 minutes on my desktop machine.

$ cat ~/.rpmmacros
%_topdir        /home/hadess/Projects/packages
%_tmppath        %{_topdir}/tmp
%__cc            ccache gcc
%__cxx            ccache g++
%_binary_payload    w2T16.xzdio


$ powerprofilesctl launch make CC="ccache gcc" -j16 binrpm-pkg

I believe there's still significant speed ups that could be done, in the kernel, by parallelising some of the symbols manipulation, caching the BTF parsing for modules, switching the single-threaded vmlinux bzip2 compression, and not generating a headers RPM (note: tested this last one, saves 4 seconds :)

 

The results of my tests. YMMV, etc.

<style>table, th, td { border: 1px solid black; border-collapse: collapse; padding: 10px; } </style>
Command Time spent Notes
koji build --scratch --arch-override=x86_64 f36 kernel.src.rpm 129 minutes It's usually quicker, but that day must have been particularly busy
fedpkg local 70 minutes No rpmmacros changes except setting the workdir in $HOME
powerprofilesctl launch fedpkg local 25 minutes
localmodconfig / bin-rpmpkg 19 minutes Defaults to "-j2"
localmodconfig -j16 / bin-rpmpkg 1:48 minutes
powerprofilesctl launch localmodconfig ccache -j16 / bin-rpmpkg 7 minutes Cold cache
powerprofilesctl launch localmodconfig ccache -j16 / bin-rpmpkg 1:45 minutes Hot cache
powerprofilesctl launch localmodconfig xzdio -j16 / bin-rpmpkg 1:20 minutes

Common GLib Programming Errors, Part Two: Weak Pointers

Posted by Michael Catanzaro on August 11, 2022 09:40 PM

This post is a sequel to Common GLib Programming Errors, where I covered four common errors: failure to disconnect a signal handler, misuse of a GSource handle ID, failure to cancel an asynchronous function, and misuse of main contexts in library or threaded code. Although there are many ways to mess up when writing programs that use GLib, I believe the first post covered the most likely and most pernicious… except I missed weak pointers. Sébastien pointed out that these should be covered too, so here we are.

Mistake #5: Failure to Disconnect Weak Pointer

In object-oriented languages, weak pointers are a safety improvement. The idea is to hold a non-owning pointer to an object that gets automatically set to NULL when that object is destroyed to prevent use-after-free vulnerabilities. However, this only works well because object-oriented languages have destructors. Without destructors, we have to deregister the weak pointer manually, and failure to do so is a disaster that will result in memory corruption that’s extremely difficult to track down. For example:

static void
a_start_watching_b (A *self,
                    B *b)
{
  // Keep a weak reference to b. When b is destroyed,
  // self->b will automatically be set to NULL.
  self->b = b;
  g_object_add_weak_pointer (b, &self->b);
}

static void
a_do_something_with_b (Foo *self)
{
  if (self->b) {
    // Do something safely here, knowing that b
    // is assuredly still alive. This avoids a
    // use-after-free vulnerability if b is destroyed,
    // i.e. self->b cannot be dangling.
  }
}

Let’s say that the Bar in this example outlives the Foo, but Foo failed to call g_object_remove_weak_pointer() . Then when Bar is destroyed later, the memory that used to be occupied by self->bar will get clobbered with NULL. Hopefully that will result in an immediate crash. If not, good luck trying to debug what’s going wrong when some innocent variable elsewhere in your program gets randomly clobbered. This is often results in a frustrating wild goose chase when trying to track down what is going wrong (example).

The solution is to always disconnect your weak pointer. In most cases, your dispose function is the best place to do this:

static void
a_dispose (GObject *object)
{
  A *a = (A *)object;
  g_clear_weak_pointer (&a->b);
  G_OBJECT_CLASS (a_parent_class)->dispose (object);
}

Note that g_clear_weak_pointer() is equivalent to:

if (a->b) {
  g_object_remove_weak_pointer (a->b, &a->b);
  a->b = NULL;
}

but you probably guessed that, because it follows the same pattern as the other clear functions that we’ve used so far.

The new XWAYLAND extension is available

Posted by Peter Hutterer on August 11, 2022 06:50 AM

As of xorgproto 2022.2, we have a new X11 protocol extension. First, you may rightly say "whaaaat? why add new extensions to the X protocol?" in a rather unnecessarily accusing way, followed up by "that's like adding lipstick to a dodo!". And that's not completely wrong, but nevertheless, we have a new protocol extension to the ... [checks calendar] almost 40 year old X protocol. And that extension is, ever creatively, named "XWAYLAND".

If you recall, Xwayland is a different X server than Xorg. It doesn't try to render directly to the hardware, instead it's a translation layer between the X protocol and the Wayland protocol so that X clients can continue to function on a Wayland compositor. The X application is generally unaware that it isn't running on Xorg and Xwayland (and the compositor) will do their best to accommodate for all the quirks that the application expects because it only speaks X. In a way, it's like calling a restaurant and ordering a burger because the person answering speaks American English. Without realising that you just called the local fancy French joint and now the chefs will have to make a burger for you, totally without avec.

Anyway, sometimes it is necessary for a client (or a user) to know whether the X server is indeed Xwayland. Previously, this was done through heuristics: the xisxwayland tool checks for XRandR properties, the xinput tool checks for input device names, and so on. These heuristics are just that, though, so they can become unreliable as Xwayland gets closer to emulating Xorg or things just change. And properties in general are problematic since they could be set by other clients. To solve this, we now have a new extension.

The XWAYLAND extension doesn't actually do anything, it's the bare minimum required for an extension. It just needs to exist and clients only need to XQueryExtension or check for it in XListExtensions (the equivalent to xdpyinfo | grep XWAYLAND). Hence, no support for Xlib or libxcb is planned. So of all the nightmares you've had in the last 2 years, the one of misidentifying Xwayland will soon be in the past.

GUADEC Notes

Posted by Jonathan Blandford on August 08, 2022 05:28 AM
<figure aria-describedby="caption-attachment-6826" class="wp-caption aligncenter" id="attachment_6826" style="width: 226px"><figcaption class="wp-caption-text" id="caption-attachment-6826">GNOME Birthday Cake</figcaption></figure>

It was really lovely to get back to GUADEC. I loved being around old friends and meeting the new faces within the project. The venue was stellar and I thoroughly enjoyed a lot of the talks this year.

For me, my favorite talks were the progressive webapps talk by Phaedrus Leeds, Federico’s meme-filled talk on accessibility, and Rob’s talk about the Endless deployment to Oaxaca, Mexico.

[Note: I hope someone goes back to the youtube videos and adds timestamps / links to all the talks. It would be easier to find and browse  them. ]

On my part, I gave a talk on GNOME Crosswords and participated on a panel on how to get a Job with Free Software. The crosswords talk in particular seemed pretty successful. It had a great article written about it in lwn.net (thanks Jake!), which lead to an increase in bug reports and crossword contributions.

One observation: it felt that attendance was down this year. I don’t know if it was Covid or Mexico, but some combination led to a smaller crowd than usual. I saw that there was a mini-GUADEC in Berlin as well (which I’ll assume was a lagging indicator of the above and not a cause.)

If we continue to have remote GUADECs in the future, I hope we can find a way to do a better way of connecting people across the conferences. One of the real advantages of GUADEC is it is a place for the project to unify and decide on big things collectively, and I’d hate for us to develop schisms in direction. I didn’t feel particularly connected to the folks in Berlin, which is a shame. I’d love to see them again.

Hopefully a few more of us can get together in Riga next year! But if we can’t, I hope we can find better ways to be more inclusive for remote events and individuals.

Crosswords Bof

<figure aria-describedby="caption-attachment-6818" class="wp-caption aligncenter" id="attachment_6818" style="width: 300px">Crosswords BoF<figcaption class="wp-caption-text" id="caption-attachment-6818">Crosswords BoF at GUADEC</figcaption></figure>

There was enough interest from the Crosswords talk that we were able to hold an impromptu BoF on Saturday. Seven people showed up to work on various aspects of the game. We tested the existing puzzles, wrote some new puzzles, and added some additional features. Great progress was made towards UTF-8 support for the WordList, and some initial Printing support was coded.

More importantly, we were able to clean up a bunch of small papercuts. We tweaked the colors and controls, and also added acrostic support. The end result is a lot of visual tweaks to the game that improves the playability and appearance.

<figure aria-describedby="caption-attachment-6820" class="wp-caption aligncenter" id="attachment_6820" style="width: 300px"><figcaption class="wp-caption-text" id="caption-attachment-6820">Acrostic puzzle featuring the new colors and crossed-out clues</figcaption></figure>

I took a bunch of these improvements and packaged it up into a new release. It’s now available on Flathub.

Thanks to Rosanna, Neil, Federico, Heather, Jess, Dave, and Caroline for their contributions!
Download on FLATHUB

Mexico:

Mexico was lovely as always. I’ve always  wanted to visit Guadalajara, and it definitely met my (already high) expectations. It was a walkable city with lots of hidden spots and surprises in store. I’d love to go back sometime with a bit more time, and really explore it more thoroughly.

<figure aria-describedby="caption-attachment-6830" class="wp-caption aligncenter" id="attachment_6830" style="width: 226px">Guadalajara Cathedral and environs<figcaption class="wp-caption-text" id="caption-attachment-6830">Guadalajara Cathedral and environs</figcaption></figure> <figure aria-describedby="caption-attachment-6833" class="wp-caption aligncenter" id="attachment_6833" style="width: 300px">Arcos Vallarta near the venue<figcaption class="wp-caption-text" id="caption-attachment-6833">Arcos Vallarta, near the venue</figcaption></figure> <figure aria-describedby="caption-attachment-6835" class="wp-caption aligncenter" id="attachment_6835" style="width: 300px">Agave field in Tequila, Jalisco<figcaption class="wp-caption-text" id="caption-attachment-6835">Agave field in Tequila, Jalisco</figcaption></figure> <figure aria-describedby="caption-attachment-6838" class="wp-caption aligncenter" id="attachment_6838" style="width: 300px">GNOME Birthday party at Bariachi – a mariachi bar extrordinaire!<figcaption class="wp-caption-text" id="caption-attachment-6838">GNOME Birthday party at Bariachi – an amazing mariachi bar!</figcaption></figure> <figure aria-describedby="caption-attachment-6845" class="wp-caption aligncenter" id="attachment_6845" style="width: 300px">Cozy little bookstore<figcaption class="wp-caption-text" id="caption-attachment-6845">A cozy little bookstore</figcaption></figure> <figure aria-describedby="caption-attachment-6839" class="wp-caption aligncenter" id="attachment_6839" style="width: 300px">Guadalajara at night<figcaption class="wp-caption-text" id="caption-attachment-6839">Guadalajara at night</figcaption></figure>

GTK[3|4] GtkScrollbar for writer documents

Posted by Caolán McNamara on August 03, 2022 04:03 PM

 

GTK4 screenshot of writer using true GtkScrollbars rather than themed Vcl ScrollBars. Long press enters gtk's usual fine control mode for scrolling.

Emulated host profiles in fwupd

Posted by Richard Hughes on July 29, 2022 09:25 AM

As some as you may know, there might be firmware security support in the next versions of Plymouth, GNOME Control Center and GDM. This is a great thing, as most people are running terribly insecure hardware and have no idea. The great majority of these systems can be improved with a few settings changes, and the first step in my plan is showing people what’s wrong, giving some quick information, and perhaps how to change it. The next step will be a “fix the problem” button but that’s still being worked on, and will need some pretty involved testing for each OEM. For the bigger picture there’s the HSI documentation which is a heavy and technical read but the introduction might be interesting. For other 99.99% of the population here are some pretty screenshots:

To facilitate development of various UIs, fwupd now supports emulating different systems. This would allow someone to show dozens of supported devices in GNOME Firmware or to showcase the firmware security panel in the GNOME release video. Hint hint. :)

To do this, ensure you have fwupd 1.8.3 installed (or enable the COPR), and then you can do:

sudo FWUPD_HOST_EMULATE=thinkpad-p1-iommu.json.gz /usr/libexec/fwupd/fwupd

Emulation data files can be created with ./contrib/generate-emulation.py file.json in the fwupd source tree and then can be manually modified if required. Hint: it’s mostly the same output as fwupdmgr get-devices --json and fwupdmgr security --json and you can run generate-emulation.py on any existing JSON output to minimize it.

To load a custom profile, you can do something like:

sudo FWUPD_HOST_EMULATE=/tmp/my-system.json /usr/libexec/fwupd/fwupd

As a precaution, the org.fwupd.hsi.HostEmulation attribute is added so we do not ask the user to upload the HSI report. The emulated devices are also not updatable for obvious reasons. Comments welcome!

UEFI rootkits and UEFI secure boot

Posted by Matthew Garrett on July 28, 2022 10:19 PM
Kaspersky describes a UEFI-implant used to attack Windows systems. Based on it appearing to require patching of the system firmware image, they hypothesise that it's propagated by manually dumping the contents of the system flash, modifying it, and then reflashing it back to the board. This probably requires physical access to the board, so it's not especially terrifying - if you're in a situation where someone's sufficiently enthusiastic about targeting you that they're reflashing your computer by hand, it's likely that you're going to have a bad time regardless.

But let's think about why this is in the firmware at all. Sophos previously discussed an implant that's sufficiently similar in some technical details that Kaspersky suggest they may be related to some degree. One notable difference is that the MyKings implant described by Sophos installs itself into the boot block of legacy MBR partitioned disks. This code will only be executed on old-style BIOS systems (or UEFI systems booting in BIOS compatibility mode), and they have no support for code signatures, so there's no need to be especially clever. Run malicious code in the boot block, patch the next stage loader, follow that chain all the way up to the kernel. Simple.

One notable distinction here is that the MBR boot block approach won't be persistent - if you reinstall the OS, the MBR will be rewritten[1] and the infection is gone. UEFI doesn't really change much here - if you reinstall Windows a new copy of the bootloader will be written out and the UEFI boot variables (that tell the firmware which bootloader to execute) will be updated to point at that. The implant may still be on disk somewhere, but it won't be run.

But there's a way to avoid this. UEFI supports loading firmware-level drivers from disk. If, rather than providing a backdoored bootloader, the implant takes the form of a UEFI driver, the attacker can set a different set of variables that tell the firmware to load that driver at boot time, before running the bootloader. OS reinstalls won't modify these variables, which means the implant will survive and can reinfect the new OS install. The only way to get rid of the implant is to either reformat the drive entirely (which most OS installers won't do by default) or replace the drive before installation.

This is much easier than patching the system firmware, and achieves similar outcomes - the number of infected users who are going to wipe their drives to reinstall is fairly low, and the kernel could be patched to hide the presence of the implant on the filesystem[2]. It's possible that the goal was to make identification as hard as possible, but there's a simpler argument here - if the firmware has UEFI Secure Boot enabled, the firmware will refuse to load such a driver, and the implant won't work. You could certainly just patch the firmware to disable secure boot and lie about it, but if you're at the point of patching the firmware anyway you may as well just do the extra work of installing your implant there.

I think there's a reasonable argument that the existence of firmware-level rootkits suggests that UEFI Secure Boot is doing its job and is pushing attackers into lower levels of the stack in order to obtain the same outcomes. Technologies like Intel's Boot Guard may (in their current form) tend to block user choice, but in theory should be effective in blocking attacks of this form and making things even harder for attackers. It should already be impossible to perform attacks like the one Kaspersky describes on more modern hardware (the system should identify that the firmware has been tampered with and fail to boot), which pushes things even further - attackers will have to take advantage of vulnerabilities in the specific firmware they're targeting. This obviously means there's an incentive to find more firmware vulnerabilities, which means the ability to apply security updates for system firmware as easily as security updates for OS components is vital (hint hint if your system firmware updates aren't available via LVFS you're probably doing it wrong).

We've known that UEFI rootkits have existed for a while (Hacking Team had one in 2015), but it's interesting to see a fairly widespread one out in the wild. Protecting against this kind of attack involves securing the entire boot chain, including the firmware itself. The industry has clearly been making progress in this respect, and it'll be interesting to see whether such attacks become more common (because Secure Boot works but firmware security is bad) or not.

[1] As we all remember from Windows installs overwriting Linux bootloaders
[2] Although this does run the risk of an infected user booting another OS instead, and being able to see the implant

comment count unavailable comments

GTK4: Toolbar popups via GtkPopovers

Posted by Caolán McNamara on July 28, 2022 10:03 AM

 

Bootstrapped using GtkPopovers to implement popups from LibreOffice's main menubars for GTK4.

Common GLib Programming Errors

Posted by Michael Catanzaro on July 27, 2022 08:24 PM

Let’s examine four mistakes to avoid when writing programs that use GLib, or, alternatively, four mistakes to look for when reviewing code that uses GLib. Experienced GNOME developers will find the first three mistakes pretty simple and basic, but nevertheless they still cause too many crashes. The fourth mistake is more complicated.

These examples will use C, but the mistakes can happen in any language. In unsafe languages like C, C++, and Vala, these mistakes usually result in security issues, specifically use-after-free vulnerabilities.

Mistake #1: Failure to Disconnect Signal Handler

Every time you connect to a signal handler, you must think about when it should be disconnected to prevent the handler from running at an incorrect time. Let’s look at a contrived but very common example. Say you have an object A and wish to connect to a signal of object B. Your code might look like this:

static void
some_signal_cb (gpointer user_data)
{
  A *self = user_data;
  a_do_something (self);
}

static void
some_method_of_a (A *self)
{
  B *b = get_b_from_somewhere ();
  g_signal_connect (b, "some-signal", (GCallback)some_signal_cb, a);
}

Very simple. Now, consider what happens if the object B outlives object A, and Object B emits some-signal after object A has been destroyed. Then the line a_do_something (self) is a use-after-free, a serious security vulnerability. Drat!

If you think about when the signal should be disconnected, you won’t make this mistake. In many cases, you are implementing an object and just want to disconnect the signal when your object is disposed. If so, you can use g_signal_connect_object() instead of the vanilla g_signal_connect(). For example, this code is not vulnerable:

static void
some_method_of_a (A *self)
{
  B *b = get_b_from_somewhere ();
  g_signal_connect_object (b, "some-signal", (GCallback)some_signal_cb, a, 0);
}

g_signal_connect_object() will disconnect the signal handler whenever object A is destroyed, so there’s no longer any problem if object B outlives object A. This simple change is usually all it takes to avoid disaster. Use g_signal_connect_object() whenever the user data you wish to pass to the signal handler is a GObject. This will usually be true in object implementation code.

Sometimes you need to pass a data struct as your user data instead. If so, g_signal_connect_object() is not an option, and you will need to disconnect manually. If you’re implementing an object, this is normally done in the dispose function:

// Object A instance struct (or priv struct)
struct _A {
  B *b;
  gulong some_signal_id;
};

static void
some_method_of_a (A *self)
{
  B *b = get_b_from_somewhere ();
  g_assert (a->some_signal_id == 0);
  a->some_signal_id = g_signal_connect (b, "some-signal", (GCallback)some_signal_cb, a, 0);
}

static void
a_dispose (GObject *object)
{
  A *a = (A *)object;
  g_clear_signal_handler (&a->some_signal_id, a->b);
  G_OBJECT_CLASS (a_parent_class)->dispose (object);
}

Here, g_clear_signal_handler() first checks if &a->some_signal_id is 0. If not, it disconnects and sets &a->some_signal_id to 0. Setting your stored signal ID to 0 and checking whether it is 0 before disconnecting is important because dispose may run multiple times to break reference cycles. Attempting to disconnect the signal multiple times is another common programmer error!

Instead of calling g_clear_signal_handler(), you could equivalently write:

if (a->some_signal_id != 0) {
  g_signal_handler_disconnect (a->b, a->some_signal_id);
  a->some_signal_id = 0;
}

But writing that manually is no fun.

Yet another way to mess up would be to use the wrong integer type to store the signal ID, like guint instead of gulong.

There are other disconnect functions you can use to avoid the need to store the signal handler ID, like g_signal_handlers_disconnect_by_data(), but I’ve shown the most general case.

Sometimes, object implementation code will intentionally not disconnect signals if the programmer believes that the object that emits the signal will never outlive the object that is connecting to it. This assumption may usually be correct, but since GObjects are refcounted, they may be reffed in unexpected places, leading to use-after-free vulnerabilities if this assumption is ever incorrect. Your code will be safer and more robust if you disconnect always.

Mistake #2: Misuse of GSource Handler ID

Mistake #2 is basically the same as Mistake #1, but using GSource rather than signal handlers. For simplicity, my examples here will use the default main context, so I don’t have to show code to manually create, attach, and destroy the GSource. The default main context is what you’ll want to use if (a) you are writing application code, not library code, and (b) you want your callbacks to execute on the main thread. (If either (a) or (b) does not apply, then you need to carefully study GMainContext to ensure you do not mess up; see Mistake #4.)

Let’s use the example of a timeout source, although the same style of bug can happen with an idle source or any other type of source that you create:

static gboolean
my_timeout_cb (gpointer user_data)
{
  A *self = user_data;
  a_do_something (self);
  return G_SOURCE_REMOVE;
}

static void
some_method_of_a (A *self)
{
  g_timeout_add (42, (GSourceFunc)my_timeout_cb, a);
}

You’ve probably guessed the flaw already: if object A is destroyed before the timeout fires, then the call to a_do_something() is a use-after-free, just like when we were working with signals. The fix is very similar: store the source ID and remove it in dispose:

// Object A instance struct (or priv struct)
struct _A {
  gulong my_timeout_id;
};

static gboolean
my_timeout_cb (gpointer user_data)
{
  A *self = user_data;
  a_do_something (self);
  a->my_timeout_id = 0;
  return G_SOURCE_REMOVE;
}

static void
some_method_of_a (A *self)
{
  g_assert (a->my_timeout_id == 0);
  a->my_timeout_id = g_timeout_add (42, (GSourceFunc)my_timeout_cb, a);
}

static void
a_dispose (GObject *object)
{
  A *a = (A *)object;
  g_clear_handle_id (&a->my_timeout_id, g_source_remove);
  G_OBJECT_CLASS (a_parent_class)->dispose (object);
}

Much better: now we’re not vulnerable to the use-after-free issue.

As before, we must be careful to ensure the source is removed exactly once. If we remove the source multiple times by mistake, GLib will usually emit a critical warning, but if you’re sufficiently unlucky you could remove an innocent unrelated source by mistake, leading to unpredictable misbehavior. This is why we need to write a->my_timeout_id = 0; before returning from the timeout function, and why we need to use g_clear_handle_id() instead of g_source_remove() on its own. Do not forget that dispose may run multiple times!

We also have to be careful to return G_SOURCE_REMOVE unless we want the callback to execute again, in which case we would return G_SOURCE_CONTINUE. Do not return TRUE or FALSE, as that is harder to read and will obscure your intent.

Mistake #3: Failure to Cancel Asynchronous Function

When working with asynchronous functions, you must think about when it should be canceled to prevent the callback from executing too late. Because passing a GCancellable to asynchronous function calls is optional, it’s common to see code omit the cancellable. Be suspicious when you see this. The cancellable is optional because sometimes it is really not needed, and when this is true, it would be annoying to require it. But omitting it will usually lead to use-after-free vulnerabilities. Here is an example of what not to do:

static void
something_finished_cb (GObject      *source_object,
                       GAsyncResult *result,
                       gpointer      user_data)
{
  A *self = user_data;
  B *b = (B *)source_object;
  g_autoptr (GError) error = NULL;

  if (!b_do_something_finish (b, result, &error)) {
    g_warning ("Failed to do something: %s", error->message);
    return;
  }

  a_do_something_else (self);
}

static void
some_method_of_a (A *self)
{
  B *b = get_b_from_somewhere ();
  b_do_something_async (b, NULL /* cancellable */, a);
}

This should feel familiar by now. If we did not use A inside the callback, then we would have been able to safely omit the cancellable here without harmful effects. But instead, this example calls a_do_something_else(). If object A is destroyed before the asynchronous function completes, then the call to a_do_something_else() will be a use-after-free.

We can fix this by storing a cancellable in our instance struct, and canceling it in dispose:

// Object A instance struct (or priv struct)
struct _A {
  GCancellable *cancellable;
};

static void
something_finished_cb (GObject      *source_object,
                       GAsyncResult *result,
                       gpointer      user_data)
{
  B *b = (B *)source_object;
  A *self = user_data;
  g_autoptr (GError) error = NULL;

  if (!b_do_something_finish (b, result, &error)) {
    if (!g_error_matches (error, G_IO_ERROR, G_IO_ERROR_CANCELLED))
      g_warning ("Failed to do something: %s", error->message);
    return;
  }
  a_do_something_else (self);
}

static void
some_method_of_a (A *self)
{
  B *b = get_b_from_somewhere ();
  b_do_something_async (b, a->cancellable, a);
}

static void
a_init (A *self)
{
  self->cancellable = g_cancellable_new ();
}

static void
a_dispose (GObject *object)
{
  A *a = (A *)object;

  g_cancellable_cancel (a->cancellable);
  g_clear_object (&a->cancellable);

  G_OBJECT_CLASS (a_parent_class)->dispose (object);
}

Now the code is not vulnerable. Note that, since you usually do not want to print a warning message when the operation is canceled, there’s a new check for G_IO_ERROR_CANCELLED in the callback.

Update #1: I managed to mess up this example in the first version of my blog post. The example above is now correct, but what I wrote originally was:

if (!b_do_something_finish (b, result, &error) &&
    !g_error_matches (error, G_IO_ERROR, G_IO_ERROR_CANCELLED)) {
  g_warning ("Failed to do something: %s", error->message);
  return;
}
a_do_something_else (self);

Do you see the bug in this version? Cancellation causes the asynchronous function call to complete the next time the application returns control to the main context. It does not complete immediately. So when the function is canceled, A is already destroyed, the error will be G_IO_ERROR_CANCELLED, and we’ll skip the return and execute a_do_something_else() anyway, triggering the use-after-free that the example was intended to avoid. Yes, my attempt to show you how to avoid a use-after-free itself contained a use-after-free. You might decide this means I’m incompetent, or you might decide that it means it’s too hard to safely use unsafe languages. Or perhaps both!

Update #2:  My original example had an unnecessary explicit check for NULL in the dispose function. Since g_cancellable_cancel() is NULL-safe, the dispose function will cancel only once even if dispose runs multiple times, because g_clear_object() will set a->cancellable = NULL. Thanks to Guido for suggesting this improvement in the comments.

Mistake #4: Incorrect Use of GMainContext in Library or Threaded Code

My fourth common mistake is really a catch-all mistake for the various other ways you can mess up with GMainContext. These errors can be very subtle and will cause functions to execute at unexpected times. Read this main context tutorial several times. Always think about which main context you want callbacks to be invoked on.

Library developers should pay special attention to the section “Using GMainContext in a Library.” It documents several security-relevant rules:

  • Never iterate a context created outside the library.
  • Always remove sources from a main context before dropping the library’s last reference to the context.
  • Always document which context each callback will be dispatched in.
  • Always store and explicitly use a specific GMainContext, even if it often points to some default context.
  • Always match pushes and pops of the thread-default main context.

If you fail to follow all of these rules, functions will be invoked at the wrong time, or on the wrong thread, or won’t be called at all. The tutorial covers GMainContext in much more detail than I possibly can here. Study it carefully. I like to review it every few years to refresh my knowledge. (Thanks Philip Withnall for writing it!)

Properly-designed libraries follow one of two conventions for which main context to invoke callbacks on: they may use the main context that was thread-default at the time the asynchronous operation started, or, for method calls on an object, they may use the main context that was thread-default at the time the object was created. Hopefully the library explicitly documents which convention it follows; if not, you must look at the source code to figure out how it works, which is not fun. If the library documentation does not indicate that it follows either convention, it is probably unsafe to use in threaded code.

Conclusion

All four mistakes are variants on the same pattern: failure to prevent a function from being unexpectedly called at the wrong time. The first three mistakes commonly lead to use-after-free vulnerabilities, which attackers abuse to hack users. The fourth mistake can cause more unpredictable effects. Sadly, today’s static analyzers are probably not smart enough to catch these mistakes. You could catch them if you write tests that trigger them and run them with an address sanitizer build, but that’s rarely realistic. In short, you need to be especially careful whenever you see signals, asynchronous function calls, or main context sources.

Sequel

The adventure continues in Common GLib Programming Errors, Part Two: Weak Pointers.

Best Practices for Build Options

Posted by Michael Catanzaro on July 15, 2022 02:03 PM

Build options are sometimes tricky to get right. Here’s my take on best practices. The golden rule is to set good upstream defaults. Everything else follows from this.

Rule #1: Choose Good Upstream Defaults

Occasionally I see upstream developers complain that a downstream operating system has built their software “incorrectly,” generally because some important dependency or feature has been disabled. Sometimes downstreams really do mess up, but more often poor upstream defaults are to blame. Upstreams must set good defaults because upstream software developers know far more about their projects than downstream packagers do. Upstreams generally have a good idea of how they expect software to be built by downstreams, whereas downstreams generally do not. Accordingly, do the thinking upstream whenever possible. When you set good defaults, it becomes easier for downstreams to build your software the way you expect, because active effort is required for downstreams to mess things up.

For example, say a project has the following two build options:

Option Name Default Value
--enable-thing-you-usually-want-enabled false
--disable-thing-you-rarely-want-disabled true

The thing you usually want enabled is not enabled by default, and the thing you rarely want disabled is disabled by default. Sad. Unfortunately, this pattern used to be extremely common with Autotools build systems, because in the real world, the names of the options are more subtle than this, and also because nobody likes squinting at configure.ac files to audit whether the options make sense. Meson build systems tend to be somewhat better because meson_options.txt is separate from the rest of the build definitions, making it easier to review all your options and check to ensure their defaults are set appropriately. However, there are still a few more subtle ways you can mess up your Meson build system, which I’ll discuss below.

Rule #2: Prefer Upstream Defaults Downstream

Conversely, downstreams should not second-guess upstream defaults unless you have a good reason to do so and really know what you’re doing.

For example, glib-networking’s Meson build system provides you with two different TLS backend options: OpenSSL or GnuTLS. The GnuTLS backend is enabled by default (well, sort of, see the next section on auto dependencies) while the OpenSSL backend is disabled by default. There’s a good reason for this: the OpenSSL backend is half-baked, partially due to bugs in glib-networking, and partially because OpenSSL just cannot do certain things that GnuTLS can. The OpenSSL backend is provided because some environments really do require it for license reasons, but it’s not the right choice for general-purpose operating systems. It may be tempting to think that you can pick whichever library you prefer, but you really should not.

Another example: WebKitGTK’s CMake build system provides a USE_WPE_RENDERER build option, which is enabled by default. This option controls which graphics rendering stack is used: if enabled, rendering uses libwpe and wpebackend-fdo, whereas if disabled, rendering uses a legacy internal Wayland compositor. The option is provided because libwpe and wpebackend-fdo are newer dependencies that are expected to be unavailable on older (pre-2020) operating systems, so older operating systems legitimately need to be able to disable it. But this configuration receives little serious testing and the upstream developers do not notice when it breaks, so you really should not be using it unless you have to. This recently caused rendering bugs that appeared to be distribution-specific, which upstream developers were not willing to investigate because upstream developers could not reproduce the issue.

Sticking with upstream defaults is generally safest. Sometimes you really need to override them. If so, go ahead. Just be careful.

Rule #3: Handle Auto Dependencies and Features with Care

The worst default ever is “build with feature enabled only if dependency xyz is installed; otherwise, disable it.” This is called an auto dependency. If using CMake or Autotools, auto dependencies are almost never permissible, and in this case “handle with care” means repent and fix it. Auto dependencies are acceptable only if you are using the Meson build system.

The theory behind auto dependencies is that it’s convenient for people casually building the software to do so with the fewest number of build errors possible, which is true. Problem is, this screws over serious production builds of your software by requiring your downstreams to possess magical knowledge of what dependencies are required to build your software properly. Users generally expect most features to be enabled at build time, but if upstream uses auto dependencies, the result is a build dependencies lottery: your feature will be enabled or disabled due to luck, based on which downstream build dependencies transitively depend on which other build dependencies. Even if it’s built properly today, that could easily change tomorrow when some other dependency changes in some other package. Just say no. Do not expect downstreams to look at your build system at all, let alone study the possible build options and make accurate judgments about which build dependencies are required to enable them. Avoiding auto dependencies is part of setting good upstream defaults.

Look at this example from WebKit’s OptionsGTK.cmake:

if (ENABLE_SPELLCHECK)
    find_package(Enchant)
    if (NOT PC_ENCHANT_FOUND)
        message(FATAL_ERROR "Enchant is needed for ENABLE_SPELLCHECK")
    endif ()
endif ()

ENABLE_SPELLCHECK is ON by default. If you don’t have enchant installed, the build will fail unless you manually disable it by passing -DENABLE_SPELLCHECK=OFF". This makes it hard to mess up: downstreams have to make an intentional choice to build with spellchecking disabled. It cannot happen by accident.

Many projects would instead write it like this:

if (ENABLE_SPELLCHECK)
    find_package(Enchant)
    if (NOT PC_ENCHANT_FOUND)
        set(ENABLE_SPELLCHECK OFF)
    endif ()
endif ()

But this is an auto dependency, which results in downstream build dependency lottery. If you write your build system like this, you cannot complain when the feature winds up disabled by mistake in downstream builds. Don’t do this.

Exception: if you use Meson, auto dependencies are acceptable if you use the feature option type and set the default to auto. Although auto features are silently enabled or disabled by default depending on whether the required dependency is present, you can easily override this behavior for serious production builds by passing -Dauto_features=enabled, which enables all the auto features and will result in build failures if dependencies are missing. All major Linux operating systems do this when building Meson packages, so Meson’s auto features should not cause problems.

Rule #4: Be Very Careful with Meson’s Build Types

Let’s categorize software builds into production builds or non-production builds. A production build is intended to be either distributed to end users or else run production workloads, whereas a non-production build is intended for testing or development and might have extra debug features enabled, like slow assertions. (These are more commonly called release builds or debug builds, but that terminology would be very confusing in the context of this discussion, as you’re about to see.)

The CMake and Meson build systems give us more than just two build types. Compare CMake build types to the corresponding Meson build types:

CMake Build Type Meson Build Type Meson debug Option Production Build?  (excludes Windows)
Release release false Yes
Debug debug true No
RelWithDebInfo debugoptimized true Yes, be careful!
MinSizeRel minsize true Yes, be careful!
N/A plain false Yes

To simplify, let’s exclude Windows from the discussion for now. (We’ll come back to Windows in a bit.) Now, notice the nomenclature difference between CMake’s RelWithDebInfo (“release with debuginfo”) build type versus Meson’s debugoptimized build type. This build type functions exactly the same for both Meson and CMake, but CMake’s name is better because it clearly indicates that this is a release or production build type, whereas the Meson name seems to indicate it is a debug or non-production build type, and Meson’s debug option is set to true. In fact, it is an optimized production build with debuginfo enabled, the same style of build that almost all Linux operating systems use for their packages (although operating systems use the plain build type instead). The same problem exists for Meson’s minsize build type. This is another production build type where debug is true.

The Meson build type name accurately reflects that the debug option is enabled, but this is very confusing because for most platforms, that option only controls whether debuginfo is generated. Looking at the table above, you can see that you must never use the debug option alone to decide whether you have a production build or a non-production build. As the table indicates, the only non-production build type is the vanilla debug build type, which you can detect by checking the combination of the debug and optimization options. You have a non-production (debug) build if debug is true and if optimization is 0 or g; otherwise, you have a production build.  I wrote this in bold because it is important and not at all obvious. (However, before applying this rule in a cross-platform project, keep reading below to see the huge caveat regarding Windows.)

Here’s an example of what not to do in your meson.build:

# Use debug/optimization flags to determine whether to enable debug or disable
# cast checks
gtk_debug_cflags = []
debug = get_option('debug')
optimization = get_option('optimization')
if debug
  gtk_debug_cflags += '-DG_ENABLE_DEBUG'
  if optimization in ['0', 'g']
    gtk_debug_cflags += '-DG_ENABLE_CONSISTENCY_CHECKS'
  endif
elif optimization in ['2', '3', 's']
  gtk_debug_cflags += ['-DG_DISABLE_CAST_CHECKS', '-DG_DISABLE_ASSERT']
endif

This is from GTK’s meson.build. The code based only on the optimization option is OK, but the code that sets -DG_ENABLE_DEBUG is looking only at the debug option. What the code really wants to do is set G_ENABLE_DEBUG if this is a non-production build, but instead it is tied to debuginfo, which is not the desired result. Downstreams are forced to scratch their heads as to what they should do. Impassioned build engineers have held spirited debates about this particular meson.build snippet. Don’t do this! (I will submit a merge request to improve this.)

Here’s a much better, although still not perfect, example of how to do the same thing, this time from GLib’s meson.build:

# Use debug/optimization flags to determine whether to enable debug or disable
# cast checks
glib_debug_cflags = []
glib_debug = get_option('glib_debug')
if glib_debug.enabled() or (glib_debug.auto() and get_option('debug'))
  glib_debug_cflags += ['-DG_ENABLE_DEBUG']
  message('Enabling various debug infrastructure')
elif get_option('optimization') in ['2', '3', 's']
  glib_debug_cflags += ['-DG_DISABLE_CAST_CHECKS']
  message('Disabling cast checks')
endif

if not get_option('glib_assert')
  glib_debug_cflags += ['-DG_DISABLE_ASSERT']
  message('Disabling GLib asserts')
endif

if not get_option('glib_checks')
  glib_debug_cflags += ['-DG_DISABLE_CHECKS']
  message('Disabling GLib checks')
endif

Notice how GLib provides explicit build options that allow downstreams to decide whether debug should be enabled or not. Using explicit build options here was a good idea! The defaults for glib_assert and glib_checks are intentionally set to true to encourage their use in production builds, while G_DISABLE_CAST_CHECKS is based only on the optimization level. But sadly, if not explicitly configured, GLib sets the value of the glib_debug_cflags option automatically, based on only the value of the debug option. This is actually an OK use of an auto feature, because it is a carefully-considered attempt to provide good default behavior for downstreams, but it fails here because it assumes that debug means “non-production build,” which we have previously established cannot be determined without checking optimization as well. (I will submit a merge request to improve this.)

Here’s another helpful table that shows how the various build types correspond to CFLAGS:

CMake/Meson Build Type CMake CFLAGS Meson CFLAGS
Release/release -O3 -DNDEBUG -O3
Debug/debug -g -O0 -g
RelWithDebInfo/debugoptimized -O2 -g -DNDEBUG -O2 -g
MinSizeRel/minsize -Os -DNDEBUG -Os -g

Notice Meson’s minsize build type includes debuginfo, while CMake’s does not. Since debuginfo requires a huge amount of space, CMake’s behavior seems better here. We’ll discuss NDEBUG momentarily.

OK, so that all makes sense, right? Well I thought so too, until I ran a draft of this blog post past Jussi, who pointed out that the Meson build types function completely differently on Windows than they do on other platforms. Unfortunately, whereas on most platforms the debug option only controls debuginfo generation, on Windows it instead controls whether the C library enables extra runtime debugging checks. So while debugoptimized and minsize are production build types on Linux and have nice corresponding CMake build types, they are non-production build types on Windows. This is a Meson defect. The point to remember is that the debug option is completely different on Windows than it is on other platforms, so my otherwise-nice rule for detecting production builds does not work properly on Windows. Cross-platform projects need to be especially careful with the debug option. There are various ways this could be fixed in Meson in the future: a nice simple proposal would be to add a new debuginfo option separate from debug, then deprecate the debugoptimized build type and replace it with releasewithdebuginfo.

CMake dodges all these problems and avoids any ambiguity because its build types are named differently: “RelWithDebInfo” and “MinSizeRel” leave no doubt that you are dealing with a release (production) build.

Rule #5: Think about NDEBUG

The other behavior difference visible in the table above is that CMake defines NDEBUG for its production build types, whereas Meson has a separate option bn_debug that controls whether to define NDEBUG. NDEBUG controls whether the C and C++ assert() macro is enabled: if this value is defined, asserts are disabled. CMake is the only build system that defines NDEBUG for you automatically. You really need to think about this: if your software is performance-sensitive and contains slow assertions, the consequences of messing this up are severe, e.g. see this historical mesa bug where Fedora’s mesa package suffered a 10x slowdown because mesa upstream accidentally enabled assertions by default. Again, please, do not blame downstreams for bad upstream defaults: downstreams are (usually) not experts on upstream software, and cannot possibly be expected to pick better defaults than upstream’s.

Meson allows developers to explicitly choose whether to enable assertions in production builds. Assertions are enabled in production by default, the opposite of CMake’s behavior. Some developers prefer that all asserts be disabled in production builds to optimize speed as far as possible, but this is usually not the best choice: having assertions enabled in production provides valuable confidence that your code is actually functioning as intended, and often improves security by converting many code execution exploits into denial of service. Most assertions do not have noticeable performance impact, so I prefer to leave most assertions enabled by default in production, and disable only asserts that are slow. Hence, I like Meson’s default behavior. But many engineers disagree with me, and some projects really do need assertions disabled in production; in particular, everyone agrees that performance-sensitive assertions should not be running in production builds. If you’re using Meson and want assertions disabled in production builds, you’re in theory supposed to use b_ndebug=if-release, but it doesn’t actually work because it only disables assertions if your build type is release or plain, while leaving assertions enabled for debugoptimized and minsize builds. We’ve already established that these are both production build types, so sadly that behavior is broken. Instead, it’s better to manually define NDEBUG except in non-production builds. Again, you have a non-production (debug) build when debug is true and if optimization is 0 or g; otherwise, you have a production build (except on Windows).

Rule #6: plain Means “Production Build,” Not “No Flags”

The GNOME release team recently had an exciting debate about the meaning of Meson’s plain build type. It is impressive how build engineers can be so enthusiastic about build options!

I asked Jussi to explain the plain build type. His response was: “Meson does not, by itself, add compiler flags,” emphasis mine. It does not mean your project should not add its own compiler flags, and it certainly does not mean it’s OK to set bad defaults as long as they are vanilla-flavored. It is a production build type, and you should ensure that it receives defaults in line with the other production build types. You’ll be fine if you follow the same rule we already established: you have a non-production (debug) build if debug is true and if optimization is 0 or g; otherwise, you have a production build (except on Windows).

The plain build type exists because it makes it easier for downstreams to implement their own compiler flags. Downstreams have to pass -O2 -g via CFLAGS because CMake and Meson are the only build systems that can do this automatically, and it’s easier to let downstreams disable this functionality than to force downstreams to set different CFLAGS separately for each supported build system.

Rule #7: Don’t Forget Hardening Flags

Sadly, by default all build systems generate insecure, unhardened binaries that should never be used in production. This is true of Autotools, CMake, Meson, and likely also whatever other build system you are thinking of. You must manually add your own hardening flags or your builds will be insecure. Unfortunately this is a little complicated to do. Fedora and RHEL’s recommended compiler flags are documented here. The freedesktop-sdk and GNOME Flatpak runtimes use these recommendations as the basis for their compiler flags, and by default, so do Flatpak applications based on these runtimes. It’s actually not very easy to replicate the same hardening flags since libraries and executables require different flags, so naively setting CFLAGS is not possible. Fedora and RHEL use GCC spec files to achieve this, whereas freedesktop-sdk relies on building GCC itself with a non-default configuration (yes, second-guessing upstream defaults). The good news is that all major downstreams have figured this out one way or another, so you only need to worry about it if you’re doing your own production builds.

Conclusion

That’s all you need to know. Remember, upstreams know their software better than downstreams do, so the hard thinking should happen upstream. We can minimize mistakes and trouble if upstreams carefully set good defaults, and if downstreams deviate from those defaults only when truly necessary. Keep these rules in mind to avoid unnecessary bug reports from dissatisfied users.

History

I updated this blog post on August 3, 2022 to change the primary guidance to “You have a non-production (debug) build if debug is true and if optimization is 0 or g; otherwise, you have a production build.” Originally, I failed to consider g. -Og means “optimize debugging experience” and it is supposedly a better choice than -O0 for debugging according to gcc(1). It’s definitely not actually, but at least that’s the intent.

Jussi responded to this blog post on August 13, 2022 to discuss why Meson’s build types don’t work so well. Read his response.

Responsible stewardship of the UEFI secure boot ecosystem

Posted by Matthew Garrett on July 12, 2022 05:50 AM
After I mentioned that Lenovo are now shipping laptops that only boot Windows by default, a few people pointed to a Lenovo document that says:

Starting in 2022 for Secured-core PCs it is a Microsoft requirement for the 3rd Party Certificate to be disabled by default.

"Secured-core" is a term used to describe machines that meet a certain set of Microsoft requirements around firmware security, and by and large it's a good thing - devices that meet these requirements are resilient against a whole bunch of potential attacks in the early boot process. But unfortunately the 2022 requirements don't seem to be publicly available, so it's difficult to know what's being asked for and why. But first, some background.

Most x86 UEFI systems that support Secure Boot trust at least two certificate authorities:

1) The Microsoft Windows Production PCA - this is used to sign the bootloader in production Windows builds. Trusting this is sufficient to boot Windows.
2) The Microsoft Corporation UEFI CA - this is used by Microsoft to sign non-Windows UEFI binaries, including built-in drivers for hardware that needs to work in the UEFI environment (such as GPUs and network cards) and bootloaders for non-Windows.

The apparent secured-core requirement for 2022 is that the second of these CAs should not be trusted by default. As a result, drivers or bootloaders signed with this certificate will not run on these systems. This means that, out of the box, these systems will not boot anything other than Windows[1].

Given the association with the secured-core requirements, this is presumably a security decision of some kind. Unfortunately, we have no real idea what this security decision is intended to protect against. The most likely scenario is concerns about the (in)security of binaries signed with the third-party signing key - there are some legitimate concerns here, but I'm going to cover why I don't think they're terribly realistic.

The first point is that, from a boot security perspective, a signed bootloader that will happily boot unsigned code kind of defeats the point. Kaspersky did it anyway. The second is that even a signed bootloader that is intended to only boot signed code may run into issues in the event of security vulnerabilities - the Boothole vulnerabilities are an example of this, covering multiple issues in GRUB that could allow for arbitrary code execution and potential loading of untrusted code.

So we know that signed bootloaders that will (either through accident or design) execute unsigned code exist. The signatures for all the known vulnerable bootloaders have been revoked, but that doesn't mean there won't be other vulnerabilities discovered in future. Configuring systems so that they don't trust the third-party CA means that those signed bootloaders won't be trusted, which means any future vulnerabilities will be irrelevant. This seems like a simple choice?

There's actually a couple of reasons why I don't think it's anywhere near that simple. The first is that whenever a signed object is booted by the firmware, the trusted certificate used to verify that object is measured into PCR 7 in the TPM. If a system previously booted with something signed with the Windows Production CA, and is now suddenly booting with something signed with the third-party UEFI CA, the values in PCR 7 will be different. TPMs support "sealing" a secret - encrypting it with a policy that the TPM will only decrypt it if certain conditions are met. Microsoft make use of this for their default Bitlocker disk encryption mechanism. The disk encryption key is encrypted by the TPM, and associated with a specific PCR 7 value. If the value of PCR 7 doesn't match, the TPM will refuse to decrypt the key, and the machine won't boot. This means that attempting to attack a Windows system that has Bitlocker enabled using a non-Windows bootloader will fail - the system will be unable to obtain the disk unlock key, which is a strong indication to the owner that they're being attacked.

The second is that this is predicated on the idea that removing the third-party bootloaders and drivers removes all the vulnerabilities. In fact, there's been rather a lot of vulnerabilities in the Windows bootloader. A broad enough vulnerability in the Windows bootloader is arguably a lot worse than a vulnerability in a third-party loader, since it won't change the PCR 7 measurements and the system will boot happily. Removing trust in the third-party CA does nothing to protect against this.

The third reason doesn't apply to all systems, but it does to many. System vendors frequently want to ship diagnostic or management utilities that run in the boot environment, but would prefer not to have to go to the trouble of getting them all signed by Microsoft. The simple solution to this is to ship their own certificate and sign all their tooling directly - the secured-core Lenovo I'm looking at currently is an example of this, with a Lenovo signing certificate. While everything signed with the third-party signing certificate goes through some degree of security review, there's no requirement for any vendor tooling to be reviewed at all. Removing the third-party CA does nothing to protect the user against the code that's most likely to contain vulnerabilities.

Obviously I may be missing something here - Microsoft may well have a strong technical justification. But they haven't shared it, and so right now we're left making guesses. And right now, I just don't see a good security argument.

But let's move on from the technical side of things and discuss the broader issue. The reason UEFI Secure Boot is present on most x86 systems is that Microsoft mandated it back in 2012. Microsoft chose to be the only trusted signing authority. Microsoft made the decision to assert that third-party code could be signed and trusted.

We've certainly learned some things since then, and a bunch of things have changed. Third-party bootloaders based on the Shim infrastructure are now reviewed via a community-managed process. We've had a productive coordinated response to the Boothole incident, which also taught us that the existing revocation strategy wasn't going to scale. In response, the community worked with Microsoft to develop a specification for making it easier to handle similar events in future. And it's also worth noting that after the initial Boothole disclosure was made to the GRUB maintainers, they proactively sought out other vulnerabilities in their codebase rather than simply patching what had been reported. The free software community has gone to great lengths to ensure third-party bootloaders are compatible with the security goals of UEFI Secure Boot.

So, to have Microsoft, the self-appointed steward of the UEFI Secure Boot ecosystem, turn round and say that a bunch of binaries that have been reviewed through processes developed in negotiation with Microsoft, implementing technologies designed to make management of revocation easier for Microsoft, and incorporating fixes for vulnerabilities discovered by the developers of those binaries who notified Microsoft of these issues despite having no obligation to do so, and which have then been signed by Microsoft are now considered by Microsoft to be insecure is, uh, kind of impolite? Especially when unreviewed vendor-signed binaries are still considered trustworthy, despite no external review being carried out at all.

If Microsoft had a set of criteria used to determine whether something is considered sufficiently trustworthy, we could determine which of these we fell short on and do something about that. From a technical perspective, Microsoft could set criteria that would allow a subset of third-party binaries that met additional review be trusted without having to trust all third-party binaries[2]. But, instead, this has been a decision made by the steward of this ecosystem without consulting major stakeholders.

If there are legitimate security concerns, let's talk about them and come up with solutions that fix them without doing a significant amount of collateral damage. Don't complain about a vendor blocking your apps and then do the same thing yourself.

[Edit to add: there seems to be some misunderstanding about where this restriction is being imposed. I bought this laptop because I'm interested in investigating the Microsoft Pluton security processor, but Pluton is not involved at all here. The restriction is being imposed by the firmware running on the main CPU, not any sort of functionality implemented on Pluton]

[1] They'll also refuse to run any drivers that are stored in flash on Thunderbolt devices, which means eGPU setups may be more complicated, as will netbooting off Thunderbolt-attached NICs
[2] Use a different leaf cert to sign the new trust tier, add the old leaf cert to dbx unless a config option is set, leave the existing intermediate in db

comment count unavailable comments

Crosswords 0.3.3: Double Dutch

Posted by Jonathan Blandford on July 11, 2022 01:56 PM

It’s time for another GNOME Crosswords release! This time we had a focus on I18N support. I also got patches from another new contributor – Philip – who added some nice improvements, dutch-language support, and a downloader.

New features include:

  • A preferences dialog to let you filter puzzle sets by language. This gives us a way to only show puzzles either in your current language or in all languages. It opens the door for puzzle sets and downloaders in different languages.
  • Translation support. We have Dutch and Spanish translations now. It would be great to see more languages available for the next release.
  • Don’t grab focus when clicking on a clue. This was a small change that had a big impact on playability.
  • Copy/Paste support for the grid.
  • Undo/Redo support for the game.
  • Use the shiny new libadwaita AboutWindow.
  • Input now works on MacOS, making it fully playable on that platform.
  • Dutch-language crosswords work with the ‘IJ’ cell.

Wait, what’s that last one???

Amazingly enough, it turns out that Dutch crosswords allow an ‘IJ’ digraph in the same cell. This is an interesting twist on the crossword, as you can add an ‘I’ and ‘J’ character independently, as well as together.

<figure aria-describedby="caption-attachment-6778" class="wp-caption aligncenter" id="attachment_6778" style="width: 377px">Dutch puzzle showing IJ cell<figcaption class="wp-caption-text" id="caption-attachment-6778">Dutch puzzle showing an ‘IJ’ cell</figcaption></figure>

To support this, I added a Quirks object that can be used to modify the behavior of the game. Beyond Dutch support, I started adding support for rebus-style crosswords as a second quirk. I imagine having to add more as we add additional language support.

Fortunately, we already supported strings within answers so this wasn’t too hard to implement. However, I’m not 100% sure I got the input controls correct. If you’re a Dutch crossword solver, I’d love to hear from you. I also hooked up the quirks to the help overlay so it only shows the special input mode when you have a puzzle that needs it (or are running it in Dutch).

I also added another extension to the .ipuz implementation to specify the locale of puzzles. We use that to filter puzzles, and add the quirks.

Lessons learned:

I thought I’d write down the things that I got stuck on this cycle in the hopes it saves other people time in the future:

Recoloring SVGs

GtkImage has the really convenient ability to recolor images when the appearance changes from light to dark. As far as I can tell, that functionality isn’t exported, so to use it you have to load the svg as a symbolic icon. Despite the documentation, it’s pretty finicky to get that correct so make sure to read all the documentation carefully. In addition, GtkImage really wants the image to be icon sized. You can override that with gtk_image_set_pixel_size().

<figure aria-describedby="caption-attachment-6802" class="wp-caption aligncenter" id="attachment_6802" style="width: 675px">recolored locked puzzles<figcaption class="wp-caption-text" id="caption-attachment-6802">Thanks to Sam for the new icons, shown here recolored.</figcaption></figure>

Translating keyfiles with meson

We use keyfiles (aka, .desktop files) as a config file format to define the puzzle sets. These config files have user-visible strings added to them them, and then are compiled into a gresource file. This was surprisingly hard to get translated. Some notes:

  • xgettext supports –keyword=Keyword and -kKeyword to define custom keywords to translate. Unfortunately, despite being documented otherwise, msgfmt only supports –keyword=Keyword. Don’t use -k.
  • Meson doesn’t allow you to pass the language type to xgettext for an arbitrary file. In order to force it to use the correct type, I had to name my config file puzzle.config.desktop.in, and rename it to puzzle.config post-translation.
  • There’s no way for meson to have a compile-time dependency on the po files for i18n.merge_file(). In order to test it, I had to remove the translated desktop file in order to force rebuild it after every translation change. Meson 0.64 is supposed to fix this.

What’s next?

  • I’m giving a talk at GUADEC! Come hear the details of how this game was built and some interesting facts about crosswords. Sadly, I’m scheduled at the same time as the always-popular gnome shell update, but hopefully a few people will be interested in what I have to say.
  • To fully support internationalization, we need to support other languages for the wordlist. There’s an open bug for it. The work here shouldn’t be too hard, but will require a decent amount of work to make sure we’re multi-byte clean.
  • And as always, if you are interested in writing and publishing crosswords, let me know!

Download on FLATHUB

 

Lenovo shipping new laptops that only boot Windows by default

Posted by Matthew Garrett on July 08, 2022 06:49 AM
I finally managed to get hold of a Thinkpad Z13 to examine a functional implementation of Microsoft's Pluton security co-processor. Trying to boot Linux from a USB stick failed out of the box for no obvious reason, but after further examination the cause became clear - the firmware defaults to not trusting bootloaders or drivers signed with the Microsoft 3rd Party UEFI CA key. This means that given the default firmware configuration, nothing other than Windows will boot. It also means that you won't be able to boot from any third-party external peripherals that are plugged in via Thunderbolt.

There's no security benefit to this. If you want security here you're paying attention to the values measured into the TPM, and thanks to Microsoft's own specification for measurements made into PCR 7, switching from booting Windows to booting something signed with the 3rd party signing key will change the measurements and invalidate any sealed secrets. It's trivial to detect this. Distrusting the 3rd party CA by default doesn't improve security, it just makes it harder for users to boot alternative operating systems.

Lenovo, this isn't OK. The entire architecture of UEFI secure boot is that it allows for security without compromising user choice of OS. Restricting boot to Windows by default provides no security benefit but makes it harder for people to run the OS they want to. Please fix it.

comment count unavailable comments

Pango 1.90

Posted by Matthias Clasen on June 22, 2022 08:19 PM

I’ve finally convinced myself that I need to make a Pango 2.0 release to clean up the API, and introduce some new APIs without breaking users that expect Pango to be very stable.

So, here it is… well not quite. What I am presenting today is not Pango 2.0  yet,  but 1.90 – an unstable preview of the coming changes, to gather feedback and give some heads-up about whats coming.

Whats changed?

Pango is now shipped as a single shared object, libpango-2.so, which contains the high-level cross-platform code as well as platform-specific fontmap implementations and the cairo support (if it is enabled). All of the APIs have been cleaned up and modernized.

PangoFontMap  has seen some significant changes. It is now possible to instantiate a PangoFontMap, and populate it manually with PangoFontFamily and PangoFontFace objects.

There are still platform-specific subclasses

  •  PangoFcFontMap
  • PangoCoreTextFontMap
  • PangoDirectWriteFontMap

which will use platform APIs to enumerate fonts and populate the fontmap.

Whats new?

PangoLineBreaker is the core of pango’s line-breaking algorithm,
broken out from PangoLayout. Having this available independent
from PangoLayout will facilitate uses such as multi-column
layout, text flow between frames and shaping paragraphs around
images.

Here is an example that shows changing the column width mid-paragraph:

PangoLines is the ‘formatted output’ part of a PangoLayout, and can be used to collect the output of a PangoLineBreaker.

PangoHbFont is a font implementation that is a thin wrapper around HarfBuzz font and face objects. This the way in which Pango handles fonts on all platforms now.

PangoUserFont is a  callback-based font implementation to allow for entirely application-defined font handling, including glyph drawing. This is similar to cairo user fonts, where this example was borrowed:

Many smaller changes, such as better control over line height with line-height attributes and control over the trimming of leading, or guaranteed font ↔ description roundtrips with face-ids.

How can I try this?

The Pango code lives on the pango2 branch, and there is a corresponding pango2 branch of GTK, which contains a port of GTK to the new APIs.

The tarballs are here.

Summary

If you have an interest in text rendering, please try this out and tell us what you think. Your feedback will make Pango 2 better.

For more details about the changes in this release, see the NEWS, and have a look at the migration guide.

If you want to learn more about the history of Pango and the background for some of these changes, come to my Guadec talk in Guadalajara!

Firefox with VA-API for brave Fedorans

Posted by Martin Stransky on June 08, 2022 10:13 AM
<figure class="wp-block-image size-large"><figcaption>A rocket propelled butterfly? – https://bugzilla.mozilla.org/show_bug.cgi?id=1772028</figcaption></figure>

It’s been a long journey since the first VA-API implementation in Firefox. Two years ago Firefox 77.0 come to Fedora with accelerated video playback on Wayland which was more a tech preview than a working solution.

Since then X11 support was added, fixed many bugs, AV1 decoding was implemented so we can claim VA-API code as mature enough to enable it for testing in Firefox Nightly 103. As we don’t want to scare peaceful Ubuntu users and grandmas watching their favorite show, VA-API is enabled in Nightly channel only and stock Firefox 103 won’t be shipped with that.

But ‘Real Men‘ wants more challenge. ‘Quiche Eaters’ can use polished software, LTS distros or even Mac. That’s nothing for adventurers running on the edge. Thus new Firefox updates (Fedora 35, Fedora 36) has VA-API enabled by default ahead of upstream to get what you asked for, brave Fedorans.

Along with a bunch of upstream VA-API backports I also added a fix for mozbz#1735929 to make life easier for those who did a life mistake with NVIDIA hardware. I hope you’ve got a lesson and your next card will be AMD.

And how to fruitify it? It’s embarrassing it doesn’t need any complicated, cryptic, powerful, unforgiving and dangerous edits of Firefox about:config preferences, setting env variables on terminal or midnight vows. If VA-API is configured property Firefox just takes it and claims at about:support page.

<figure class="aligncenter size-large is-resized"><figcaption>VA-API is enabled </figcaption></figure>

More details are available at https://fedoraproject.org/wiki/Firefox_Hardware_acceleration

Dark Style Preference with GTK

Posted by Caolán McNamara on May 19, 2022 10:45 AM

<iframe allowfullscreen="allowfullscreen" class="b-hbp-video b-uploaded" frameborder="0" height="548" mozallowfullscreen="mozallowfullscreen" src="https://www.blogger.com/video.g?token=AD6v5dz52B_79FJR4bR-YHRbf2EzQ4gBm8diyMOOUg_FBJF3VVp99JijFOp3vQu8cjD2vFVDBCy6P8CeDmYRNQXleQ" webkitallowfullscreen="webkitallowfullscreen" width="658"></iframe>

Added something to track the org.freedesktop.appearance.color-scheme property as used by the GNOME 42 Dark Style Preference setting. Screencast recorded with the new iteration of GNOME's screen built-in recorder which is quite snazzy.

Can we fix bearer tokens?

Posted by Matthew Garrett on May 16, 2022 07:48 AM
Last month I wrote about how bearer tokens are just awful, and a week later Github announced that someone had managed to exfiltrate bearer tokens from Heroku that gave them access to, well, a lot of Github repositories. This has inevitably resulted in a whole bunch of discussion about a number of things, but people seem to be largely ignoring the fundamental issue that maybe we just shouldn't have magical blobs that grant you access to basically everything even if you've copied them from a legitimate holder to Honest John's Totally Legitimate API Consumer.

To make it clearer what the problem is here, let's use an analogy. You have a safety deposit box. To gain access to it, you simply need to be able to open it with a key you were given. Anyone who turns up with the key can open the box and do whatever they want with the contents. Unfortunately, the key is extremely easy to copy - anyone who is able to get hold of your keyring for a moment is in a position to duplicate it, and then they have access to the box. Wouldn't it be better if something could be done to ensure that whoever showed up with a working key was someone who was actually authorised to have that key?

To achieve that we need some way to verify the identity of the person holding the key. In the physical world we have a range of ways to achieve this, from simply checking whether someone has a piece of ID that associates them with the safety deposit box all the way up to invasive biometric measurements that supposedly verify that they're definitely the same person. But computers don't have passports or fingerprints, so we need another way to identify them.

When you open a browser and try to connect to your bank, the bank's website provides a TLS certificate that lets your browser know that you're talking to your bank instead of someone pretending to be your bank. The spec allows this to be a bi-directional transaction - you can also prove your identity to the remote website. This is referred to as "mutual TLS", or mTLS, and a successful mTLS transaction ends up with both ends knowing who they're talking to, as long as they have a reason to trust the certificate they were presented with.

That's actually a pretty big constraint! We have a reasonable model for the server - it's something that's issued by a trusted third party and it's tied to the DNS name for the server in question. Clients don't tend to have stable DNS identity, and that makes the entire thing sort of awkward. But, thankfully, maybe we don't need to? We don't need the client to be able to prove its identity to arbitrary third party sites here - we just need the client to be able to prove it's a legitimate holder of whichever bearer token it's presenting to that site. And that's a much easier problem.

Here's the simple solution - clients generate a TLS cert. This can be self-signed, because all we want to do here is be able to verify whether the machine talking to us is the same one that had a token issued to it. The client contacts a service that's going to give it a bearer token. The service requests mTLS auth without being picky about the certificate that's presented. The service embeds a hash of that certificate in the token before handing it back to the client. Whenever the client presents that token to any other service, the service ensures that the mTLS cert the client presented matches the hash in the bearer token. Copy the token without copying the mTLS certificate and the token gets rejected. Hurrah hurrah hats for everyone.

Well except for the obvious problem that if you're in a position to exfiltrate the bearer tokens you can probably just steal the client certificates and keys as well, and now you can pretend to be the original client and this is not adding much additional security. Fortunately pretty much everything we care about has the ability to store the private half of an asymmetric key in hardware (TPMs on Linux and Windows systems, the Secure Enclave on Macs and iPhones, either a piece of magical hardware or Trustzone on Android) in a way that avoids anyone being able to just steal the key.

How do we know that the key is actually in hardware? Here's the fun bit - it doesn't matter. If you're issuing a bearer token to a system then you're already asserting that the system is trusted. If the system is lying to you about whether or not the key it's presenting is hardware-backed then you've already lost. If it lied and the system is later compromised then sure all your apes get stolen, but maybe don't run systems that lie and avoid that situation as a result?

Anyway. This is covered in RFC 8705 so why aren't we all doing this already? From the client side, the largest generic issue is that TPMs are astonishingly slow in comparison to doing a TLS handshake on the CPU. RSA signing operations on TPMs can take around half a second, which doesn't sound too bad, except your browser is probably establishing multiple TLS connections to subdomains on the site it's connecting to and performance is going to tank. Fixing this involves doing whatever's necessary to convince the browser to pipe everything over a single TLS connection, and that's just not really where the web is right at the moment. Using EC keys instead helps a lot (~0.1 seconds per signature on modern TPMs), but it's still going to be a bottleneck.

The other problem, of course, is that ecosystem support for hardware-backed certificates is just awful. Windows lets you stick them into the standard platform certificate store, but the docs for this are hidden in a random PDF in a Github repo. Macs require you to do some weird bridging between the Secure Enclave API and the keychain API. Linux? Well, the standard answer is to do PKCS#11, and I have literally never met anybody who likes PKCS#11 and I have spent a bunch of time in standards meetings with the sort of people you might expect to like PKCS#11 and even they don't like it. It turns out that loading a bunch of random C bullshit that has strong feelings about function pointers into your security critical process is not necessarily something that is going to improve your quality of life, so instead you should use something like this and just have enough C to bridge to a language that isn't secretly plotting to kill your pets the moment you turn your back.

And, uh, obviously none of this matters at all unless people actually support it. Github has no support at all for validating the identity of whoever holds a bearer token. Most issuers of bearer tokens have no support for embedding holder identity into the token. This is not good! As of last week, all three of the big cloud providers support virtualised TPMs in their VMs - we should be running CI on systems that can do that, and tying any issued tokens to the VMs that are supposed to be making use of them.

So sure this isn't trivial. But it's also not impossible, and making this stuff work would improve the security of, well, everything. We literally have the technology to prevent attacks like Github suffered. What do we have to do to get people to actually start working on implementing that?

comment count unavailable comments

Crosswords 0.3

Posted by Jonathan Blandford on May 14, 2022 07:33 PM

I’m pleased to announce Crosswords 0.3. This is the first version that feels ready for public consumption. Unlike the version I announced five months ago, it is much more robust and has some key new features.

New in this version:

  • Available on flathub: After working on it out of the source tree for a long while, I finally got the flatpaks working. Download it and try it out—let me know what you think! I’d really appreciate any sort of feedback, positive or negative.

  • Puzzle Downloaders: This is a big addition. We now support external downloaders and puzzle sets. This means it’s possible to have a tool that downloads puzzles from the internet. It also lets us ship sets of puzzles for the user to solve. With a good set of puzzles, we could even host them on gnome.org. These can be distributed externally, or they can be distributed via flatpak extensions (if not installing locally). I wrapped xword-dl and puzzlepull with a downloader to add some newspaper downloaders, and I’m looking to add more original puzzles shortly.

  • Dark mode: I thought that this was the neatest feature in GNOME 42. We now support both light and dark mode and honor the system setting. CSS is also heavily used to style the app allowing for future visual modifications and customizations. I’m interested in allowing  css customization on a per-puzzle set basis.

  • Hint Button: This is a late addition. It can be used to suggest a random word that fits in the current row. It’s not super convenient, but I also don’t want to make the game too easy! We use Peter Broda’s wordlist as the basis for this.
  • .puz support: Internally we use the unencumbered .ipuz format. This is a pretty decent format and supports a wide-variety of crossword features. But there’s a lot of crosswords out there that use the .puz format and a I know people have large collections of puzzles in that format. I wrote a simple convertor to load these.

Next steps

I hope to release this a bit more frequently now that we have gotten to this stage. Next on my immediate list:

  • Revisit clue formats; empirically crossword files in the wild play a little fast-and-loose with escaping and formatting (eg. random entities and underline-escaping).
  • Write a Puzzle Set manager that will let you decide which downloaders/puzzles to show, as well as launch gnome-software to search for more.
  • Document the external Puzzle Set format to allow others to package up games.
  • Fix any bugs that are found.

I also plan to work on the Crossword editor and get that ready for a release on flathub. The amazing-looking AdwEntryRow should make parts of the design a lot cleaner.

But, no game is good without being fun! I am looking to expand the list of puzzles available. Let me know if you write crosswords and want to include them in a puzzle set.

Thanks

I couldn’t have gotten this release out without lots of help. In particular:

  • Federico, for helping to  refactor the main code and lots of great advice
  • Nick, for explaining how to get apps into flathub
  • Alexander, for his help with getting toasts working with a new behavior
  • Parker, for patiently explaining to me how the world of Crosswords worked
  • The folks on #gtk for answering questions and producing an excellent toolkit
  • And most importantly Rosanna, for testing everything and for her consistent cheering and support

Download on FLATHUB

FMW is finished

Posted by Evzen Gasta on May 04, 2022 10:55 AM

As all good things have to come to an end, my bachelor thesis has to end someday also. But I’m still looking forward to contributing to FMW, when are any updates or fixes needed. Official FMW 5.0.0 will be released soon. This was my first experience with an open source project and I liked it very much. I’m looking forward to start working on new open source projects in the future.

This project gave me a lot of experience such as learning how to deploy Qt applications on various operating systems, and how the structure of applications looks. I have learned the new programming language QML and how to use CMake in open source project. Another big experience for me is understanding how GitHub CI works. This was used to automatically create builds for various systems. I have practiced many options and advantages of Git (from the beginning I have been using only “git pull” with “git push”).

As a result I was able to create new generation of FMW.

<figure class="wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-1 is-layout-flex" data-carousel-extra="{"blog_id":200697502,"permalink":"https:\/\/egastablog.wordpress.com\/2022\/05\/04\/fmw-is-finished\/"}"> <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure> <figcaption class="blocks-gallery-caption">New generation of FMW</figcaption></figure>

This result couldn’t be achieved without my technical mentor Jan Grulich (his blog and twitter). And also without my leader of my bachelor’s thesis Dominika Regéciová (her blog and twitter). So I would like to thank you both, for the opportunity, patients and overall help.

Fitting Everything Together

Posted by Lennart Poettering on May 02, 2022 10:00 PM

TLDR: Hermetic /usr/ is awesome; let's popularize image-based OSes with modernized security properties built around immutability, SecureBoot, TPM2, adaptability, auto-updating, factory reset, uniformity – built from traditional distribution packages, but deployed via images.

Over the past years, systemd gained a number of components for building Linux-based operating systems. While these components individually have been adopted by many distributions and products for specific purposes, we did not publicly communicate a broader vision of how they should all fit together in the long run. In this blog story I hope to provide that from my personal perspective, i.e. explain how I personally would build an OS and where I personally think OS development with Linux should go.

I figure this is going to be a longer blog story, but I hope it will be equally enlightening. Please understand though that everything I write about OS design here is my personal opinion, and not one of my employer.

For the last 12 years or so I have been working on Linux OS development, mostly around systemd. In all those years I had a lot of time thinking about the Linux platform, and specifically traditional Linux distributions and their strengths and weaknesses. I have seen many attempts to reinvent Linux distributions in one way or another, to varying success. After all this most would probably agree that the traditional RPM or dpkg/apt-based distributions still define the Linux platform more than others (for 25+ years now), even though some Linux-based OSes (Android, ChromeOS) probably outnumber the installations overall.

And over all those 12 years I kept wondering, how would I actually build an OS for a system or for an appliance, and what are the components necessary to achieve that. And most importantly, how can we make these components generic enough so that they are useful in generic/traditional distributions too, and in other use cases than my own.

The Project

Before figuring out how I would build an OS it's probably good to figure out what type of OS I actually want to build, what purpose I intend to cover. I think a desktop OS is probably the most interesting. Why is that? Well, first of all, I use one of these for my job every single day, so I care immediately, it's my primary tool of work. But more importantly: I think building a desktop OS is one of the most complex overall OS projects you can work on, simply because desktops are so much more versatile and variable than servers or embedded devices. If one figures out the desktop case, I think there's a lot more to learn from, and reuse in the server or embedded case, then going the other way. After all, there's a reason why so much of the widely accepted Linux userspace stack comes from people with a desktop background (including systemd, BTW).

So, let's see how I would build a desktop OS. If you press me hard, and ask me why I would do that given that ChromeOS already exists and more or less is a Linux desktop OS: there's plenty I am missing in ChromeOS, but most importantly, I am lot more interested in building something people can easily and naturally rebuild and hack on, i.e. Google-style over-the-wall open source with its skewed power dynamic is not particularly attractive to me. I much prefer building this within the framework of a proper open source community, out in the open, and basing all this strongly on the status quo ante, i.e. the existing distributions. I think it is crucial to provide a clear avenue to build a modern OS based on the existing distribution model, if there shall ever be a chance to make this interesting for a larger audience.

(Let me underline though: even though I am going to focus on a desktop here, most of this is directly relevant for servers as well, in particular container host OSes and suchlike, or embedded devices, e.g. car IVI systems and so on.)

Design Goals

  1. First and foremost, I think the focus must be on an image-based design rather than a package-based one. For robustness and security it is essential to operate with reproducible, immutable images that describe the OS or large parts of it in full, rather than operating always with fine-grained RPM/dpkg style packages. That's not to say that packages are not relevant (I actually think they matter a lot!), but I think they should be less of a tool for deploying code but more one of building the objects to deploy. A different way to see this: any OS built like this must be easy to replicate in a large number of instances, with minimal variability. Regardless if we talk about desktops, servers or embedded devices: focus for my OS should be on "cattle", not "pets", i.e that from the start it's trivial to reuse the well-tested, cryptographically signed combination of software over a large set of devices the same way, with a maximum of bit-exact reuse and a minimum of local variances.

  2. The trust chain matters, from the boot loader all the way to the apps. This means all code that is run must be cryptographically validated before it is run. All storage must be cryptographically protected: public data must be integrity checked; private data must remain confidential.

    This is in fact where big distributions currently fail pretty badly. I would go as far as saying that SecureBoot on Linux distributions is mostly security theater at this point, if you so will. That's because the initrd that unlocks your FDE (i.e. the cryptographic concept that protects the rest of your system) is not signed or protected in any way. It's trivial to modify for an attacker with access to your hard disk in an undetectable way, and collect your FDE passphrase. The involved bureaucracy around the implementation of UEFI SecureBoot of the big distributions is to a large degree pointless if you ask me, given that once the kernel is assumed to be in a good state, as the next step the system invokes completely unsafe code with full privileges.

    This is a fault of current Linux distributions though, not of SecureBoot in general. Other OSes use this functionality in more useful ways, and we should correct that too.

  3. Pretty much the same thing: offline security matters. I want my data to be reasonably safe at rest, i.e. cryptographically inaccessible even when I leave my laptop in my hotel room, suspended.

  4. Everything should be cryptographically measured, so that remote attestation is supported for as much software shipped on the OS as possible.

  5. Everything should be self descriptive, have single sources of truths that are closely attached to the object itself, instead of stored externally.

  6. Everything should be self-updating. Today we know that software is never bug-free, and thus requires a continuous update cycle. Not only the OS itself, but also any extensions, services and apps running on it.

  7. Everything should be robust in respect to aborted OS operations, power loss and so on. It should be robust towards hosed OS updates (regardless if the download process failed, or the image was buggy), and not require user interaction to recover from them.

  8. There must always be a way to put the system back into a well-defined, guaranteed safe state ("factory reset"). This includes that all sensitive data from earlier uses becomes cryptographically inaccessible.

  9. The OS should enforce clear separation between vendor resources, system resources and user resources: conceptually and when it comes to cryptographical protection.

  10. Things should be adaptive: the system should come up and make the best of the system it runs on, adapt to the storage and hardware. Moreover, the system should support execution on bare metal equally well as execution in a VM environment and in a container environment (i.e. systemd-nspawn).

  11. Things should not require explicit installation. i.e. every image should be a live image. For installation it should be sufficient to dd an OS image onto disk. Thus, strong focus on "instantiate on first boot", rather than "instantiate before first boot".

  12. Things should be reasonably minimal. The image the system starts its life with should be quick to download, and not include resources that can as well be created locally later.

  13. System identity, local cryptographic keys and so on should be generated locally, not be pre-provisioned, so that there's no leak of sensitive data during the transport onto the system possible.

  14. Things should be reasonably democratic and hackable. It should be easy to fork an OS, to modify an OS and still get reasonable cryptographic protection. Modifying your OS should not necessarily imply that your "warranty is voided" and you lose all good properties of the OS, if you so will.

  15. Things should be reasonably modular. The privileged part of the core OS must be extensible, including on the individual system. It's not sufficient to support extensibility just through high-level UI applications.

  16. Things should be reasonably uniform, i.e. ideally the same formats and cryptographic properties are used for all components of the system, regardless if for the host OS itself or the payloads it receives and runs.

  17. Even taking all these goals into consideration, it should still be close to traditional Linux distributions, and take advantage of what they are really good at: integration and security update cycles.

Now that we know our goals and requirements, let's start designing the OS along these lines.

Hermetic /usr/

First of all the OS resources (code, data files, …) should be hermetic in an immutable /usr/. This means that a /usr/ tree should carry everything needed to set up the minimal set of directories and files outside of /usr/ to make the system work. This /usr/ tree can then be mounted read-only into the writable root file system that then will eventually carry the local configuration, state and user data in /etc/, /var/ and /home/ as usual.

Thankfully, modern distributions are surprisingly close to working without issues in such a hermetic context. Specifically, Fedora works mostly just fine: it has adopted the /usr/ merge and the declarative systemd-sysusers and systemd-tmpfiles components quite comprehensively, which means the directory trees outside of /usr/ are automatically generated as needed if missing. In particular /etc/passwd and /etc/group (and related files) are appropriately populated, should they be missing entries.

In my model a hermetic OS is hence comprehensively defined within /usr/: combine the /usr/ tree with an empty, otherwise unpopulated root file system, and it will boot up successfully, automatically adding the strictly necessary files, and resources that are necessary to boot up.

Monopolizing vendor OS resources and definitions in an immutable /usr/ opens multiple doors to us:

  • We can apply dm-verity to the whole /usr/ tree, i.e. guarantee structural, cryptographic integrity on the whole vendor OS resources at once, with full file system metadata.

  • We can implement updates to the OS easily: by implementing an A/B update scheme on the /usr/ tree we can update the OS resources atomically and robustly, while leaving the rest of the OS environment untouched.

  • We can implement factory reset easily: erase the root file system and reboot. The hermetic OS in /usr/ has all the information it needs to set up the root file system afresh — exactly like in a new installation.

Initial Look at the Partition Table

So let's have a look at a suitable partition table, taking a hermetic /usr/ into account. Let's conceptually start with a table of four entries:

  1. An UEFI System Partition (required by firmware to boot)

  2. Immutable, Verity-protected, signed file system with the /usr/ tree in version A

  3. Immutable, Verity-protected, signed file system with the /usr/ tree in version B

  4. A writable, encrypted root file system

(This is just for initial illustration here, as we'll see later it's going to be a bit more complex in the end.)

The Discoverable Partitions Specification provides suitable partition types UUIDs for all of the above partitions. Which is great, because it makes the image self-descriptive: simply by looking at the image's GPT table we know what to mount where. This means we do not need a manual /etc/fstab, and a multitude of tools such as systemd-nspawn and similar can operate directly on the disk image and boot it up.

Booting

Now that we have a rough idea how to organize the partition table, let's look a bit at how to boot into that. Specifically, in my model "unified kernels" are the way to go, specifically those implementing Boot Loader Specification Type #2. These are basically kernel images that have an initial RAM disk attached to them, as well as a kernel command line, a boot splash image and possibly more, all wrapped into a single UEFI PE binary. By combining these into one we achieve two goals: they become extremely easy to update (i.e. drop in one file, and you update kernel+initrd) and more importantly, you can sign them as one for the purpose of UEFI SecureBoot.

In my model, each version of such a kernel would be associated with exactly one version of the /usr/ tree: both are always updated at the same time. An update then becomes relatively simple: drop in one new /usr/ file system plus one kernel, and the update is complete.

The boot loader used for all this would be systemd-boot, of course. It's a very simple loader, and implements the aforementioned boot loader specification. This means it requires no explicit configuration or anything: it's entirely sufficient to drop in one such unified kernel file, and it will be picked up, and be made a candidate to boot into.

You might wonder how to configure the root file system to boot from with such a unified kernel that contains the kernel command line and is signed as a whole and thus immutable. The idea here is to use the usrhash= kernel command line option implemented by systemd-veritysetup-generator and systemd-fstab-generator. It does two things: it will search and set up a dm-verity volume for the /usr/ file system, and then mount it. It takes the root hash value of the dm-verity Merkle tree as the parameter. This hash is then also used to find the /usr/ partition in the GPT partition table, under the assumption that the partition UUIDs are derived from it, as per the suggestions in the discoverable partitions specification (see above).

systemd-boot (if not told otherwise) will do a version sort of the kernel image files it finds, and then automatically boot the newest one. Picking a specific kernel to boot will also fixate which version of the /usr/ tree to boot into, because — as mentioned — the Verity root hash of it is built into the kernel command line the unified kernel image contains.

In my model I'd place the kernels directly into the UEFI System Partition (ESP), in order to simplify things. (systemd-boot also supports reading them from a separate boot partition, but let's not complicate things needlessly, at least for now.)

So, with all this, we now already have a boot chain that goes something like this: once the boot loader is run, it will pick the newest kernel, which includes the initial RAM disk and a secure reference to the /usr/ file system to use. This is already great. But a /usr/ alone won't make us happy, we also need a root file system. In my model, that file system would be writable, and the /etc/ and /var/ hierarchies would be located directly on it. Since these trees potentially contain secrets (SSH keys, …) the root file system needs to be encrypted. We'll use LUKS2 for this, of course. In my model, I'd bind this to the TPM2 chip (for compatibility with systems lacking one, we can find a suitable fallback, which then provides weaker guarantees, see below). A TPM2 is a security chip available in most modern PCs. Among other things it contains a persistent secret key that can be used to encrypt data, in a way that only if you possess access to it and can prove you are using validated software you can decrypt it again. The cryptographic measuring I mentioned earlier is what allows this to work. But … let's not get lost too much in the details of TPM2 devices, that'd be material for a novel, and this blog story is going to be way too long already.

What does using a TPM2 bound key for unlocking the root file system get us? We can encrypt the root file system with it, and you can only read or make changes to the root file system if you also possess the TPM2 chip and run our validated version of the OS. This protects us against an evil maid scenario to some level: an attacker cannot just copy the hard disk of your laptop while you leave it in your hotel room, because unless the attacker also steals the TPM2 device it cannot be decrypted. The attacker can also not just modify the root file system, because such changes would be detected on next boot because they aren't done with the right cryptographic key.

So, now we have a system that already can boot up somewhat completely, and run userspace services. All code that is run is verified in some way: the /usr/ file system is Verity protected, and the root hash of it is included in the kernel that is signed via UEFI SecureBoot. And the root file system is locked to the TPM2 where the secret key is only accessible if our signed OS + /usr/ tree is used.

(One brief intermission here: so far all the components I am referencing here exist already, and have been shipped in systemd and other projects already, including the TPM2 based disk encryption. There's one thing missing here however at the moment that still needs to be developed (happy to take PRs!): right now TPM2 based LUKS2 unlocking is bound to PCR hash values. This is hard to work with when implementing updates — what we'd need instead is unlocking by signatures of PCR hashes. TPM2 supports this, but we don't support it yet in our systemd-cryptsetup + systemd-cryptenroll stack.)

One of the goals mentioned above is that cryptographic key material should always be generated locally on first boot, rather than pre-provisioned. This of course has implications for the encryption key of the root file system: if we want to boot into this system we need the root file system to exist, and thus a key already generated that it is encrypted with. But where precisely would we generate it if we have no installer which could generate while installing (as it is done in traditional Linux distribution installers). My proposed solution here is to use systemd-repart, which is a declarative, purely additive repartitioner. It can run from the initrd to create and format partitions on boot, before transitioning into the root file system. It can also format the partitions it creates and encrypt them, automatically enrolling an TPM2-bound key.

So, let's revisit the partition table we mentioned earlier. Here's what in my model we'd actually ship in the initial image:

  1. An UEFI System Partition (ESP)

  2. An immutable, Verity-protected, signed file system with the /usr/ tree in version A

And that's already it. No root file system, no B /usr/ partition, nothing else. Only two partitions are shipped: the ESP with the systemd-boot loader and one unified kernel image, and the A version of the /usr/ partition. Then, on first boot systemd-repart will notice that the root file system doesn't exist yet, and will create it, encrypt it, and format it, and enroll the key into the TPM2. It will also create the second /usr/ partition (B) that we'll need for later A/B updates (which will be created empty for now, until the first update operation actually takes place, see below). Once done the initrd will combine the fresh root file system with the shipped /usr/ tree, and transition into it. Because the OS is hermetic in /usr/ and contains all the systemd-tmpfiles and systemd-sysuser information it can then set up the root file system properly and create any directories and symlinks (and maybe a few files) necessary to operate.

Besides the fact that the root file system's encryption keys are generated on the system we boot from and never leave it, it is also pretty nice that the root file system will be sized dynamically, taking into account the physical size of the backing storage. This is perfect, because on first boot the image will automatically adapt to what it has been dd'ed onto.

Factory Reset

This is a good point to talk about the factory reset logic, i.e. the mechanism to place the system back into a known good state. This is important for two reasons: in our laptop use case, once you want to pass the laptop to someone else, you want to ensure your data is fully and comprehensively erased. Moreover, if you have reason to believe your device was hacked you want to revert the device to a known good state, i.e. ensure that exploits cannot persist. systemd-repart already has a mechanism for it. In the declarations of the partitions the system should have, entries may be marked to be candidates for erasing on factory reset. The actual factory reset is then requested by one of two means: by specifying a specific kernel command line option (which is not too interesting here, given we lock that down via UEFI SecureBoot; but then again, one could also add a second kernel to the ESP that is identical to the first, with only different that it lists this command line option: thus when the user selects this entry it will initiate a factory reset) — and via an EFI variable that can be set and is honoured on the immediately following boot. So here's how a factory reset would then go down: once the factory reset is requested it's enough to reboot. On the subsequent boot systemd-repart runs from the initrd, where it will honour the request and erase the partitions marked for erasing. Once that is complete the system is back in the state we shipped the system in: only the ESP and the /usr/ file system will exist, but the root file system is gone. And from here we can continue as on the original first boot: create a new root file system (and any other partitions), and encrypt/set it up afresh.

So now we have a nice setup, where everything is either signed or encrypted securely. The system can adapt to the system it is booted on automatically on first boot, and can easily be brought back into a well defined state identical to the way it was shipped in.

Modularity

But of course, such a monolithic, immutable system is only useful for very specific purposes. If /usr/ can't be written to, – at least in the traditional sense – one cannot just go and install a new software package that one needs. So here two goals are superficially conflicting: on one hand one wants modularity, i.e. the ability to add components to the system, and on the other immutability, i.e. that precisely this is prohibited.

So let's see what I propose as a middle ground in my model. First, what's the precise use case for such modularity? I see a couple of different ones:

  1. For some cases it is necessary to extend the system itself at the lowest level, so that the components added in extend (or maybe even replace) the resources shipped in the base OS image, so that they live in the same namespace, and are subject to the same security restrictions and privileges. Exposure to the details of the base OS and its interface for this kind of modularity is at the maximum.

    Example: a module that adds a debugger or tracing tools into the system. Or maybe an optional hardware driver module.

  2. In other cases, more isolation is preferable: instead of extending the system resources directly, additional services shall be added in that bring their own files, can live in their own namespace (but with "windows" into the host namespaces), however still are system components, and provide services to other programs, whether local or remote. Exposure to the details of the base OS for this kind of modularity is restricted: it mostly focuses on the ability to consume and provide IPC APIs from/to the system. Components of this type can still be highly privileged, but the level of integration is substantially smaller than for the type explained above.

    Example: a module that adds a specific VPN connection service to the OS.

  3. Finally, there's the actual payload of the OS. This stuff is relatively isolated from the OS and definitely from each other. It mostly consumes OS APIs, and generally doesn't provide OS APIs. This kind of stuff runs with minimal privileges, and in its own namespace of concepts.

    Example: a desktop app, for reading your emails.

Of course, the lines between these three types of modules are blurry, but I think distinguishing them does make sense, as I think different mechanisms are appropriate for each. So here's what I'd propose in my model to use for this.

  1. For the system extension case I think the systemd-sysext images are appropriate. This tool operates on system extension images that are very similar to the host's disk image: they also contain a /usr/ partition, protected by Verity. However, they just include additions to the host image: binaries that extend the host. When such a system extension image is activated, it is merged via an immutable overlayfs mount into the host's /usr/ tree. Thus any file shipped in such a system extension will suddenly appear as if it was part of the host OS itself. For optional components that should be considered part of the OS more or less this is a very simple and powerful way to combine an immutable OS with an immutable extension. Note that most likely extensions for an OS matching this tool should be built at the same time within the same update cycle scheme as the host OS itself. After all, the files included in the extensions will have dependencies on files in the system OS image, and care must be taken that these dependencies remain in order.

  2. For adding in additional somewhat isolated system services in my model, Portable Services are the proposed tool of choice. Portable services are in most ways just like regular system services; they could be included in the system OS image or an extension image. However, portable services use RootImage= to run off separate disk images, thus within their own namespace. Images set up this way have various ways to integrate into the host OS, as they are in most ways regular system services, which just happen to bring their own directory tree. Also, unlike regular system services, for them sandboxing is opt-out rather than opt-in. In my model, here too the disk images are Verity protected and thus immutable. Just like the host OS they are GPT disk images that come with a /usr/ partition and Verity data, along with signing.

  3. Finally, the actual payload of the OS, i.e. the apps. To be useful in real life here it is important to hook into existing ecosystems, so that a large set of apps are available. Given that on Linux flatpak (or on servers OCI containers) are the established format that pretty much won they are probably the way to go. That said, I think both of these mechanisms have relatively weak properties, in particular when it comes to security, since immutability/measurements and similar are not provided. This means, unlike for system extensions and portable services a complete trust chain with attestation and per-app cryptographically protected data is much harder to implement sanely.

What I'd like to underline here is that the main system OS image, as well as the system extension images and the portable service images are put together the same way: they are GPT disk images, with one immutable file system and associated Verity data. The latter two should also contain a PKCS#7 signature for the top-level Verity hash. This uniformity has many benefits: you can use the same tools to build and process these images, but most importantly: by using a single way to validate them throughout the stack (i.e. Verity, in the latter cases with PKCS#7 signatures), validation and measurement is straightforward. In fact it's so obvious that we don't even have to implement it in systemd: the kernel has direct support for this Verity signature checking natively already (IMA).

So, by composing a system at runtime from a host image, extension images and portable service images we have a nicely modular system where every single component is cryptographically validated on every single IO operation, and every component is measured, in its entire combination, directly in the kernel's IMA subsystem.

(Of course, once you add the desktop apps or OCI containers on top, then these properties are lost further down the chain. But well, a lot is already won, if you can close the chain that far down.)

Note that system extensions are not designed to replicate the fine grained packaging logic of RPM/dpkg. Of course, systemd-sysext is a generic tool, so you can use it for whatever you want, but there's a reason it does not bring support for a dependency language: the goal here is not to replicate traditional Linux packaging (we have that already, in RPM/dpkg, and I think they are actually OK for what they do) but to provide delivery of larger, coarser sets of functionality, in lockstep with the underlying OS' life-cycle and in particular with no interdependencies, except on the underlying OS.

Also note that depending on the use case it might make sense to also use system extensions to modularize the initrd step. This is probably less relevant for a desktop OS, but for server systems it might make sense to package up support for specific complex storage in a systemd-sysext system extension, which can be applied to the initrd that is built into the unified kernel. (In fact, we have been working on implementing signed yet modular initrd support to general purpose Fedora this way.)

Note that portable services are composable from system extension too, by the way. This makes them even more useful, as you can share a common runtime between multiple portable service, or even use the host image as common runtime for portable services. In this model a common runtime image is shared between one or more system extensions, and composed at runtime via an overlayfs instance.

More Modularity: Secondary OS Installs

Having an immutable, cryptographically locked down host OS is great I think, and if we have some moderate modularity on top, that's also great. But oftentimes it's useful to be able to depart/compromise for some specific use cases from that, i.e. provide a bridge for example to allow workloads designed around RPM/dpkg package management to coexist reasonably nicely with such an immutable host.

For this purpose in my model I'd propose using systemd-nspawn containers. The containers are focused on OS containerization, i.e. they allow you to run a full OS with init system and everything as payload (unlike for example Docker containers which focus on a single service, and where running a full OS in it is a mess).

Running systemd-nspawn containers for such secondary OS installs has various nice properties. One of course is that systemd-nspawn supports the same level of cryptographic image validation that we rely on for the host itself. Thus, to some level the whole OS trust chain is reasonably recursive if desired: the firmware validates the OS, and the OS can validate a secondary OS installed within it. In fact, we can run our trusted OS recursively on itself and get similar security guarantees! Besides these security aspects, systemd-nspawn also has really nice properties when it comes to integration with the host. For example the --bind-user= permits binding a host user record and their directory into a container as a simple one step operation. This makes it extremely easy to have a single user and $HOME but share it concurrently with the host and a zoo of secondary OSes in systemd-nspawn containers, which each could run different distributions even.

Developer Mode

Superficially, an OS with an immutable /usr/ appears much less hackable than an OS where everything is writable. Moreover, an OS where everything must be signed and cryptographically validated makes it hard to insert your own code, given you are unlikely to possess access to the signing keys.

To address this issue other systems have supported a "developer" mode: when entered the security guarantees are disabled, and the system can be freely modified, without cryptographic validation. While that's a great concept to have I doubt it's what most developers really want: the cryptographic properties of the OS are great after all, it sucks having to give them up once developer mode is activated.

In my model I'd thus propose two different approaches to this problem. First of all, I think there's value in allowing users to additively extend/override the OS via local developer system extensions. With this scheme the underlying cryptographic validation would remain in tact, but — if this form of development mode is explicitly enabled – the developer could add in more resources from local storage, that are not tied to the OS builder's chain of trust, but a local one (i.e. simply backed by encrypted storage of some form).

The second approach is to make it easy to extend (or in fact replace) the set of trusted validation keys, with local ones that are under the control of the user, in order to make it easy to operate with kernel, OS, extension, portable service or container images signed by the local developer without involvement of the OS builder. This is relatively easy to do for components down the trust chain, i.e. the elements further up the chain should optionally allow additional certificates to allow validation with.

(Note that systemd currently has no explicit support for a "developer" mode like this. I think we should add that sooner or later however.)

Democratizing Code Signing

Closely related to the question of developer mode is the question of code signing. If you ask me, the status quo of UEFI SecureBoot code signing in the major Linux distributions is pretty sad. The work to get stuff signed is massive, but in effect it delivers very little in return: because initrds are entirely unprotected, and reside on partitions lacking any form of cryptographic integrity protection any attacker can trivially easily modify the boot process of any such Linux system and freely collected FDE passphrases entered. There's little value in signing the boot loader and kernel in a complex bureaucracy if it then happily loads entirely unprotected code that processes the actually relevant security credentials: the FDE keys.

In my model, through use of unified kernels this important gap is closed, hence UEFI SecureBoot code signing becomes an integral part of the boot chain from firmware to the host OS. Unfortunately, code signing – and having something a user can locally hack, is to some level conflicting. However, I think we can improve the situation here, and put more emphasis on enrolling developer keys in the trust chain easily. Specifically, I see one relevant approach here: enrolling keys directly in the firmware is something that we should make less of a theoretical exercise and more something we can realistically deploy. See this work in progress making this more automatic and eventually safe. Other approaches are thinkable (including some that build on existing MokManager infrastructure), but given the politics involved, are harder to conclusively implement.

Running the OS itself in a container

What I explain above is put together with running on a bare metal system in mind. However, one of the stated goals is to make the OS adaptive enough to also run in a container environment (specifically: systemd-nspawn) nicely. Booting a disk image on bare metal or in a VM generally means that the UEFI firmware validates and invokes the boot loader, and the boot loader invokes the kernel which then transitions into the final system. This is different for containers: here the container manager immediately calls the init system, i.e. PID 1. Thus the validation logic must be different: cryptographic validation must be done by the container manager. In my model this is solved by shipping the OS image not only with a Verity data partition (as is already necessary for the UEFI SecureBoot trust chain, see above), but also with another partition, containing a PKCS#7 signature of the root hash of said Verity partition. This of course is exactly what I propose for both the system extension and portable service image. Thus, in my model the images for all three uses are put together the same way: an immutable /usr/ partition, accompanied by a Verity partition and a PKCS#7 signature partition. The OS image itself then has two ways "into" the trust chain: either through the signed unified kernel in the ESP (which is used for bare metal and VM boots) or by using the PKCS#7 signature stored in the partition (which is used for container/systemd-nspawn boots).

Parameterizing Kernels

A fully immutable and signed OS has to establish trust in the user data it makes use of before doing so. In the model I describe here, for /etc/ and /var/ we do this via disk encryption of the root file system (in combination with integrity checking). But the point where the root file system is mounted comes relatively late in the boot process, and thus cannot be used to parameterize the boot itself. In many cases it's important to be able to parameterize the boot process however.

For example, for the implementation of the developer mode indicated above it's useful to be able to pass this fact safely to the initrd, in combination with other fields (e.g. hashed root password for allowing in-initrd logins for debug purposes). After all, if the initrd is pre-built by the vendor and signed as whole together with the kernel it cannot be modified to carry such data directly (which is in fact how parameterizing of the initrd to a large degree was traditionally done).

In my model this is achieved through system credentials, which allow passing parameters to systems (and services for the matter) in an encrypted and authenticated fashion, bound to the TPM2 chip. This means that we can securely pass data into the initrd so that it can be authenticated and decrypted only on the system it is intended for and with the unified kernel image it was intended for.

Swap

In my model the OS would also carry a swap partition. For the simple reason that only then systemd-oomd.service can provide the best results. Also see In defence of swap: common misconceptions

Updating Images

We have a rough idea how the system shall be organized now, let's next focus on the deployment cycle: software needs regular update cycles, and software that is not updated regularly is a security problem. Thus, I am sure that any modern system must be automatically updated, without this requiring avoidable user interaction.

In my model, this is the job for systemd-sysupdate. It's a relatively simple A/B image updater: it operates either on partitions, on regular files in a directory, or on subdirectories in a directory. Each entry has a version (which is encoded in the GPT partition label for partitions, and in the filename for regular files and directories): whenever an update is initiated the oldest version is erased, and the newest version is downloaded.

With the setup described above a system update becomes a really simple operation. On each update the systemd-sysupdate tool downloads a /usr/ file system partition, an accompanying Verity partition, a PKCS#7 signature partition, and drops it into the host's partition table (where it possibly replaces the oldest version so far stored there). Then it downloads a unified kernel image and drops it into the EFI System Partition's /EFI/Linux (as per Boot Loader Specification; possibly erase the oldest such file there). And that's already the whole update process: four files are downloaded from the server, unpacked and put in the most straightforward of ways into the partition table or file system. Unlike in other OS designs there's no mechanism required to explicitly switch to the newer version, the aforementioned systemd-boot logic will automatically pick the newest kernel once it is dropped in.

Above we talked a lot about modularity, and how to put systems together as a combination of a host OS image, system extension images for the initrd and the host, portable service images and systemd-nspawn container images. I already emphasized that these image files are actually always the same: GPT disk images with partition definitions that match the Discoverable Partition Specification. This comes very handy when thinking about updating: we can use the exact same systemd-sysupdate tool for updating these other images as we use for the host image. The uniformity of the on-disk format allows us to update them uniformly too.

Boot Counting + Assessment

Automatic OS updates do not come without risks: if they happen automatically, and an update goes wrong this might mean your system might be automatically updated into a brick. This of course is less than ideal. Hence it is essential to address this reasonably automatically. In my model, there's systemd's Automatic Boot Assessment for that. The mechanism is simple: whenever a new unified kernel image is dropped into the system it will be stored with a small integer counter value included in the filename. Whenever the unified kernel image is selected for booting by systemd-boot, it is decreased by one. Once the system booted up successfully (which is determined by userspace) the counter is removed from the file name (which indicates "this entry is known to work"). If the counter ever hits zero, this indicates that it tried to boot it a couple of times, and each time failed, thus is apparently "bad". In this case systemd-boot will not consider the kernel anymore, and revert to the next older (that doesn't have a counter of zero).

By sticking the boot counter into the filename of the unified kernel we can directly attach this information to the kernel, and thus need not concern ourselves with cleaning up secondary information about the kernel when the kernel is removed. Updating with a tool like systemd-sysupdate remains a very simple operation hence: drop one old file, add one new file.

Picking the Newest Version

I already mentioned that systemd-boot automatically picks the newest unified kernel image to boot, by looking at the version encoded in the filename. This is done via a simple strverscmp() call (well, truth be told, it's a modified version of that call, different from the one implemented in libc, because real-life package managers use more complex rules for comparing versions these days, and hence it made sense to do that here too). The concept of having multiple entries of some resource in a directory, and picking the newest one automatically is a powerful concept, I think. It means adding/removing new versions is extremely easy (as we discussed above, in systemd-sysupdate context), and allows stateless determination of what to use.

If systemd-boot can do that, what about system extension images, portable service images, or systemd-nspawn container images that do not actually use systemd-boot as the entrypoint? All these tools actually implement the very same logic, but on the partition level: if multiple suitable /usr/ partitions exist, then the newest is determined by comparing the GPT partition label of them.

This is in a way the counterpart to the systemd-sysupdate update logic described above: we always need a way to determine which partition to actually then use after the update took place: and this becomes very easy each time: enumerate possible entries, pick the newest as per the (modified) strverscmp() result.

Home Directory Management

In my model the device's users and their home directories are managed by systemd-homed. This means they are relatively self-contained and can be migrated easily between devices. The numeric UID assignment for each user is done at the moment of login only, and the files in the home directory are mapped as needed via a uidmap mount. It also allows us to protect the data of each user individually with a credential that belongs to the user itself. i.e. instead of binding confidentiality of the user's data to the system-wide full-disk-encryption each user gets their own encrypted home directory where the user's authentication token (password, FIDO2 token, PKCS#11 token, recovery key…) is used as authentication and decryption key for the user's data. This brings a major improvement for security as it means the user's data is cryptographically inaccessible except when the user is actually logged in.

It also allows us to correct another major issue with traditional Linux systems: the way how data encryption works during system suspend. Traditionally on Linux the disk encryption credentials (e.g. LUKS passphrase) is kept in memory also when the system is suspended. This is a bad choice for security, since many (most?) of us probably never turn off their laptop but suspend it instead. But if the decryption key is always present in unencrypted form during the suspended time, then it could potentially be read from there by a sufficiently equipped attacker.

By encrypting the user's home directory with the user's authentication token we can first safely "suspend" the home directory before going to the system suspend state (i.e. flush out the cryptographic keys needed to access it). This means any process currently accessing the home directory will be frozen for the time of the suspend, but that's expected anyway during a system suspend cycle. Why is this better than the status quo ante? In this model the home directory's cryptographic key material is erased during suspend, but it can be safely reacquired on resume, from system code. If the system is only encrypted as a whole however, then the system code itself couldn't reauthenticate the user, because it would be frozen too. By separating home directory encryption from the root file system encryption we can avoid this problem.

Partition Setup

So we discussed the organization of the partitions OS images multiple times in the above, each time focusing on a specific aspect. Let's now summarize how this should look like all together.

In my model, the initial, shipped OS image should look roughly like this:

  • (1) An UEFI System Partition, with systemd-boot as boot loader and one unified kernel
  • (2) A /usr/ partition (version "A"), with a label fooOS_0.7 (under the assumption we called our project fooOS and the image version is 0.7).
  • (3) A Verity partition for the /usr/ partition (version "A"), with the same label
  • (4) A partition carrying the Verity root hash for the /usr/ partition (version "A"), along with a PKCS#7 signature of it, also with the same label

On first boot this is augmented by systemd-repart like this:

  • (5) A second /usr/ partition (version "B"), initially with a label _empty (which is the label systemd-sysupdate uses to mark partitions that currently carry no valid payload)
  • (6) A Verity partition for that (version "B"), similar to the above case, also labelled _empty
  • (7) And ditto a Verity root hash partition with a PKCS#7 signature (version "B"), also labelled _empty
  • (8) A root file system, encrypted and locked to the TPM2
  • (9) A home file system, integrity protected via a key also in TPM2 (encryption is unnecessary, since systemd-homed adds that on its own, and it's nice to avoid duplicate encryption)
  • (10) A swap partition, encrypted and locked to the TPM2

Then, on the first OS update the partitions 5, 6, 7 are filled with a new version of the OS (let's say 0.8) and thus get their label updated to fooOS_0.8. After a boot, this version is active.

On a subsequent update the three partitions fooOS_0.7 get wiped and replaced by fooOS_0.9 and so on.

On factory reset, the partitions 8, 9, 10 are deleted, so that systemd-repart recreates them, using a new set of cryptographic keys.

Here's a graphic that hopefully illustrates the partition stable from shipped image, through first boot, multiple update cycles and eventual factory reset:

Partitions Overview

Trust Chain

So let's summarize the intended chain of trust (for bare metal/VM boots) that ensures every piece of code in this model is signed and validated, and any system secret is locked to TPM2.

  1. First, firmware (or possibly shim) authenticates systemd-boot.

  2. Once systemd-boot picks a unified kernel image to boot, it is also authenticated by firmware/shim.

  3. The unified kernel image contains an initrd, which is the first userspace component that runs. It finds any system extensions passed into the initrd, and sets them up through Verity. The kernel will validate the Verity root hash signature of these system extension images against its usual keyring.

  4. The initrd also finds credentials passed in, then securely unlocks (which means: decrypts + authenticates) them with a secret from the TPM2 chip, locked to the kernel image itself.

  5. The kernel image also contains a kernel command line which contains a usrhash= option that pins the root hash of the /usr/ partition to use.

  6. The initrd then unlocks the encrypted root file system, with a secret bound to the TPM2 chip.

  7. The system then transitions into the main system, i.e. the combination of the Verity protected /usr/ and the encrypted root files system. It then activates two more encrypted (and/or integrity protected) volumes for /home/ and swap, also with a secret tied to the TPM2 chip.

Here's an attempt to illustrate the above graphically:

Trust Chain

This is the trust chain of the basic OS. Validation of system extension images, portable service images, systemd-nspawn container images always takes place the same way: the kernel validates these Verity images along with their PKCS#7 signatures against the kernel's keyring.

File System Choice

In the above I left the choice of file systems unspecified. For the immutable /usr/ partitions squashfs might be a good candidate, but any other that works nicely in a read-only fashion and generates reproducible results is a good choice, too. The home directories as managed by systemd-homed should certainly use btrfs, because it's the only general purpose file system supporting online grow and shrink, which systemd-homed can take benefit of, to manage storage.

For the root file system btrfs is likely also the best idea. That's because we intend to use LUKS/dm-crypt underneath, which by default only provides confidentiality, not authenticity of the data (unless combined with dm-integrity). Since btrfs (unlike xfs/ext4) does full data checksumming it's probably the best choice here, since it means we don't have to use dm-integrity (which comes at a higher performance cost).

OS Installation vs. OS Instantiation

In the discussion above a lot of focus was put on setting up the OS and completing the partition layout and such on first boot. This means installing the OS becomes as simple as dd-ing (i.e. "streaming") the shipped disk image into the final HDD medium. Simple, isn't it?

Of course, such a scheme is just too simple for many setups in real life. Whenever multi-boot is required (i.e. co-installing an OS implementing this model with another unrelated one), dd-ing a disk image onto the HDD is going to overwrite user data that was supposed to be kept around.

In order to cover for this case, in my model, we'd use systemd-repart (again!) to allow streaming the source disk image into the target HDD in a smarter, additive way. The tool after all is purely additive: it will add in partitions or grow them if they are missing or too small. systemd-repart already has all the necessary provisions to not only create a partition on the target disk, but also copy blocks from a raw installer disk. An install operation would then become a two stop process: one invocation of systemd-repart that adds in the /usr/, its Verity and the signature partition to the target medium, populated with a copy of the same partition of the installer medium. And one invocation of bootctl that installs the systemd-boot boot loader in the ESP. (Well, there's one thing missing here: the unified OS kernel also needs to be dropped into the ESP. For now, this can be done with a simple cp call. In the long run, this should probably be something bootctl can do as well, if told so.)

So, with this scheme we have a simple scheme to cover all bases: we can either just dd an image to disk, or we can stream an image onto an existing HDD, adding a couple of new partitions and files to the ESP.

Of course, in reality things are more complex than that even: there's a good chance that the existing ESP is simply too small to carry multiple unified kernels. In my model, the way to address this is by shipping two slightly different systemd-repart partition definition file sets: the ideal case when the ESP is large enough, and a fallback case, where it isn't and where we then add in an addition XBOOTLDR partition (as per the Discoverable Partitions Specification). In that mode the ESP carries the boot loader, but the unified kernels are stored in the XBOOTLDR partition. This scenario is not quite as simple as the XBOOTLDR-less scenario described first, but is equally well supported in the various tools. Note that systemd-repart can be told size constraints on the partitions it shall create or augment, thus to implement this scheme it's enough to invoke the tool with the fallback partition scheme if invocation with the ideal scheme fails.

Either way: regardless how the partitions, the boot loader and the unified kernels ended up on the system's hard disk, on first boot the code paths are the same again: systemd-repart will be called to augment the partition table with the root file system, and properly encrypt it, as was already discussed earlier here. This means: all cryptographic key material used for disk encryption is generated on first boot only, the installer phase does not encrypt anything.

Live Systems vs. Installer Systems vs. Installed Systems

Traditionally on Linux three types of systems were common: "installed" systems, i.e. that are stored on the main storage of the device and are the primary place people spend their time in; "installer" systems which are used to install them and whose job is to copy and setup the packages that make up the installed system; and "live" systems, which were a middle ground: a system that behaves like an installed system in most ways, but lives on removable media.

In my model I'd like to remove the distinction between these three concepts as much as possible: each of these three images should carry the exact same /usr/ file system, and should be suitable to be replicated the same way. Once installed the resulting image can also act as an installer for another system, and so on, creating a certain "viral" effect: if you have one image or installation it's automatically something you can replicate 1:1 with a simple systemd-repart invocation.

Building Images According to this Model

The above explains how the image should look like and how its first boot and update cycle will modify it. But this leaves one question unanswered: how to actually build the initial image for OS instances according to this model?

Note that there's nothing too special about the images following this model: they are ultimately just GPT disk images with Linux file systems, following the Discoverable Partition Specification. This means you can use any set of tools of your choice that can put together GPT disk images for compliant images.

I personally would use mkosi for this purpose though. It's designed to generate compliant images, and has a rich toolset for SecureBoot and signed/Verity file systems already in place.

What is key here is that this model doesn't depart from RPM and dpkg, instead it builds on top of that: in this model they are excellent for putting together images on the build host, but deployment onto the runtime host does not involve individual packages.

I think one cannot underestimate the value traditional distributions bring, regarding security, integration and general polishing. The concepts I describe above are inherited from this, but depart from the idea that distribution packages are a runtime concept and make it a build-time concept instead.

Note that the above is pretty much independent from the underlying distribution.

Final Words

I have no illusions, general purpose distributions are not going to adopt this model as their default any time soon, and it's not even my goal that they do that. The above is my personal vision, and I don't expect people to buy into it 100%, and that's fine. However, what I am interested in is finding the overlaps, i.e. work with people who buy 50% into this vision, and share the components.

My goals here thus are to:

  1. Get distributions to move to a model where images like this can be built from the distribution easily. Specifically this means that distributions make their OS hermetic in /usr/.

  2. Find the overlaps, share components with other projects to revisit how distributions are put together. This is already happening, see systemd-tmpfiles and systemd-sysuser support in various distributions, but I think there's more to share.

  3. Make people interested in building actual real-world images based on general purpose distributions adhering to the model described above. I'd love a "GnomeBook" image with full trust properties, that is built from true Linux distros, such as Fedora or ArchLinux.

FAQ

  1. What about ostree? Doesn't ostree already deliver what this blog story describes?

    ostree is fine technology, but in respect to security and robustness properties it's not too interesting I think, because unlike image-based approaches it cannot really deliver integrity/robustness guarantees over the whole tree easily. To be able to trust an ostree setup you have to establish trust in the underlying file system first, and the complexity of the file system makes that challenging. To provide an effective offline-secure trust chain through the whole depth of the stack it is essential to cryptographically validate every single I/O operation. In an image-based model this is trivially easy, but in ostree model it's with current file system technology not possible and even if this is added in one way or another in the future (though I am not aware of anyone doing on-access file-based integrity that spans a whole hierarchy of files that was compatible with ostree's hardlink farm model) I think validation is still at too high a level, since Linux file system developers made very clear their implementations are not robust to rogue images. (There's this stuff planned, but doing structural authentication ahead of time instead of on access makes the idea to weak — and I'd expect too slow — in my eyes.)

    With my design I want to deliver similar security guarantees as ChromeOS does, but ostree is much weaker there, and I see no perspective of this changing. In a way ostree's integrity checks are similar to RPM's and enforced on download rather than on access. In the model I suggest above, it's always on access, and thus safe towards offline attacks (i.e. evil maid attacks). In today's world, I think offline security is absolutely necessary though.

    That said, ostree does have some benefits over the model described above: it naturally shares file system inodes if many of the modules/images involved share the same data. It's thus more space efficient on disk (and thus also in RAM/cache to some degree) by default. In my model it would be up to the image builders to minimize shipping overly redundant disk images, by making good use of suitably composable system extensions.

  2. What about configuration management?

    At first glance immutable systems and configuration management don't go that well together. However, do note, that in the model I propose above the root file system with all its contents, including /etc/ and /var/ is actually writable and can be modified like on any other typical Linux distribution. The only exception is /usr/ where the immutable OS is hermetic. That means configuration management tools should work just fine in this model – up to the point where they are used to install additional RPM/dpkg packages, because that's something not allowed in the model above: packages need to be installed at image build time and thus on the image build host, not the runtime host.

  3. What about non-UEFI and non-TPM2 systems?

    The above is designed around the feature set of contemporary PCs, and this means UEFI and TPM2 being available (simply because the PC is pretty much defined by the Windows platform, and current versions of Windows require both).

    I think it's important to make the best of the features of today's PC hardware, and then find suitable fallbacks on more limited hardware. Specifically this means: if there's desire to implement something like the this on non-UEFI or non-TPM2 hardware we should look for suitable fallbacks for the individual functionality, but generally try to add glue to the old systems so that conceptually they behave more like the new systems instead of the other way round. Or in other words: most of the above is not strictly tied to UEFI or TPM2, and for many cases already there are reasonably fallbacks in place for more limited systems. Of course, without TPM2 many of the security guarantees will be weakened.

  4. How would you name an OS built that way?

    I think a desktop OS built this way if it has the GNOME desktop should of course be called GnomeBook, to mimic the ChromeBook name. ;-)

    But in general, I'd call hermetic, adaptive, immutable OSes like this "particles".

How can you help?

  1. Help making Distributions Hermetic in /usr/!

    One of the core ideas of the approach described above is to make the OS hermetic in /usr/, i.e. make it carry a comprehensive description of what needs to be set up outside of it when instantiated. Specifically, this means that system users that are needed are declared in systemd-sysusers snippets, and skeleton files and directories are created via systemd-tmpfiles. Moreover additional partitions should be declared via systemd-repart drop-ins.

    At this point some distributions (such as Fedora) are (probably more by accident than on purpose) already mostly hermetic in /usr/, at least for the most basic parts of the OS. However, this is not complete: many daemons require to have specific resources set up in /var/ or /etc/ before they can work, and the relevant packages do not carry systemd-tmpfiles descriptions that add them if missing. So there are two ways you could help here: politically, it would be highly relevant to convince distributions that an OS that is hermetic in /usr/ is highly desirable and it's a worthy goal for packagers to get there. More specifically, it would be desirable if RPM/dpkg packages would ship with enough systemd-tmpfiles information so that configuration files the packages strictly need for operation are symlinked (or copied) from /usr/share/factory/ if they are missing (even better of course would be if packages from their upstream sources on would just work with an empty /etc/ and /var/, and create themselves what they need and default to good defaults in absence of configuration files).

    Note that distributions that adopted systemd-sysusers, systemd-tmpfiles and the /usr/ merge are already quite close to providing an OS that is hermetic in /usr/. These were the big, the major advancements: making the image fully hermetic should be less controversial – at least that's my guess.

    Also note that making the OS hermetic in /usr/ is not just useful in scenarios like the above. It also means that stuff like this and like this can work well.

  2. Fill in the gaps!

    I already mentioned a couple of missing bits and pieces in the implementation of the overall vision. In the systemd project we'd be delighted to review/merge any PRs that fill in the voids.

  3. Build your own OS like this!

    Of course, while we built all these building blocks and they have been adopted to various levels and various purposes in the various distributions, no one so far built an OS that puts things together just like that. It would be excellent if we had communities that work on building images like what I propose above. i.e. if you want to work on making a secure GnomeBook as I suggest above a reality that would be more than welcome.

    How could this look like specifically? Pick an existing distribution, write a set of mkosi descriptions plus some additional drop-in files, and then build this on some build infrastructure. While doing so, report the gaps, and help us address them.

Further Documentation of Used Components and Concepts

  1. systemd-tmpfiles
  2. systemd-sysusers
  3. systemd-boot
  4. systemd-stub
  5. systemd-sysext
  6. systemd-portabled, Portable Services Introduction
  7. systemd-repart
  8. systemd-nspawn
  9. systemd-sysupdate
  10. systemd-creds, System and Service Credentials
  11. systemd-homed
  12. Automatic Boot Assessment
  13. Boot Loader Specification
  14. Discoverable Partitions Specification
  15. Safely Building Images

Earlier Blog Stories Related to this Topic

  1. The Strange State of Authenticated Boot and Disk Encryption on Generic Linux Distributions
  2. The Wondrous World of Discoverable GPT Disk Images
  3. Unlocking LUKS2 volumes with TPM2, FIDO2, PKCS#11 Security Hardware on systemd 248
  4. Portable Services with systemd v239
  5. mkosi — A Tool for Generating OS Images

And that's all for now.

threads and libxcb, part 2

Posted by Adam Jackson on April 29, 2022 04:35 PM

I've been working on kopper recently, which is a complementary project to zink. Just as zink implements OpenGL in terms of Vulkan, kopper seeks to implement the GL window system bindings - like EGL and GLX - in terms of the Vulkan WSI extensions. There are several benefits to doing this, which I'll get into in a future post, but today's story is really about libX11 and libxcb.

Yes, again.

One important GLX feature is the ability to set the swap interval, which is how you get tear-free rendering by syncing buffer swaps to the vertical retrace. A swap interval of 1 is the typical case, where an image update happens once per frame. The Vulkan way to do this is to set the swapchain present mode to FIFO, since FIFO updates are implicitly synced to vblank. Mesa's WSI code for X11 uses a swapchain management thread for FIFO present modes. This thread is started from inside the vulkan driver, and it only uses libxcb to talk to the X server. But libGL is a libX11 client library, so in this scenario there is always an "xlib thread" as well.

libX11 uses libxcb internally these days, because otherwise there would be no way to intermix xlib and xcb calls in the same process. But it does not use libxcb's reflection of the protocol, XGetGeometry does not call xcb_get_geometry for example. Instead, libxcb has an API to allow other code to take over the write side of the display socket, with a callback mechanism to get it back when another xcb client issues a request. The callback function libX11 uses here is straightforward: lock the Display, flush out any internally buffered requests, and return the sequence number of the last request written. Both libraries need this sequence number for various reasons internally, xcb for example uses it to make sure replies go back to the thread that issued the request.

But "lock the Display" here really means call into a vtable in the Display struct. That vtable is filled in during XOpenDisplay, but the individual function pointers are only non-NULL if you called XInitThreads beforehand. And if you're libGL, you have no way to enforce that, your public-facing API operates on a Display that was already created.

So now we see the race. The queue management thread calls into libxcb while the main thread is somewhere inside libX11. Since libX11 has taken the socket, the xcb thread runs the release callback. Since the Display was not made thread-safe at XOpenDisplay time, the release callback does not block, so the xlib thread's work won't be correctly accounted. If you're lucky the two sides will at least write to the socket atomically with respect to each other, but at this point they have diverging opinions about the request sequence numbering, and it's a matter of time until you crash.

It turns out kopper makes this really easy to hit. Like "resize a glxgears window" easy. However, this isn't just a kopper issue, this race exists for every program that uses xcb on a not-necessarily-thread-safe Display. The only reasonable fix is to for libX11 to just always be thread-safe.

So now, it is.


fwupd 1.8.0 and 50 million updates

Posted by Richard Hughes on April 28, 2022 03:05 PM

I’ve just tagged the 1.8.0 release of fwupd, with these release notes — there’s lots of good stuff there as always. More remarkable is that LVFS has now supplied over 50 million updates to Linux machines all around the globe. The true number is going to be unknown, as we allow vendors to distribute updates without any kind of logging, and also allow companies or agencies to mirror the entire LVFS so the archive can be used offline. The true number of updates deployed will be a lot higher than 50 million, which honestly blows my tiny mind. Just 7 years ago Christian asked me to “make firmware updates work on Linux” and now we have a thriving client project that respects both your freedom and your privacy, and a thriving ecosystem of hardware vendors who consider Linux users first class citizens. Of course, there are vendors who are not shipping updates for popular hardware, but they’re now in the minority — and every month we have two or three new vendor account requests. The logistical, security and most importantly commercial implications of not being “on the LVFS” are now too critical even for tier-1 IHVs, ODMs and OEMs to ignore.

I’m still amazed to see Reddit posts, YouTube videos and random people on Twitter talk about the thing that’s been my baby for the last few years. It’s both frightening as hell (because of the responsibility) and incredibly humbling at the same time. Red Hat can certainly take a lot of credit for the undeniable success of LVFS and fwupd, as they have been the people paying my salary and pushing me forward over the last decade and more. Obviously I’m glad everything is being used by the distros like Ubuntu and Arch, although for me it’s Fedora that’s at least technically the one pushing Linux forward these days. I’ve seen Fedora grow in market share year on year, and I’m proud to be one of the people pushing the exciting Future Features into Fedora.

So what happens next? I guess we have the next 50 million updates to look forward to. The LVFS has been growing ever so slightly exponentially since it was first conceived so that won’t take very long now. We’ve blasted through 1MM updates a month, and now regularly ship more than 2MM updates a month and with the number of devices supported growing like it has (4004 different streams, with 2232 more planned), it does seem an exciting place to be. I’m glad that the number of committers for fwupd is growing at the same pace as the popularity, and I’m not planning to burn out any time soon. Google has also been an amazing partner in encouraging vendors to ship updates on the LVFS and shipping fwupd in ChromeOS — and their trust and support has been invaluable. I’m also glad the “side-projects” like “GNOME Firmware“, “Host Security ID“, “fwupd friendly firmware” and “uSWID as an SBoM” also seem to be flourishing into independent projects in their own right. It does seem now is the right time to push the ecosystem towards transparency, open source and respecting the users privacy. Redistributing closed source firmware may be an unusual route to get there, but it’s certainly working. There are a few super-sekret things I’m just not allowed to share yet, but it’s fair to say that I’m incredibly excited about the long term future.

From the bottom of my heart, thank you all for your encouragement and support.

Testing my System Code in /usr/ Without Modifying /usr/

Posted by Lennart Poettering on April 26, 2022 10:00 PM

I recently blogged about how to run a volatile systemd-nspawn container from your host's /usr/ tree, for quickly testing stuff in your host environment, sharing your home drectory, but all that without making a single modification to your host, and on an isolated node.

The one-liner discussed in that blog story is great for testing during system software development. Let's have a look at another systemd tool that I regularly use to test things during systemd development, in a relatively safe environment, but still taking full benefit of my host's setup.

Since a while now, systemd has been shipping with a simple component called systemd-sysext. It's primary usecase goes something like this: on one hand OS systems with immutable /usr/ hierarchies are fantastic for security, robustness, updating and simplicity, but on the other hand not being able to quickly add stuff to /usr/ is just annoying.

systemd-sysext is supposed to bridge this contradiction: when invoked it will merge a bunch of "system extension" images into /usr/ (and /opt/ as a matter of fact) through the use of read-only overlayfs, making all files shipped in the image instantly and atomically appear in /usr/ during runtime — as if they always had been there. Now, let's say you are building your locked down OS, with an immutable /usr/ tree, and it comes without ability to log into, without debugging tools, without anything you want and need when trying to debug and fix something in the system. With systemd-sysext you could use a system extension image that contains all this, drop it into the system, and activate it with systemd-sysext so that it genuinely extends the host system.

(There are many other usecases for this tool, for example, you could build systems that way that at their base use a generic image, but by installing one or more system extensions get extended to with additional more specific functionality, or drivers, or similar. The tool is generic, use it for whatever you want, but for now let's not get lost in listing all the possibilites.)

What's particularly nice about the tool is that it supports automatically discovered dm-verity images, with signatures and everything. So you can even do this in a fully authenticated, measured, safe way. But I am digressing…

Now that we (hopefully) have a rough understanding what systemd-sysext is and does, let's discuss how specficially we can use this in the context of system software development, to safely use and test bleeding edge development code — built freshly from your project's build tree – in your host OS without having to risk that the host OS is corrupted or becomes unbootable by stuff that didn't quite yet work the way it was envisioned:

The images systemd-sysext merges into /usr/ can be of two kinds: disk images with a file system/verity/signature, or simple, plain directory trees. To make these images available to the tool, they can be placed or symlinked into /usr/lib/extensions/, /var/lib/extensions/, /run/extensions/ (and a bunch of others). So if we now install our freshly built development software into a subdirectory of those paths, then that's entirely sufficient to make them valid system extension images in the sense of systemd-sysext, and thus can be merged into /usr/ to try them out.

To be more specific: when I develop systemd itself, here's what I do regularly, to see how my new development version would behave on my host system. As preparation I checked out the systemd development git tree first of course, hacked around in it a bit, then built it with meson/ninja. And now I want to test what I just built:

sudo DESTDIR=/run/extensions/systemd-test meson install -C build --quiet --no-rebuild &&
        sudo systemd-sysext refresh --force

Explanation: first, we'll install my current build tree as a system extension into /run/extensions/systemd-test/. And then we apply it to the host via the systemd-sysext refresh command. This command will search for all installed system extension images in the aforementioned directories, then unmount (i.e. "unmerge") any previously merged dirs from /usr/ and then freshly mount (i.e. "merge") the new set of system extensions on top of /usr/. And just like that, I have installed my development tree of systemd into the host OS, and all that without actually modifying/replacing even a single file on the host at all. Nothing here actually hit the disk!

Note that all this works on any system really, it is not necessary that the underlying OS even is designed with immutability in mind. Just because the tool was developed with immutable systems in mind it doesn't mean you couldn't use it on traditional systems where /usr/ is mutable as well. In fact, my development box actually runs regular Fedora, i.e. is RPM-based and thus has a mutable /usr/ tree. As long as system extensions are applied the whole of /usr/ becomes read-only though.

Once I am done testing, when I want to revert to how things were without the image installed, it is sufficient to call:

sudo systemd-sysext unmerge

And there you go, all files my development tree generated are gone again, and the host system is as it was before (and /usr/ mutable again, in case one is on a traditional Linux distribution).

Also note that a reboot (regardless if a clean one or an abnormal shutdown) will undo the whole thing automatically, since we installed our build tree into /run/ after all, i.e. a tmpfs instance that is flushed on boot. And given that the overlayfs merge is a runtime thing, too, the whole operation was executed without any persistence. Isn't that great?

(You might wonder why I specified --force on the systemd-sysext refresh line earlier. That's because systemd-sysext actually does some minimal version compatibility checks when applying system extension images. For that it will look at the host's /etc/os-release file with /usr/lib/extension-release.d/extension-release.<name>, and refuse operaton if the image is not actually built for the host OS version. Here we don't want to bother with dropping that file in there, we know already that the extension image is compatible with the host, as we just built it on it. --force allows us to skip the version check.)

You might wonder: what about the combination of the idea from the previous blog story (regarding running container's off the host /usr/ tree) with system extensions? Glad you asked. Right now we have no support for this, but it's high on our TODO list (patches welcome, of course!). i.e. a new switch for systemd-nspawn called --system-extension= that would allow merging one or more such extensions into the container tree booted would be stellar. With that, with a single command I could run a container off my host OS but with a development version of systemd dropped in, all without any persistence. How awesome would that be?

(Oh, and in case you wonder, all of this only works with distributions that have completed the /usr/ merge. On legacy distributions that didn't do that and still place parts of /usr/ all over the hierarchy the above won't work, since merging /usr/ trees via overlayfs is pretty pointess if the OS is not hermetic in /usr/.)

And that's all for now. Happy hacking!

Fedora MediaWriter (nextgen) on MacOS

Posted by Evzen Gasta on April 19, 2022 06:34 AM

Hi, for the past few months I have been working on MacOS build and fixing USB drive restoration on Windows as well as fixing few design bugs. I fixed overflowing highlighted option background in combobox used in Adwaita style on Linux. Next design fix was updating animations when changing pages. Now when we build the new FMW on all platforms, I can say I like MacOS and Linux versions a lot more than Windows version. I don’t like design of buttons. What about your opinion?

<figure class="wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-3 is-layout-flex" data-carousel-extra="{"blog_id":200697502,"permalink":"https:\/\/egastablog.wordpress.com\/2022\/04\/19\/fedora-media-writer-nextgen-builds-for-all-platforms\/"}"> <figure class="wp-block-image size-large"><figcaption>Linux</figcaption></figure> <figure class="wp-block-image size-large"><figcaption>Windows</figcaption></figure> <figure class="wp-block-image size-large"><figcaption>MacOS</figcaption></figure> </figure>

Windows

On Windows I have fixed the problem with USB drive restoration. This was present for a long period of time. Problem was caused by corruption of the partition table after the ISO file was written onto a USB drive and utility diskpart was not able to work with this drive anymore. This was quite tricky to figure out, but after all the fix was quite easy. I had to recreate new partition table using command “convert gpt” which creates new partition table on USB drive. For backwards compatibility with older systems and USB drives I had to run command “convert mbr”. After that utility diskpart now can restore the USB drive to original state.

MacOS

Making build for MacOS was quite tricky, because I don’t have any Apple device at home. So I had to use virtual machine (app sosumi available as snap) to try my changes I made to FMW. Even with this app I could not connect USB drive to properly try out full functionality of this new version. Thanks to my mentor and leader of my bachelor’s thesis Jan Grulich (his blog and twitter) for all the testing.

While updating CI manifest I had to make some tweaks to build.sh file, such as adjusting paths of few modules and libraries missed by macdeployqt or libraries that has to be copied manually.

For those of you who don’t know what is macdeployqt… Its “The Mac deployment tool can be found in QTDIR/bin/macdeployqt. It is designed to automate the process of creating a deployable application bundle that contains the Qt libraries as private frameworks” definition of macdeployqt from Qt docs.

All available builds you can download from releases on GitHub, if there are any problems  with the application, let me know either here or create bug in issues on GitHub.

GTK: Calling attention to a widget in LibreOffice with CSS animation

Posted by Caolán McNamara on April 17, 2022 02:01 PM


The motivation here is Use attention-attracting cue when pressing Ctrl+F while in the find bar and do "something" to call attention to the widget. I thought I'd try some of the built-in GTK CSS support. So here we animate the widget to reduce its left and right margins inwards a little and its opacity to 50% before returning to its original properties.

Nothing is ever simple, so in order to repeat the animation (the second time ctrl+f is pressed while the widget has focus) we have to duplicate the animation and use the other copy when the previous copy was last run.

The Freedom Phone is not great at privacy

Posted by Matthew Garrett on April 17, 2022 12:23 AM
The Freedom Phone advertises itself as a "Free speech and privacy first focused phone". As documented on the features page, it runs ClearOS, an Android-based OS produced by Clear United (or maybe one of the bewildering array of associated companies, we'll come back to that later). It's advertised as including Signal, but what's shipped is not the version available from the Signal website or any official app store - instead it's this fork called "ClearSignal".

The first thing to note about ClearSignal is that the privacy policy link from that page 404s, which is not a great start. The second thing is that it has a version number of 5.8.14, which is strange because upstream went from 5.8.10 to 5.9.0. The third is that, despite Signal being GPL 3, there's no source code available. So, I grabbed jadx and started looking for differences between ClearSignal and the upstream 5.8.10 release. The results were, uh, surprising.

First up is that they seem to have integrated ACRA, a crash reporting framework. This feels a little odd - in the absence of a privacy policy, it's unclear what information this gathers or how it'll be stored. Having a piece of privacy software automatically uploading information about what you were doing in the event of a crash with no notification other than a toast that appears saying "Crash Report" feels a little dubious.

Next is that Signal (for fairly obvious reasons) warns you if your version is out of date and eventually refuses to work unless you upgrade. ClearSignal has dealt with this problem by, uh, simply removing that code. The MacOS version of the desktop app they provide for download seems to be derived from a release from last September, which for an Electron-based app feels like a pretty terrible idea. Weirdly, for Windows they link to an official binary release from February 2021, and for Linux they tell you how to use the upstream repo properly. I have no idea what's going on here.

They've also added support for network backups of your Signal data. This involves the backups being pushed to an S3 bucket using credentials that are statically available in the app. It's ok, though, each upload has some sort of nominally unique identifier associated with it, so it's not trivial to just download other people's backups. But, uh, where does this identifier come from? It turns out that Clear Center, another of the Clear family of companies, employs a bunch of people to work on a ClearID[1], some sort of decentralised something or other that seems to be based on KERI. There's an overview slide deck here which didn't really answer any of my questions and as far as I can tell this is entirely lacking any sort of peer review, but hey it's only the one thing that stops anyone on the internet being able to grab your Signal backups so how important can it be.

The final thing, though? They've extended Signal's invitation support to encourage users to get others to sign up for Clear United. There's an exposed API endpoint called "get_user_email_by_mobile_number" which does exactly what you'd expect - if you give it a registered phone number, it gives you back the associated email address. This requires no authentication. But it gets better! The API to generate a referral link to send to others sends the name and phone number of everyone in your phone's contact list. There does not appear to be any indication that this is going to happen.

So, from a privacy perspective, going to go with things being some distance from ideal. But what's going on with all these Clear companies anyway? They all seem to be related to Michael Proper, who founded the Clear Foundation in 2009. They are, perhaps unsurprisingly, heavily invested in blockchain stuff, while Clear United also appears to be some sort of multi-level marketing scheme which has a membership agreement that includes the somewhat astonishing claim that:

Specifically, the initial focus of the Association will provide members with supplements and technologies for:

9a. Frequency Evaluation, Scans, Reports;

9b. Remote Frequency Health Tuning through Quantum Entanglement;

9c. General and Customized Frequency Optimizations;


- there's more discussion of this and other weirdness here. Clear Center, meanwhile, has a Chief Physics Officer? I have a lot of questions.

Anyway. We have a company that seems to be combining blockchain and MLM, has some opinions about Quantum Entanglement, bases the security of its platform on a set of novel cryptographic primitives that seem to have had no external review, has implemented an API that just hands out personal information without any authentication and an app that appears more than happy to upload all your contact details without telling you first, has failed to update this app to keep up with upstream security updates, and is violating the upstream license. If this is their idea of "privacy first", I really hate to think what their code looks like when privacy comes further down the list.

[1] Pointed out to me here

comment count unavailable comments

GTK4: ComboBox Cell Renderers

Posted by Caolán McNamara on April 12, 2022 07:52 PM

 

Custom Gtk Cell Renderer in GTK4 GtkComboBox in LibreOffice now working as hoped for.

Running a Container off the Host /usr/

Posted by Lennart Poettering on April 05, 2022 10:00 PM

Apparently, in some parts of this world, the /usr/-merge transition is still ongoing. Let's take the opportunity to have a look at one specific way to take benefit of the /usr/-merge (and associated work) IRL.

I develop system-level software as you might know. Oftentimes I want to run my development code on my PC but be reasonably sure it cannot destroy or otherwise negatively affect my host system. Now I could set up a container tree for that, and boot into that. But often I am too lazy for that, I don't want to bother with a slow package manager setting up a new OS tree for me. So here's what I often do instead — and this only works because of the /usr/-merge.

I run a command like the following (without any preparatory work):

systemd-nspawn \
        --directory=/ \
        --volatile=yes \
        -U \
        --set-credential=passwd.hashed-password.root:$(mkpasswd mysecret) \
        --set-credential=firstboot.locale:C.UTF-8 \
        --bind-user=lennart \
        -b

And then I very quickly get a login prompt on a container that runs the exact same software as my host — but is also isolated from the host. I do not need to prepare any separate OS tree or anything else. It just works. And my host user lennart is just there, ready for me to log into.

So here's what these systemd-nspawn options specifically do:

  • --directory=/ tells systemd-nspawn to run off the host OS' file hierarchy. That smells like danger of course, running two OS instances off the same directory hierarchy. But don't be scared, because:

  • --volatile=yes enables volatile mode. Specifically this means what we configured with --directory=/ as root file system is slightly rearranged. Instead of mounting that tree as it is, we'll mount a tmpfs instance as actual root file system, and then mount the /usr/ subdirectory of the specified hierarchy into the /usr/ subdirectory of the container file hierarchy in read-only fashion – and only that directory. So now we have a container directory tree that is basically empty, but imports all host OS binaries and libraries into its /usr/ tree. All software installed on the host is also available in the container with no manual work. This mechanism only works because on /usr/-merged OSes vendor resources are monopolized at a single place: /usr/. It's sufficient to share that one directory with the container to get a second instance of the host OS running. Note that this means /etc/ and /var/ will be entirely empty initially when this second system boots up. Thankfully, forward looking distributions (such as Fedora) have adopted systemd-tmpfiles and systemd-sysusers quite pervasively, so that system users and files/directories required for operation are created automatically should they be missing. Thus, even though at boot the mentioned directories are initially empty, once the system is booted up they are sufficiently populated for things to just work.

  • -U means we'll enable user namespacing, in fully automatic mode. This does three things: it picks a free host UID range dynamically for the container, then sets up user namespacing for the container processes mapping host UID range to UIDs 0…65534 in the container. It then sets up a similar UID mapped mount on the /usr/ tree of the container. Net effect: file ownerships as set on the host OS tree appear as they belong to the very same users inside of the container environment, except that we use user namespacing for everything, and thus the users are actually neatly isolated from the host.

  • --set-credential=passwd.hashed-password.root:$(mkpasswd mysecret) passes a credential to the container. Credentials are bits of data that you can pass to systemd services and whole systems. They are actually awesome concepts (e.g. they support TPM2 authentication/encryption that just works!) but I am not going to go into details around that, given it's off-topic in this specific scenario. Here we just take benefit of the fact that systemd-sysusers looks for a credential called passwd.hashed-password.root to initialize the root password of the system from. We set it to mysecret. This means once the system is booted up we can log in as root and the supplied password. Yay. (Remember, /etc/ is initially empty on this container, and thus also carries no /etc/passwd or /etc/shadow, and thus has no root user record, and thus no root password.)

    mkpasswd is a tool then converts a plain text password into a UNIX hashed password, which is what this specific credential expects.

  • Similar, --set-credential=firstboot.locale:C.UTF-8 tells the systemd-firstboot service in the container to initialize /etc/locale.conf with this locale.

  • --bind-user=lennart binds the host user lennart into the container, also as user lennart. This does two things: it mounts the host user's home directory into the container. It also copies a minimal user record of the specified user into the container that nss-systemd then picks up and includes in the regular user database. This means, once the container is booted up I can log in as lennart with my regular password, and once I logged in I will see my regular host home directory, and can make changes to it. Yippieh! (This does a couple of more things, such as UID mapping and things, but let's not get lost in too much details.)

So, if I run this, I will very quickly get a login prompt, where I can log into as my regular user. I have full access to my host home directory, but otherwise everything is nicely isolated from the host, and changes outside of the home directory are either prohibited or are volatile, i.e. go to a tmpfs instance whose lifetime is bound to the container's lifetime: when I shut down the container I just started, then any changes outside of my user's home directory are lost.

Note that while here I use --volatile=yes in combination with --directory=/ you can actually use it on any OS hierarchy, i.e. just about any directory that contains OS binaries.

Similar, the --bind-user= stuff works with any OS hierarchy too (but do note that only systemd 249 and newer will pick up the user records passed to the container that way, i.e. this requires at least v249 both on the host and in the container to work).

Or in short: the possibilities are endless!

Requirements

For this all to work, you need:

  1. A recent kernel (5.15 should suffice, as it brings UID mapped mounts for the most common file systems, so that -U and --bind-user= can work well.)

  2. A recent systemd (249 should suffice, which brings --bind-user=, and a -U switch backed by UID mapped mounts).

  3. A distribution that adopted the /usr/-merge, systemd-tmpfiles and systemd-sysusers so that the directory hierarchy and user databases are automatically populated when empty at boot. (Fedora 35 should suffice.)

Limitations

While a lot of today's software actually out of the box works well on systems that come up with an unpopulated /etc/ and /var/, and either fall back to reasonable built-in defaults, or deploy systemd-tmpfiles to create what is missing, things aren't perfect: some software typically installed an desktop OSes will fail to start when invoked in such a container, and be visible as ugly failed services, but it won't stop me from logging in and using the system for what I want to use it. It would be excellent to get that fixed, though. This can either be fixed in the relevant software upstream (i.e. if opening your configuration file fails with ENOENT, then just default to reasonable defaults), or in the distribution packaging (i.e. add a tmpfiles.d/ file that copies or symlinks in skeleton configuration from /usr/share/factory/etc/ via the C or L line types).

And then there's certain software dealing with hardware management and similar that simply cannot work in a container (as device APIs on Linux are generally not virtualized for containers) reasonably. It would be excellent if software like that would be updated to carry ConditionVirtualization=!container or ConditionPathIsReadWrite=/sys conditionalization in their unit files, so that it is automatically – cleanly – skipped when executed in such a container environment.

And that's all for now.

Bearer tokens are just awful

Posted by Matthew Garrett on April 05, 2022 06:54 AM
As I mentioned last time, bearer tokens are not super compatible with a model in which every access is verified to ensure it's coming from a trusted device. Let's talk about that in a bit more detail.

First off, what is a bearer token? In its simplest form, it's simply an opaque blob that you give to a user after an authentication or authorisation challenge, and then they show it to you to prove that they should be allowed access to a resource. In theory you could just hand someone a randomly generated blob, but then you'd need to keep track of which blobs you've issued and when they should be expired and who they correspond to, so frequently this is actually done using JWTs which contain some base64 encoded JSON that describes the user and group membership and so on and then have a signature associated with them so whenever the user presents one you can just validate the signature and then assume that the contents of the JSON are trustworthy.

One thing to note here is that the crypto is purely between whoever issued the token and whoever validates the token - as far as the server is concerned, any client who can just show it the token is just fine as long as the signature is verified. There's no way to verify the client's state, so one of the core ideas of Zero Trust (that we verify that the client is in a trustworthy state on every access) is already violated.

Can we make things not terrible? Sure! We may not be able to validate the client state on every access, but we can validate the client state when we issue the token in the first place. When the user hits a login page, we do state validation according to whatever policy we want to enforce, and if the client violates that policy we refuse to issue a token to it. If the token has a sufficiently short lifetime then an attacker is only going to have a short period of time to use that token before it expires and then (with luck) they won't be able to get a new one because the state validation will fail.

Except! This is fine for cases where we control the issuance flow. What if we have a scenario where a third party authenticates the client (by verifying that they have a valid token issued by their ID provider) and then uses that to issue their own token that's much longer lived? Well, now the client has a long-lived token sitting on it. And if anyone copies that token to another device, they can now pretend to be that client.

This is, sadly, depressingly common. A lot of services will verify the user, and then issue an oauth token that'll expire some time around the heat death of the universe. If a client system is compromised and an attacker just copies that token to another system, they can continue to pretend to be the legitimate user until someone notices (which, depending on whether or not the service in question has any sort of audit logs, and whether you're paying any attention to them, may be once screenshots of your data show up on Twitter).

This is a problem! There's no way to fit a hosted service that behaves this way into a Zero Trust model - the best you can say is that a token was issued to a device that was, around that time, apparently trustworthy, and now it's some time later and you have literally no idea whether the device is still trustworthy or if the token is still even on that device.

But wait, there's more! Even if you're nowhere near doing any sort of Zero Trust stuff, imagine the case of a user having a bunch of tokens from multiple services on their laptop, and then they leave their laptop unlocked in a cafe while they head to the toilet and whoops it's not there any more, better assume that someone has access to all the data on there. How many services has our opportunistic new laptop owner gained access to as a result? How do we revoke all of the tokens that are sitting there on the local disk? Do you even have a policy for dealing with that?

There isn't a simple answer to all of these problems. Replacing bearer tokens with some sort of asymmetric cryptographic challenge to the client would at least let us tie the tokens to a TPM or other secure enclave, and then we wouldn't have to worry about them being copied elsewhere. But that wouldn't help us if the client is compromised and the attacker simply keeps using the compromised client. The entire model of simply proving knowledge of a secret being sufficient to gain access to a resource is inherently incompatible with a desire for fine-grained trust verification on every access, but I don't see anything changing until we have a standard for third party services to be able to perform that trust verification against a customer's policy.

Still, at least this means I can just run weird Android IoT apps through mitmproxy, pull the bearer token out of the request headers and then start poking the remote API with curl. It may all be broken, but it's also got me a bunch of bug bounty credit, so, it;s impossible to say if its bad or not,

(Addendum: this suggestion that we solve the hardware binding problem by simply passing all the network traffic through some sort of local enclave that could see tokens being set and would then sequester them and reinject them into later requests is OBVIOUSLY HORRIFYING and is also probably going to be at least three startup pitches by the end of next week)

comment count unavailable comments

ZTA doesn't solve all problems, but partial implementations solve fewer

Posted by Matthew Garrett on March 31, 2022 11:06 PM
Traditional network access controls work by assuming that something is trustworthy based on some other factor - for example, if a computer is on your office network, it's trustworthy because only trustworthy people should be able to gain physical access to plug something in. If you restrict access to your services to requests coming from trusted networks, then you can assert that it's coming from a trusted device.

Of course, this isn't necessarily true. A machine on your office network may be compromised. An attacker may obtain valid VPN credentials. Someone could leave a hostile device plugged in under a desk in a meeting room. Trust is being placed in devices that may not be trustworthy.

A Zero Trust Architecture (ZTA) is one where a device is granted no inherent trust. Instead, each access to a service is validated against some policy - if the policy is satisfied, the access is permitted. A typical implementation involves granting each device some sort of cryptographic identity (typically a TLS client certificate) and placing the protected services behind a proxy. The proxy verifies the device identity, queries another service to obtain the current device state (we'll come back to that in a moment), compares the state against a policy and either pass the request through to the service or reject it. Different services can have different policies (eg, you probably want a lax policy around whatever's hosting the documentation for how to fix your system if it's being refused access to something for being in the wrong state), and if you want you can also tie it to proof of user identity in some way.

From a user perspective, this is entirely transparent. The proxy is made available on the public internet, DNS for the services points to the proxy, and every time your users try to access the service they hit the proxy instead and (if everything's ok) gain access to it no matter which network they're on. There's no need to connect to a VPN first, and there's no worries about accidentally leaking information over the public internet instead of over a secure link.

It's also notable that traditional solutions tend to be all-or-nothing. If I have some services that are more sensitive than others, the only way I can really enforce this is by having multiple different VPNs and only granting access to sensitive services from specific VPNs. This obviously risks combinatorial explosion once I have more than a couple of policies, and it's a terrible user experience.

Overall, ZTA approaches provide more security and an improved user experience. So why are we still using VPNs? Primarily because this is all extremely difficult. Let's take a look at an extremely recent scenario. A device used by customer support technicians was compromised. The vendor in question has a solution that can tie authentication decisions to whether or not a device has a cryptographic identity. If this was in use, and if the cryptographic identity was tied to the device hardware (eg, by being generated in a TPM), the attacker would not simply be able to obtain the user credentials and log in from their own device. This is good - if the attacker wanted to maintain access to the service, they needed to stay on the device in question. This increases the probability of the monitoring tooling on the compromised device noticing them.

Unfortunately, the attacker simply disabled the monitoring tooling on the compromised device. If device state was being verified on each access then this would be noticed before too long - the last data received from the device would be flagged as too old, and the requests would no longer satisfy any reasonable access control policy. Instead, the device was assumed to be trustworthy simply because it could demonstrate its identity. There's an important point here: just because a device belongs to you doesn't mean it's a trustworthy device.

So, if ZTA approaches are so powerful and user-friendly, why aren't we all using one? There's a few problems, but the single biggest is that there's no standardised way to verify device state in any meaningful way. Remote Attestation can both prove device identity and the device boot state, but the only product on the market that does much with this is Microsoft's Device Health Attestation. DHA doesn't solve the broader problem of also reporting runtime state - it may be able to verify that endpoint monitoring was launched, but it doesn't make assertions about whether it's still running. Right now, people are left trying to scrape this information from whatever tooling they're running. The absence of any standardised approach to this problem means anyone who wants to deploy a strong ZTA has to integrate with whatever tooling they're already running, and that then increases the cost of migrating to any other tooling later.

But even device identity is hard! Knowing whether a machine should be given a certificate or not depends on knowing whether or not you own it, and inventory control is a surprisingly difficult problem in a lot of environments. It's not even just a matter of whether a machine should be given a certificate in the first place - if a machine is reported as lost or stolen, its trust should be revoked. Your inventory system needs to tie into your device state store in order to ensure that your proxies drop access.

And, worse, all of this depends on you being able to put stuff behind a proxy in the first place! If you're using third-party hosted services, that's a problem. In the absence of a proxy, trust decisions are probably made at login time. It's possible to tie user auth decisions to device identity and state (eg, a self-hosted SAML endpoint could do that before passing through to the actual ID provider), but that's still going to end up providing a bearer token of some sort that can potentially be exfiltrated, and will continue to be trusted even if the device state becomes invalid.

ZTA doesn't solve all problems, and there isn't a clear path to it doing so without significantly greater industry support. But a complete ZTA solution is significantly more powerful than a partial one. Verifying device identity is a step on the path to ZTA, but in the absence of device state verification it's only a step.

comment count unavailable comments

AMD's Pluton implementation seems to be controllable

Posted by Matthew Garrett on March 23, 2022 08:42 AM
I've been digging through the firmware for an AMD laptop with a Ryzen 6000 that incorporates Pluton for the past couple of weeks, and I've got some rough conclusions. Note that these are extremely preliminary and may not be accurate, but I'm going to try to encourage others to look into this in more detail. For those of you at home, I'm using an image from here, specifically version 309. The installer is happy to run under Wine, and if you tell it to "Extract" rather than "Install" it'll leave a file sitting in C:\\DRIVERS\ASUS_GA402RK_309_BIOS_Update_20220322235241 which seems to have an additional 2K of header on it. Strip that and you should have something approximating a flash image.

Looking for UTF16 strings in this reveals something interesting:

<quote>Pluton (HSP) X86 Firmware Support
Enable/Disable X86 firmware HSP related code path, including AGESA HSP module, SBIOS HSP related drivers.
Auto - Depends on PcdAmdHspCoreEnable build value
NOTE: PSP directory entry 0xB BIT36 have the highest priority.
NOTE: This option will NOT put HSP hardware in disable state, to disable HSP hardware, you need setup PSP directory entry 0xB, BIT36 to 1.
// EntryValue[36] = 0: Enable, HSP core is enabled.
// EntryValue[36] = 1: Disable, HSP core is disabled then PSP will gate the HSP clock, no further PSP to HSP commands. System will boot without HSP.
</quote>
"HSP" here means "Hardware Security Processor" - a generic term that refers to Pluton in this case. This is a configuration setting that determines whether Pluton is "enabled" or not - my interpretation of this is that it doesn't directly influence Pluton, but disables all mechanisms that would allow the OS to communicate with it. In this scenario, Pluton has its firmware loaded and could conceivably be functional if the OS knew how to speak to it directly, but the firmware will never speak to it itself. I took a quick look at the Windows drivers for Pluton and it looks like they won't do anything unless the firmware wants to expose Pluton, so this should mean that Windows will do nothing.

So what about the reference to "PSP directory entry 0xB BIT36 have the highest priority"? The PSP is the AMD Platform Security Processor - it's an ARM core on the CPU package that boots before the x86. The PSP firmware lives in the same flash image as the x86 firmware, so the PSP looks for a header that points it towards the firmware it should execute. This gives a pointer to a "directory" - a list of different object types and where they're located in flash (there's a description of this for slightly older AMDs here). Type 0xb is treated slightly specially. Where most types contain the address of where the actual object is, type 0xb contains a 64-bit value that's interpreted as enabling or disabling various features - something AMD calls "soft fusing" (Intel have something similar that involves setting bits in the Firmware Interface Table). The PSP looks at the bits that are set here and alters its behaviour. If bit 36 is set, the PSP tells Pluton to turn itself off and will no longer send any commands to it.

So, we have two mechanisms to disable Pluton - the PSP can tell it to turn itself off, or the x86 firmware can simply never speak to it or admit that it exists. Both of these imply that Pluton has started executing before it's shut down, so it's reasonable to wonder whether it can still do stuff. In the image I'm looking at, there's a blob starting at 0x0069b610 that appears to be firmware for Pluton - it contains chunks that appear to be the reference TPM2 implementation, and it broadly decompiles as valid ARM code. It should be viable to figure out whether it can do anything in the face of being "disabled" via either of the above mechanisms.

Unfortunately for me, the system I'm looking at does set bit 36 in the 0xb entry - as a result, Pluton is disabled before x86 code starts running and I can't investigate further in any straightforward way. The implication that the user-controllable mechanism for disabling Pluton merely disables x86 communication with it rather than turning it off entirely is a little concerning, although (assuming Pluton is behaving as a TPM rather than having an enhanced set of capabilities) skipping any firmware communication means the OS has no way to know what happened before it started running even if it has a mechanism to communicate with Pluton without firmware assistance. In that scenario it'd be viable to write a bootloader shim that just faked up the firmware measurements before handing control to the OS.

The bit 36 disabling mechanism seems more solid? Again, it should be possible to analyse the Pluton firmware to determine whether it actually pays attention to a disable command being sent. But even if it chooses to ignore that, if the PSP is in a position to just cut the clock to Pluton, it's not going to be able to do a lot. At that point we're trusting AMD rather than trusting Microsoft, but given that you're also trusting AMD to execute the code you're giving them to execute, it's hard to avoid placing trust in them.

Overall: I'm reasonably confident that systems that ship with Pluton disabled via setting bit 36 in the soft fuses are going to disable it sufficiently hard that the OS can't do anything about it. Systems that give the user an option to enable or disable it are a little less clear in that respect, and it's possible (but not yet demonstrated) that an OS could communicate with Pluton anyway. However, if that's true, and if the firmware never communicates with Pluton itself, the user could install a stub loader in UEFI that mimicks the firmware behaviour and leaves the OS thinking everything was good when it absolutely is not.

So, assuming that Pluton in its current form on AMD has no capabilities outside those we know about, the disabling mechanisms are probably good enough. It's tough to make a firm statement on this before I have access to a system that doesn't just disable it immediately, so stay tuned for updates.

comment count unavailable comments

Stand With Ukraine

Posted by Martin Stransky on March 21, 2022 08:02 PM
<figure class="wp-block-image size-large"></figure>

Updates on the new generation of Fedora MediaWriter

Posted by Evzen Gasta on March 14, 2022 03:36 PM

In the past few months I have been developing new generation of Fedora New generation of FMW with a new UI written in Qt6 which will use native QtQuick styles for Windows and MacOS. At this point I have a fully functional application with all the features from the current version.

The application can be now build for Windows and Linux. Linux builds are also available as Flatpak for testing pourpose. Bare in mind this is still not the final version and there still might be some issues.

To develop new generation of FMW I had to learn, rework or update many things. To develop a new generation of FMW I had to learn, rework or update many things. A lot of them I saw for the first time like a complex project, QML, CMake, Qt… First of all I’ve started removing deprecated code that is no longer supported in Qt6 and made sure FMW can be built. After that I could start working on QML. I’ve started making pages and gradually adding basic functionality step by step.

Flatpak

For testing I’ve learned what is Flatpak and how Flatpak works. I’ve also learned how to create Flatpak using manifest file, this file was also needed to be updated. Thanks to GitHub CI we have available a test build made after every commit.

<figure class="wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-5 is-layout-flex" data-carousel-extra="{"blog_id":200697502,"permalink":"https:\/\/egastablog.wordpress.com\/2022\/03\/14\/updates-on-the-new-generation-of-fedora-mediawriter\/"}"> <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure> <figcaption class="blocks-gallery-caption">Flatpak</figcaption></figure>

Windows

To develop Windows build I had to update the current script to support Qt6. That means updating qmake to cmake and removing all deprecated things. Also remove Adwaita theme, which is no longer used in FMW build on Windows. I also needed to copy all dependencies to create an installer for Windows.

Testing was pretty difficult for me, because I’m making the Windows builds on Linux and debugging was possible only in QtCreator on Windows. I had to boot multiple times from Linux to Windows and the other way around.

<figure class="wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-9 is-layout-flex" data-carousel-extra="{"blog_id":200697502,"permalink":"https:\/\/egastablog.wordpress.com\/2022\/03\/14\/updates-on-the-new-generation-of-fedora-mediawriter\/"}"> <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure> <figcaption class="blocks-gallery-caption">Windows</figcaption></figure>

Windows version still needs some adjustment and fixes, such as restoring USB drive. For that reason we have available only the Linux version for testing, which has full functionality. I would be happy if you give me any feedback either here, or on github in the issues mentioning you use the nextgen version. You can get the development version here in the releases section.