Fedora desktop Planet

Can we fix bearer tokens?

Posted by Matthew Garrett on May 16, 2022 07:48 AM
Last month I wrote about how bearer tokens are just awful, and a week later Github announced that someone had managed to exfiltrate bearer tokens from Heroku that gave them access to, well, a lot of Github repositories. This has inevitably resulted in a whole bunch of discussion about a number of things, but people seem to be largely ignoring the fundamental issue that maybe we just shouldn't have magical blobs that grant you access to basically everything even if you've copied them from a legitimate holder to Honest John's Totally Legitimate API Consumer.

To make it clearer what the problem is here, let's use an analogy. You have a safety deposit box. To gain access to it, you simply need to be able to open it with a key you were given. Anyone who turns up with the key can open the box and do whatever they want with the contents. Unfortunately, the key is extremely easy to copy - anyone who is able to get hold of your keyring for a moment is in a position to duplicate it, and then they have access to the box. Wouldn't it be better if something could be done to ensure that whoever showed up with a working key was someone who was actually authorised to have that key?

To achieve that we need some way to verify the identity of the person holding the key. In the physical world we have a range of ways to achieve this, from simply checking whether someone has a piece of ID that associates them with the safety deposit box all the way up to invasive biometric measurements that supposedly verify that they're definitely the same person. But computers don't have passports or fingerprints, so we need another way to identify them.

When you open a browser and try to connect to your bank, the bank's website provides a TLS certificate that lets your browser know that you're talking to your bank instead of someone pretending to be your bank. The spec allows this to be a bi-directional transaction - you can also prove your identity to the remote website. This is referred to as "mutual TLS", or mTLS, and a successful mTLS transaction ends up with both ends knowing who they're talking to, as long as they have a reason to trust the certificate they were presented with.

That's actually a pretty big constraint! We have a reasonable model for the server - it's something that's issued by a trusted third party and it's tied to the DNS name for the server in question. Clients don't tend to have stable DNS identity, and that makes the entire thing sort of awkward. But, thankfully, maybe we don't need to? We don't need the client to be able to prove its identity to arbitrary third party sites here - we just need the client to be able to prove it's a legitimate holder of whichever bearer token it's presenting to that site. And that's a much easier problem.

Here's the simple solution - clients generate a TLS cert. This can be self-signed, because all we want to do here is be able to verify whether the machine talking to us is the same one that had a token issued to it. The client contacts a service that's going to give it a bearer token. The service requests mTLS auth without being picky about the certificate that's presented. The service embeds a hash of that certificate in the token before handing it back to the client. Whenever the client presents that token to any other service, the service ensures that the mTLS cert the client presented matches the hash in the bearer token. Copy the token without copying the mTLS certificate and the token gets rejected. Hurrah hurrah hats for everyone.

Well except for the obvious problem that if you're in a position to exfiltrate the bearer tokens you can probably just steal the client certificates and keys as well, and now you can pretend to be the original client and this is not adding much additional security. Fortunately pretty much everything we care about has the ability to store the private half of an asymmetric key in hardware (TPMs on Linux and Windows systems, the Secure Enclave on Macs and iPhones, either a piece of magical hardware or Trustzone on Android) in a way that avoids anyone being able to just steal the key.

How do we know that the key is actually in hardware? Here's the fun bit - it doesn't matter. If you're issuing a bearer token to a system then you're already asserting that the system is trusted. If the system is lying to you about whether or not the key it's presenting is hardware-backed then you've already lost. If it lied and the system is later compromised then sure all your apes get stolen, but maybe don't run systems that lie and avoid that situation as a result?

Anyway. This is covered in RFC 8705 so why aren't we all doing this already? From the client side, the largest generic issue is that TPMs are astonishingly slow in comparison to doing a TLS handshake on the CPU. RSA signing operations on TPMs can take around half a second, which doesn't sound too bad, except your browser is probably establishing multiple TLS connections to subdomains on the site it's connecting to and performance is going to tank. Fixing this involves doing whatever's necessary to convince the browser to pipe everything over a single TLS connection, and that's just not really where the web is right at the moment. Using EC keys instead helps a lot (~0.1 seconds per signature on modern TPMs), but it's still going to be a bottleneck.

The other problem, of course, is that ecosystem support for hardware-backed certificates is just awful. Windows lets you stick them into the standard platform certificate store, but the docs for this are hidden in a random PDF in a Github repo. Macs require you to do some weird bridging between the Secure Enclave API and the keychain API. Linux? Well, the standard answer is to do PKCS#11, and I have literally never met anybody who likes PKCS#11 and I have spent a bunch of time in standards meetings with the sort of people you might expect to like PKCS#11 and even they don't like it. It turns out that loading a bunch of random C bullshit that has strong feelings about function pointers into your security critical process is not necessarily something that is going to improve your quality of life, so instead you should use something like this and just have enough C to bridge to a language that isn't secretly plotting to kill your pets the moment you turn your back.

And, uh, obviously none of this matters at all unless people actually support it. Github has no support at all for validating the identity of whoever holds a bearer token. Most issuers of bearer tokens have no support for embedding holder identity into the token. This is not good! As of last week, all three of the big cloud providers support virtualised TPMs in their VMs - we should be running CI on systems that can do that, and tying any issued tokens to the VMs that are supposed to be making use of them.

So sure this isn't trivial. But it's also not impossible, and making this stuff work would improve the security of, well, everything. We literally have the technology to prevent attacks like Github suffered. What do we have to do to get people to actually start working on implementing that?

comment count unavailable comments

Crosswords 0.3

Posted by Jonathan Blandford on May 14, 2022 07:33 PM

I’m pleased to announce Crosswords 0.3. This is the first version that feels ready for public consumption. Unlike the version I announced five months ago, it is much more robust and has some key new features.

New in this version:

  • Available on flathub: After working on it out of the source tree for a long while, I finally got the flatpaks working. Download it and try it out—let me know what you think! I’d really appreciate any sort of feedback, positive or negative.

  • Puzzle Downloaders: This is a big addition. We now support external downloaders and puzzle sets. This means it’s possible to have a tool that downloads puzzles from the internet. It also lets us ship sets of puzzles for the user to solve. With a good set of puzzles, we could even host them on gnome.org. These can be distributed externally, or they can be distributed via flatpak extensions (if not installing locally). I wrapped xword-dl and puzzlepull with a downloader to add some newspaper downloaders, and I’m looking to add more original puzzles shortly.

  • Dark mode: I thought that this was the neatest feature in GNOME 42. We now support both light and dark mode and honor the system setting. CSS is also heavily used to style the app allowing for future visual modifications and customizations. I’m interested in allowing  css customization on a per-puzzle set basis.

  • Hint Button: This is a late addition. It can be used to suggest a random word that fits in the current row. It’s not super convenient, but I also don’t want to make the game too easy! We use Peter Broda’s wordlist as the basis for this.
  • .puz support: Internally we use the unencumbered .ipuz format. This is a pretty decent format and supports a wide-variety of crossword features. But there’s a lot of crosswords out there that use the .puz format and a I know people have large collections of puzzles in that format. I wrote a simple convertor to load these.

Next steps

I hope to release this a bit more frequently now that we have gotten to this stage. Next on my immediate list:

  • Revisit clue formats; empirically crossword files in the wild play a little fast-and-loose with escaping and formatting (eg. random entities and underline-escaping).
  • Write a Puzzle Set manager that will let you decide which downloaders/puzzles to show, as well as launch gnome-software to search for more.
  • Document the external Puzzle Set format to allow others to package up games.
  • Fix any bugs that are found.

I also plan to work on the Crossword editor and get that ready for a release on flathub. The amazing-looking AdwEntryRow should make parts of the design a lot cleaner.

But, no game is good without being fun! I am looking to expand the list of puzzles available. Let me know if you write crosswords and want to include them in a puzzle set.

Thanks

I couldn’t have gotten this release out without lots of help. In particular:

  • Federico, for helping to  refactor the main code and lots of great advice
  • Nick, for explaining how to get apps into flathub
  • Alexander, for his help with getting toasts working with a new behavior
  • Parker, for patiently explaining to me how the world of Crosswords worked
  • The folks on #gtk for answering questions and producing an excellent toolkit
  • And most importantly Rosanna, for testing everything and for her consistent cheering and support

Download on FLATHUB

FMW is finished

Posted by Evzen Gasta on May 04, 2022 10:55 AM

As all good things have to come to an end, my bachelor thesis has to end someday also. But I’m still looking forward to contributing to FMW, when are any updates or fixes needed. Official FMW 5.0.0 will be released soon. This was my first experience with an open source project and I liked it very much. I’m looking forward to start working on new open source projects in the future.

This project gave me a lot of experience such as learning how to deploy Qt applications on various operating systems, and how the structure of applications looks. I have learned the new programming language QML and how to use CMake in open source project. Another big experience for me is understanding how GitHub CI works. This was used to automatically create builds for various systems. I have practiced many options and advantages of Git (from the beginning I have been using only “git pull” with “git push”).

As a result I was able to create new generation of FMW.

<figure class="wp-container-2 wp-block-gallery-1 wp-block-gallery has-nested-images columns-default is-cropped" data-carousel-extra="{"blog_id":200697502,"permalink":"https:\/\/egastablog.wordpress.com\/2022\/05\/04\/fmw-is-finished\/"}"> <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure> <figcaption class="blocks-gallery-caption">New generation of FMW</figcaption></figure>

This result couldn’t be achieved without my technical mentor Jan Grulich (his blog and twitter). And also without my leader of my bachelor’s thesis Dominika Regéciová (her blog and twitter). So I would like to thank you both, for the opportunity, patients and overall help.

Fitting Everything Together

Posted by Lennart Poettering on May 02, 2022 10:00 PM

TLDR: Hermetic /usr/ is awesome; let's popularize image-based OSes with modernized security properties built around immutability, SecureBoot, TPM2, adaptability, auto-updating, factory reset, uniformity – built from traditional distribution packages, but deployed via images.

Over the past years, systemd gained a number of components for building Linux-based operating systems. While these components individually have been adopted by many distributions and products for specific purposes, we did not publicly communicate a broader vision of how they should all fit together in the long run. In this blog story I hope to provide that from my personal perspective, i.e. explain how I personally would build an OS and where I personally think OS development with Linux should go.

I figure this is going to be a longer blog story, but I hope it will be equally enlightening. Please understand though that everything I write about OS design here is my personal opinion, and not one of my employer.

For the last 12 years or so I have been working on Linux OS development, mostly around systemd. In all those years I had a lot of time thinking about the Linux platform, and specifically traditional Linux distributions and their strengths and weaknesses. I have seen many attempts to reinvent Linux distributions in one way or another, to varying success. After all this most would probably agree that the traditional RPM or dpkg/apt-based distributions still define the Linux platform more than others (for 25+ years now), even though some Linux-based OSes (Android, ChromeOS) probably outnumber the installations overall.

And over all those 12 years I kept wondering, how would I actually build an OS for a system or for an appliance, and what are the components necessary to achieve that. And most importantly, how can we make these components generic enough so that they are useful in generic/traditional distributions too, and in other use cases than my own.

The Project

Before figuring out how I would build an OS it's probably good to figure out what type of OS I actually want to build, what purpose I intend to cover. I think a desktop OS is probably the most interesting. Why is that? Well, first of all, I use one of these for my job every single day, so I care immediately, it's my primary tool of work. But more importantly: I think building a desktop OS is one of the most complex overall OS projects you can work on, simply because desktops are so much more versatile and variable than servers or embedded devices. If one figures out the desktop case, I think there's a lot more to learn from, and reuse in the server or embedded case, then going the other way. After all, there's a reason why so much of the widely accepted Linux userspace stack comes from people with a desktop background (including systemd, BTW).

So, let's see how I would build a desktop OS. If you press me hard, and ask me why I would do that given that ChromeOS already exists and more or less is a Linux desktop OS: there's plenty I am missing in ChromeOS, but most importantly, I am lot more interested in building something people can easily and naturally rebuild and hack on, i.e. Google-style over-the-wall open source with its skewed power dynamic is not particularly attractive to me. I much prefer building this within the framework of a proper open source community, out in the open, and basing all this strongly on the status quo ante, i.e. the existing distributions. I think it is crucial to provide a clear avenue to build a modern OS based on the existing distribution model, if there shall ever be a chance to make this interesting for a larger audience.

(Let me underline though: even though I am going to focus on a desktop here, most of this is directly relevant for servers as well, in particular container host OSes and suchlike, or embedded devices, e.g. car IVI systems and so on.)

Design Goals

  1. First and foremost, I think the focus must be on an image-based design rather than a package-based one. For robustness and security it is essential to operate with reproducible, immutable images that describe the OS or large parts of it in full, rather than operating always with fine-grained RPM/dpkg style packages. That's not to say that packages are not relevant (I actually think they matter a lot!), but I think they should be less of a tool for deploying code but more one of building the objects to deploy. A different way to see this: any OS built like this must be easy to replicate in a large number of instances, with minimal variability. Regardless if we talk about desktops, servers or embedded devices: focus for my OS should be on "cattle", not "pets", i.e that from the start it's trivial to reuse the well-tested, cryptographically signed combination of software over a large set of devices the same way, with a maximum of bit-exact reuse and a minimum of local variances.

  2. The trust chain matters, from the boot loader all the way to the apps. This means all code that is run must be cryptographically validated before it is run. All storage must be cryptographically protected: public data must be integrity checked; private data must remain confidential.

    This is in fact where big distributions currently fail pretty badly. I would go as far as saying that SecureBoot on Linux distributions is mostly security theater at this point, if you so will. That's because the initrd that unlocks your FDE (i.e. the cryptographic concept that protects the rest of your system) is not signed or protected in any way. It's trivial to modify for an attacker with access to your hard disk in an undetectable way, and collect your FDE passphrase. The involved bureaucracy around the implementation of UEFI SecureBoot of the big distributions is to a large degree pointless if you ask me, given that once the kernel is assumed to be in a good state, as the next step the system invokes completely unsafe code with full privileges.

    This is a fault of current Linux distributions though, not of SecureBoot in general. Other OSes use this functionality in more useful ways, and we should correct that too.

  3. Pretty much the same thing: offline security matters. I want my data to be reasonably safe at rest, i.e. cryptographically inaccessible even when I leave my laptop in my hotel room, suspended.

  4. Everything should be cryptographically measured, so that remote attestation is supported for as much software shipped on the OS as possible.

  5. Everything should be self descriptive, have single sources of truths that are closely attached to the object itself, instead of stored externally.

  6. Everything should be self-updating. Today we know that software is never bug-free, and thus requires a continuous update cycle. Not only the OS itself, but also any extensions, services and apps running on it.

  7. Everything should be robust in respect to aborted OS operations, power loss and so on. It should be robust towards hosed OS updates (regardless if the download process failed, or the image was buggy), and not require user interaction to recover from them.

  8. There must always be a way to put the system back into a well-defined, guaranteed safe state ("factory reset"). This includes that all sensitive data from earlier uses becomes cryptographically inaccessible.

  9. The OS should enforce clear separation between vendor resources, system resources and user resources: conceptually and when it comes to cryptographical protection.

  10. Things should be adaptive: the system should come up and make the best of the system it runs on, adapt to the storage and hardware. Moreover, the system should support execution on bare metal equally well as execution in a VM environment and in a container environment (i.e. systemd-nspawn).

  11. Things should not require explicit installation. i.e. every image should be a live image. For installation it should be sufficient to dd an OS image onto disk. Thus, strong focus on "instantiate on first boot", rather than "instantiate before first boot".

  12. Things should be reasonably minimal. The image the system starts its life with should be quick to download, and not include resources that can as well be created locally later.

  13. System identity, local cryptographic keys and so on should be generated locally, not be pre-provisioned, so that there's no leak of sensitive data during the transport onto the system possible.

  14. Things should be reasonably democratic and hackable. It should be easy to fork an OS, to modify an OS and still get reasonable cryptographic protection. Modifying your OS should not necessarily imply that your "warranty is voided" and you lose all good properties of the OS, if you so will.

  15. Things should be reasonably modular. The privileged part of the core OS must be extensible, including on the individual system. It's not sufficient to support extensibility just through high-level UI applications.

  16. Things should be reasonably uniform, i.e. ideally the same formats and cryptographic properties are used for all components of the system, regardless if for the host OS itself or the payloads it receives and runs.

  17. Even taking all these goals into consideration, it should still be close to traditional Linux distributions, and take advantage of what they are really good at: integration and security update cycles.

Now that we know our goals and requirements, let's start designing the OS along these lines.

Hermetic /usr/

First of all the OS resources (code, data files, …) should be hermetic in an immutable /usr/. This means that a /usr/ tree should carry everything needed to set up the minimal set of directories and files outside of /usr/ to make the system work. This /usr/ tree can then be mounted read-only into the writable root file system that then will eventually carry the local configuration, state and user data in /etc/, /var/ and /home/ as usual.

Thankfully, modern distributions are surprisingly close to working without issues in such a hermetic context. Specifically, Fedora works mostly just fine: it has adopted the /usr/ merge and the declarative systemd-sysusers and systemd-tmpfiles components quite comprehensively, which means the directory trees outside of /usr/ are automatically generated as needed if missing. In particular /etc/passwd and /etc/group (and related files) are appropriately populated, should they be missing entries.

In my model a hermetic OS is hence comprehensively defined within /usr/: combine the /usr/ tree with an empty, otherwise unpopulated root file system, and it will boot up successfully, automatically adding the strictly necessary files, and resources that are necessary to boot up.

Monopolizing vendor OS resources and definitions in an immutable /usr/ opens multiple doors to us:

  • We can apply dm-verity to the whole /usr/ tree, i.e. guarantee structural, cryptographic integrity on the whole vendor OS resources at once, with full file system metadata.

  • We can implement updates to the OS easily: by implementing an A/B update scheme on the /usr/ tree we can update the OS resources atomically and robustly, while leaving the rest of the OS environment untouched.

  • We can implement factory reset easily: erase the root file system and reboot. The hermetic OS in /usr/ has all the information it needs to set up the root file system afresh — exactly like in a new installation.

Initial Look at the Partition Table

So let's have a look at a suitable partition table, taking a hermetic /usr/ into account. Let's conceptually start with a table of four entries:

  1. An UEFI System Partition (required by firmware to boot)

  2. Immutable, Verity-protected, signed file system with the /usr/ tree in version A

  3. Immutable, Verity-protected, signed file system with the /usr/ tree in version B

  4. A writable, encrypted root file system

(This is just for initial illustration here, as we'll see later it's going to be a bit more complex in the end.)

The Discoverable Partitions Specification provides suitable partition types UUIDs for all of the above partitions. Which is great, because it makes the image self-descriptive: simply by looking at the image's GPT table we know what to mount where. This means we do not need a manual /etc/fstab, and a multitude of tools such as systemd-nspawn and similar can operate directly on the disk image and boot it up.

Booting

Now that we have a rough idea how to organize the partition table, let's look a bit at how to boot into that. Specifically, in my model "unified kernels" are the way to go, specifically those implementing Boot Loader Specification Type #2. These are basically kernel images that have an initial RAM disk attached to them, as well as a kernel command line, a boot splash image and possibly more, all wrapped into a single UEFI PE binary. By combining these into one we achieve two goals: they become extremely easy to update (i.e. drop in one file, and you update kernel+initrd) and more importantly, you can sign them as one for the purpose of UEFI SecureBoot.

In my model, each version of such a kernel would be associated with exactly one version of the /usr/ tree: both are always updated at the same time. An update then becomes relatively simple: drop in one new /usr/ file system plus one kernel, and the update is complete.

The boot loader used for all this would be systemd-boot, of course. It's a very simple loader, and implements the aforementioned boot loader specification. This means it requires no explicit configuration or anything: it's entirely sufficient to drop in one such unified kernel file, and it will be picked up, and be made a candidate to boot into.

You might wonder how to configure the root file system to boot from with such a unified kernel that contains the kernel command line and is signed as a whole and thus immutable. The idea here is to use the usrhash= kernel command line option implemented by systemd-veritysetup-generator and systemd-fstab-generator. It does two things: it will search and set up a dm-verity volume for the /usr/ file system, and then mount it. It takes the root hash value of the dm-verity Merkle tree as the parameter. This hash is then also used to find the /usr/ partition in the GPT partition table, under the assumption that the partition UUIDs are derived from it, as per the suggestions in the discoverable partitions specification (see above).

systemd-boot (if not told otherwise) will do a version sort of the kernel image files it finds, and then automatically boot the newest one. Picking a specific kernel to boot will also fixate which version of the /usr/ tree to boot into, because — as mentioned — the Verity root hash of it is built into the kernel command line the unified kernel image contains.

In my model I'd place the kernels directly into the UEFI System Partition (ESP), in order to simplify things. (systemd-boot also supports reading them from a separate boot partition, but let's not complicate things needlessly, at least for now.)

So, with all this, we now already have a boot chain that goes something like this: once the boot loader is run, it will pick the newest kernel, which includes the initial RAM disk and a secure reference to the /usr/ file system to use. This is already great. But a /usr/ alone won't make us happy, we also need a root file system. In my model, that file system would be writable, and the /etc/ and /var/ hierarchies would be located directly on it. Since these trees potentially contain secrets (SSH keys, …) the root file system needs to be encrypted. We'll use LUKS2 for this, of course. In my model, I'd bind this to the TPM2 chip (for compatibility with systems lacking one, we can find a suitable fallback, which then provides weaker guarantees, see below). A TPM2 is a security chip available in most modern PCs. Among other things it contains a persistent secret key that can be used to encrypt data, in a way that only if you possess access to it and can prove you are using validated software you can decrypt it again. The cryptographic measuring I mentioned earlier is what allows this to work. But … let's not get lost too much in the details of TPM2 devices, that'd be material for a novel, and this blog story is going to be way too long already.

What does using a TPM2 bound key for unlocking the root file system get us? We can encrypt the root file system with it, and you can only read or make changes to the root file system if you also possess the TPM2 chip and run our validated version of the OS. This protects us against an evil maid scenario to some level: an attacker cannot just copy the hard disk of your laptop while you leave it in your hotel room, because unless the attacker also steals the TPM2 device it cannot be decrypted. The attacker can also not just modify the root file system, because such changes would be detected on next boot because they aren't done with the right cryptographic key.

So, now we have a system that already can boot up somewhat completely, and run userspace services. All code that is run is verified in some way: the /usr/ file system is Verity protected, and the root hash of it is included in the kernel that is signed via UEFI SecureBoot. And the root file system is locked to the TPM2 where the secret key is only accessible if our signed OS + /usr/ tree is used.

(One brief intermission here: so far all the components I am referencing here exist already, and have been shipped in systemd and other projects already, including the TPM2 based disk encryption. There's one thing missing here however at the moment that still needs to be developed (happy to take PRs!): right now TPM2 based LUKS2 unlocking is bound to PCR hash values. This is hard to work with when implementing updates — what we'd need instead is unlocking by signatures of PCR hashes. TPM2 supports this, but we don't support it yet in our systemd-cryptsetup + systemd-cryptenroll stack.)

One of the goals mentioned above is that cryptographic key material should always be generated locally on first boot, rather than pre-provisioned. This of course has implications for the encryption key of the root file system: if we want to boot into this system we need the root file system to exist, and thus a key already generated that it is encrypted with. But where precisely would we generate it if we have no installer which could generate while installing (as it is done in traditional Linux distribution installers). My proposed solution here is to use systemd-repart, which is a declarative, purely additive repartitioner. It can run from the initrd to create and format partitions on boot, before transitioning into the root file system. It can also format the partitions it creates and encrypt them, automatically enrolling an TPM2-bound key.

So, let's revisit the partition table we mentioned earlier. Here's what in my model we'd actually ship in the initial image:

  1. An UEFI System Partition (ESP)

  2. An immutable, Verity-protected, signed file system with the /usr/ tree in version A

And that's already it. No root file system, no B /usr/ partition, nothing else. Only two partitions are shipped: the ESP with the systemd-boot loader and one unified kernel image, and the A version of the /usr/ partition. Then, on first boot systemd-repart will notice that the root file system doesn't exist yet, and will create it, encrypt it, and format it, and enroll the key into the TPM2. It will also create the second /usr/ partition (B) that we'll need for later A/B updates (which will be created empty for now, until the first update operation actually takes place, see below). Once done the initrd will combine the fresh root file system with the shipped /usr/ tree, and transition into it. Because the OS is hermetic in /usr/ and contains all the systemd-tmpfiles and systemd-sysuser information it can then set up the root file system properly and create any directories and symlinks (and maybe a few files) necessary to operate.

Besides the fact that the root file system's encryption keys are generated on the system we boot from and never leave it, it is also pretty nice that the root file system will be sized dynamically, taking into account the physical size of the backing storage. This is perfect, because on first boot the image will automatically adapt to what it has been dd'ed onto.

Factory Reset

This is a good point to talk about the factory reset logic, i.e. the mechanism to place the system back into a known good state. This is important for two reasons: in our laptop use case, once you want to pass the laptop to someone else, you want to ensure your data is fully and comprehensively erased. Moreover, if you have reason to believe your device was hacked you want to revert the device to a known good state, i.e. ensure that exploits cannot persist. systemd-repart already has a mechanism for it. In the declarations of the partitions the system should have, entries may be marked to be candidates for erasing on factory reset. The actual factory reset is then requested by one of two means: by specifying a specific kernel command line option (which is not too interesting here, given we lock that down via UEFI SecureBoot; but then again, one could also add a second kernel to the ESP that is identical to the first, with only different that it lists this command line option: thus when the user selects this entry it will initiate a factory reset) — and via an EFI variable that can be set and is honoured on the immediately following boot. So here's how a factory reset would then go down: once the factory reset is requested it's enough to reboot. On the subsequent boot systemd-repart runs from the initrd, where it will honour the request and erase the partitions marked for erasing. Once that is complete the system is back in the state we shipped the system in: only the ESP and the /usr/ file system will exist, but the root file system is gone. And from here we can continue as on the original first boot: create a new root file system (and any other partitions), and encrypt/set it up afresh.

So now we have a nice setup, where everything is either signed or encrypted securely. The system can adapt to the system it is booted on automatically on first boot, and can easily be brought back into a well defined state identical to the way it was shipped in.

Modularity

But of course, such a monolithic, immutable system is only useful for very specific purposes. If /usr/ can't be written to, – at least in the traditional sense – one cannot just go and install a new software package that one needs. So here two goals are superficially conflicting: on one hand one wants modularity, i.e. the ability to add components to the system, and on the other immutability, i.e. that precisely this is prohibited.

So let's see what I propose as a middle ground in my model. First, what's the precise use case for such modularity? I see a couple of different ones:

  1. For some cases it is necessary to extend the system itself at the lowest level, so that the components added in extend (or maybe even replace) the resources shipped in the base OS image, so that they live in the same namespace, and are subject to the same security restrictions and privileges. Exposure to the details of the base OS and its interface for this kind of modularity is at the maximum.

    Example: a module that adds a debugger or tracing tools into the system. Or maybe an optional hardware driver module.

  2. In other cases, more isolation is preferable: instead of extending the system resources directly, additional services shall be added in that bring their own files, can live in their own namespace (but with "windows" into the host namespaces), however still are system components, and provide services to other programs, whether local or remote. Exposure to the details of the base OS for this kind of modularity is restricted: it mostly focuses on the ability to consume and provide IPC APIs from/to the system. Components of this type can still be highly privileged, but the level of integration is substantially smaller than for the type explained above.

    Example: a module that adds a specific VPN connection service to the OS.

  3. Finally, there's the actual payload of the OS. This stuff is relatively isolated from the OS and definitely from each other. It mostly consumes OS APIs, and generally doesn't provide OS APIs. This kind of stuff runs with minimal privileges, and in its own namespace of concepts.

    Example: a desktop app, for reading your emails.

Of course, the lines between these three types of modules are blurry, but I think distinguishing them does make sense, as I think different mechanisms are appropriate for each. So here's what I'd propose in my model to use for this.

  1. For the system extension case I think the systemd-sysext images are appropriate. This tool operates on system extension images that are very similar to the host's disk image: they also contain a /usr/ partition, protected by Verity. However, they just include additions to the host image: binaries that extend the host. When such a system extension image is activated, it is merged via an immutable overlayfs mount into the host's /usr/ tree. Thus any file shipped in such a system extension will suddenly appear as if it was part of the host OS itself. For optional components that should be considered part of the OS more or less this is a very simple and powerful way to combine an immutable OS with an immutable extension. Note that most likely extensions for an OS matching this tool should be built at the same time within the same update cycle scheme as the host OS itself. After all, the files included in the extensions will have dependencies on files in the system OS image, and care must be taken that these dependencies remain in order.

  2. For adding in additional somewhat isolated system services in my model, Portable Services are the proposed tool of choice. Portable services are in most ways just like regular system services; they could be included in the system OS image or an extension image. However, portable services use RootImage= to run off separate disk images, thus within their own namespace. Images set up this way have various ways to integrate into the host OS, as they are in most ways regular system services, which just happen to bring their own directory tree. Also, unlike regular system services, for them sandboxing is opt-out rather than opt-in. In my model, here too the disk images are Verity protected and thus immutable. Just like the host OS they are GPT disk images that come with a /usr/ partition and Verity data, along with signing.

  3. Finally, the actual payload of the OS, i.e. the apps. To be useful in real life here it is important to hook into existing ecosystems, so that a large set of apps are available. Given that on Linux flatpak (or on servers OCI containers) are the established format that pretty much won they are probably the way to go. That said, I think both of these mechanisms have relatively weak properties, in particular when it comes to security, since immutability/measurements and similar are not provided. This means, unlike for system extensions and portable services a complete trust chain with attestation and per-app cryptographically protected data is much harder to implement sanely.

What I'd like to underline here is that the main system OS image, as well as the system extension images and the portable service images are put together the same way: they are GPT disk images, with one immutable file system and associated Verity data. The latter two should also contain a PKCS#7 signature for the top-level Verity hash. This uniformity has many benefits: you can use the same tools to build and process these images, but most importantly: by using a single way to validate them throughout the stack (i.e. Verity, in the latter cases with PKCS#7 signatures), validation and measurement is straightforward. In fact it's so obvious that we don't even have to implement it in systemd: the kernel has direct support for this Verity signature checking natively already (IMA).

So, by composing a system at runtime from a host image, extension images and portable service images we have a nicely modular system where every single component is cryptographically validated on every single IO operation, and every component is measured, in its entire combination, directly in the kernel's IMA subsystem.

(Of course, once you add the desktop apps or OCI containers on top, then these properties are lost further down the chain. But well, a lot is already won, if you can close the chain that far down.)

Note that system extensions are not designed to replicate the fine grained packaging logic of RPM/dpkg. Of course, systemd-sysext is a generic tool, so you can use it for whatever you want, but there's a reason it does not bring support for a dependency language: the goal here is not to replicate traditional Linux packaging (we have that already, in RPM/dpkg, and I think they are actually OK for what they do) but to provide delivery of larger, coarser sets of functionality, in lockstep with the underlying OS' life-cycle and in particular with no interdependencies, except on the underlying OS.

Also note that depending on the use case it might make sense to also use system extensions to modularize the initrd step. This is probably less relevant for a desktop OS, but for server systems it might make sense to package up support for specific complex storage in a systemd-sysext system extension, which can be applied to the initrd that is built into the unified kernel. (In fact, we have been working on implementing signed yet modular initrd support to general purpose Fedora this way.)

Note that portable services are composable from system extension too, by the way. This makes them even more useful, as you can share a common runtime between multiple portable service, or even use the host image as common runtime for portable services. In this model a common runtime image is shared between one or more system extensions, and composed at runtime via an overlayfs instance.

More Modularity: Secondary OS Installs

Having an immutable, cryptographically locked down host OS is great I think, and if we have some moderate modularity on top, that's also great. But oftentimes it's useful to be able to depart/compromise for some specific use cases from that, i.e. provide a bridge for example to allow workloads designed around RPM/dpkg package management to coexist reasonably nicely with such an immutable host.

For this purpose in my model I'd propose using systemd-nspawn containers. The containers are focused on OS containerization, i.e. they allow you to run a full OS with init system and everything as payload (unlike for example Docker containers which focus on a single service, and where running a full OS in it is a mess).

Running systemd-nspawn containers for such secondary OS installs has various nice properties. One of course is that systemd-nspawn supports the same level of cryptographic image validation that we rely on for the host itself. Thus, to some level the whole OS trust chain is reasonably recursive if desired: the firmware validates the OS, and the OS can validate a secondary OS installed within it. In fact, we can run our trusted OS recursively on itself and get similar security guarantees! Besides these security aspects, systemd-nspawn also has really nice properties when it comes to integration with the host. For example the --bind-user= permits binding a host user record and their directory into a container as a simple one step operation. This makes it extremely easy to have a single user and $HOME but share it concurrently with the host and a zoo of secondary OSes in systemd-nspawn containers, which each could run different distributions even.

Developer Mode

Superficially, an OS with an immutable /usr/ appears much less hackable than an OS where everything is writable. Moreover, an OS where everything must be signed and cryptographically validated makes it hard to insert your own code, given you are unlikely to possess access to the signing keys.

To address this issue other systems have supported a "developer" mode: when entered the security guarantees are disabled, and the system can be freely modified, without cryptographic validation. While that's a great concept to have I doubt it's what most developers really want: the cryptographic properties of the OS are great after all, it sucks having to give them up once developer mode is activated.

In my model I'd thus propose two different approaches to this problem. First of all, I think there's value in allowing users to additively extend/override the OS via local developer system extensions. With this scheme the underlying cryptographic validation would remain in tact, but — if this form of development mode is explicitly enabled – the developer could add in more resources from local storage, that are not tied to the OS builder's chain of trust, but a local one (i.e. simply backed by encrypted storage of some form).

The second approach is to make it easy to extend (or in fact replace) the set of trusted validation keys, with local ones that are under the control of the user, in order to make it easy to operate with kernel, OS, extension, portable service or container images signed by the local developer without involvement of the OS builder. This is relatively easy to do for components down the trust chain, i.e. the elements further up the chain should optionally allow additional certificates to allow validation with.

(Note that systemd currently has no explicit support for a "developer" mode like this. I think we should add that sooner or later however.)

Democratizing Code Signing

Closely related to the question of developer mode is the question of code signing. If you ask me, the status quo of UEFI SecureBoot code signing in the major Linux distributions is pretty sad. The work to get stuff signed is massive, but in effect it delivers very little in return: because initrds are entirely unprotected, and reside on partitions lacking any form of cryptographic integrity protection any attacker can trivially easily modify the boot process of any such Linux system and freely collected FDE passphrases entered. There's little value in signing the boot loader and kernel in a complex bureaucracy if it then happily loads entirely unprotected code that processes the actually relevant security credentials: the FDE keys.

In my model, through use of unified kernels this important gap is closed, hence UEFI SecureBoot code signing becomes an integral part of the boot chain from firmware to the host OS. Unfortunately, code signing – and having something a user can locally hack, is to some level conflicting. However, I think we can improve the situation here, and put more emphasis on enrolling developer keys in the trust chain easily. Specifically, I see one relevant approach here: enrolling keys directly in the firmware is something that we should make less of a theoretical exercise and more something we can realistically deploy. See this work in progress making this more automatic and eventually safe. Other approaches are thinkable (including some that build on existing MokManager infrastructure), but given the politics involved, are harder to conclusively implement.

Running the OS itself in a container

What I explain above is put together with running on a bare metal system in mind. However, one of the stated goals is to make the OS adaptive enough to also run in a container environment (specifically: systemd-nspawn) nicely. Booting a disk image on bare metal or in a VM generally means that the UEFI firmware validates and invokes the boot loader, and the boot loader invokes the kernel which then transitions into the final system. This is different for containers: here the container manager immediately calls the init system, i.e. PID 1. Thus the validation logic must be different: cryptographic validation must be done by the container manager. In my model this is solved by shipping the OS image not only with a Verity data partition (as is already necessary for the UEFI SecureBoot trust chain, see above), but also with another partition, containing a PKCS#7 signature of the root hash of said Verity partition. This of course is exactly what I propose for both the system extension and portable service image. Thus, in my model the images for all three uses are put together the same way: an immutable /usr/ partition, accompanied by a Verity partition and a PKCS#7 signature partition. The OS image itself then has two ways "into" the trust chain: either through the signed unified kernel in the ESP (which is used for bare metal and VM boots) or by using the PKCS#7 signature stored in the partition (which is used for container/systemd-nspawn boots).

Parameterizing Kernels

A fully immutable and signed OS has to establish trust in the user data it makes use of before doing so. In the model I describe here, for /etc/ and /var/ we do this via disk encryption of the root file system (in combination with integrity checking). But the point where the root file system is mounted comes relatively late in the boot process, and thus cannot be used to parameterize the boot itself. In many cases it's important to be able to parameterize the boot process however.

For example, for the implementation of the developer mode indicated above it's useful to be able to pass this fact safely to the initrd, in combination with other fields (e.g. hashed root password for allowing in-initrd logins for debug purposes). After all, if the initrd is pre-built by the vendor and signed as whole together with the kernel it cannot be modified to carry such data directly (which is in fact how parameterizing of the initrd to a large degree was traditionally done).

In my model this is achieved through system credentials, which allow passing parameters to systems (and services for the matter) in an encrypted and authenticated fashion, bound to the TPM2 chip. This means that we can securely pass data into the initrd so that it can be authenticated and decrypted only on the system it is intended for and with the unified kernel image it was intended for.

Swap

In my model the OS would also carry a swap partition. For the simple reason that only then systemd-oomd.service can provide the best results. Also see In defence of swap: common misconceptions

Updating Images

We have a rough idea how the system shall be organized now, let's next focus on the deployment cycle: software needs regular update cycles, and software that is not updated regularly is a security problem. Thus, I am sure that any modern system must be automatically updated, without this requiring avoidable user interaction.

In my model, this is the job for systemd-sysupdate. It's a relatively simple A/B image updater: it operates either on partitions, on regular files in a directory, or on subdirectories in a directory. Each entry has a version (which is encoded in the GPT partition label for partitions, and in the filename for regular files and directories): whenever an update is initiated the oldest version is erased, and the newest version is downloaded.

With the setup described above a system update becomes a really simple operation. On each update the systemd-sysupdate tool downloads a /usr/ file system partition, an accompanying Verity partition, a PKCS#7 signature partition, and drops it into the host's partition table (where it possibly replaces the oldest version so far stored there). Then it downloads a unified kernel image and drops it into the EFI System Partition's /EFI/Linux (as per Boot Loader Specification; possibly erase the oldest such file there). And that's already the whole update process: four files are downloaded from the server, unpacked and put in the most straightforward of ways into the partition table or file system. Unlike in other OS designs there's no mechanism required to explicitly switch to the newer version, the aforementioned systemd-boot logic will automatically pick the newest kernel once it is dropped in.

Above we talked a lot about modularity, and how to put systems together as a combination of a host OS image, system extension images for the initrd and the host, portable service images and systemd-nspawn container images. I already emphasized that these image files are actually always the same: GPT disk images with partition definitions that match the Discoverable Partition Specification. This comes very handy when thinking about updating: we can use the exact same systemd-sysupdate tool for updating these other images as we use for the host image. The uniformity of the on-disk format allows us to update them uniformly too.

Boot Counting + Assessment

Automatic OS updates do not come without risks: if they happen automatically, and an update goes wrong this might mean your system might be automatically updated into a brick. This of course is less than ideal. Hence it is essential to address this reasonably automatically. In my model, there's systemd's Automatic Boot Assessment for that. The mechanism is simple: whenever a new unified kernel image is dropped into the system it will be stored with a small integer counter value included in the filename. Whenever the unified kernel image is selected for booting by systemd-boot, it is decreased by one. Once the system booted up successfully (which is determined by userspace) the counter is removed from the file name (which indicates "this entry is known to work"). If the counter ever hits zero, this indicates that it tried to boot it a couple of times, and each time failed, thus is apparently "bad". In this case systemd-boot will not consider the kernel anymore, and revert to the next older (that doesn't have a counter of zero).

By sticking the boot counter into the filename of the unified kernel we can directly attach this information to the kernel, and thus need not concern ourselves with cleaning up secondary information about the kernel when the kernel is removed. Updating with a tool like systemd-sysupdate remains a very simple operation hence: drop one old file, add one new file.

Picking the Newest Version

I already mentioned that systemd-boot automatically picks the newest unified kernel image to boot, by looking at the version encoded in the filename. This is done via a simple strverscmp() call (well, truth be told, it's a modified version of that call, different from the one implemented in libc, because real-life package managers use more complex rules for comparing versions these days, and hence it made sense to do that here too). The concept of having multiple entries of some resource in a directory, and picking the newest one automatically is a powerful concept, I think. It means adding/removing new versions is extremely easy (as we discussed above, in systemd-sysupdate context), and allows stateless determination of what to use.

If systemd-boot can do that, what about system extension images, portable service images, or systemd-nspawn container images that do not actually use systemd-boot as the entrypoint? All these tools actually implement the very same logic, but on the partition level: if multiple suitable /usr/ partitions exist, then the newest is determined by comparing the GPT partition label of them.

This is in a way the counterpart to the systemd-sysupdate update logic described above: we always need a way to determine which partition to actually then use after the update took place: and this becomes very easy each time: enumerate possible entries, pick the newest as per the (modified) strverscmp() result.

Home Directory Management

In my model the device's users and their home directories are managed by systemd-homed. This means they are relatively self-contained and can be migrated easily between devices. The numeric UID assignment for each user is done at the moment of login only, and the files in the home directory are mapped as needed via a uidmap mount. It also allows us to protect the data of each user individually with a credential that belongs to the user itself. i.e. instead of binding confidentiality of the user's data to the system-wide full-disk-encryption each user gets their own encrypted home directory where the user's authentication token (password, FIDO2 token, PKCS#11 token, recovery key…) is used as authentication and decryption key for the user's data. This brings a major improvement for security as it means the user's data is cryptographically inaccessible except when the user is actually logged in.

It also allows us to correct another major issue with traditional Linux systems: the way how data encryption works during system suspend. Traditionally on Linux the disk encryption credentials (e.g. LUKS passphrase) is kept in memory also when the system is suspended. This is a bad choice for security, since many (most?) of us probably never turn off their laptop but suspend it instead. But if the decryption key is always present in unencrypted form during the suspended time, then it could potentially be read from there by a sufficiently equipped attacker.

By encrypting the user's home directory with the user's authentication token we can first safely "suspend" the home directory before going to the system suspend state (i.e. flush out the cryptographic keys needed to access it). This means any process currently accessing the home directory will be frozen for the time of the suspend, but that's expected anyway during a system suspend cycle. Why is this better than the status quo ante? In this model the home directory's cryptographic key material is erased during suspend, but it can be safely reacquired on resume, from system code. If the system is only encrypted as a whole however, then the system code itself couldn't reauthenticate the user, because it would be frozen too. By separating home directory encryption from the root file system encryption we can avoid this problem.

Partition Setup

So we discussed the organization of the partitions OS images multiple times in the above, each time focusing on a specific aspect. Let's now summarize how this should look like all together.

In my model, the initial, shipped OS image should look roughly like this:

  • (1) An UEFI System Partition, with systemd-boot as boot loader and one unified kernel
  • (2) A /usr/ partition (version "A"), with a label fooOS_0.7 (under the assumption we called our project fooOS and the image version is 0.7).
  • (3) A Verity partition for the /usr/ partition (version "A"), with the same label
  • (4) A partition carrying the Verity root hash for the /usr/ partition (version "A"), along with a PKCS#7 signature of it, also with the same label

On first boot this is augmented by systemd-repart like this:

  • (5) A second /usr/ partition (version "B"), initially with a label _empty (which is the label systemd-sysupdate uses to mark partitions that currently carry no valid payload)
  • (6) A Verity partition for that (version "B"), similar to the above case, also labelled _empty
  • (7) And ditto a Verity root hash partition with a PKCS#7 signature (version "B"), also labelled _empty
  • (8) A root file system, encrypted and locked to the TPM2
  • (9) A home file system, integrity protected via a key also in TPM2 (encryption is unnecessary, since systemd-homed adds that on its own, and it's nice to avoid duplicate encryption)
  • (10) A swap partition, encrypted and locked to the TPM2

Then, on the first OS update the partitions 5, 6, 7 are filled with a new version of the OS (let's say 0.8) and thus get their label updated to fooOS_0.8. After a boot, this version is active.

On a subsequent update the three partitions fooOS_0.7 get wiped and replaced by fooOS_0.9 and so on.

On factory reset, the partitions 8, 9, 10 are deleted, so that systemd-repart recreates them, using a new set of cryptographic keys.

Here's a graphic that hopefully illustrates the partition stable from shipped image, through first boot, multiple update cycles and eventual factory reset:

Partitions Overview

Trust Chain

So let's summarize the intended chain of trust (for bare metal/VM boots) that ensures every piece of code in this model is signed and validated, and any system secret is locked to TPM2.

  1. First, firmware (or possibly shim) authenticates systemd-boot.

  2. Once systemd-boot picks a unified kernel image to boot, it is also authenticated by firmware/shim.

  3. The unified kernel image contains an initrd, which is the first userspace component that runs. It finds any system extensions passed into the initrd, and sets them up through Verity. The kernel will validate the Verity root hash signature of these system extension images against its usual keyring.

  4. The initrd also finds credentials passed in, then securely unlocks (which means: decrypts + authenticates) them with a secret from the TPM2 chip, locked to the kernel image itself.

  5. The kernel image also contains a kernel command line which contains a usrhash= option that pins the root hash of the /usr/ partition to use.

  6. The initrd then unlocks the encrypted root file system, with a secret bound to the TPM2 chip.

  7. The system then transitions into the main system, i.e. the combination of the Verity protected /usr/ and the encrypted root files system. It then activates two more encrypted (and/or integrity protected) volumes for /home/ and swap, also with a secret tied to the TPM2 chip.

Here's an attempt to illustrate the above graphically:

Trust Chain

This is the trust chain of the basic OS. Validation of system extension images, portable service images, systemd-nspawn container images always takes place the same way: the kernel validates these Verity images along with their PKCS#7 signatures against the kernel's keyring.

File System Choice

In the above I left the choice of file systems unspecified. For the immutable /usr/ partitions squashfs might be a good candidate, but any other that works nicely in a read-only fashion and generates reproducible results is a good choice, too. The home directories as managed by systemd-homed should certainly use btrfs, because it's the only general purpose file system supporting online grow and shrink, which systemd-homed can take benefit of, to manage storage.

For the root file system btrfs is likely also the best idea. That's because we intend to use LUKS/dm-crypt underneath, which by default only provides confidentiality, not authenticity of the data (unless combined with dm-integrity). Since btrfs (unlike xfs/ext4) does full data checksumming it's probably the best choice here, since it means we don't have to use dm-integrity (which comes at a higher performance cost).

OS Installation vs. OS Instantiation

In the discussion above a lot of focus was put on setting up the OS and completing the partition layout and such on first boot. This means installing the OS becomes as simple as dd-ing (i.e. "streaming") the shipped disk image into the final HDD medium. Simple, isn't it?

Of course, such a scheme is just too simple for many setups in real life. Whenever multi-boot is required (i.e. co-installing an OS implementing this model with another unrelated one), dd-ing a disk image onto the HDD is going to overwrite user data that was supposed to be kept around.

In order to cover for this case, in my model, we'd use systemd-repart (again!) to allow streaming the source disk image into the target HDD in a smarter, additive way. The tool after all is purely additive: it will add in partitions or grow them if they are missing or too small. systemd-repart already has all the necessary provisions to not only create a partition on the target disk, but also copy blocks from a raw installer disk. An install operation would then become a two stop process: one invocation of systemd-repart that adds in the /usr/, its Verity and the signature partition to the target medium, populated with a copy of the same partition of the installer medium. And one invocation of bootctl that installs the systemd-boot boot loader in the ESP. (Well, there's one thing missing here: the unified OS kernel also needs to be dropped into the ESP. For now, this can be done with a simple cp call. In the long run, this should probably be something bootctl can do as well, if told so.)

So, with this scheme we have a simple scheme to cover all bases: we can either just dd an image to disk, or we can stream an image onto an existing HDD, adding a couple of new partitions and files to the ESP.

Of course, in reality things are more complex than that even: there's a good chance that the existing ESP is simply too small to carry multiple unified kernels. In my model, the way to address this is by shipping two slightly different systemd-repart partition definition file sets: the ideal case when the ESP is large enough, and a fallback case, where it isn't and where we then add in an addition XBOOTLDR partition (as per the Discoverable Partitions Specification). In that mode the ESP carries the boot loader, but the unified kernels are stored in the XBOOTLDR partition. This scenario is not quite as simple as the XBOOTLDR-less scenario described first, but is equally well supported in the various tools. Note that systemd-repart can be told size constraints on the partitions it shall create or augment, thus to implement this scheme it's enough to invoke the tool with the fallback partition scheme if invocation with the ideal scheme fails.

Either way: regardless how the partitions, the boot loader and the unified kernels ended up on the system's hard disk, on first boot the code paths are the same again: systemd-repart will be called to augment the partition table with the root file system, and properly encrypt it, as was already discussed earlier here. This means: all cryptographic key material used for disk encryption is generated on first boot only, the installer phase does not encrypt anything.

Live Systems vs. Installer Systems vs. Installed Systems

Traditionally on Linux three types of systems were common: "installed" systems, i.e. that are stored on the main storage of the device and are the primary place people spend their time in; "installer" systems which are used to install them and whose job is to copy and setup the packages that make up the installed system; and "live" systems, which were a middle ground: a system that behaves like an installed system in most ways, but lives on removable media.

In my model I'd like to remove the distinction between these three concepts as much as possible: each of these three images should carry the exact same /usr/ file system, and should be suitable to be replicated the same way. Once installed the resulting image can also act as an installer for another system, and so on, creating a certain "viral" effect: if you have one image or installation it's automatically something you can replicate 1:1 with a simple systemd-repart invocation.

Building Images According to this Model

The above explains how the image should look like and how its first boot and update cycle will modify it. But this leaves one question unanswered: how to actually build the initial image for OS instances according to this model?

Note that there's nothing too special about the images following this model: they are ultimately just GPT disk images with Linux file systems, following the Discoverable Partition Specification. This means you can use any set of tools of your choice that can put together GPT disk images for compliant images.

I personally would use mkosi for this purpose though. It's designed to generate compliant images, and has a rich toolset for SecureBoot and signed/Verity file systems already in place.

What is key here is that this model doesn't depart from RPM and dpkg, instead it builds on top of that: in this model they are excellent for putting together images on the build host, but deployment onto the runtime host does not involve individual packages.

I think one cannot underestimate the value traditional distributions bring, regarding security, integration and general polishing. The concepts I describe above are inherited from this, but depart from the idea that distribution packages are a runtime concept and make it a build-time concept instead.

Note that the above is pretty much independent from the underlying distribution.

Final Words

I have no illusions, general purpose distributions are not going to adopt this model as their default any time soon, and it's not even my goal that they do that. The above is my personal vision, and I don't expect people to buy into it 100%, and that's fine. However, what I am interested in is finding the overlaps, i.e. work with people who buy 50% into this vision, and share the components.

My goals here thus are to:

  1. Get distributions to move to a model where images like this can be built from the distribution easily. Specifically this means that distributions make their OS hermetic in /usr/.

  2. Find the overlaps, share components with other projects to revisit how distributions are put together. This is already happening, see systemd-tmpfiles and systemd-sysuser support in various distributions, but I think there's more to share.

  3. Make people interested in building actual real-world images based on general purpose distributions adhering to the model described above. I'd love a "GnomeBook" image with full trust properties, that is built from true Linux distros, such as Fedora or ArchLinux.

FAQ

  1. What about ostree? Doesn't ostree already deliver what this blog story describes?

    ostree is fine technology, but in respect to security and robustness properties it's not too interesting I think, because unlike image-based approaches it cannot really deliver integrity/robustness guarantees over the whole tree easily. To be able to trust an ostree setup you have to establish trust in the underlying file system first, and the complexity of the file system makes that challenging. To provide an effective offline-secure trust chain through the whole depth of the stack it is essential to cryptographically validate every single I/O operation. In an image-based model this is trivially easy, but in ostree model it's with current file system technology not possible and even if this is added in one way or another in the future (though I am not aware of anyone doing on-access file-based integrity that spans a whole hierarchy of files that was compatible with ostree's hardlink farm model) I think validation is still at too high a level, since Linux file system developers made very clear their implementations are not robust to rogue images. (There's this stuff planned, but doing structural authentication ahead of time instead of on access makes the idea to weak — and I'd expect too slow — in my eyes.)

    With my design I want to deliver similar security guarantees as ChromeOS does, but ostree is much weaker there, and I see no perspective of this changing. In a way ostree's integrity checks are similar to RPM's and enforced on download rather than on access. In the model I suggest above, it's always on access, and thus safe towards offline attacks (i.e. evil maid attacks). In today's world, I think offline security is absolutely necessary though.

    That said, ostree does have some benefits over the model described above: it naturally shares file system inodes if many of the modules/images involved share the same data. It's thus more space efficient on disk (and thus also in RAM/cache to some degree) by default. In my model it would be up to the image builders to minimize shipping overly redundant disk images, by making good use of suitably composable system extensions.

  2. What about configuration management?

    At first glance immutable systems and configuration management don't go that well together. However, do note, that in the model I propose above the root file system with all its contents, including /etc/ and /var/ is actually writable and can be modified like on any other typical Linux distribution. The only exception is /usr/ where the immutable OS is hermetic. That means configuration management tools should work just fine in this model – up to the point where they are used to install additional RPM/dpkg packages, because that's something not allowed in the model above: packages need to be installed at image build time and thus on the image build host, not the runtime host.

  3. What about non-UEFI and non-TPM2 systems?

    The above is designed around the feature set of contemporary PCs, and this means UEFI and TPM2 being available (simply because the PC is pretty much defined by the Windows platform, and current versions of Windows require both).

    I think it's important to make the best of the features of today's PC hardware, and then find suitable fallbacks on more limited hardware. Specifically this means: if there's desire to implement something like the this on non-UEFI or non-TPM2 hardware we should look for suitable fallbacks for the individual functionality, but generally try to add glue to the old systems so that conceptually they behave more like the new systems instead of the other way round. Or in other words: most of the above is not strictly tied to UEFI or TPM2, and for many cases already there are reasonably fallbacks in place for more limited systems. Of course, without TPM2 many of the security guarantees will be weakened.

  4. How would you name an OS built that way?

    I think a desktop OS built this way if it has the GNOME desktop should of course be called GnomeBook, to mimic the ChromeBook name. ;-)

    But in general, I'd call hermetic, adaptive, immutable OSes like this "particles".

How can you help?

  1. Help making Distributions Hermetic in /usr/!

    One of the core ideas of the approach described above is to make the OS hermetic in /usr/, i.e. make it carry a comprehensive description of what needs to be set up outside of it when instantiated. Specifically, this means that system users that are needed are declared in systemd-sysusers snippets, and skeleton files and directories are created via systemd-tmpfiles. Moreover additional partitions should be declared via systemd-repart drop-ins.

    At this point some distributions (such as Fedora) are (probably more by accident than on purpose) already mostly hermetic in /usr/, at least for the most basic parts of the OS. However, this is not complete: many daemons require to have specific resources set up in /var/ or /etc/ before they can work, and the relevant packages do not carry systemd-tmpfiles descriptions that add them if missing. So there are two ways you could help here: politically, it would be highly relevant to convince distributions that an OS that is hermetic in /usr/ is highly desirable and it's a worthy goal for packagers to get there. More specifically, it would be desirable if RPM/dpkg packages would ship with enough systemd-tmpfiles information so that configuration files the packages strictly need for operation are symlinked (or copied) from /usr/share/factory/ if they are missing (even better of course would be if packages from their upstream sources on would just work with an empty /etc/ and /var/, and create themselves what they need and default to good defaults in absence of configuration files).

    Note that distributions that adopted systemd-sysusers, systemd-tmpfiles and the /usr/ merge are already quite close to providing an OS that is hermetic in /usr/. These were the big, the major advancements: making the image fully hermetic should be less controversial – at least that's my guess.

    Also note that making the OS hermetic in /usr/ is not just useful in scenarios like the above. It also means that stuff like this and like this can work well.

  2. Fill in the gaps!

    I already mentioned a couple of missing bits and pieces in the implementation of the overall vision. In the systemd project we'd be delighted to review/merge any PRs that fill in the voids.

  3. Build your own OS like this!

    Of course, while we built all these building blocks and they have been adopted to various levels and various purposes in the various distributions, no one so far built an OS that puts things together just like that. It would be excellent if we had communities that work on building images like what I propose above. i.e. if you want to work on making a secure GnomeBook as I suggest above a reality that would be more than welcome.

    How could this look like specifically? Pick an existing distribution, write a set of mkosi descriptions plus some additional drop-in files, and then build this on some build infrastructure. While doing so, report the gaps, and help us address them.

Further Documentation of Used Components and Concepts

  1. systemd-tmpfiles
  2. systemd-sysusers
  3. systemd-boot
  4. systemd-stub
  5. systemd-sysext
  6. systemd-portabled, Portable Services Introduction
  7. systemd-repart
  8. systemd-nspawn
  9. systemd-sysupdate
  10. systemd-creds, System and Service Credentials
  11. systemd-homed
  12. Automatic Boot Assessment
  13. Boot Loader Specification
  14. Discoverable Partitions Specification
  15. Safely Building Images

Earlier Blog Stories Related to this Topic

  1. The Strange State of Authenticated Boot and Disk Encryption on Generic Linux Distributions
  2. The Wondrous World of Discoverable GPT Disk Images
  3. Unlocking LUKS2 volumes with TPM2, FIDO2, PKCS#11 Security Hardware on systemd 248
  4. Portable Services with systemd v239
  5. mkosi — A Tool for Generating OS Images

And that's all for now.

threads and libxcb, part 2

Posted by Adam Jackson on April 29, 2022 04:35 PM

I've been working on kopper recently, which is a complementary project to zink. Just as zink implements OpenGL in terms of Vulkan, kopper seeks to implement the GL window system bindings - like EGL and GLX - in terms of the Vulkan WSI extensions. There are several benefits to doing this, which I'll get into in a future post, but today's story is really about libX11 and libxcb.

Yes, again.

One important GLX feature is the ability to set the swap interval, which is how you get tear-free rendering by syncing buffer swaps to the vertical retrace. A swap interval of 1 is the typical case, where an image update happens once per frame. The Vulkan way to do this is to set the swapchain present mode to FIFO, since FIFO updates are implicitly synced to vblank. Mesa's WSI code for X11 uses a swapchain management thread for FIFO present modes. This thread is started from inside the vulkan driver, and it only uses libxcb to talk to the X server. But libGL is a libX11 client library, so in this scenario there is always an "xlib thread" as well.

libX11 uses libxcb internally these days, because otherwise there would be no way to intermix xlib and xcb calls in the same process. But it does not use libxcb's reflection of the protocol, XGetGeometry does not call xcb_get_geometry for example. Instead, libxcb has an API to allow other code to take over the write side of the display socket, with a callback mechanism to get it back when another xcb client issues a request. The callback function libX11 uses here is straightforward: lock the Display, flush out any internally buffered requests, and return the sequence number of the last request written. Both libraries need this sequence number for various reasons internally, xcb for example uses it to make sure replies go back to the thread that issued the request.

But "lock the Display" here really means call into a vtable in the Display struct. That vtable is filled in during XOpenDisplay, but the individual function pointers are only non-NULL if you called XInitThreads beforehand. And if you're libGL, you have no way to enforce that, your public-facing API operates on a Display that was already created.

So now we see the race. The queue management thread calls into libxcb while the main thread is somewhere inside libX11. Since libX11 has taken the socket, the xcb thread runs the release callback. Since the Display was not made thread-safe at XOpenDisplay time, the release callback does not block, so the xlib thread's work won't be correctly accounted. If you're lucky the two sides will at least write to the socket atomically with respect to each other, but at this point they have diverging opinions about the request sequence numbering, and it's a matter of time until you crash.

It turns out kopper makes this really easy to hit. Like "resize a glxgears window" easy. However, this isn't just a kopper issue, this race exists for every program that uses xcb on a not-necessarily-thread-safe Display. The only reasonable fix is to for libX11 to just always be thread-safe.

So now, it is.


fwupd 1.8.0 and 50 million updates

Posted by Richard Hughes on April 28, 2022 03:05 PM

I’ve just tagged the 1.8.0 release of fwupd, with these release notes — there’s lots of good stuff there as always. More remarkable is that LVFS has now supplied over 50 million updates to Linux machines all around the globe. The true number is going to be unknown, as we allow vendors to distribute updates without any kind of logging, and also allow companies or agencies to mirror the entire LVFS so the archive can be used offline. The true number of updates deployed will be a lot higher than 50 million, which honestly blows my tiny mind. Just 7 years ago Christian asked me to “make firmware updates work on Linux” and now we have a thriving client project that respects both your freedom and your privacy, and a thriving ecosystem of hardware vendors who consider Linux users first class citizens. Of course, there are vendors who are not shipping updates for popular hardware, but they’re now in the minority — and every month we have two or three new vendor account requests. The logistical, security and most importantly commercial implications of not being “on the LVFS” are now too critical even for tier-1 IHVs, ODMs and OEMs to ignore.

I’m still amazed to see Reddit posts, YouTube videos and random people on Twitter talk about the thing that’s been my baby for the last few years. It’s both frightening as hell (because of the responsibility) and incredibly humbling at the same time. Red Hat can certainly take a lot of credit for the undeniable success of LVFS and fwupd, as they have been the people paying my salary and pushing me forward over the last decade and more. Obviously I’m glad everything is being used by the distros like Ubuntu and Arch, although for me it’s Fedora that’s at least technically the one pushing Linux forward these days. I’ve seen Fedora grow in market share year on year, and I’m proud to be one of the people pushing the exciting Future Features into Fedora.

So what happens next? I guess we have the next 50 million updates to look forward to. The LVFS has been growing ever so slightly exponentially since it was first conceived so that won’t take very long now. We’ve blasted through 1MM updates a month, and now regularly ship more than 2MM updates a month and with the number of devices supported growing like it has (4004 different streams, with 2232 more planned), it does seem an exciting place to be. I’m glad that the number of committers for fwupd is growing at the same pace as the popularity, and I’m not planning to burn out any time soon. Google has also been an amazing partner in encouraging vendors to ship updates on the LVFS and shipping fwupd in ChromeOS — and their trust and support has been invaluable. I’m also glad the “side-projects” like “GNOME Firmware“, “Host Security ID“, “fwupd friendly firmware” and “uSWID as an SBoM” also seem to be flourishing into independent projects in their own right. It does seem now is the right time to push the ecosystem towards transparency, open source and respecting the users privacy. Redistributing closed source firmware may be an unusual route to get there, but it’s certainly working. There are a few super-sekret things I’m just not allowed to share yet, but it’s fair to say that I’m incredibly excited about the long term future.

From the bottom of my heart, thank you all for your encouragement and support.

Testing my System Code in /usr/ Without Modifying /usr/

Posted by Lennart Poettering on April 26, 2022 10:00 PM

I recently blogged about how to run a volatile systemd-nspawn container from your host's /usr/ tree, for quickly testing stuff in your host environment, sharing your home drectory, but all that without making a single modification to your host, and on an isolated node.

The one-liner discussed in that blog story is great for testing during system software development. Let's have a look at another systemd tool that I regularly use to test things during systemd development, in a relatively safe environment, but still taking full benefit of my host's setup.

Since a while now, systemd has been shipping with a simple component called systemd-sysext. It's primary usecase goes something like this: on one hand OS systems with immutable /usr/ hierarchies are fantastic for security, robustness, updating and simplicity, but on the other hand not being able to quickly add stuff to /usr/ is just annoying.

systemd-sysext is supposed to bridge this contradiction: when invoked it will merge a bunch of "system extension" images into /usr/ (and /opt/ as a matter of fact) through the use of read-only overlayfs, making all files shipped in the image instantly and atomically appear in /usr/ during runtime — as if they always had been there. Now, let's say you are building your locked down OS, with an immutable /usr/ tree, and it comes without ability to log into, without debugging tools, without anything you want and need when trying to debug and fix something in the system. With systemd-sysext you could use a system extension image that contains all this, drop it into the system, and activate it with systemd-sysext so that it genuinely extends the host system.

(There are many other usecases for this tool, for example, you could build systems that way that at their base use a generic image, but by installing one or more system extensions get extended to with additional more specific functionality, or drivers, or similar. The tool is generic, use it for whatever you want, but for now let's not get lost in listing all the possibilites.)

What's particularly nice about the tool is that it supports automatically discovered dm-verity images, with signatures and everything. So you can even do this in a fully authenticated, measured, safe way. But I am digressing…

Now that we (hopefully) have a rough understanding what systemd-sysext is and does, let's discuss how specficially we can use this in the context of system software development, to safely use and test bleeding edge development code — built freshly from your project's build tree – in your host OS without having to risk that the host OS is corrupted or becomes unbootable by stuff that didn't quite yet work the way it was envisioned:

The images systemd-sysext merges into /usr/ can be of two kinds: disk images with a file system/verity/signature, or simple, plain directory trees. To make these images available to the tool, they can be placed or symlinked into /usr/lib/extensions/, /var/lib/extensions/, /run/extensions/ (and a bunch of others). So if we now install our freshly built development software into a subdirectory of those paths, then that's entirely sufficient to make them valid system extension images in the sense of systemd-sysext, and thus can be merged into /usr/ to try them out.

To be more specific: when I develop systemd itself, here's what I do regularly, to see how my new development version would behave on my host system. As preparation I checked out the systemd development git tree first of course, hacked around in it a bit, then built it with meson/ninja. And now I want to test what I just built:

sudo DESTDIR=/run/extensions/systemd-test meson install -C build --quiet --no-rebuild &&
        sudo systemd-sysext refresh --force

Explanation: first, we'll install my current build tree as a system extension into /run/extensions/systemd-test/. And then we apply it to the host via the systemd-sysext refresh command. This command will search for all installed system extension images in the aforementioned directories, then unmount (i.e. "unmerge") any previously merged dirs from /usr/ and then freshly mount (i.e. "merge") the new set of system extensions on top of /usr/. And just like that, I have installed my development tree of systemd into the host OS, and all that without actually modifying/replacing even a single file on the host at all. Nothing here actually hit the disk!

Note that all this works on any system really, it is not necessary that the underlying OS even is designed with immutability in mind. Just because the tool was developed with immutable systems in mind it doesn't mean you couldn't use it on traditional systems where /usr/ is mutable as well. In fact, my development box actually runs regular Fedora, i.e. is RPM-based and thus has a mutable /usr/ tree. As long as system extensions are applied the whole of /usr/ becomes read-only though.

Once I am done testing, when I want to revert to how things were without the image installed, it is sufficient to call:

sudo systemd-sysext unmerge

And there you go, all files my development tree generated are gone again, and the host system is as it was before (and /usr/ mutable again, in case one is on a traditional Linux distribution).

Also note that a reboot (regardless if a clean one or an abnormal shutdown) will undo the whole thing automatically, since we installed our build tree into /run/ after all, i.e. a tmpfs instance that is flushed on boot. And given that the overlayfs merge is a runtime thing, too, the whole operation was executed without any persistence. Isn't that great?

(You might wonder why I specified --force on the systemd-sysext refresh line earlier. That's because systemd-sysext actually does some minimal version compatibility checks when applying system extension images. For that it will look at the host's /etc/os-release file with /usr/lib/extension-release.d/extension-release.<name>, and refuse operaton if the image is not actually built for the host OS version. Here we don't want to bother with dropping that file in there, we know already that the extension image is compatible with the host, as we just built it on it. --force allows us to skip the version check.)

You might wonder: what about the combination of the idea from the previous blog story (regarding running container's off the host /usr/ tree) with system extensions? Glad you asked. Right now we have no support for this, but it's high on our TODO list (patches welcome, of course!). i.e. a new switch for systemd-nspawn called --system-extension= that would allow merging one or more such extensions into the container tree booted would be stellar. With that, with a single command I could run a container off my host OS but with a development version of systemd dropped in, all without any persistence. How awesome would that be?

(Oh, and in case you wonder, all of this only works with distributions that have completed the /usr/ merge. On legacy distributions that didn't do that and still place parts of /usr/ all over the hierarchy the above won't work, since merging /usr/ trees via overlayfs is pretty pointess if the OS is not hermetic in /usr/.)

And that's all for now. Happy hacking!

Fedora MediaWriter (nextgen) on MacOS

Posted by Evzen Gasta on April 19, 2022 06:34 AM

Hi, for the past few months I have been working on MacOS build and fixing USB drive restoration on Windows as well as fixing few design bugs. I fixed overflowing highlighted option background in combobox used in Adwaita style on Linux. Next design fix was updating animations when changing pages. Now when we build the new FMW on all platforms, I can say I like MacOS and Linux versions a lot more than Windows version. I don’t like design of buttons. What about your opinion?

<figure class="wp-container-4 wp-block-gallery-3 wp-block-gallery has-nested-images columns-default is-cropped" data-carousel-extra="{"blog_id":200697502,"permalink":"https:\/\/egastablog.wordpress.com\/2022\/04\/19\/fedora-media-writer-nextgen-builds-for-all-platforms\/"}"> <figure class="wp-block-image size-large"><figcaption>Linux</figcaption></figure> <figure class="wp-block-image size-large"><figcaption>Windows</figcaption></figure> <figure class="wp-block-image size-large"><figcaption>MacOS</figcaption></figure> </figure>

Windows

On Windows I have fixed the problem with USB drive restoration. This was present for a long period of time. Problem was caused by corruption of the partition table after the ISO file was written onto a USB drive and utility diskpart was not able to work with this drive anymore. This was quite tricky to figure out, but after all the fix was quite easy. I had to recreate new partition table using command “convert gpt” which creates new partition table on USB drive. For backwards compatibility with older systems and USB drives I had to run command “convert mbr”. After that utility diskpart now can restore the USB drive to original state.

MacOS

Making build for MacOS was quite tricky, because I don’t have any Apple device at home. So I had to use virtual machine (app sosumi available as snap) to try my changes I made to FMW. Even with this app I could not connect USB drive to properly try out full functionality of this new version. Thanks to my mentor and leader of my bachelor’s thesis Jan Grulich (his blog and twitter) for all the testing.

While updating CI manifest I had to make some tweaks to build.sh file, such as adjusting paths of few modules and libraries missed by macdeployqt or libraries that has to be copied manually.

For those of you who don’t know what is macdeployqt… Its “The Mac deployment tool can be found in QTDIR/bin/macdeployqt. It is designed to automate the process of creating a deployable application bundle that contains the Qt libraries as private frameworks” definition of macdeployqt from Qt docs.

All available builds you can download from releases on GitHub, if there are any problems  with the application, let me know either here or create bug in issues on GitHub.

GTK: Calling attention to a widget in LibreOffice with CSS animation

Posted by Caolán McNamara on April 17, 2022 02:01 PM


The motivation here is Use attention-attracting cue when pressing Ctrl+F while in the find bar and do "something" to call attention to the widget. I thought I'd try some of the built-in GTK CSS support. So here we animate the widget to reduce its left and right margins inwards a little and its opacity to 50% before returning to its original properties.

Nothing is ever simple, so in order to repeat the animation (the second time ctrl+f is pressed while the widget has focus) we have to duplicate the animation and use the other copy when the previous copy was last run.

The Freedom Phone is not great at privacy

Posted by Matthew Garrett on April 17, 2022 12:23 AM
The Freedom Phone advertises itself as a "Free speech and privacy first focused phone". As documented on the features page, it runs ClearOS, an Android-based OS produced by Clear United (or maybe one of the bewildering array of associated companies, we'll come back to that later). It's advertised as including Signal, but what's shipped is not the version available from the Signal website or any official app store - instead it's this fork called "ClearSignal".

The first thing to note about ClearSignal is that the privacy policy link from that page 404s, which is not a great start. The second thing is that it has a version number of 5.8.14, which is strange because upstream went from 5.8.10 to 5.9.0. The third is that, despite Signal being GPL 3, there's no source code available. So, I grabbed jadx and started looking for differences between ClearSignal and the upstream 5.8.10 release. The results were, uh, surprising.

First up is that they seem to have integrated ACRA, a crash reporting framework. This feels a little odd - in the absence of a privacy policy, it's unclear what information this gathers or how it'll be stored. Having a piece of privacy software automatically uploading information about what you were doing in the event of a crash with no notification other than a toast that appears saying "Crash Report" feels a little dubious.

Next is that Signal (for fairly obvious reasons) warns you if your version is out of date and eventually refuses to work unless you upgrade. ClearSignal has dealt with this problem by, uh, simply removing that code. The MacOS version of the desktop app they provide for download seems to be derived from a release from last September, which for an Electron-based app feels like a pretty terrible idea. Weirdly, for Windows they link to an official binary release from February 2021, and for Linux they tell you how to use the upstream repo properly. I have no idea what's going on here.

They've also added support for network backups of your Signal data. This involves the backups being pushed to an S3 bucket using credentials that are statically available in the app. It's ok, though, each upload has some sort of nominally unique identifier associated with it, so it's not trivial to just download other people's backups. But, uh, where does this identifier come from? It turns out that Clear Center, another of the Clear family of companies, employs a bunch of people to work on a ClearID[1], some sort of decentralised something or other that seems to be based on KERI. There's an overview slide deck here which didn't really answer any of my questions and as far as I can tell this is entirely lacking any sort of peer review, but hey it's only the one thing that stops anyone on the internet being able to grab your Signal backups so how important can it be.

The final thing, though? They've extended Signal's invitation support to encourage users to get others to sign up for Clear United. There's an exposed API endpoint called "get_user_email_by_mobile_number" which does exactly what you'd expect - if you give it a registered phone number, it gives you back the associated email address. This requires no authentication. But it gets better! The API to generate a referral link to send to others sends the name and phone number of everyone in your phone's contact list. There does not appear to be any indication that this is going to happen.

So, from a privacy perspective, going to go with things being some distance from ideal. But what's going on with all these Clear companies anyway? They all seem to be related to Michael Proper, who founded the Clear Foundation in 2009. They are, perhaps unsurprisingly, heavily invested in blockchain stuff, while Clear United also appears to be some sort of multi-level marketing scheme which has a membership agreement that includes the somewhat astonishing claim that:

Specifically, the initial focus of the Association will provide members with supplements and technologies for:

9a. Frequency Evaluation, Scans, Reports;

9b. Remote Frequency Health Tuning through Quantum Entanglement;

9c. General and Customized Frequency Optimizations;


- there's more discussion of this and other weirdness here. Clear Center, meanwhile, has a Chief Physics Officer? I have a lot of questions.

Anyway. We have a company that seems to be combining blockchain and MLM, has some opinions about Quantum Entanglement, bases the security of its platform on a set of novel cryptographic primitives that seem to have had no external review, has implemented an API that just hands out personal information without any authentication and an app that appears more than happy to upload all your contact details without telling you first, has failed to update this app to keep up with upstream security updates, and is violating the upstream license. If this is their idea of "privacy first", I really hate to think what their code looks like when privacy comes further down the list.

[1] Pointed out to me here

comment count unavailable comments

GTK4: ComboBox Cell Renderers

Posted by Caolán McNamara on April 12, 2022 07:52 PM

 

Custom Gtk Cell Renderer in GTK4 GtkComboBox in LibreOffice now working as hoped for.

Running a Container off the Host /usr/

Posted by Lennart Poettering on April 05, 2022 10:00 PM

Apparently, in some parts of this world, the /usr/-merge transition is still ongoing. Let's take the opportunity to have a look at one specific way to take benefit of the /usr/-merge (and associated work) IRL.

I develop system-level software as you might know. Oftentimes I want to run my development code on my PC but be reasonably sure it cannot destroy or otherwise negatively affect my host system. Now I could set up a container tree for that, and boot into that. But often I am too lazy for that, I don't want to bother with a slow package manager setting up a new OS tree for me. So here's what I often do instead — and this only works because of the /usr/-merge.

I run a command like the following (without any preparatory work):

systemd-nspawn \
        --directory=/ \
        --volatile=yes \
        -U \
        --set-credential=passwd.hashed-password.root:$(mkpasswd mysecret) \
        --set-credential=firstboot.locale:C.UTF-8 \
        --bind-user=lennart \
        -b

And then I very quickly get a login prompt on a container that runs the exact same software as my host — but is also isolated from the host. I do not need to prepare any separate OS tree or anything else. It just works. And my host user lennart is just there, ready for me to log into.

So here's what these systemd-nspawn options specifically do:

  • --directory=/ tells systemd-nspawn to run off the host OS' file hierarchy. That smells like danger of course, running two OS instances off the same directory hierarchy. But don't be scared, because:

  • --volatile=yes enables volatile mode. Specifically this means what we configured with --directory=/ as root file system is slightly rearranged. Instead of mounting that tree as it is, we'll mount a tmpfs instance as actual root file system, and then mount the /usr/ subdirectory of the specified hierarchy into the /usr/ subdirectory of the container file hierarchy in read-only fashion – and only that directory. So now we have a container directory tree that is basically empty, but imports all host OS binaries and libraries into its /usr/ tree. All software installed on the host is also available in the container with no manual work. This mechanism only works because on /usr/-merged OSes vendor resources are monopolized at a single place: /usr/. It's sufficient to share that one directory with the container to get a second instance of the host OS running. Note that this means /etc/ and /var/ will be entirely empty initially when this second system boots up. Thankfully, forward looking distributions (such as Fedora) have adopted systemd-tmpfiles and systemd-sysusers quite pervasively, so that system users and files/directories required for operation are created automatically should they be missing. Thus, even though at boot the mentioned directories are initially empty, once the system is booted up they are sufficiently populated for things to just work.

  • -U means we'll enable user namespacing, in fully automatic mode. This does three things: it picks a free host UID range dynamically for the container, then sets up user namespacing for the container processes mapping host UID range to UIDs 0…65534 in the container. It then sets up a similar UID mapped mount on the /usr/ tree of the container. Net effect: file ownerships as set on the host OS tree appear as they belong to the very same users inside of the container environment, except that we use user namespacing for everything, and thus the users are actually neatly isolated from the host.

  • --set-credential=passwd.hashed-password.root:$(mkpasswd mysecret) passes a credential to the container. Credentials are bits of data that you can pass to systemd services and whole systems. They are actually awesome concepts (e.g. they support TPM2 authentication/encryption that just works!) but I am not going to go into details around that, given it's off-topic in this specific scenario. Here we just take benefit of the fact that systemd-sysusers looks for a credential called passwd.hashed-password.root to initialize the root password of the system from. We set it to mysecret. This means once the system is booted up we can log in as root and the supplied password. Yay. (Remember, /etc/ is initially empty on this container, and thus also carries no /etc/passwd or /etc/shadow, and thus has no root user record, and thus no root password.)

    mkpasswd is a tool then converts a plain text password into a UNIX hashed password, which is what this specific credential expects.

  • Similar, --set-credential=firstboot.locale:C.UTF-8 tells the systemd-firstboot service in the container to initialize /etc/locale.conf with this locale.

  • --bind-user=lennart binds the host user lennart into the container, also as user lennart. This does two things: it mounts the host user's home directory into the container. It also copies a minimal user record of the specified user into the container that nss-systemd then picks up and includes in the regular user database. This means, once the container is booted up I can log in as lennart with my regular password, and once I logged in I will see my regular host home directory, and can make changes to it. Yippieh! (This does a couple of more things, such as UID mapping and things, but let's not get lost in too much details.)

So, if I run this, I will very quickly get a login prompt, where I can log into as my regular user. I have full access to my host home directory, but otherwise everything is nicely isolated from the host, and changes outside of the home directory are either prohibited or are volatile, i.e. go to a tmpfs instance whose lifetime is bound to the container's lifetime: when I shut down the container I just started, then any changes outside of my user's home directory are lost.

Note that while here I use --volatile=yes in combination with --directory=/ you can actually use it on any OS hierarchy, i.e. just about any directory that contains OS binaries.

Similar, the --bind-user= stuff works with any OS hierarchy too (but do note that only systemd 249 and newer will pick up the user records passed to the container that way, i.e. this requires at least v249 both on the host and in the container to work).

Or in short: the possibilities are endless!

Requirements

For this all to work, you need:

  1. A recent kernel (5.15 should suffice, as it brings UID mapped mounts for the most common file systems, so that -U and --bind-user= can work well.)

  2. A recent systemd (249 should suffice, which brings --bind-user=, and a -U switch backed by UID mapped mounts).

  3. A distribution that adopted the /usr/-merge, systemd-tmpfiles and systemd-sysusers so that the directory hierarchy and user databases are automatically populated when empty at boot. (Fedora 35 should suffice.)

Limitations

While a lot of today's software actually out of the box works well on systems that come up with an unpopulated /etc/ and /var/, and either fall back to reasonable built-in defaults, or deploy systemd-tmpfiles to create what is missing, things aren't perfect: some software typically installed an desktop OSes will fail to start when invoked in such a container, and be visible as ugly failed services, but it won't stop me from logging in and using the system for what I want to use it. It would be excellent to get that fixed, though. This can either be fixed in the relevant software upstream (i.e. if opening your configuration file fails with ENOENT, then just default to reasonable defaults), or in the distribution packaging (i.e. add a tmpfiles.d/ file that copies or symlinks in skeleton configuration from /usr/share/factory/etc/ via the C or L line types).

And then there's certain software dealing with hardware management and similar that simply cannot work in a container (as device APIs on Linux are generally not virtualized for containers) reasonably. It would be excellent if software like that would be updated to carry ConditionVirtualization=!container or ConditionPathIsReadWrite=/sys conditionalization in their unit files, so that it is automatically – cleanly – skipped when executed in such a container environment.

And that's all for now.

Bearer tokens are just awful

Posted by Matthew Garrett on April 05, 2022 06:54 AM
As I mentioned last time, bearer tokens are not super compatible with a model in which every access is verified to ensure it's coming from a trusted device. Let's talk about that in a bit more detail.

First off, what is a bearer token? In its simplest form, it's simply an opaque blob that you give to a user after an authentication or authorisation challenge, and then they show it to you to prove that they should be allowed access to a resource. In theory you could just hand someone a randomly generated blob, but then you'd need to keep track of which blobs you've issued and when they should be expired and who they correspond to, so frequently this is actually done using JWTs which contain some base64 encoded JSON that describes the user and group membership and so on and then have a signature associated with them so whenever the user presents one you can just validate the signature and then assume that the contents of the JSON are trustworthy.

One thing to note here is that the crypto is purely between whoever issued the token and whoever validates the token - as far as the server is concerned, any client who can just show it the token is just fine as long as the signature is verified. There's no way to verify the client's state, so one of the core ideas of Zero Trust (that we verify that the client is in a trustworthy state on every access) is already violated.

Can we make things not terrible? Sure! We may not be able to validate the client state on every access, but we can validate the client state when we issue the token in the first place. When the user hits a login page, we do state validation according to whatever policy we want to enforce, and if the client violates that policy we refuse to issue a token to it. If the token has a sufficiently short lifetime then an attacker is only going to have a short period of time to use that token before it expires and then (with luck) they won't be able to get a new one because the state validation will fail.

Except! This is fine for cases where we control the issuance flow. What if we have a scenario where a third party authenticates the client (by verifying that they have a valid token issued by their ID provider) and then uses that to issue their own token that's much longer lived? Well, now the client has a long-lived token sitting on it. And if anyone copies that token to another device, they can now pretend to be that client.

This is, sadly, depressingly common. A lot of services will verify the user, and then issue an oauth token that'll expire some time around the heat death of the universe. If a client system is compromised and an attacker just copies that token to another system, they can continue to pretend to be the legitimate user until someone notices (which, depending on whether or not the service in question has any sort of audit logs, and whether you're paying any attention to them, may be once screenshots of your data show up on Twitter).

This is a problem! There's no way to fit a hosted service that behaves this way into a Zero Trust model - the best you can say is that a token was issued to a device that was, around that time, apparently trustworthy, and now it's some time later and you have literally no idea whether the device is still trustworthy or if the token is still even on that device.

But wait, there's more! Even if you're nowhere near doing any sort of Zero Trust stuff, imagine the case of a user having a bunch of tokens from multiple services on their laptop, and then they leave their laptop unlocked in a cafe while they head to the toilet and whoops it's not there any more, better assume that someone has access to all the data on there. How many services has our opportunistic new laptop owner gained access to as a result? How do we revoke all of the tokens that are sitting there on the local disk? Do you even have a policy for dealing with that?

There isn't a simple answer to all of these problems. Replacing bearer tokens with some sort of asymmetric cryptographic challenge to the client would at least let us tie the tokens to a TPM or other secure enclave, and then we wouldn't have to worry about them being copied elsewhere. But that wouldn't help us if the client is compromised and the attacker simply keeps using the compromised client. The entire model of simply proving knowledge of a secret being sufficient to gain access to a resource is inherently incompatible with a desire for fine-grained trust verification on every access, but I don't see anything changing until we have a standard for third party services to be able to perform that trust verification against a customer's policy.

Still, at least this means I can just run weird Android IoT apps through mitmproxy, pull the bearer token out of the request headers and then start poking the remote API with curl. It may all be broken, but it's also got me a bunch of bug bounty credit, so, it;s impossible to say if its bad or not,

(Addendum: this suggestion that we solve the hardware binding problem by simply passing all the network traffic through some sort of local enclave that could see tokens being set and would then sequester them and reinject them into later requests is OBVIOUSLY HORRIFYING and is also probably going to be at least three startup pitches by the end of next week)

comment count unavailable comments

ZTA doesn't solve all problems, but partial implementations solve fewer

Posted by Matthew Garrett on March 31, 2022 11:06 PM
Traditional network access controls work by assuming that something is trustworthy based on some other factor - for example, if a computer is on your office network, it's trustworthy because only trustworthy people should be able to gain physical access to plug something in. If you restrict access to your services to requests coming from trusted networks, then you can assert that it's coming from a trusted device.

Of course, this isn't necessarily true. A machine on your office network may be compromised. An attacker may obtain valid VPN credentials. Someone could leave a hostile device plugged in under a desk in a meeting room. Trust is being placed in devices that may not be trustworthy.

A Zero Trust Architecture (ZTA) is one where a device is granted no inherent trust. Instead, each access to a service is validated against some policy - if the policy is satisfied, the access is permitted. A typical implementation involves granting each device some sort of cryptographic identity (typically a TLS client certificate) and placing the protected services behind a proxy. The proxy verifies the device identity, queries another service to obtain the current device state (we'll come back to that in a moment), compares the state against a policy and either pass the request through to the service or reject it. Different services can have different policies (eg, you probably want a lax policy around whatever's hosting the documentation for how to fix your system if it's being refused access to something for being in the wrong state), and if you want you can also tie it to proof of user identity in some way.

From a user perspective, this is entirely transparent. The proxy is made available on the public internet, DNS for the services points to the proxy, and every time your users try to access the service they hit the proxy instead and (if everything's ok) gain access to it no matter which network they're on. There's no need to connect to a VPN first, and there's no worries about accidentally leaking information over the public internet instead of over a secure link.

It's also notable that traditional solutions tend to be all-or-nothing. If I have some services that are more sensitive than others, the only way I can really enforce this is by having multiple different VPNs and only granting access to sensitive services from specific VPNs. This obviously risks combinatorial explosion once I have more than a couple of policies, and it's a terrible user experience.

Overall, ZTA approaches provide more security and an improved user experience. So why are we still using VPNs? Primarily because this is all extremely difficult. Let's take a look at an extremely recent scenario. A device used by customer support technicians was compromised. The vendor in question has a solution that can tie authentication decisions to whether or not a device has a cryptographic identity. If this was in use, and if the cryptographic identity was tied to the device hardware (eg, by being generated in a TPM), the attacker would not simply be able to obtain the user credentials and log in from their own device. This is good - if the attacker wanted to maintain access to the service, they needed to stay on the device in question. This increases the probability of the monitoring tooling on the compromised device noticing them.

Unfortunately, the attacker simply disabled the monitoring tooling on the compromised device. If device state was being verified on each access then this would be noticed before too long - the last data received from the device would be flagged as too old, and the requests would no longer satisfy any reasonable access control policy. Instead, the device was assumed to be trustworthy simply because it could demonstrate its identity. There's an important point here: just because a device belongs to you doesn't mean it's a trustworthy device.

So, if ZTA approaches are so powerful and user-friendly, why aren't we all using one? There's a few problems, but the single biggest is that there's no standardised way to verify device state in any meaningful way. Remote Attestation can both prove device identity and the device boot state, but the only product on the market that does much with this is Microsoft's Device Health Attestation. DHA doesn't solve the broader problem of also reporting runtime state - it may be able to verify that endpoint monitoring was launched, but it doesn't make assertions about whether it's still running. Right now, people are left trying to scrape this information from whatever tooling they're running. The absence of any standardised approach to this problem means anyone who wants to deploy a strong ZTA has to integrate with whatever tooling they're already running, and that then increases the cost of migrating to any other tooling later.

But even device identity is hard! Knowing whether a machine should be given a certificate or not depends on knowing whether or not you own it, and inventory control is a surprisingly difficult problem in a lot of environments. It's not even just a matter of whether a machine should be given a certificate in the first place - if a machine is reported as lost or stolen, its trust should be revoked. Your inventory system needs to tie into your device state store in order to ensure that your proxies drop access.

And, worse, all of this depends on you being able to put stuff behind a proxy in the first place! If you're using third-party hosted services, that's a problem. In the absence of a proxy, trust decisions are probably made at login time. It's possible to tie user auth decisions to device identity and state (eg, a self-hosted SAML endpoint could do that before passing through to the actual ID provider), but that's still going to end up providing a bearer token of some sort that can potentially be exfiltrated, and will continue to be trusted even if the device state becomes invalid.

ZTA doesn't solve all problems, and there isn't a clear path to it doing so without significantly greater industry support. But a complete ZTA solution is significantly more powerful than a partial one. Verifying device identity is a step on the path to ZTA, but in the absence of device state verification it's only a step.

comment count unavailable comments

AMD's Pluton implementation seems to be controllable

Posted by Matthew Garrett on March 23, 2022 08:42 AM
I've been digging through the firmware for an AMD laptop with a Ryzen 6000 that incorporates Pluton for the past couple of weeks, and I've got some rough conclusions. Note that these are extremely preliminary and may not be accurate, but I'm going to try to encourage others to look into this in more detail. For those of you at home, I'm using an image from here, specifically version 309. The installer is happy to run under Wine, and if you tell it to "Extract" rather than "Install" it'll leave a file sitting in C:\\DRIVERS\ASUS_GA402RK_309_BIOS_Update_20220322235241 which seems to have an additional 2K of header on it. Strip that and you should have something approximating a flash image.

Looking for UTF16 strings in this reveals something interesting:

<quote>Pluton (HSP) X86 Firmware Support
Enable/Disable X86 firmware HSP related code path, including AGESA HSP module, SBIOS HSP related drivers.
Auto - Depends on PcdAmdHspCoreEnable build value
NOTE: PSP directory entry 0xB BIT36 have the highest priority.
NOTE: This option will NOT put HSP hardware in disable state, to disable HSP hardware, you need setup PSP directory entry 0xB, BIT36 to 1.
// EntryValue[36] = 0: Enable, HSP core is enabled.
// EntryValue[36] = 1: Disable, HSP core is disabled then PSP will gate the HSP clock, no further PSP to HSP commands. System will boot without HSP.
</quote>
"HSP" here means "Hardware Security Processor" - a generic term that refers to Pluton in this case. This is a configuration setting that determines whether Pluton is "enabled" or not - my interpretation of this is that it doesn't directly influence Pluton, but disables all mechanisms that would allow the OS to communicate with it. In this scenario, Pluton has its firmware loaded and could conceivably be functional if the OS knew how to speak to it directly, but the firmware will never speak to it itself. I took a quick look at the Windows drivers for Pluton and it looks like they won't do anything unless the firmware wants to expose Pluton, so this should mean that Windows will do nothing.

So what about the reference to "PSP directory entry 0xB BIT36 have the highest priority"? The PSP is the AMD Platform Security Processor - it's an ARM core on the CPU package that boots before the x86. The PSP firmware lives in the same flash image as the x86 firmware, so the PSP looks for a header that points it towards the firmware it should execute. This gives a pointer to a "directory" - a list of different object types and where they're located in flash (there's a description of this for slightly older AMDs here). Type 0xb is treated slightly specially. Where most types contain the address of where the actual object is, type 0xb contains a 64-bit value that's interpreted as enabling or disabling various features - something AMD calls "soft fusing" (Intel have something similar that involves setting bits in the Firmware Interface Table). The PSP looks at the bits that are set here and alters its behaviour. If bit 36 is set, the PSP tells Pluton to turn itself off and will no longer send any commands to it.

So, we have two mechanisms to disable Pluton - the PSP can tell it to turn itself off, or the x86 firmware can simply never speak to it or admit that it exists. Both of these imply that Pluton has started executing before it's shut down, so it's reasonable to wonder whether it can still do stuff. In the image I'm looking at, there's a blob starting at 0x0069b610 that appears to be firmware for Pluton - it contains chunks that appear to be the reference TPM2 implementation, and it broadly decompiles as valid ARM code. It should be viable to figure out whether it can do anything in the face of being "disabled" via either of the above mechanisms.

Unfortunately for me, the system I'm looking at does set bit 36 in the 0xb entry - as a result, Pluton is disabled before x86 code starts running and I can't investigate further in any straightforward way. The implication that the user-controllable mechanism for disabling Pluton merely disables x86 communication with it rather than turning it off entirely is a little concerning, although (assuming Pluton is behaving as a TPM rather than having an enhanced set of capabilities) skipping any firmware communication means the OS has no way to know what happened before it started running even if it has a mechanism to communicate with Pluton without firmware assistance. In that scenario it'd be viable to write a bootloader shim that just faked up the firmware measurements before handing control to the OS.

The bit 36 disabling mechanism seems more solid? Again, it should be possible to analyse the Pluton firmware to determine whether it actually pays attention to a disable command being sent. But even if it chooses to ignore that, if the PSP is in a position to just cut the clock to Pluton, it's not going to be able to do a lot. At that point we're trusting AMD rather than trusting Microsoft, but given that you're also trusting AMD to execute the code you're giving them to execute, it's hard to avoid placing trust in them.

Overall: I'm reasonably confident that systems that ship with Pluton disabled via setting bit 36 in the soft fuses are going to disable it sufficiently hard that the OS can't do anything about it. Systems that give the user an option to enable or disable it are a little less clear in that respect, and it's possible (but not yet demonstrated) that an OS could communicate with Pluton anyway. However, if that's true, and if the firmware never communicates with Pluton itself, the user could install a stub loader in UEFI that mimicks the firmware behaviour and leaves the OS thinking everything was good when it absolutely is not.

So, assuming that Pluton in its current form on AMD has no capabilities outside those we know about, the disabling mechanisms are probably good enough. It's tough to make a firm statement on this before I have access to a system that doesn't just disable it immediately, so stay tuned for updates.

comment count unavailable comments

Stand With Ukraine

Posted by Martin Stransky on March 21, 2022 08:02 PM
<figure class="wp-block-image size-large"></figure>

Updates on the new generation of Fedora MediaWriter

Posted by Evzen Gasta on March 14, 2022 03:36 PM

In the past few months I have been developing new generation of Fedora New generation of FMW with a new UI written in Qt6 which will use native QtQuick styles for Windows and MacOS. At this point I have a fully functional application with all the features from the current version.

The application can be now build for Windows and Linux. Linux builds are also available as Flatpak for testing pourpose. Bare in mind this is still not the final version and there still might be some issues.

To develop new generation of FMW I had to learn, rework or update many things. To develop a new generation of FMW I had to learn, rework or update many things. A lot of them I saw for the first time like a complex project, QML, CMake, Qt… First of all I’ve started removing deprecated code that is no longer supported in Qt6 and made sure FMW can be built. After that I could start working on QML. I’ve started making pages and gradually adding basic functionality step by step.

Flatpak

For testing I’ve learned what is Flatpak and how Flatpak works. I’ve also learned how to create Flatpak using manifest file, this file was also needed to be updated. Thanks to GitHub CI we have available a test build made after every commit.

<figure class="wp-container-6 wp-block-gallery-5 wp-block-gallery has-nested-images columns-default is-cropped" data-carousel-extra="{"blog_id":200697502,"permalink":"https:\/\/egastablog.wordpress.com\/2022\/03\/14\/updates-on-the-new-generation-of-fedora-mediawriter\/"}"> <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure> <figcaption class="blocks-gallery-caption">Flatpak</figcaption></figure>

Windows

To develop Windows build I had to update the current script to support Qt6. That means updating qmake to cmake and removing all deprecated things. Also remove Adwaita theme, which is no longer used in FMW build on Windows. I also needed to copy all dependencies to create an installer for Windows.

Testing was pretty difficult for me, because I’m making the Windows builds on Linux and debugging was possible only in QtCreator on Windows. I had to boot multiple times from Linux to Windows and the other way around.

<figure class="wp-container-10 wp-block-gallery-9 wp-block-gallery has-nested-images columns-default is-cropped" data-carousel-extra="{"blog_id":200697502,"permalink":"https:\/\/egastablog.wordpress.com\/2022\/03\/14\/updates-on-the-new-generation-of-fedora-mediawriter\/"}"> <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure> <figcaption class="blocks-gallery-caption">Windows</figcaption></figure>

Windows version still needs some adjustment and fixes, such as restoring USB drive. For that reason we have available only the Linux version for testing, which has full functionality. I would be happy if you give me any feedback either here, or on github in the issues mentioning you use the nextgen version. You can get the development version here in the releases section.

Firmware Software Bill of Materials

Posted by Richard Hughes on March 10, 2022 10:46 AM

A Software Bill of Materials (aka SBoM) is something you’ve probably never heard of, but in future years they’ll hopefully start to become more and more important. In May last year the US president issued an executive order titled Improving the Nation’s Cybersecurity in which it outlines the way that critical software used by various branches of the government should be more traceable and secure. One of the key information captured in a SBoM is “who built what from where” which in open source we’re already familiar with, e.g. “Red Hat built your Linux kernel in a datacenter in the US” rather than “random person from the internet build your container on their laptop using Debian Sarge” and in the former case we also always have the hash of the source archive that was used to build it, and a lot more. Where this concept breaks down is firmware, where lots of different entities build each subsection in different ways, usually due to commercial and technical constraints.

Firmware is often lumped together as one thing, both technically as-in “one download” and conceptually when thinking about OS security. In reality a single firmware image might contain a FSP from Intel, several updated CPU microcode blobs for a few different CPUs, a CSME management engine, an embedded controller update, a UEFI system firmware a lot more. The system firmware is then made up of different file volumes, each with a few dozen EFI “PEI” binaries for initial system start-up and then a couple of hundred (!) “DXE” binaries for things like pre-boot networking and things like fingerprint authentication, mouse and keyboard input.

In the executive order from last May, firmware was explicitly excluded from the list of software that required a SBoM, on the logic that none of the infrastructure or specifications were in place, and it just wasn’t possible to do. The requirement for SBoM for boot-level firmware is expected in subsequent phases of the executive order. Needless to say I’ve been spending the last few months putting all the pieces together to make a firmware SBoM not just possible, but super easy for OEMs, ODMs and IBVs to generate.

The first problem to solve is how to embed the software ID (also known as SWID) metadata into each EFI binary. This is solved by putting coSWID metadata (a DTMF specification) into a new COFF section called, unsurprisingly, “SBOM”. This allows us to automatically capture at build time some data, for instance the tree hash, and the files that were used to build the binary, etc. This is what my friends at Eclypsium have been working on – so soon you can drop a top-level vendor.ini file in your EDK2 checkout with the correct vendor data (legal name, home page etc.) and then you can just build the tree and get everything inserted in this new PE section automatically. This gets us half way there. The uSWID readme explains how to do this manually too, for people not using either the EDK2 build-system or a variant of it.

The second problem is how to include SWID metadata for the blobs we either don’t build, or we can’t modify in any way, e.g. the FSP or uCode. For this there’s an “external” version of the same coSWID metadata which has a simple header we can find in the firmware image. This can either be included in the file volume itself, or just included as a file alongside the binary deliverable. We just have to trust that the vendor includes the correct metadata there – and we’re already trusting the vendor to implement things like SecureBoot correctly. The vendor can either use the [pip install] uswid command line (more examples in the uSWID readme) or more helpfully there’s also a web-generator on the LVFS that can spit out the tiny coSWID blob with the correct header ready to be included somewhere in the binary image.

Open source firmware like coreboot is also in the same boat of course, but here we have more flexibility in how to generate and include the SWID metadata in the image. My friends at Immune and 9elements are planning to work on this really soon, so we can have feature parity for free firmware like coreboot – even when non-free blobs are included into the image so that it can actually work on real hardware.

So, we have the metadata provision from the IBV, ODM and OEM all sprinkled around the update binary. What do we do then? When the binary is uploaded to the LVFS we decompress all the shards of the firmware, and do various checks. At this point we can look for coSWID metadata in the EFI binaries and also uSWID+coSWID metadata for the non-free blobs. From this we can save any of the detected SWID metadata to the per-component datastore, and make it available as a publicly available SBoM HTML page and also .zip archive containing the raw SWID XML data. It probably makes sense to have an external tool, either a CLI utility in the lvfs-website project, or something in native golang — but that doesn’t exist yet.

The vendor also gets the all important “green tick” which means the customer buying the hardware knows that it’s complying with the new requirements. Of course, we can’t check if the ODM has included all the SWID metadata for all the binaries, or included all the SWID components for all of the nonfree chunks, but it’s good enough as a first pass. The next logical thing would be to make a rule saying that the SWID green tick disappears if we detected CPU microcode, but also didn’t detect any microcode SWID metadata, etc. It would also be interesting to show a pie-chart for a given firmware image, showing just where the firmware has been built from, and who by, and how much stuff remains unaccounted for. But, little steps first.

I think I’ve got agreement-in-principal from most of the major stakeholders, and I’ll be hopefully presenting this work alongside AMI to the UEFI forum in a few months time. This means we’re in a position to actually provide SBoM for all firmware when the next EO revision is announced, rather than the ecosystem collapsing into a ball of raw panic.

If you want to add uSWID metadata to your firmware please let me know how I can help, even if it’s not available on the LVFS yet; I think this makes just as much sense for firmware that sits on a USB hub as it does your system firmware. Comments welcome.

Deckard and LibreOffice

Posted by Caolán McNamara on March 05, 2022 09:55 PM
LibreOffice reuses the same ui format that gtk uses. This suggests that deckard could be used to preview translations of them.

Testing this out shows (as above) that it can be made to work. A few problems though:

1. We have various placeholder widgets which don't work in deckard because the widgets don't exist in gtk so dialogs that use them can't display as something falls over with e.g. "Invalid object type 'SvSimpleTableContainer'" I had hoped I'd get placeholders by default on failure.
2. Our .po translation entries for the dialogs strings all have autogenerated msgctxt fields which don't correspond to the blank default of the .ui so the msgctxt fields have to be removed, then msguniq to remove duplicates, and the result can the be run through msgfmt to create a .mo that works with deckard to show web-previews

libei - adding support for passive contexts

Posted by Peter Hutterer on March 04, 2022 04:30 AM

A quick reminder: libei is the library for emulated input. It comes as a pair of C libraries, libei for the client side and libeis for the server side.

libei has been sitting mostly untouched since the last status update. There are two use-cases we need to solve for input emulation in Wayland - the ability to emulate input (think xdotool, or Synergy/Barrier/InputLeap client) and the ability to capture input (think Synergy/Barrier/InputLeap server). The latter effectively blocked development in libei [1], until that use-case was sorted there wasn't much point investing too much into libei - after all it may get thrown out as a bad idea. And epiphanies were as elusive like toilet paper and RATs, so nothing much get done. This changed about a week or two ago when the required lightbulb finally arrived, pre-lit from the factory.

So, the solution to the input capturing use-case is going to be a so-called "passive context" for libei. In the traditional [2] "active context" approach for libei we have the EIS implementation in the compositor and a client using libei to connect to that. The compositor sets up a seat or more, then some devices within that seat that typically represent the available screens. libei then sends events through these devices, causing input to be appear in the compositor which moves the cursor around. In a typical and simple use-case you'd get a 1920x1080 absolute pointer device and a keyboard with a $layout keymap, libei then sends events to position the cursor and or happily type away on-screen.

In the "passive context" <deja-vu> approach for libei we have the EIS implementation in the compositor and a client using libei to connect to that. The compositor sets up a seat or more, then some devices within that seat </deja-vu> that typically represent the physical devices connected to the host computer. libei then receives events from these devices, causing input to be generated in the libei client. In a typical and simple use-case you'd get a relative pointer device and a keyboard device with a $layout keymap, the compositor then sends events matching the relative input of the connected mouse or touchpad.

The two notable differences are thus: events flow from EIS to libei and the devices don't represent the screen but rather the physical [3] input devices.

This changes libei from a library for emulated input to an input event transport layer between two processes. On a much higher level than e.g. evdev or HID and with more contextual information (seats, devices are logically abstracted, etc.). And of course, the EIS implementation is always in control of the events, regardless which direction they flow. A compositor can implement an event filter or designate key to break the connection to the libei client. In pseudocode, the compositor's input event processing function will look like this:


function handle_input_events():
real_events = libinput.get_events()
for e in real_events:
if input_capture_active:
send_event_to_passive_libei_client(e)
else:
process_event(e)

emulated_events = eis.get_events_from_active_clients()
for e in emulated_events:
process_event(e)
Not shown here are the various appropriate filters and conversions in between (e.g. all relative events from libinput devices would likely be sent through the single relative device exposed on the EIS context). Again, the compositor is in control so it would be trivial to implement e.g. capturing of the touchpad only but not the mouse.

In the current design, a libei context can only be active or passive, not both. The EIS context is both, it's up to the implementation to disconnect active or passive clients if it doesn't support those.

Notably, the above only caters for the transport of input events, it doesn't actually make any decision on when to capture events. This handled by the CaptureInput XDG Desktop Portal [4]. The idea here is that an application like Synergy/Barrier/InputLeap server connects to the CaptureInput portal and requests a CaptureInput session. In that session it can define pointer barriers (left edge, right edge, etc.) and, in the future, maybe other triggers. In return it gets a libei socket that it can initialize a libei context from. When the compositor decides that the pointer barrier has been crossed, it re-routes the input events through the EIS context so they pop out in the application. Synergy/Barrier/InputLeap then converts that to the global position, passes it to the right remote Synergy/Barrier/InputLeap client and replays it there through an active libei context where it feeds into the local compositor.

Because the management of when to capture input is handled by the portal and the respective backends, it can be natively integrated into the UI. Because the actual input events are a direct flow between compositor and application, the latency should be minimal. Because it's a high-level event library, you don't need to care about hardware-specific details (unlike, say, the inputfd proposal from 2017). Because the negotiation of when to capture input is through the portal, the application itself can run inside a sandbox. And because libei only handles the transport layer, compositors that don't want to support sandboxes can set up their own negotiation protocol.

So overall, right now this seems like a workable solution.

[1] "blocked" is probably overstating it a bit but no-one else tried to push it forward, so..
[2] "traditional" is probably overstating it for a project that's barely out of alpha development
[3] "physical" is probably overstating it since it's likely to be a logical representation of the types of inputs, e.g. one relative device for all mice/touchpads/trackpoints
[4] "handled by" is probably overstating it since at the time of writing the portal is merely a draft of an XML file

GTK4: Using GtkMediaStream in LibreOffice

Posted by Caolán McNamara on February 18, 2022 05:30 PM

 

<iframe allowfullscreen="" class="BLOG_video_class" height="266" src="https://www.youtube.com/embed/UnjHd6CKVMw" width="320" youtube-src-id="UnjHd6CKVMw"></iframe>


Today's GTK4 version of LibreOffice (towards 7.4) using the first cut of my efforts to use GtkMediaStream + GtkPicture for video playback. Works a lot better than this video of a video suggests ;-)

WebRTC: journey to make wayland screen sharing enabled by default

Posted by Jan Grulich on February 16, 2022 09:08 AM

While we have pretty good support for screen sharing on Wayland in WebRTC, which is included in browsers like Chromium or Firefox, it is still not enabled by default in Chromium and it is kept behind a flag. Not only you have to remember to always enable it for new configurations, but for many users it is not even something they are aware of. This has been my main focus recently and I would like to share with you steps that has been done and what are the plans for the future.

What are the changes to expect in Chromium soon?

DMA-BUF improvements/fixes:

Last year I landed proper DMA-BUF support in WebRTC, which made things way faster. It was working, but it was not perfect and there were some corner cases where it might not be working at all. Here are changes I made recently:

  • Advertise DMA-BUF support when it is really supported. Older versions of PipeWire don’t handle the new way of DMA-BUF negotiation and therefore it shouldn’t be used in such cases. Also using DMA-BUF modifiers requires some recent versions of PipeWire on both sides.
  • Implemented stream renegotiation. In situation when we fail to import a DMA-BUF with given modifier, we will drop this modifier and try to renegotiate stream parameters and go with a different modifier or fallback to shared memory buffers in case we fail completely.
  • Make sure to import DMA-BUF with correct render node. In case of multi-gpu setups, we always picked the first render node to import DMA-BUFs, but it can happen that they were actually produced by a different render node and for that reason we might fail to import them. We now try to get default EGLDisplay, which should be the same one used by the wayland compositor and we should be using same render node.

Better mouse cursor support:

Until now we had mouse cursor as part of the screen content. This means that everytime you moved with your mouse cursor, we had to update whole image and that is very inefficient. The API in WebRTC allows you to implement MouseCursorMonitor which can be used to track mouse changes only and each platform can have both MouseCursorMonitor and DesktopCapturer implementations combined in DesktopAndCursorComposer to get complete image and this all works automatically like a magic. Unlike X11 implementation, our only option is to get everything from one PipeWire stream we connect to and there was no way how to make it shared from DesktopCapturer implementation so it can be used by MouseCursorMonitor implementation. I had to split DesktopCapturer to have xdg-desktop-portal and PipeWire separate implementations. Code for PipeWire is now a SharedScreenCastStream class which is being shared through DesktopCaptureOptions. This is set of parameters associated with each capturer instance and luckily this is also passed to MouseCursorMonitor so we can have access to already initialized PipeWire stream and get the cursor data from there. Implement MouseCursorMonitor with SharedScreenCastStream was then piece of cake.

List of merge requests:

This should again significantly improve performance of screen sharing, because moving with a mouse over a static screen content doesn’t need full screen content update.

Misc:

Last but not least, I’m now in touch with Google developers who help me to review all my changes and discuss with me the current state, issues I have, etc. on monthly meetings we have. The plan is to make this finally enabled by default, hopefully in the first half of this year. There are still some things that need to be solved before this is enabled and there is lot of work ahead, but things look promising.

Plans for the future:

  • Implement stream restoration
    • this will allow us to skip the second portal dialog and I already have plan in my head how to do this in WebRTC. This is currently only supported by xdg-desktop-portal-gnome and xdg-desktop-portal-kde lacks this functionality.
  • Improve UX of the Chromium screen sharing dialog
  • Write tests for all PipeWire/portal code in WebRTC

Even though WebRTC is used in Firefox, I mostly talk about Chromium, because Firefox doesn’t use most recent WebRTC and will need to pick all the changes I did or rebase to newer WebRTC in order to have them. Firefox also has PipeWire/Wayland screen sharing enabled by default and doesn’t have UX issues as there is no internal screen sharing dialog like in Chromium.

I hope all these changes will make your experience better and next time when you read a new blog post I will be informing you about end of this journey.

The xf86-input-wacom driver hits 1.0

Posted by Peter Hutterer on February 15, 2022 05:24 AM

After roughly 20 years and counting up to 0.40 in release numbers, I've decided to call the next version of the xf86-input-wacom driver the 1.0 release. [1] This cycle has seen a bulk of development (>180 patches) which is roughly as much as the last 12 releases together. None of these patches actually added user-visible features, so let's talk about technical dept and what turned out to be an interesting way of reducing it.

The wacom driver's git history goes back to 2002 and the current batch of maintainers (Ping, Jason and I) have all been working on it for one to two decades. It used to be a Wacom-only driver but with the improvements made to the kernel over the years the driver should work with most tablets that have a kernel driver, albeit some of the more quirky niche features will be more limited (but your non-Wacom devices probably don't have those features anyway).

The one constant was always: the driver was extremely difficult to test, something common to all X input drivers. Development is a cycle of restarting the X server a billion times, testing is mostly plugging hardware in and moving things around in the hope that you can spot the bugs. On a driver that doesn't move much, this isn't necessarily a problem. Until a bug comes along, that requires some core rework of the event handling - in the kernel, libinput and, yes, the wacom driver.

After years of libinput development, I wasn't really in the mood for the whole "plug every tablet in and test it, for every commit". In a rather caffeine-driven development cycle [2], the driver was separated into two logical entities: the core driver and the "frontend". The default frontend is the X11 one which is now a relatively thin layer around the core driver parts, primarily to translate events into the X Server's API. So, not unlike libinput + xf86-input-libinput in terms of architecture. In ascii-art:


|
+--------------------+ | big giant
/dev/input/event0->| core driver | x11 |->| X server
+--------------------+ | process
|

Now, that logical separation means we can have another frontend which I implemented as a relatively light GObject wrapper and is now a library creatively called libgwacom:



+-----------------------+ |
/dev/input/event0->| core driver | gwacom |--| tools or test suites
+-----------------------+ |

This isn't a public library or API and it's very much focused on the needs of the X driver so there are some peculiarities in there. What it allows us though is a new wacom-record tool that can hook onto event nodes and print the events as they come out of the driver. So instead of having to restart X and move and click things, you get this:

$ ./builddir/wacom-record
wacom-record:
version: 0.99.2
git: xf86-input-wacom-0.99.2-17-g404dfd5a
device:
path: /dev/input/event6
name: "Wacom Intuos Pro M Pen"
events:
- source: 0
event: new-device
name: "Wacom Intuos Pro M Pen"
type: stylus
capabilities:
keys: true
is-absolute: true
is-direct-touch: false
ntouches: 0
naxes: 6
axes:
- {type: x , range: [ 0, 44800], resolution: 200000}
- {type: y , range: [ 0, 29600], resolution: 200000}
- {type: pressure , range: [ 0, 65536], resolution: 0}
- {type: tilt_x , range: [ -64, 63], resolution: 57}
- {type: tilt_y , range: [ -64, 63], resolution: 57}
- {type: wheel , range: [ -900, 899], resolution: 0}
...
- source: 0
mode: absolute
event: motion
mask: [ "x", "y", "pressure", "tilt-x", "tilt-y", "wheel" ]
axes: { x: 28066, y: 17643, pressure: 0, tilt: [ -4, 56], rotation: 0, throttle: 0, wheel: -108, rings: [ 0, 0]
This is YAML which means we can process the output for comparison or just to search for things.

A tool to quickly analyse data makes for faster development iterations but it's still a far cry from reliable regression testing (and writing a test suite is a daunting task at best). But one nice thing about GObject is that it's accessible from other languages, including Python. So our test suite can be in Python, using pytest and all its capabilities, plus all the advantages Python has over C. Most of driver testing comes down to: create a uinput device, set up the driver with some options, push events through that device and verify they come out of the driver in the right sequence and format. I don't need C for that. So there's pull request sitting out there doing exactly that - adding a pytest test suite for a 20-year old X driver written in C. That this is a) possible and b) a lot less work than expected got me quite unreasonably excited. If you do have to maintain an old C library, maybe consider whether's possible doing the same because there's nothing like the warm fuzzy feeling a green tick on a CI pipeline gives you.

[1] As scholars of version numbers know, they make as much sense as your stereotypical uncle's facebook opinion, so why not.
[2] The Colombian GDP probably went up a bit

How to use libportal/libportal-qt

Posted by Jan Grulich on February 10, 2022 11:28 AM

There was a blog post from Peter Hutterer about Flatpak portals posted few months back. Peter explained what are portals and how do they work. Portals are used mostly because of security and sandbox/Wayland restrictions. Many times your only way to get access outside (opening a file, sending a notification, sharing a screen, etc.) is to use a portal. For most use-cases applications or developers don’t need to care about them as their support is usually implemented in libraries they use. For example Qt and GTK use portals internally so apps can use still the same APIs as before and they don’t need to worry about their apps not working in sandboxed environments. BUT there are still scenarios where libraries have unsufficient or none portal support, or a different options are desired so what are the options in this case if you still need to use portals?

  1. Do everything yourself, which means you will implement all the DBus calls and handling yourself.
  2. Use a library. Most logic choice would be libportal, but there is also a project called ASHPD for Rust users.

What is libportal and libportal-qt?

The libportal library provides GIO-style async APIs for Flatpak portals. It hides all the DBus complexity users would face in case of using portals directly and provides a user-friendly library instead. You might think that the libportal-qt is the same thing, just with Qt-style APIs, but the idea behind it is that each toolkit (Gtk3, Gtk4, Qt5, Qt6) has a different way to get a window handle which is needed to associate portal dialogs with the app that invoked them. So libportal-qt just provides a way to get a XdpParent object from a QWindow. As a C++/Qt developer I don’t mind using C/Glib APIs and I used it many times, but there is still one speciality I fail to use everytime, my friend GVariant. Some of the portal APIs in libportal expects a GVariant for all the complex structures, for example to specify a filter option for OpenFile() call from the fillechooser portal, you have to build a very complex GVariant based on the DBus specification.

Remember I told you libportal-qt doesn’t offer Qt-style APIs? This is not necessarily true, because I implemented all the complex structures you will have to pass in most of the portals and implemented functions that will return them as GVariants so you don’t need to get in touch with GVariants at all.

How to use libportal-qt?

First of all, all libportal flavours have pkgconfig file installed so it’s easy to use them from any build system and you just need to search for libportal-qt5 (we don’t have -qt6 version yet).

And how does the code look like? For example let’s say you want to open an image:

// Creates a filter rule, this can be a Mimetype or Pattern.
XdpQt::FileChooserFilterRule rule;
rule.type = XdpQt::FileChooserFilterRuleType::Mimetype;
rule.rule = QStringLiteral("image/jpeg");

// Create a filter with our rules, we will then pass it to OpenFile() call as GVariant.
XdpQt::FileChooserFilter filter;
filter.label = QStringLiteral("Images");
filter.rules << rule;

// Create a GVariant from our filter. This will result into variant in form of:
// "[('Images', [(1, 'image/jpeg')])]"
g_autoptr(GVariant) filterVariant = XdpQt::filechooserFiltersToGVariant({filter});

// Get XdpParent to associate this call (portal dialog) with our window.
XdpParent *parent = xdp_parent_new_qt(m_mainWindow->windowHandle());

// Finally open a file. XdpQt::globalPortalObject() is another convenient function 
// that creates a global instance of XdpPortal object so you don't need to take care
// of creating it yourself. For some of the arguments we just pass a nullptr to don't 
// specify them.
xdp_portal_open_file(XdpQt::globalPortalObject() /*XdpPortal object*/,
                                  parent /*XdpParent object*/, "Title", filterVariant /*filters*/,
                                  nullptr /*current_filter*/, nullptr /*choices*/, 
                                  XDP_OPEN_FILE_FLAG_NONE /*flags*/, nullptr /*cancellable*/, 
                                  openedFile /*callback*/, this /*data*/);
xdp_parent_free(parent);

// Then the callback would look like this, eg.
static void openedFile(GObject *object, GAsyncResult *result, gpointer data) {
    g_autoptr(GError) error;
    g_autoptr(GVariant) ret = 
        xdp_portal_open_file_finish(XdpQt::globalPortalObject(), result, &error);

    if (ret) {
        // Another convenient function that will get you uris and choices from 
        // GVariant returned by xdp_portal_open_file() call.
        XdpQt::FileChooserResult result = filechooserResultFromGVariant(ret);
        
        // Do whatever you want to do with the result. Here we just print opened selected files.
        qDebug() << result.uris;
    }
}

As you can see, no GVariant got hurt and you can easily open a file without any GVariant knowledge. Besides FileChooser portal helpers, we also have Notification portal helpers, because serializing icons and buttons is also something that is not trivial. For the rest of the portals you either don’t need to use complex GVariants so you can use them easily without helper functions same way as shown above, or some portals like ScreenCast or RemoteDesktop are not used that often and we don’t have helper functions for those just yet.

I hope you can find this helpful in case you want to join this world. The libportal project is hosted on GitHub in case you want to try it just now, because this is still not part of any stable release (will be in libportal 0.6), or report a bug or just look at my GVariant helpers to see what I spare you of.

“Videos” de-clutter-ification

Posted by Bastien Nocera on February 05, 2022 05:29 PM

(I nearly went with clutterectomy, but that would be doing our old servant project a disservice.)

Yesterday, I finally merged the work-in-progress branch porting totem to GStreamer's GTK GL sink widget, undoing a lot of the work done in 2011 and 2014 to port the video widget and then to finally make use of its features.

But GTK has been modernised (in GTK3 but in GTK4 even more so), GStreamer grew a collection of GL plugins, Wayland and VA-API matured and clutter (and its siblings clutter-gtk, and clutter-gst) didn't get the resources they needed to follow.

Screenshot_from_2022-02-03_18-03-40A screenshot with practically no changes, as expected

The list of bug fixes and enhancements is substantial:

  • Makes some files that threw shaders warnings playable
  • Fixes resize lag for the widgets embedded in the video widget
  • Fixes interactions with widgets on some HDR capable systems, or even widgets disappearing sometimes (!)
  • Gets rid of the floating blank windows under Wayland
  • Should help with tearing, although that's highly dependent on the system
  • Hi-DPI support
  • Hardware acceleration (through libva)

Until the port to GTK4, we expect a overall drop in performance on systems where there's no VA-API support, and the GTK4 port should bring it to par with the fastest of players available for GNOME.

You can install a Preview version right now by running:

$ flatpak install --user https://flathub.org/beta-repo/appstream/org.gnome.Totem.Devel.flatpakref

and filing bug in the GNOME GitLab.

Next stop, a GTK4 port!

Boot Guard and PSB have user-hostile defaults

Posted by Matthew Garrett on January 17, 2022 04:37 AM
Compromising an OS without it being detectable is hard. Modern operating systems support the imposition of a security policy or the launch of some sort of monitoring agent sufficient early in boot that even if you compromise the OS, you're probably going to have left some sort of detectable trace[1]. You can avoid this by attacking the lower layers - if you compromise the bootloader then it can just hotpatch a backdoor into the kernel before executing it, for instance.

This is avoided via one of two mechanisms. Measured boot (such as TPM-based Trusted Boot) makes a tamper-proof cryptographic record of what the system booted, with each component in turn creating a measurement of the next component in the boot chain. If a component is tampered with, its measurement will be different. This can be used to either prevent the release of a cryptographic secret if the boot chain is modified (for instance, using the TPM to encrypt the disk encryption key), or can be used to attest the boot state to another device which can tell you whether you're safe or not. The other approach is verified boot (such as UEFI Secure Boot), where each component in the boot chain verifies the next component before executing it. If the verification fails, execution halts.

In both cases, each component in the boot chain measures and/or verifies the next. But something needs to be the first link in this chain, and traditionally this was the system firmware. Which means you could tamper with the system firmware and subvert the entire process - either have the firmware patch the bootloader in RAM after measuring or verifying it, or just load a modified bootloader and lie about the measurements or ignore the verification. Attackers had already been targeting the firmware (Hacking Team had something along these lines, although this was pre-secure boot so just dropped a rootkit into the OS), and given a well-implemented measured and verified boot chain, the firmware becomes an even more attractive target.

Intel's Boot Guard and AMD's Platform Secure Boot attempt to solve this problem by moving the validation of the core system firmware to an (approximately) immutable environment. Intel's solution involves the Management Engine, a separate x86 core integrated into the motherboard chipset. The ME's boot ROM verifies a signature on its firmware before executing it, and once the ME is up it verifies that the system firmware's bootblock is signed using a public key that corresponds to a hash blown into one-time programmable fuses in the chipset. What happens next depends on policy - it can either prevent the system from booting, allow the system to boot to recover the firmware but automatically shut it down after a while, or flag the failure but allow the system to boot anyway. Most policies will also involve a measurement of the bootblock being pushed into the TPM.

AMD's Platform Secure Boot is slightly different. Rather than the root of trust living in the motherboard chipset, it's in AMD's Platform Security Processor which is incorporated directly onto the CPU die. Similar to Boot Guard, the PSP has ROM that verifies the PSP's own firmware, and then that firmware verifies the system firmware signature against a set of blown fuses in the CPU. If that fails, system boot is halted. I'm having trouble finding decent technical documentation about PSB, and what I have found doesn't mention measuring anything into the TPM - if this is the case, PSB only implements verified boot, not measured boot.

What's the practical upshot of this? The first is that you can't replace the system firmware with anything that doesn't have a valid signature, which effectively means you're locked into firmware the vendor chooses to sign. This prevents replacing the system firmware with either a replacement implementation (such as Coreboot) or a modified version of the original implementation (such as firmware that disables locking of CPU functionality or removes hardware allowlists). In this respect, enforcing system firmware verification works against the user rather than benefiting them.
Of course, it also prevents an attacker from doing the same thing, but while this is a real threat to some users, I think it's hard to say that it's a realistic threat for most users.

The problem is that vendors are shipping with Boot Guard and (increasingly) PSB enabled by default. In the AMD case this causes another problem - because the fuses are in the CPU itself, a CPU that's had PSB enabled is no longer compatible with any motherboards running firmware that wasn't signed with the same key. If a user wants to upgrade their system's CPU, they're effectively unable to sell the old one. But in both scenarios, the user's ability to control what their system is running is reduced.

As I said, the threat that these technologies seek to protect against is real. If you're a large company that handles a lot of sensitive data, you should probably worry about it. If you're a journalist or an activist dealing with governments that have a track record of targeting people like you, it should probably be part of your threat model. But otherwise, the probability of you being hit by a purely userland attack is so ludicrously high compared to you being targeted this way that it's just not a big deal.

I think there's a more reasonable tradeoff than where we've ended up. Tying things like disk encryption secrets to TPM state means that if the system firmware is measured into the TPM prior to being executed, we can at least detect that the firmware has been tampered with. In this case nothing prevents the firmware being modified, there's just a record in your TPM that it's no longer the same as it was when you encrypted the secret. So, here's what I'd suggest:

1) The default behaviour of technologies like Boot Guard or PSB should be to measure the firmware signing key and whether the firmware has a valid signature into PCR 7 (the TPM register that is also used to record which UEFI Secure Boot signing key is used to verify the bootloader).
2) If the PCR 7 value changes, the disk encryption key release will be blocked, and the user will be redirected to a key recovery process. This should include remote attestation, allowing the user to be informed that their firmware signing situation has changed.
3) Tooling should be provided to switch the policy from merely measuring to verifying, and users at meaningful risk of firmware-based attacks should be encouraged to make use of this tooling

This would allow users to replace their system firmware at will, at the cost of having to re-seal their disk encryption keys against the new TPM measurements. It would provide enough information that, in the (unlikely for most users) scenario that their firmware has actually been modified without their knowledge, they can identify that. And it would allow users who are at high risk to switch to a higher security state, and for hardware that is explicitly intended to be resilient against attacks to have different defaults.

This is frustratingly close to possible with Boot Guard, but I don't think it's quite there. Before you've blown the Boot Guard fuses, the Boot Guard policy can be read out of flash. This means that you can drop a Boot Guard configuration into flash telling the ME to measure the firmware but not prevent it from running. But there are two problems remaining:

1) The measurement is made into PCR 0, and PCR 0 changes every time your firmware is updated. That makes it a bad default for sealing encryption keys.
2) It doesn't look like the policy is measured before being enforced. This means that an attacker can simply reflash modified firmware with a policy that disables measurement and then make a fake measurement that makes it look like the firmware is ok.

Fixing this seems simple enough - the Boot Guard policy should always be measured, and measurements of the policy and the signing key should be made into a PCR other than PCR 0. If an attacker modified the policy, the PCR value would change. If an attacker modified the firmware without modifying the policy, the PCR value would also change. People who are at high risk would run an app that would blow the Boot Guard policy into fuses rather than just relying on the copy in flash, and enable verification as well as measurement. Now if an attacker tampers with the firmware, the system simply refuses to boot and the attacker doesn't get anything.

Things are harder on the AMD side. I can't find any indication that PSB supports measuring the firmware at all, which obviously makes this approach impossible. I'm somewhat surprised by that, and so wouldn't be surprised if it does do a measurement somewhere. If it doesn't, there's a rather more significant problem - if a system has a socketed CPU, and someone has sufficient physical access to replace the firmware, they can just swap out the CPU as well with one that doesn't have PSB enabled. Under normal circumstances the system firmware can detect this and prompt the user, but given that the attacker has just replaced the firmware we can assume that they'd do so with firmware that doesn't decide to tell the user what just happened. In the absence of better documentation, it's extremely hard to say that PSB actually provides meaningful security benefits.

So, overall: I think Boot Guard protects against a real-world attack that matters to a small but important set of targets. I think most of its benefits could be provided in a way that still gave users control over their system firmware, while also permitting high-risk targets to opt-in to stronger guarantees. Based on what's publicly documented about PSB, it's hard to say that it provides real-world security benefits for anyone at present. In both cases, what's actually shipping reduces the control people have over their systems, and should be considered user-hostile.

[1] Assuming that someone's both turning this on and actually looking at the data produced

comment count unavailable comments

Pluton is not (currently) a threat to software freedom

Posted by Matthew Garrett on January 09, 2022 12:59 AM
At CES this week, Lenovo announced that their new Z-series laptops would ship with AMD processors that incorporate Microsoft's Pluton security chip. There's a fair degree of cynicism around whether Microsoft have the interests of the industry as a whole at heart or not, so unsurprisingly people have voiced concerns about Pluton allowing for platform lock-in and future devices no longer booting non-Windows operating systems. Based on what we currently know, I think those concerns are understandable but misplaced.

But first it's helpful to know what Pluton actually is, and that's hard because Microsoft haven't actually provided much in the way of technical detail. The best I've found is a discussion of Pluton in the context of Azure Sphere, Microsoft's IoT security platform. This, in association with the block diagrams on page 12 and 13 of this slidedeck, suggest that Pluton is a general purpose security processor in a similar vein to Google's Titan chip. It has a relatively low powered CPU core, an RNG, and various hardware cryptography engines - there's nothing terribly surprising here, and it's pretty much the same set of components that you'd find in a standard Trusted Platform Module of the sort shipped in pretty much every modern x86 PC. But unlike Titan, Pluton seems to have been designed with the explicit goal of being incorporated into other chips, rather than being a standalone component. In the Azure Sphere case, we see it directly incorporated into a Mediatek chip. In the Xbox Series devices, it's incorporated into the SoC. And now, we're seeing it arrive on general purpose AMD CPUs.

Microsoft's announcement says that Pluton can be shipped in three configurations:as the Trusted Platform Module; as a security processor used for non-TPM scenarios like platform resiliency; or OEMs can choose to ship with Pluton turned off. What we're likely to see to begin with is the former - Pluton will run firmware that exposes a Trusted Computing Group compatible TPM interface. This is almost identical to the status quo. Microsoft have required that all Windows certified hardware ship with a TPM for years now, but for cost reasons this is often not in the form of a separate hardware component. Instead, both Intel and AMD provide support for running the TPM stack on a component separate from the main execution cores on the system - for Intel, this TPM code runs on the Management Engine integrated into the chipset, and for AMD on the Platform Security Processor that's integrated into the CPU package itself.

So in this respect, Pluton changes very little; the only difference is that the TPM code is running on hardware dedicated to that purpose, rather than alongside other code. Importantly, in this mode Pluton will not do anything unless the system firmware or OS ask it to. Pluton cannot independently block the execution of any other code - it knows nothing about the code the CPU is executing unless explicitly told about it. What the OS can certainly do is ask Pluton to verify a signature before executing code, but the OS could also just verify that signature itself. Windows can already be configured to reject software that doesn't have a valid signature. If Microsoft wanted to enforce that they could just change the default today, there's no need to wait until everyone has hardware with Pluton built-in.

The two things that seem to cause people concerns are remote attestation and the fact that Microsoft will be able to ship firmware updates to Pluton via Windows Update. I've written about remote attestation before, so won't go into too many details here, but the short summary is that it's a mechanism that allows your system to prove to a remote site that it booted a specific set of code. What's important to note here is that the TPM (Pluton, in the scenario we're talking about) can't do this on its own - remote attestation can only be triggered with the aid of the operating system. Microsoft's Device Health Attestation is an example of remote attestation in action, and the technology definitely allows remote sites to refuse to grant you access unless you booted a specific set of software. But there are two important things to note here: first, remote attestation cannot prevent you from booting whatever software you want, and second, as evidenced by Microsoft already having a remote attestation product, you don't need Pluton to do this! Remote attestation has been possible since TPMs started shipping over two decades ago.

The other concern is Microsoft having control over the firmware updates. The context here is that TPMs are not magically free of bugs, and sometimes these can have security consequences. One example is Infineon TPMs producing weak RSA keys, a vulnerability that could be rectified by a firmware update to the TPM. Unfortunately these updates had to be issued by the device manufacturer rather than Infineon being able to do so directly. This meant users had to wait for their vendor to get around to shipping an update, something that might not happen at all if the machine was sufficiently old. From a security perspective, being able to ship firmware updates for the TPM without them having to go through the device manufacturer is a huge win.

Microsoft's obviously in a position to ship a firmware update that modifies the TPM's behaviour - there would be no technical barrier to them shipping code that resulted in the TPM just handing out your disk encryption secret on demand. But Microsoft already control the operating system, so they already have your disk encryption secret. There's no need for them to backdoor the TPM to give them something that the TPM's happy to give them anyway. If you don't trust Microsoft then you probably shouldn't be running Windows, and if you're not running Windows Microsoft can't update the firmware on your TPM.

So, as of now, Pluton running firmware that makes it look like a TPM just isn't a terribly interesting change to where we are already. It can't block you running software (either apps or operating systems). It doesn't enable any new privacy concerns. There's no mechanism for Microsoft to forcibly push updates to it if you're not running Windows.

Could this change in future? Potentially. Microsoft mention another use-case for Pluton "as a security processor used for non-TPM scenarios like platform resiliency", but don't go into any more detail. At this point, we don't know the full set of capabilities that Pluton has. Can it DMA? Could it play a role in firmware authentication? There are scenarios where, in theory, a component such as Pluton could be used in ways that would make it more difficult to run arbitrary code. It would be reassuring to hear more about what the non-TPM scenarios are expected to look like and what capabilities Pluton actually has.

But let's not lose sight of something more fundamental here. If Microsoft wanted to block free operating systems from new hardware, they could simply mandate that vendors remove the ability to disable secure boot or modify the key databases. If Microsoft wanted to prevent users from being able to run arbitrary applications, they could just ship an update to Windows that enforced signing requirements. If they want to be hostile to free software, they don't need Pluton to do it.

(Edit: it's been pointed out that I kind of gloss over the fact that remote attestation is a potential threat to free software, as it theoretically allows sites to block access based on which OS you're running. There's various reasons I don't think this is realistic - one is that there's just way too much variability in measurements for it to be practical to write a policy that's strict enough to offer useful guarantees without also blocking a number of legitimate users, and the other is that you can just pass the request through to a machine that is running the appropriate software and have it attest for you. The fact that nobody has actually bothered to use remote attestation for this purpose even though most consumer systems already ship with TPMs suggests that people generally agree with me on that)

comment count unavailable comments

Update on Linux hibernation support when lockdown is enabled

Posted by Matthew Garrett on December 31, 2021 03:36 AM
Some time back I wrote up a description of my proposed (and implemented) solution for making hibernation work under Linux even within the bounds of the integrity model. It's been a while, so here's an update.

The first is that localities just aren't an option. It turns out that they're optional in the spec, and TPMs are entirely permitted to say they don't support them. The only time they're likely to work is on platforms that support DRTM implementations like TXT. Most consumer hardware doesn't fall into that category, so we don't get to use that solution. Unfortunate, but, well.

The second is that I'd ignored an attack vector. If the kernel is configured to restrict access to PCR 23, then yes, an attacker is never able to modify PCR 23 to be in the same state it would be if hibernation were occurring and the key certification data will fail to validate. Unfortunately, an attacker could simply boot into an older kernel that didn't implement the PCR 23 restriction, and could fake things up there (yes, this is getting a bit convoluted, but the entire point here is to make this impossible rather than just awkward). Once PCR 23 was in the correct state, they would then be able to write out a new swap image, boot into a new kernel that supported the secure hibernation solution, and have that resume successfully in the (incorrect) belief that the image was written out in a secure environment.

This felt like an awkward problem to fix. We need to be able to distinguish between the kernel having modified the PCRs and userland having modified the PCRs, and we need to be able to do this without modifying any kernels that have already been released[1]. The normal approach to determining whether an event occurred in a specific phase of the boot process is to "cap" the PCR - extend it with a known value that indicates a transition between stages of the boot process. Any events that occur before the cap event must have occurred in the previous stage of boot, and since the final PCR value depends on the order of measurements and not just the contents of those measurements, if a PCR is capped before userland runs, userland can't fake the same PCR value afterwards. If Linux capped a PCR before userland started running, we'd be able to place a measurement there before the cap occurred and then prove that that extension occurred before userland had the opportunity to interfere. We could simply place a statement that the kernel supported the PCR 23 restrictions there, and we'd be fine.

Unfortunately Linux doesn't currently do this, and adding support for doing so doesn't fix the problem - if an attacker boots a kernel that doesn't cap a PCR, they can just cap it themselves from userland. So, we're faced with the same problem: booting an older kernel allows the system to be placed in an identical state to the current kernel, and a fake hibernation image can be written out. Solving this required a PCR that was being modified after kernel code was running, but before userland was started, even with existing kernels.

Thankfully, there is one! PCR 5 is defined as containing measurements related to boot management configuration and data. One of the measurements it contains is the result of the UEFI ExitBootServices() call. ExitBootServices() is called at the transition from the UEFI boot environment to the running OS, and the kernel contains code that executes before it. So, if we measure an assertion regarding whether or not we support restricted access to PCR 23 into PCR 5 before we call ExitBootServices(), this will prevent userspace from spoofing us (because userspace will only be able to extend PCR 5 after the firmware extended PCR 5 in response to ExitBootServices() being called). Obviously this depends on the firmware actually performing the PCR 5 extension when ExitBootServices() is called, but if firmware's out of spec then I don't think there's any real expectation of it being secure enough for any of this to buy you anything anyway.

My current tree is here, but there's a couple of things I want to do before submitting it, including ensuring that the key material is wiped from RAM after use (otherwise it could potentially be scraped out and used to generate another image afterwards) and, uh, actually making sure this works (I no longer have the machine I was previously using for testing, and switching my other dev machine over to TPM 2 firmware is proving troublesome, so I need to pull another machine out of the stack and reimage it).

[1] The linear nature of time makes feature development much more frustrating

comment count unavailable comments

Can you help with bulk storage firmware updates?

Posted by Richard Hughes on December 12, 2021 02:23 PM

Does anyone have any examples of peripheral devices that can have their firmware upgraded by dropping a new firmware file onto a mounted volume? e.g. insert device, new disk appears, firmware file is copied over, then the firmware update completes?

Could anyone with a device that supports firmware upgrade using bulk storage please fill in my 2 minute questionnaire? I’m trying to create a UF2-compatible plugin to fwupd and need data to make sure it’s suitable for all vendors and devices. The current pull request is here, but I have no idea if it is suitable yet. Thanks!

gtk3: calc autofilter as GtkPopover

Posted by Caolán McNamara on December 10, 2021 08:35 PM

 

 

How calc's autofilter looks today under gtk3 + wayland in trunk towards LibreOffice 7.4

New generation of Fedora MediaWriter

Posted by Evzen Gasta on December 10, 2021 01:43 PM

Welcome on my blog page, my name is Evžen Gasta. I live In Czech Republic and I’m studying FIT VUT in Brno.

I would like to share my news and updates about my development of new generation of Fedora MediaWriter (shortly FMW) as my Bachelor’s thesis.

Current version of FMW is running on Qt5, because this version qml doesn’t support native look on Mac and Windows, FMW looks on all platforms the same. At the beginning of the year 2021 was released new major version Qt. This version comes with many usefull news, such as support CMake as default build system, support of new version C++17 and support of native styles in qml…

<figure class="wp-container-12 wp-block-gallery-11 wp-block-gallery has-nested-images columns-default is-cropped" data-carousel-extra="{"blog_id":200697502,"permalink":"https:\/\/egastablog.wordpress.com\/2021\/12\/10\/first-post\/"}"> <figure class="wp-block-image size-large"><figcaption>Current Main Page FMW</figcaption></figure> <figure class="wp-block-image size-large"><figcaption>Current selection of architecture</figcaption></figure> </figure>

As result new generation of FMW will run on Qt6 and will come with Adwaita theme only for Linux. Mac and Windows will be used native styles of Controls. Probably also fix some current issues after returning back full functionality later.

<figure class="wp-container-14 wp-block-gallery-13 wp-block-gallery has-nested-images columns-default is-cropped" data-carousel-extra="{"blog_id":200697502,"permalink":"https:\/\/egastablog.wordpress.com\/2021\/12\/10\/first-post\/"}"> <figure class="wp-block-image size-large"><figcaption>Mockups of new FMW</figcaption></figure> </figure>

I have started development already and at the time of writing, FMW has prototype look and is able to restore drive with live Fedora. Also can write selected .iso file to drive and download file, on the other hand you can’t see progress of writing or download for now. After new year should be available preview version on Linux.

<figure class="wp-container-16 wp-block-gallery-15 wp-block-gallery has-nested-images columns-default is-cropped" data-carousel-extra="{"blog_id":200697502,"permalink":"https:\/\/egastablog.wordpress.com\/2021\/12\/10\/first-post\/"}"> <figure class="wp-block-image size-large"><figcaption>Main page with inserted disk to restore</figcaption></figure> <figure class="wp-block-image size-large"><figcaption>Version page</figcaption></figure> </figure>
<figure class="wp-container-20 wp-block-gallery-19 wp-block-gallery has-nested-images columns-default is-cropped" data-carousel-extra="{"blog_id":200697502,"permalink":"https:\/\/egastablog.wordpress.com\/2021\/12\/10\/first-post\/"}"> <figure class="wp-block-image size-large"><figcaption>Write page with selecting .iso file</figcaption></figure> <figure class="wp-block-image size-large"><figcaption>Download / Write page</figcaption></figure> </figure>

In college, we usually don’t develop such a complex aplications and if we do so, it is from scratch. So, when I started this project, it was quite confusing to orient in. But now I’m quite enjoying development of qml, it is fast and easy to learn and doesn’t need to be compiled, like C, or Java… On the other hand integrating with C++ especially understanding, which function is usefull can be sometimes more time consuming, than I would like.

The final version, with full functionality should be finished by May. If the development goes well, I’ll try update drive restoring on Windows. Because FMW is currently used mostly on Windows.

Stay tuned for more updates coming soon.

Pango updates

Posted by Matthias Clasen on December 04, 2021 01:19 AM

I was hoping to wrap up my Pango work after the previous update, but unexpected trouble came in from the side – Benjamin made GtkLabel more serious about height-for-width, and that uncovered some inaccuracies in Pango’s line wrapping implementation. Sometimes, we would make our lines shorter than necessary, and sometimes, we would let a hyphen leak out of the allotted width, creating an overlong line.

Fixing all this up took some serious effort,  but I think it was time well spent. One of the outcomes is that Pango now has APIs to serialize PangoLayout objects, and these are used  in the testsuite.

A Layout Editor

To get some (visual) insight into what was going wrong with line breaking, I wrote a quick utility called layout-editor. This is how it looks:

It lets you tweak all the parameters of a PangoLayout object and shows you the results of your changes. It can also show details about pango’s analysis of the text. And it can overlay extra information, such as extents of lines, runs, glyphs, caret positions, and more.

Since the layout editor also uses the serialization APIs to load and save your layouts, you can directly use it to inspect the test cases in Pango’s testsuite and create new ones. This should help improve test coverage, going forward.

If you want to gain more insight into what is happening inside Pango,  this tool might be for you.

Better Tabs

With this new tool in hands, I felt the urge to see if it can help for feature development too. One long-standing feature gap in Pango is the lack of support for tab stops with alignments other than left.

Amazingly, an almost 15 year old patch found in this bug still mostly applied, and worked, after some small adaptations. The new tool was indeed very helpful in working out some of the finer points.

If you always felt like you should be able to line up numbers properly at their decimal point, instead of picking a monospace font and hoping for the best, voila! now you can.

Wrapping up

All of this is available in Pango 1.50. Enjoy

Firmware “Best Known Configuration” in fwupd

Posted by Richard Hughes on November 29, 2021 11:45 AM

I’ve just deployed some new functionality to the LVFS adding support for component <tag>s. These are used by server vendors to identify a known-working (or commercially supported) set of firmware on the machine. This is currently opt-in for each vendor to avoid the UI clutter on the components view, and so if you’re a vendor reading this post and realize you want this feature, let me know and it’s two clicks on the admin panel.

The idea is that when provisioning the machine, we can set HostBkc=vendor-2021q1 in /etc/fwupd/daemon.conf and then any invocation of fwupdmgr sync-bkc will install or downgrade firmware on all compatible devices (UEFI, RAID, network adapter, & SAS HBA etc.) to make the system match a compatible set. This allows two things:

  • Factory recovery where a system in the field has been upgraded
  • Ensuring a consistent set of vendor-tested firmware for a specific workload

The tags are either assigned in the archive firmware.metainfo.xml file or added post-upload on the LVFS and are then included in the public AppStream metadata. A single firmware can be marked with multiple tags, and tags can be duplicated for different firmwares. This would allow a server vendor to say “this set of firmware has been tested as a set for workload A, and this other set of firmware has been tested for workload B” which is somewhat odd for us consumer-types, but seems to be pretty normal for enterprise deployments.

As a bonus feature, updating or downgrading firmware away from the “Best Known Configuration” is allowed, but we’ll show a semi-scary warning. Using fwupdmgr sync-bkc will undo any manual changes and bring the machine back to the BKC. Needless to say fwupd will not ship with a configured BKC.

We’ll include this somewhat-niche-but-required feature with fwupd 1.7.3 which will hopefully be released before Christmas. Questions and comments welcome.

New LVFS redirect behavior

Posted by Richard Hughes on November 25, 2021 04:20 PM

tl;dr: if you’re using libfwupd to download firmware, nothing changes and everything continues as before. If you’re using something like wget that doesn’t follow redirects by default you might need to add a command line argument to download firmware from the LVFS.

Just a quick note to explain something that some people might have noticed; if you’re using fwupd >= 1.6.1 or >= 1.5.10 when you connect to the LVFS to download a firmware file you actually get redirected to the same file on the CDN. e.g. downloading https://fwupd/download/foo.cab gets a redirect to https://cdn.fwupd/download/foo.cab which is then streamed to the user. Why this insanity?

As some of you know, egress charges from AWS are insanely high. The Linux Foundation are the kind people that kindly pay the LVFS bill every month, and a 4 years ago that was just a few hundred dollars and that was a rounding error to them. Last year we again grew at more than 100% and the projection for next year is going to surpass even that; the average size of firmware files has gone from ~30MB to ~50MB with much, much, larger server firmware in the pipeline. We certainly can’t watch the egress bill scale linearly with the LVFS popularity, else some accountant at the Linux Foundation is going to start asking questions – especially when Fastly provides the LF a geo-replicated CDN – which we’re not using.

So why don’t we put the CDN URL in the XML metadata directly, and then avoid all this redirect complexity altogether? This time the lawyers get us, as we’re required by US law to restrict distribution of some firmware to some countries on an embargo list. It’s very complicated, and it varies by vendor, but it’s not something we can avoid. So for this reason, the LVFS does a GeoIP lookup on the client IP, and if it’s all okay we then redirect the client to the CDN-cached version. It also lets us tell the vendor how many times the firmware has been downloaded without importing the CDN logs every 24 hours – which would be even harder as we only keep them for a short time for privacy reasons.

Introducing GNOME Crosswords

Posted by Jonathan Blandford on November 18, 2021 08:00 AM

GNOME Crosswords

Howdy folks! I want to announce a game for GNOME that I’ve been working on for a few months.

I’ve always enjoyed solving Crossword puzzles. It’s something I grew up doing as a kid, and we continue to do them as a family at the dinner table at night. I’ve wanted to try my hand at writing crosswords for a while, but there isn’t really a good tool available for doing so, and certainly no free software ones that work well with a recent GNOME release. I recently bought myself a lovely new Fedora-loaded Lenovo, and after it arrived, I thought I’d take a shot at writing such a tool.

Over the past four months or so I managed to get something worth releasing. The code is available here. It should build on relatively recent Linux distributions, though it does need libadwaita from git (toasts!). I also put together a flatpak file for testing here (no repo yet, as getting that set up defeated me). Once I’m more confident that the puzzles are solvable and fun I plan to publish it to flathub.

<figure aria-describedby="caption-attachment-6652" class="wp-caption aligncenter" id="attachment_6652" style="width: 480px">A dog's day<figcaption class="wp-caption-text" id="caption-attachment-6652">Non-traditional grid</figcaption></figure> <figure aria-describedby="caption-attachment-6649" class="wp-caption aligncenter" id="attachment_6649" style="width: 480px">Guardian cryptic No 28,605<figcaption class="wp-caption-text" id="caption-attachment-6649">The Guardian Daily Cryptic with reveal answers enabled</figcaption></figure>

Features:

It’s still early, but it already has some fun features:

  • Puzzle Sets. The heart of the game is the Puzzle Set. It’s a collection of crossword puzzles that are tied together by a theme. Solving a puzzle unlocks more puzzles. I currently have one puzzle set (“Cats and Dogs”) with nine puzzles in it, but I have a few more puzzle sets planned. It contains mostly traditional puzzles, but I threw in a cryptic to keep people on their toes.
  • Nontraditional shapes and styles: I wanted to make something a bit little more whimsical and fun, as well as the more traditional puzzle grids. So I added support for colors and shapes as well. My son had fun doing pixel art to create some of the grids.
  • Reveal mistakes: For when you get stuck! It also supports checksums for puzzles that don’t include the solution.
  • Scalable grid: Currently the UI only exposes four sizes, but we have all the pieces to scale crosswords to different sizes.
  • Support for the .ipuz spec: This spec supports a ton of things, and I don’t support it fully yet, but most of the crossword part of the spec is included. There aren’t a ton of .ipuz files floating around, but you can use puzzlepull to download the Guardian Daily puzzle if you want to try some other examples.
<figure aria-describedby="caption-attachment-6691" class="wp-caption aligncenter" id="attachment_6691" style="width: 261px">Puzzle Set<figcaption class="wp-caption-text" id="caption-attachment-6691">The first Puzzle Set</figcaption></figure>

Crossword Editor

GNOME Crosswords Editor

As part of building this app, I realized that creating grids was as big a part of the app as writing the actual game itself. To facilitate that, I started writing a crossword editor as well. It’s in the early stages, but it already has one of the most important features: a tool to create the initial grid. Making puzzles that fit well together is surprisingly hard. To make it easier, I wrote a crossword solver that quickly suggests words to fill in the grid. I’m proud of the design – it’s able to efficiently suggest options out of a list of 500K words really quickly (<1 μs on my machine). I was able to use it to build an autofill dialog that can recursively fill in a section of the puzzle when making a grid.

I still have more work to do on the editor and it’s clear that the autofill dialog isn’t a panacea, but it helped me figure out out some tricky corners. Here’s a video of the autofill dialog in action:

<video class="wp-video-shortcode" controls="controls" height="473" id="video-6632-1" preload="metadata" width="840"><source src="https://blogs.gnome.org/jrb/files/2021/11/Screencast-from-11-16-2021-071758-PM.webm?_=1" type="video/webm">https://blogs.gnome.org/jrb/files/2021/11/Screencast-from-11-16-2021-071758-PM.webm</video>

Thanks

I especially want to thank Rosanna and my kids for play-testing this and suggesting clues, as well as their patience while I was writing it. Thanks also to Federico for giving great advice, great code, and for being a star. Matthias for helping me relearn GTK and explaining GtkIMContext. Also, the example code in GNOME Builder was immensely helpful for getting this started.

What’s next?

There are a ton of features I’d like to add to this game. It really needs printing support, which should be relatively easy. I’d also love to see it get internationalized (and not just translated) – are crosswords in non-Latin languages a thing? And I’ve seen enough of Benjamin‘s GUADEC presentations over the years to know GTK can do something cooler than popup a dialog when you finish a crossword.

But the most important thing is that the game needs to be fun! For that, we need more puzzles and the existing puzzles need to be better. If you’re interested in joining me in creating a good set of puzzles for Linux, try the game out and let me know.

Pango Updates

Posted by Matthias Clasen on November 10, 2021 04:07 AM

Here is another update on what will appear in Pango 1.50 soon. Quite a few things, as it turns out. (This is a continuation of my last Pango update)

Bidi Improvements

Pango has long claimed to have good support for bidirectional text rendering and editing. But until recently, there were some annoying bugs: you could end up in loops when moving the cursor through mixed-direction text.

The relevant code has been rewritten, closing a very old bug (#157).

Useful Information

Modern fonts can contain a lot of different information – there can be colormaps, and glyphs can be represented not just as splines, but also as pngs or svgs.

For GTK, an important piece of information for each glyph is whether it is colored or monochrome. With the new PangoGlyphVisAttr.is_color field,  GTK no longer needs to poke directly at the font data to find out.

Another case where GTK needed data that wasn’t available through Pango APIs has been closed with pango_font_get_languages(). This data is used in the font chooser filter popup:

Superscripts and subscripts

Superscripts and subscripts now use font metrics information for size changes and baseline shift.  They can also be properly nested, so markup like

2<sup>2<sup>2</sup></sup>

yields the expected rendering:

Customizing segmentation

Pango determines word and sentence boundaries according to the Unicode Text Segmentation Spec (TR29). The specification can’t deal with all the complexities of formatted text, so sometimes its results need tailoring, which can now be done with the new word and sentence attributes:

$ pango-segmentation --kind=word --text=\
    "<span segment='word'>1-based</span> index"

|1-based| |index|

Better Markup

Besides the new attributes that have been mentioned, Pango markup now lets you specify many attributes in more natural units.

For example,  you can now say

<span font_size='12.5pt'>

instead of

<span font_size='12800'>

Small Caps

Pango has had the PangoVariant enumeration with its PANGO_VARIANT_SMALL_CAPS value since forever, but it has never done anything. Since we’ve added support for OpenType features (in Pango 1.37), it has been possible to produce Small Caps by using the smcp=1 feature, if the font supports it. Sadly, most fonts don’t.

With the text transformation and font scaling infrastructure now in place, it was easy to emulate Small Caps for fonts that don’t  support this natively:

While at it, we’ve expanded the PangoVariant enumeration to cover all the CSS casing variants, and made the GTK CSS engine use them instead of OpenType features.

Better debugging

Pango ships with a versatile test utility called pango-view, which has options to test many of Pangos layout features. It has recently learned to show more auxiliary information in visual form, like glyph extents, caret positions and slopes:

$ pango-view --text Boxes \ --annotate=glyph,caret,slope

Enjoy!

Firefox 94 comes with EGL on X11

Posted by Martin Stransky on October 30, 2021 12:05 PM
<figure class="wp-block-image size-large"><figcaption>(In)Famous WebGL Aquarium demo. EGL brings you more fish 🙂</figcaption></figure>

Firefox 94 is coming out next week and brings awesome news. OpenGL EGL backend its enabled by default on Intel/AMD and recent Mesa for users on X11.

This project has been driven by Robert Mader (most of the EGL work), Andrew Osmond (glxtest fixes and config), Jamie Nicol (EGL/Android and partial damage), Greg V (partial damage support) and Jan Ikenmeyer (Darkspirit) (help with issues, testing).

Historically Linux comes with GLX (OpenGL X11 extension) but that era is finally ending and we’re moving forward to EGL which promises all the goodness you can expect from modern graphics subsystem…or you can create a texture over graphics memory at least 🙂

I’ll keep aside all EGL / GLX difference and focus to changes from user perspective. GLX is old, well debugged and tied closely to X11 which means seamless experience and wide support by gfx drivers (like proprietary NVIDIA ones). It’s used by most X11 applications and ‘just works’.

EGL is ‘new’ from Linux desktop perspective and used mainly by Wayland, Android and various small devices. It’s not fully supported by all desktop drivers and has glitches (broken rendering of transparent windows for instance). But as Wayland is gaining momentum also EGL is getting more attention and fixes on Linux desktop.

And why we actually want EGL? Because it gives us a cool toy – EGLImages (and EGLFence). EGLImage is an object which is created over a piece of GPU memory (which can be DMA-Buf), shared with different process, used as a frame buffer (target of GL rendering) or a texture (source of GL rendering).

EGLImage allows to use GPU memory in a very creative way. VA-API decoded video frames or WebGL scenes can be mapped as EGLImages, moved from decode process to rendering process and used as a texture. EGLFence allows to lock EGLImages across process so we don’t re-paint WebGL scene while it’s used in a different process or recycle VA-API video frames too early.

And what you can expect from EGL in Firefox? Faster WebGL rendering (used on Google Maps for instance), more effective rendering due to to partial damage support and potential VA-API video decoding (that’s blocked by Bug 1698778 on both Wayland and X11). It also unifies rendering path for Wayland and X11, which means X11 will gain features done for Wayland (suspended rendering for invisible windows, better VSync support and more).

NVIDIA is also working on EGL & DMA-Buf support in their proprietary drivers so there’s a hope for owners of such FOSS unfriendly hardware.

So give Firefox 94 a try. If anything goes wrong, please file a bug. You can also disable EGL and switch back to GLX. Go to about:config page and flip gfx.x11-egl.force-disabled preference and restart browser.

GTK4: Toolbars in Sidebar

Posted by Caolán McNamara on October 28, 2021 04:28 PM

 

GTK4 port of Libreoffice now supports the "widebutton" Toolbar MenuButtons that show a preview of the selected color.

PSA: gnome-settings-daemon's MediaKeys API is going away

Posted by Bastien Nocera on October 20, 2021 12:12 PM

 In 2007, Jan Arne Petersen added a D-Bus API to what was still pretty much an import into gnome-control-center of the "acme" utility I wrote to have all the keys on my iBook working.

It switched the code away from remapping keyboard keys to "XF86Audio*", to expecting players to contact the D-Bus daemon and ask to be forwarded key events.

 

Multimedia keys circa 2003

In 2013, we added support for controlling media players using MPRIS, as another interface. Fast-forward to 2021, and MPRIS support is ubiquitous, whether in free software, proprietary applications or even browsers. So we'll be parting with the "org.gnome.SettingsDaemon.MediaKeys" D-Bus API. If your application still wants to work with older versions of GNOME, it is recommended to at least quiet the MediaKeys API's unavailability.

 

Multimedia keys in 2021
 

TL;DR: Remove code that relies on gnome-settings-daemon's MediaKeys API, make sure to add MPRIS support to your app.

Firefox Wayland development in 2021

Posted by Martin Stransky on October 01, 2021 08:45 AM
<figure class="aligncenter size-large"><figcaption>I swear, no more crashes on Wayland!</figcaption></figure>

It’s been long time from my last update about Firefox news on Linux and I’ve finally got some time to sum up what we’ve been working on for last year and what’s coming. There haven’t been introduced any new exciting features (from Linux perspective) for the last year but rather a hidden but important changes.

From Linux desktop developers perspective 2021 is a year of Wayland. KDE has been shipping decent Wayland compositor which becomes default for Fedora 34. It’s actually pretty fast and gives you smooth feeling of “good old times” with X11/Gtk2/name-your-favorite environment where any graphics change was just instant without lags or slow transitions. I must mention Robert Mader who created a new Firefox Wayland SW backend for the KDE.

As major desktop distro (Ubuntu) is slowly moving to Wayland we’re getting more and more Wayland/Firefox users. Even notorious troublemaker (NVIDIA) decided to step in and support it.

What’s done for next releases?

It’s good that Wayland market share is rising but we also need to make sure that Firefox is ready to run there without any major issues and matches its X11 variant. There are two major areas where Firefox is behind its X11 counterpart – clipboard and popup handling. It’s given by some Wayland protocol features where we can’t simply duplicate the X11 code here.

Clipboard on Wayland is similar to X11 one but we need to translate Wayland asynchronous clipboard to Firefox/Web synced one. I tried various approaches but the best one seems to be just use the asynchronous Wayland clipboard as is and implement some kind of abstraction over it. That was implemented in Firefox 93 an it’s going to be shipped by default in Firefox 94.

On the other hand popups are the most annoying thing we have to implement on Wayland. Firefox just expect any popup can be created any time without its parent (or use main window as a parent) but Wayland requests strict popup hierarchy. It means every window can have only one child popup. When more than one popup is opened it has to be attached to the previously opened popup which becomes a parent for it. And when any popup in the chain is closed the popups must be rearranged to keep the chain connected. This involves all kind of popups like content menus, tooltips, extensions popups, permissions popups and so on. Plus there are some interesting bugs in Wayland protocol or Gtk so excitement/frustration is guaranteed and basic popup implementation becomes extraordinary challenge where small changes can introduce various regressions. Despite the ‘fun with popups’ the popup tracker is almost clear and we’ll ship it in Firefox 94.

One of main Wayland feature is support of monitors with various DPI/scale factor together. Fedora default compositor Mutter shows here a creative approach and reports screen sizes differently than other compositors. As we really want to know screen sizes Firefox tracks monitor changes from Wayland directly and find correct screen by matching screen left top corner point – which fortunately stays stays same for all compositor. We also stop painting Firefox window when screen scale changes so you should enjoy seamless experience on systems with various screen sizes with Firefox 93.

Future plans for Firefox 95

Firefox 95 development cycle begins next ween and I’m going to look to drag and drop which has been partially broken for long long time. Some Wayland specific fixes are already in Firefox 94 but we need to rework some parts to correctly copy files from remote destinations (like inbox) to local filesystems, fix names of dropped files or do tab preview of moved tabs. There are also new interesting compositors bugs as usually 🙂

Future plans for Firefox 96

Firefox Wayland port is generally done and there isn’t any big difference between X11 and Wayland variant at least on GNOME which Fedora uses as default environment. We’re fixing minor bugs and keep eye on user reports.

For next quarter I’d like to look at GPU process for Wayland. GPU process is running tasks related to graphics hardware and shields browser from HW drivers crashes. It’s also place where VAAPI video decoding should be run and will be properly sandboxed there (right now VAAPI is run in content process along general Firefox code, it’s restricted by content sanbox which leads to various VAAPI code crashes and failures).

Reducing the effectiveness of a safety feature

Posted by Richard Hughes on September 23, 2021 03:37 PM

We just purchased a 2021 KIA eNIRO to use as our family car. As typical with EVs, this has to produce a fake engine noise to avoid squashing the hard of hearing (or animals) not expecting two tons of metal to be silently moving. When we test drove a 2020 car last year, there was a physical button on the dash to turn this off, on the premise that the noise sometimes isn’t required or appropriate, but it always defaulted to “on” for every start. As the car gets faster the noise also increases in volume, until after about 30km/h fading to nothing. In reverse the same thing happens with some additional beeps. Getting our 2021 car this year, the button was no longer present and the less-than-affectionately known VESS module cannot be muted. I can guess why, someone probably turned it off and squashed something or someone and someone in the UK/US/EU government understandably freaked out. KIA also removed the wire from the wiring loom, and won’t sell the 2020 button cluster, so it’s not even like you can retrofit a new car to act like the old one.

To be super clear: I don’t have a problem with the VESS noise, but because the “speaker” is in the front bumper the solution for going backwards is “turn up the volume”. Living in London means that houses are pretty close together and me reversing into the drive at 2mph shouldn’t submit the house opposite with a noise several times louder than a huge garbage truck. The solution in the various KIA owner forums seems to be “just unplug the VESS module” but this seems at best unethical and probably borderline illegal given it’s a device with the express purpose of trying to avoid hurting someone with your 2 ton lump of metal.

Given VESS is, as you might expect, just another device on the CAN bus and people have reverse engineered the command stream so you can actually just plug the VESS module into a USB device (with a CAN converter) and play with it yourself. My idea would be to make a modchip-like device that plugs into the VESS module using the existing plug, and basically MITM the CAN messages. All messages going from VESS back to the ECU get allow-listed (even though the ECU doesn’t seem to care if the VESS goes AWOL…) and any speed measurements going forward also get passed straight through. The clever part would be to MITM the speed when the “reverse gear” command has been issued, so that the car thinks it’s going about 20km/h backwards. This makes the VESS still make the engine and beeping noise but it’s only about as loud as the VESS module when going forwards when outside the car.

Technically this is quite easy, VESS->txrxCAN->MCU->txrxCAN->ECU and you can probably use an inexpensive Microchip reference board for a prototype. My question is more if this would:

  1. Be ethical
  2. Be legal
  3. Invalidate my insurance
  4. Invalidate the warranty of my nice new shiny car

Feedback very welcome!

What's new in XI 2.4 - touchpad gestures

Posted by Peter Hutterer on September 23, 2021 05:26 AM

After a nine year hiatus, a new version of the X Input Protocol is out. Credit for the work goes to Povilas Kanapickas, who also implemented support for XI 2.4 in the various pieces of the stack [0]. So let's have a look.

X has had touch events since XI 2.2 (2012) but those were only really useful for direct touch devices (read: touchscreens). There were accommodations for indirect touch devices like touchpads but they were never used. The synaptics driver set the required bits for a while but it was dropped in 2015 because ... it was complicated to make use of and no-one seemed to actually use it anyway. Meanwhile, the rest of the world moved on and touchpad gestures are now prevalent. They've been standard in MacOS for ages, in Windows for almost ages and - with recent GNOME releases - now feature prominently on the Linux desktop as well. They have been part of libinput and the Wayland protocol for years (and even recently gained a new set of "hold" gestures). Meanwhile, X was left behind in the dust or mud, depending on your local climate.

XI 2.4 fixes this, it adds pinch and swipe gestures to the XI2 protocol and makes those available to supporting clients [2]. Notably here is that the interpretation of gestures is left to the driver [1]. The server takes the gestures and does the required state handling but otherwise has no decision into what constitutes a gesture. This is of course no different to e.g. 2-finger scrolling on a touchpad where the server just receives scroll events and passes them on accordingly.

XI 2.4 gesture events are quite similar to touch events in that they are processed as a sequence of begin/update/end with both types having their own event types. So the events you will receive are e.g. XIGesturePinchBegin or XIGestureSwipeUpdate. As with touch events, a client must select for all three (begin/update/end) on a window. Only one gesture can exist at any time, so if you are a multi-tasking octopus prepare to be disappointed.

Because gestures are tied to an indirect-touch device, the location they apply at is wherever the cursor is currently positioned. In that, they work similar to button presses, and passive grabs apply as expected too. So long-term the window manager will likely want a passive grab on the root window for swipe gestures while applications will implement pinch-to-zoom as you'd expect.

In terms of API there are no suprises. libXi 1.8 is the version to implement the new features and there we have a new XIGestureClassInfo returned by XIQueryDevice and of course the two events: XIGesturePinchEvent and XIGestureSwipeEvent. Grabbing is done via e.g. XIGrabSwipeGestureBegin, so for those of you with XI2 experience this will all look familiar. For those of you without - it's probably no longer worth investing time into becoming an XI2 expert.

Overall, it's a nice addition to the protocol and it will help getting the X server slightly closer to Wayland for a widely-used feature. Once GTK, mutter and all the other pieces in the stack are in place, it will just work for any (GTK) application that supports gestures under Wayland already. The same will be true for Qt I expect.

X server 21.1 will be out in a few weeks, xf86-input-libinput 1.2.0 is already out and so are xorgproto 2021.5 and libXi 1.8.

[0] In addition to taking on the Xorg release, so clearly there are no limits here
[1] More specifically: it's done by libinput since neither xf86-input-evdev nor xf86-input-synaptics will ever see gestures being implemented
[2] Hold gestures missed out on the various deadlines

An Xorg release without Xwayland

Posted by Peter Hutterer on September 22, 2021 11:00 PM

Xorg is about to released.

And it's a release without Xwayland.

And... wait, what?

Let's unwind this a bit, and ideally you should come away with a better understanding of Xorg vs Xwayland, and possibly even Wayland itself.

Heads up: if you are familiar with X, the below is simplified to the point it hurts. Sorry about that, but as an X developer you're probably good at coping with pain.

Let's go back to the 1980s, when fashion was weird and there were still reasons to be optimistic about the future. Because this is a thought exercise, we go back with full hindsight 20/20 vision and, ideally, the winning Lotto numbers in case we have some time for some self-indulgence.

If we were to implement an X server from scratch, we'd come away with a set of components. libxprotocol that handles the actual protocol wire format parsing and provides a C api to access that (quite like libxcb, actually). That one will just be the protocol-to-code conversion layer.

We'd have a libxserver component which handles all the state management required for an X server to actually behave like an X server (nothing in the X protocol require an X server to display anything). That library has a few entry points for abstract input events (pointer and keyboard, because this is the 80s after all) and a few exit points for rendered output.

libxserver uses libxprotocol but that's an implementation detail, we can ignore the protocol for the rest of the post.

Let's create a github organisation and host those two libraries. We now have: http://github.com/x/libxserver and http://github.com/x/libxprotocol [1].

Now, to actually implement a working functional X server, our new project would link against libxserver hook into this library's API points. For input, you'd use libinput and pass those events through, for output you'd use the modesetting driver that knows how to scream at the hardware until something finally shows up. This is somewhere between outrageously simplified and unacceptably wrong but it'll do for this post.

Your X server has to handle a lot of the hardware-specifics but other than that it's a wrapper around libxserver which does the work of ... well, being an X server.

Our stack looks like this:


+------------------------+
| xserver [libxserver]|--------[ X client ]
| |
|[libinput] [modesetting]|
+------------------------+
| kernel |
+------------------------+
Hooray, we have re-implemented Xorg. Or rather, XFree86 because we're 20 years from all the pent-up frustratrion that caused the Xorg fork. Let's host this project on http://github.com/x/xorg

Now, let's say instead of physical display devices, we want to render into an framebuffer, and we have no input devices.


+------------------------+
| xserver [libxserver]|--------[ X client ]
| |
| [write()] |
+------------------------+
| some buffer |
+------------------------+
This is basically Xvfb or, if you are writing out PostScript, Xprint. Let's host those on github too, we're accumulating quite a set of projects here.

Now, let's say those buffers are allocated elsewhere and we're just rendering to them. And those buffer are passed to us via an IPC protocol, like... Wayland!


+------------------------+
| xserver [libxserver]|--------[ X client ]
| |
|input events [render]|
+------------------------+
| |
+------------------------+
| Wayland compositor |
+------------------------+
And voila, we have Xwayland. If you swap out the protocol you can have Xquartz (X on Macos) or Xwin (X on Windows) or Xnext/Xephyr (X on X) or Xvnc (X over VNC). The principle is always the same.

Fun fact: the Wayland compositor doesn't need to run on the hardware, you can play display server matryoshka until you run out of turtles.

In our glorious revisioned past all these are distinct projects, re-using libxserver and some external libraries where needed. Depending on the projects things may be very simple or get very complex, it depends on how we render things.

But in the end, we have several independent projects all providing us with an X server process - the specific X bits are done in libxserver though. We can release Xwayland without having to release Xorg or Xvfb.

libxserver won't need a lot of releases, the behaviour is largely specified by the protocol requirements and once you're done implementing it, it'll be quite a slow-moving project.

Ok, now, fast forward to 2021, lose some hindsight, hope, and attitude and - oh, we have exactly the above structure. Except that it's not spread across multiple independent repos on github, it's all sitting in the same git directory: our Xorg, Xwayland, Xvfb, etc. are all sitting in hw/$name, and libxserver is basically the rest of the repo.

A traditional X server release was a tag in that git directory. An XWayland-only release is basically an rm -rf hw/*-but-not-xwayland followed by a tag, an Xorg-only release is basically an rm -rf hw/*-but-not-xfree86 [2].

In theory, we could've moved all these out into separate projects a while ago but the benefits are small and no-one has the time for that anyway.

So there you have it - you can have Xorg-only or XWayland-only releases without the world coming to an end.

Now, for the "Xorg is dead" claims - it's very likely that the current release will be the last Xorg release. [3] There is little interest in an X server that runs on hardware, or rather: there's little interest in the effort required to push out releases. Povilas did a great job in getting this one out but again, it's likely this is the last release. [4]

Xwayland - very different, it'll hang around for a long time because it's "just" a protocol translation layer. And of course the interest is there, so we have volunteers to do the releases.

So basically: expecting Xwayland releases, be surprised (but not confused) by Xorg releases.

[1] Github of course doesn't exist yet because we're in the 80s. Time-travelling is complicated.
[2] Historical directory name, just accept it.
[3] Just like the previous release...
[4] At least until the next volunteer steps ups. Turns out the problem "no-one wants to work on this" is easily fixed by "me! me! I want to work on this". A concept that is apparently quite hard to understand in the peanut gallery.

Authenticated Boot and Disk Encryption on Linux

Posted by Lennart Poettering on September 22, 2021 10:00 PM

The Strange State of Authenticated Boot and Disk Encryption on Generic Linux Distributions

TL;DR: Linux has been supporting Full Disk Encryption (FDE) and technologies such as UEFI SecureBoot and TPMs for a long time. However, the way they are set up by most distributions is not as secure as they should be, and in some ways quite frankly weird. In fact, right now, your data is probably more secure if stored on current ChromeOS, Android, Windows or MacOS devices, than it is on typical Linux distributions.

Generic Linux distributions (i.e. Debian, Fedora, Ubuntu, …) adopted Full Disk Encryption (FDE) more than 15 years ago, with the LUKS/cryptsetup infrastructure. It was a big step forward to a more secure environment. Almost ten years ago the big distributions started adding UEFI SecureBoot to their boot process. Support for Trusted Platform Modules (TPMs) has been added to the distributions a long time ago as well — but even though many PCs/laptops these days have TPM chips on-board it's generally not used in the default setup of generic Linux distributions.

How these technologies currently fit together on generic Linux distributions doesn't really make too much sense to me — and falls short of what they could actually deliver. In this story I'd like to have a closer look at why I think that, and what I propose to do about it.

The Basic Technologies

Let's have a closer look what these technologies actually deliver:

  1. LUKS/dm-crypt/cryptsetup provide disk encryption, and optionally data authentication. Disk encryption means that reading the data in clear-text form is only possible if you possess a secret of some form, usually a password/passphrase. Data authentication means that no one can make changes to the data on disk unless they possess a secret of some form. Most distributions only enable the former though — the latter is a more recent addition to LUKS/cryptsetup, and is not used by default on most distributions (though it probably should be). Closely related to LUKS/dm-crypt is dm-verity (which can authenticate immutable volumes) and dm-integrity (which can authenticate writable volumes, among other things).

  2. UEFI SecureBoot provides mechanisms for authenticating boot loaders and other pre-OS binaries before they are invoked. If those boot loaders then authenticate the next step of booting in a similar fashion there's a chain of trust which can ensure that only code that has some level of trust associated with it will run on the system. Authentication of boot loaders is done via cryptographic signatures: the OS/boot loader vendors cryptographically sign their boot loader binaries. The cryptographic certificates that may be used to validate these signatures are then signed by Microsoft, and since Microsoft's certificates are basically built into all of today's PCs and laptops this will provide some basic trust chain: if you want to modify the boot loader of a system you must have access to the private key used to sign the code (or to the private keys further up the certificate chain).

  3. TPMs do many things. For this text we'll focus one facet: they can be used to protect secrets (for example for use in disk encryption, see above), that are released only if the code that booted the host can be authenticated in some form. This works roughly like this: every component that is used during the boot process (i.e. code, certificates, configuration, …) is hashed with a cryptographic hash function before it is used. The resulting hash is written to some small volatile memory the TPM maintains that is write-only (the so called Platform Configuration Registers, "PCRs"): each step of the boot process will write hashes of the resources needed by the next part of the boot process into these PCRs. The PCRs cannot be written freely: the hashes written are combined with what is already stored in the PCRs — also through hashing and the result of that then replaces the previous value. Effectively this means: only if every component involved in the boot matches expectations the hash values exposed in the TPM PCRs match the expected values too. And if you then use those values to unlock the secrets you want to protect you can guarantee that the key is only released to the OS if the expected OS and configuration is booted. The process of hashing the components of the boot process and writing that to the TPM PCRs is called "measuring". What's also important to mention is that the secrets are not only protected by these PCR values but encrypted with a "seed key" that is generated on the TPM chip itself, and cannot leave the TPM (at least so goes the theory). The idea is that you cannot read out a TPM's seed key, and thus you cannot duplicate the chip: unless you possess the original, physical chip you cannot retrieve the secret it might be able to unlock for you. Finally, TPMs can enforce a limit on unlock attempts per time ("anti-hammering"): this makes it hard to brute force things: if you can only execute a certain number of unlock attempts within some specific time then brute forcing will be prohibitively slow.

How Linux Distributions use these Technologies

As mentioned already, Linux distributions adopted the first two of these technologies widely, the third one not so much.

So typically, here's how the boot process of Linux distributions works these days:

  1. The UEFI firmware invokes a piece of code called "shim" (which is stored in the EFI System Partition — the "ESP" — of your system), that more or less is just a list of certificates compiled into code form. The shim is signed with the aforementioned Microsoft key, that is built into all PCs/laptops. This list of certificates then can be used to validate the next step of the boot process. The shim is measured by the firmware into the TPM. (Well, the shim can do a bit more than what I describe here, but this is outside of the focus of this article.)

  2. The shim then invokes a boot loader (often Grub) that is signed by a private key owned by the distribution vendor. The boot loader is stored in the ESP as well, plus some other places (i.e. possibly a separate boot partition). The corresponding certificate is included in the list of certificates built into the shim. The boot loader components are also measured into the TPM.

  3. The boot loader then invokes the kernel and passes it an initial RAM disk image (initrd), which contains initial userspace code. The kernel itself is signed by the distribution vendor too. It's also validated via the shim. The initrd is not validated, though (!). The kernel is measured into the TPM, the initrd sometimes too.

  4. The kernel unpacks the initrd image, and invokes what is contained in it. Typically, the initrd then asks the user for a password for the encrypted root file system. The initrd then uses that to set up the encrypted volume. No code authentication or TPM measurements take place.

  5. The initrd then transitions into the root file system. No code authentication or TPM measurements take place.

  6. When the OS itself is up the user is prompted for their user name, and their password. If correct, this will unlock the user account: the system is now ready to use. At this point no code authentication, no TPM measurements take place. Moreover, the user's password is not used to unlock any data, it's used only to allow or deny the login attempt — the user's data has already been decrypted a long time ago, by the initrd, as mentioned above.

What you'll notice here of course is that code validation happens for the shim, the boot loader and the kernel, but not for the initrd or the main OS code anymore. TPM measurements might go one step further: the initrd is measured sometimes too, if you are lucky. Moreover, you might notice that the disk encryption password and the user password are inquired by code that is not validated, and is thus not safe from external manipulation. You might also notice that even though TPM measurements of boot loader/OS components are done nothing actually ever makes use of the resulting PCRs in the typical setup.

Attack Scenarios

Of course, before determining whether the setup described above makes sense or not, one should have an idea what one actually intends to protect against.

The most basic attack scenario to focus on is probably that you want to be reasonably sure that if someone steals your laptop that contains all your data then this data remains confidential. The model described above probably delivers that to some degree: the full disk encryption when used with a reasonably strong password should make it hard for the laptop thief to access the data. The data is as secure as the password used is strong. The attacker might attempt to brute force the password, thus if the password is not chosen carefully the attacker might be successful.

Two more interesting attack scenarios go something like this:

  1. Instead of stealing your laptop the attacker takes the harddisk from your laptop while you aren't watching (e.g. while you went for a walk and left it at home or in your hotel room), makes a copy of it, and then puts it back. You'll never notice they did that. The attacker then analyzes the data in their lab, maybe trying to brute force the password. In this scenario you won't even know that your data is at risk, because for you nothing changed — unlike in the basic scenario above. If the attacker manages to break your password they have full access to the data included on it, i.e. everything you so far stored on it, but not necessarily on what you are going to store on it later. This scenario is worse than the basic one mentioned above, for the simple fact that you won't know that you might be attacked. (This scenario could be extended further: maybe the attacker has a chance to watch you type in your password or so, effectively lowering the password strength.)

  2. Instead of stealing your laptop the attacker takes the harddisk from your laptop while you aren't watching, inserts backdoor code on it, and puts it back. In this scenario you won't know your data is at risk, because physically everything is as before. What's really bad though is that the attacker gets access to anything you do on your laptop, both the data already on it, and whatever you will do in the future.

I think in particular this backdoor attack scenario is something we should be concerned about. We know for a fact that attacks like that happen all the time (Pegasus, industry espionage, …), hence we should make them hard.

Are we Safe?

So, does the scheme so far implemented by generic Linux distributions protect us against the latter two scenarios? Unfortunately not at all. Because distributions set up disk encryption the way they do, and only bind it to a user password, an attacker can easily duplicate the disk, and then attempt to brute force your password. What's worse: since code authentication ends at the kernel — and the initrd is not authenticated anymore —, backdooring is trivially easy: an attacker can change the initrd any way they want, without having to fight any kind of protections. And given that FDE unlocking is implemented in the initrd, and it's the initrd that asks for the encryption password things are just too easy: an attacker could trivially easily insert some code that picks up the FDE password as you type it in and send it wherever they want. And not just that: since once they are in they are in, they can do anything they like for the rest of the system's lifecycle, with full privileges — including installing backdoors for versions of the OS or kernel that are installed on the device in the future, so that their backdoor remains open for as long as they like.

That is sad of course. It's particular sad given that the other popular OSes all address this much better. ChromeOS, Android, Windows and MacOS all have way better built-in protections against attacks like this. And it's why one can certainly claim that your data is probably better protected right now if you store it on those OSes then it is on generic Linux distributions.

(Yeah, I know that there are some niche distros which do this better, and some hackers hack their own. But I care about general purpose distros here, i.e. the big ones, that most people base their work on.)

Note that there are more problems with the current setup. For example, it's really weird that during boot the user is queried for an FDE password which actually protects their data, and then once the system is up they are queried again – now asking for a username, and another password. And the weird thing is that this second authentication that appears to be user-focused doesn't really protect the user's data anymore — at that moment the data is already unlocked and accessible. The username/password query is supposed to be useful in multi-user scenarios of course, but how does that make any sense, given that these multiple users would all have to know a disk encryption password that unlocks the whole thing during the FDE step, and thus they have access to every user's data anyway if they make an offline copy of the harddisk?

Can we do better?

Of course we can, and that is what this story is actually supposed to be about.

Let's first figure out what the minimal issues we should fix are (at least in my humble opinion):

  1. The initrd must be authenticated before being booted into. (And measured unconditionally.)

  2. The OS binary resources (i.e. /usr/) must be authenticated before being booted into. (But don't need to be encrypted, since everyone has the same anyway, there's nothing to hide here.)

  3. The OS configuration and state (i.e. /etc/ and /var/) must be encrypted, and authenticated before they are used. The encryption key should be bound to the TPM device; i.e system data should be locked to a security concept belonging to the system, not the user.

  4. The user's home directory (i.e. /home/lennart/ and similar) must be encrypted and authenticated. The unlocking key should be bound to a user password or user security token (FIDO2 or PKCS#11 token); i.e. user data should be locked to a security concept belonging to the user, not the system.

Or to summarize this differently:

  1. Every single component of the boot process and OS needs to be authenticated, i.e. all of shim (done), boot loader (done), kernel (done), initrd (missing so far), OS binary resources (missing so far), OS configuration and state (missing so far), the user's home (missing so far).

  2. Encryption is necessary for the OS configuration and state (bound to TPM), and for the user's home directory (bound to a user password or user security token).

In Detail

Let's see how we can achieve the above in more detail.

How to Authenticate the initrd

At the moment initrds are generated on the installed host via scripts (dracut and similar) that try to figure out a minimal set of binaries and configuration data to build an initrd that contains just enough to be able to find and set up the root file system. What is included in the initrd hence depends highly on the individual installation and its configuration. Pretty likely no two initrds generated that way will be fully identical due to this. This model clearly has benefits: the initrds generated this way are very small and minimal, and support exactly what is necessary for the system to boot, and not less or more. It comes with serious drawbacks too though: the generation process is fragile and sometimes more akin to black magic than following clear rules: the generator script natively has to understand a myriad of storage stacks to determine what needs to be included and what not. It also means that authenticating the image is hard: given that each individual host gets a different specialized initrd, it means we cannot just sign the initrd with the vendor key like we sign the kernel. If we want to keep this design we'd have to figure out some other mechanism (e.g. a per-host signature key – that is generated locally; or by authenticating it with a message authentication code bound to the TPM). While these approaches are certainly thinkable, I am not convinced they actually are a good idea though: locally and dynamically generated per-host initrds is something we probably should move away from.

If we move away from locally generated initrds, things become a lot simpler. If the distribution vendor generates the initrds on their build systems then it can be attached to the kernel image itself, and thus be signed and measured along with the kernel image, without any further work. This simplicity is simply lovely. Besides robustness and reproducibility this gives us an easy route to authenticated initrds.

But of course, nothing is really that simple: working with vendor-generated initrds means that we can't adjust them anymore to the specifics of the individual host: if we pre-build the initrds and include them in the kernel image in immutable fashion then it becomes harder to support complex, more exotic storage or to parameterize it with local network server information, credentials, passwords, and so on. Now, for my simple laptop use-case these things don't matter, there's no need to extend/parameterize things, laptops and their setups are not that wildly different. But what to do about the cases where we want both: extensibility to cover for less common storage subsystems (iscsi, LVM, multipath, drivers for exotic hardware…) and parameterization?

Here's a proposal how to achieve that: let's build a basic initrd into the kernel as suggested, but then do two things to make this scheme both extensible and parameterizable, without compromising security.

  1. Let's define a way how the basic initrd can be extended with additional files, which are stored in separate "extension images". The basic initrd should be able to discover these extension images, authenticate them and then activate them, thus extending the initrd with additional resources on-the-fly.

  2. Let's define a way how we can safely pass additional parameters to the kernel/initrd (and actually the rest of the OS, too) in an authenticated (and possibly encrypted) fashion. Parameters in this context can be anything specific to the local installation, i.e. server information, security credentials, certificates, SSH server keys, or even just the root password that shall be able to unlock the root account in the initrd …

In such a scheme we should be able to deliver everything we are looking for:

  1. We'll have a full trust chain for the code: the boot loader will authenticate and measure the kernel and basic initrd. The initrd extension images will then be authenticated by the basic initrd image.

  2. We'll have authentication for all the parameters passed to the initrd.

This so far sounds very unspecific? Let's make it more specific by looking closer at the components I'd suggest to be used for this logic:

  1. The systemd suite since a few months contains a subsystem implementing system extensions (v248). System extensions are ultimately just disk images (for example a squashfs file system in a GPT envelope) that can extend an underlying OS tree. Extending in this regard means they simply add additional files and directories into the OS tree, i.e. below /usr/. For a longer explanation see systemd-sysext(8). When a system extension is activated it is simply mounted and then merged into the main /usr/ tree via a read-only overlayfs mount. Now what's particularly nice about them in this context we are talking about here is that the extension images may carry dm-verity authentication data, and PKCS#7 signatures (once this is merged, that is, i.e. v250).

  2. The systemd suite also contains a concept called service "credentials". These are small pieces of information passed to services in a secure way. One key feature of these credentials is that they can be encrypted and authenticated in a very simple way with a key bound to the TPM (v250). See LoadCredentialEncrypted= and systemd-creds(1) for details. They are great for safely storing SSL private keys and similar on your system, but they also come handy for parameterizing initrds: an encrypted credential is just a file that can only be decoded if the right TPM is around with the right PCR values set.

  3. The systemd suite contains a component called systemd-stub(7). It's an EFI stub, i.e. a small piece of code that is attached to a kernel image, and turns the kernel image into a regular EFI binary that can be directly executed by the firmware (or a boot loader). This stub has a number of nice features (for example, it can show a boot splash before invoking the Linux kernel itself and such). Once this work is merged (v250) the stub will support one more feature: it will automatically search for system extension image files and credential files next to the kernel image file, measure them and pass them on to the main initrd of the host.

Putting this together we have nice way to provide fully authenticated kernel images, initrd images and initrd extension images; as well as encrypted and authenticated parameters via the credentials logic.

How would a distribution actually make us of this? A distribution vendor would pre-build the basic initrd, and glue it into the kernel image, and sign that as a whole. Then, for each supposed extension of the basic initrd (e.g. one for iscsi support, one for LVM, one for multipath, …), the vendor would use a tool such as mkosi to build an extension image, i.e. a GPT disk image containing the files in squashfs format, a Verity partition that authenticates it, plus a PKCS#7 signature partition that validates the root hash for the dm-verity partition, and that can be checked against a key provided by the boot loader or main initrd. Then, any parameters for the initrd will be encrypted using systemd-creds encrypt -T. The resulting encrypted credentials and the initrd extension images are then simply placed next to the kernel image in the ESP (or boot partition). Done.

This checks all boxes: everything is authenticated and measured, the credentials also encrypted. Things remain extensible and modular, can be pre-built by the vendor, and installation is as simple as dropping in one file for each extension and/or credential.

How to Authenticate the Binary OS Resources

Let's now have a look how to authenticate the Binary OS resources, i.e. the stuff you find in /usr/, i.e. the stuff traditionally shipped to the user's system via RPMs or DEBs.

I think there are three relevant ways how to authenticate this:

  1. Make /usr/ a dm-verity volume. dm-verity is a concept implemented in the Linux kernel that provides authenticity to read-only block devices: every read access is cryptographically verified against a top-level hash value. This top-level hash is typically a 256bit value that you can either encode in the kernel image you are using, or cryptographically sign (which is particularly nice once this is merged). I think this is actually the best approach since it makes the /usr/ tree entirely immutable in a very simple way. However, this also means that the whole of /usr/ needs to be updated as once, i.e. the traditional rpm/apt based update logic cannot work in this mode.

  2. Make /usr/ a dm-integrity volume. dm-integrity is a concept provided by the Linux kernel that offers integrity guarantees to writable block devices, i.e. in some ways it can be considered to be a bit like dm-verity while permitting write access. It can be used in three ways, one of which I think is particularly relevant here. The first way is with a simple hash function in "stand-alone" mode: this is not too interesting here, it just provides greater data safety for file systems that don't hash check their files' data on their own. The second way is in combination with dm-crypt, i.e. with disk encryption. In this case it adds authenticity to confidentiality: only if you know the right secret you can read and make changes to the data, and any attempt to make changes without knowing this secret key will be detected as IO error on next read by those in possession of the secret (more about this below). The third way is the one I think is most interesting here: in "stand-alone" mode, but with a keyed hash function (e.g. HMAC). What's this good for? This provides authenticity without encryption: if you make changes to the disk without knowing the secret this will be noticed on the next read attempt of the data and result in IO errors. This mode provides what we want (authenticity) and doesn't do what we don't need (encryption). Of course, the secret key for the HMAC must be provided somehow, I think ideally by the TPM.

  3. Make /usr/ a dm-crypt (LUKS) + dm-integrity volume. This provides both authenticity and encryption. The latter isn't typically needed for /usr/ given that it generally contains no secret data: anyone can download the binaries off the Internet anyway, and the sources too. By encrypting this you'll waste CPU cycles, but beyond that it doesn't hurt much. (Admittedly, some people might want to hide the precise set of packages they have installed, since it of course does reveal a bit of information about you: i.e. what you are working on, maybe what your job is – think: if you are a hacker you have hacking tools installed – and similar). Going this way might simplify things in some cases, as it means you don't have to distinguish "OS binary resources" (i.e /usr/) and "OS configuration and state" (i.e. /etc/ + /var/, see below), and just make it the same volume. Here too, the secret key must be provided somehow, I think ideally by the TPM.

All three approach are valid. The first approach has my primary sympathies, but for distributions not willing to abandon client-side updates via RPM/dpkg this is not an option, in which case I would propose the other two approaches for these cases.

The LUKS encryption key (and in case of dm-integrity standalone mode the key for the keyed hash function) should be bound to the TPM. Why the TPM for this? You could also use a user password, a FIDO2 or PKCS#11 security token — but I think TPM is the right choice: why that? To reduce the requirement for repeated authentication, i.e. that you first have to provide the disk encryption password, and then you have to login, providing another password. It should be possible that the system boots up unattended and then only one authentication prompt is needed to unlock the user's data properly. The TPM provides a way to do this in a reasonably safe and fully unattended way. Also, when we stop considering just the laptop use-case for a moment: on servers interactive disk encryption prompts don't make much sense — the fact that TPMs can provide secrets without this requiring user interaction and thus the ability to work in entirely unattended environments is quite desirable. Note that crypttab(5) as implemented by systemd (v248) provides native support for authentication via password, via TPM2, via PKCS#11 or via FIDO2, so the choice is ultimately all yours.

How to Encrypt/Authenticate OS Configuration and State

Let's now look at the OS configuration and state, i.e. the stuff in /etc/ and /var/. It probably makes sense to not consider these two hierarchies independently but instead just consider this to be the root file system. If the OS binary resources are in a separate file system it is then mounted onto the /usr/ sub-directory of the root file system.

The OS configuration and state (or: root file system) should be both encrypted and authenticated: it might contain secret keys, user passwords, privileged logs and similar. This data matters and contains plenty data that should remain confidential.

The encryption of choice here is dm-crypt (LUKS) + dm-integrity similar as discussed above, again with the key bound to the TPM.

If the OS binary resources are protected the same way it is safe to merge these two volumes and have a single partition for both (see above)

How to Encrypt/Authenticate the User's Home Directory

The data in the user's home directory should be encrypted, and bound to the user's preferred token of authentication (i.e. a password or FIDO2/PKCS#11 security token). As mentioned, in the traditional mode of operation the user's home directory is not individually encrypted, but only encrypted because FDE is in use. The encryption key for that is a system wide key though, not a per-user key. And I think that's problem, as mentioned (and probably not even generally understood by our users). We should correct that and ensure that the user's password is what unlocks the user's data.

In the systemd suite we provide a service systemd-homed(8) (v245) that implements this in a safe way: each user gets its own LUKS volume stored in a loopback file in /home/, and this is enough to synthesize a user account. The encryption password for this volume is the user's account password, thus it's really the password provided at login time that unlocks the user's data. systemd-homed also supports other mechanisms of authentication, in particular PKCS#11/FIDO2 security tokens. It also provides support for other storage back-ends (such as fscrypt), but I'd always suggest to use the LUKS back-end since it's the only one providing the comprehensive confidentiality guarantees one wants for a UNIX-style home directory.

Note that there's one special caveat here: if the user's home directory (e.g. /home/lennart/) is encrypted and authenticated, what about the file system this data is stored on, i.e. /home/ itself? If that dir is part of the the root file system this would result in double encryption: first the data is encrypted with the TPM root file system key, and then again with the per-user key. Such double encryption is a waste of resources, and unnecessary. I'd thus suggest to make /home/ its own dm-integrity volume with a HMAC, keyed by the TPM. This means the data stored directly in /home/ will be authenticated but not encrypted. That's good not only for performance, but also has practical benefits: it allows extracting the encrypted volume of the various users in case the TPM key is lost, as a way to recover from dead laptops or similar.

Why authenticate /home/, if it only contains per-user home directories that are authenticated on their own anyway? That's a valid question: it's because the kernel file system maintainers made clear that Linux file system code is not considered safe against rogue disk images, and is not tested for that; this means before you mount anything you need to establish trust in some way because otherwise there's a risk that the act of mounting might exploit your kernel.

Summary of Resources and their Protections

So, let's now put this all together. Here's a table showing the various resources we deal with, and how I think they should be protected (in my idealized world).

Resource Needs Authentication Needs Encryption Suggested Technology Validation/Encryption Keys/Certificates acquired via Stored where
Shim yes no SecureBoot signature verification firmware certificate database ESP
Boot loader yes no ditto firmware certificate database/shim ESP/boot partition
Kernel yes no ditto ditto ditto
initrd yes no ditto ditto ditto
initrd parameters yes yes systemd TPM encrypted credentials TPM ditto
initrd extensions yes no systemd-sysext with Verity+PKCS#7 signatures firmware/initrd certificate database ditto
OS binary resources yes no dm-verity root hash linked into kernel image, or firmware/initrd certificate database top-level partition
OS configuration and state yes yes dm-crypt (LUKS) + dm-integrity TPM top-level partition
/home/ itself yes no dm-integrity with HMAC TPM top-level partition
User home directories yes yes dm-crypt (LUKS) + dm-integrity in loopback files User password/FIDO2/PKCS#11 security token loopback file inside /home partition

This should provide all the desired guarantees: everything is authenticated, and the individualized per-host or per-user data is also encrypted. No double encryption takes place. The encryption keys/verification certificates are stored/bound to the most appropriate infrastructure.

Does this address the three attack scenarios mentioned earlier? I think so, yes. The basic attack scenario I described is addressed by the fact that /var/, /etc/ and /home/*/ are encrypted. Brute forcing the former two is harder than in the status quo ante model, since a high entropy key is used instead of one derived from a user provided password. Moreover, the "anti-hammering" logic of the TPM will make brute forcing prohibitively slow. The home directories are protected by the user's password or ideally a personal FIDO2/PKCS#11 security token in this model. Of course, a password isn't better security-wise then the status quo ante. But given the FIDO2/PKCS#11 support built into systemd-homed it should be easier to lock down the home directories securely.

Binding encryption of /var/ and /etc/ to the TPM also addresses the first of the two more advanced attack scenarios: a copy of the harddisk is useless without the physical TPM chip, since the seed key is sealed into that. (And even if the attacker had the chance to watch you type in your password, it won't help unless they possess access to to the TPM chip.) For the home directory this attack is not addressed as long as a plain password is used. However, since binding home directories to FIDO2/PKCS#11 tokens is built into systemd-homed things should be safe here too — provided the user actually possesses and uses such a device.

The backdoor attack scenario is addressed by the fact that every resource in play now is authenticated: it's hard to backdoor the OS if there's no component that isn't verified by signature keys or TPM secrets the attacker hopefully doesn't know.

For general purpose distributions that focus on updating the OS per RPM/dpkg the idealized model above won't work out, since (as mentioned) this implies an immutable /usr/, and thus requires updating /usr/ via an atomic update operation. For such distros a setup like the following is probably more realistic, but see above.

Resource Needs Authentication Needs Encryption Suggested Technology Validation/Encryption Keys/Certificates acquired via Stored where
Shim yes no SecureBoot signature verification firmware certificate database ESP
Boot loader yes no ditto firmware certificate database/shim ESP/boot partition
Kernel yes no ditto ditto ditto
initrd yes no ditto ditto ditto
initrd parameters yes yes systemd TPM encrypted credentials TPM ditto
initrd extensions yes no systemd-sysext with Verity+PKCS#7 signatures firmware/initrd certificate database ditto
OS binary resources, configuration and state yes yes dm-crypt (LUKS) + dm-integrity TPM top-level partition
/home/ itself yes no dm-integrity with HMAC TPM top-level partition
User home directories yes yes dm-crypt (LUKS) + dm-integrity in loopback files User password/FIDO2/PKCS#11 security token loopback file inside /home partition

This means there's only one root file system that contains all of /etc/, /var/ and /usr/.

Recovery Keys

When binding encryption to TPMs one problem that arises is what strategy to adopt if the TPM is lost, due to hardware failure: if I need the TPM to unlock my encrypted volume, what do I do if I need the data but lost the TPM?

The answer here is supporting recovery keys (this is similar to how other OSes approach this). Recovery keys are pretty much the same concept as passwords. The main difference being that they are computer generated rather than user-chosen. Because of that they typically have much higher entropy (which makes them more annoying to type in, i.e you want to use them only when you must, not day-to-day). By having higher entropy they are useful in combination with TPM, FIDO2 or PKCS#11 based unlocking: unlike a combination with passwords they do not compromise the higher strength of protection that TPM/FIDO2/PKCS#11 based unlocking is supposed to provide.

Current versions of systemd-cryptenroll(1) implement a recovery key concept in an attempt to address this problem. You may enroll any combination of TPM chips, PKCS#11 tokens, FIDO2 tokens, recovery keys and passwords on the same LUKS volume. When enrolling a recovery key it is generated and shown on screen both in text form and as QR code you can scan off screen if you like. The idea is write down/store this recovery key at a safe place so that you can use it when you need it. Note that such recovery keys can be entered wherever a LUKS password is requested, i.e. after generation they behave pretty much the same as a regular password.

TPM PCR Brittleness

Locking devices to TPMs and enforcing a PCR policy with this (i.e. configuring the TPM key to be unlockable only if certain PCRs match certain values, and thus requiring the OS to be in a certain state) brings a problem with it: TPM PCR brittleness. If the key you want to unlock with the TPM requires the OS to be in a specific state (i.e. that all OS components' hashes match certain expectations or similar) then doing OS updates might have the affect of making your key inaccessible: the OS updates will cause the code to change, and thus the hashes of the code, and thus certain PCRs. (Thankfully, you unrolled a recovery key, as described above, so this doesn't mean you lost your data, right?).

To address this I'd suggest three strategies:

  1. Most importantly: don't actually use the TPM PCRs that contain code hashes. There are actually multiple PCRs defined, each containing measurements of different aspects of the boot process. My recommendation is to bind keys to PCR 7 only, a PCR that contains measurements of the UEFI SecureBoot certificate databases. Thus, the keys will remain accessible as long as these databases remain the same, and updates to code will not affect it (updates to the certificate databases will, and they do happen too, though hopefully much less frequent then code updates). Does this reduce security? Not much, no, because the code that's run is after all not just measured but also validated via code signatures, and those signatures are validated with the aforementioned certificate databases. Thus binding an encrypted TPM key to PCR 7 should enforce a similar level of trust in the boot/OS code as binding it to a PCR with hashes of specific versions of that code. i.e. using PCR 7 means you say "every code signed by these vendors is allowed to unlock my key" while using a PCR that contains code hashes means "only this exact version of my code may access my key".

  2. Use LUKS key management to enroll multiple versions of the TPM keys in relevant volumes, to support multiple versions of the OS code (or multiple versions of the certificate database, as discussed above). Specifically: whenever an update is done that might result changing the relevant PCRs, pre-calculate the new PCRs, and enroll them in an additional LUKS slot on the relevant volumes. This means that the unlocking keys tied to the TPM remain accessible in both states of the system. Eventually, once rebooted after the update, remove the old slots.

  3. If these two strategies didn't work out (maybe because the OS/firmware was updated outside of OS control, or the update mechanism was aborted at the wrong time) and the TPM PCRs changed unexpectedly, and the user now needs to use their recovery key to get access to the OS back, let's handle this gracefully and automatically reenroll the current TPM PCRs at boot, after the recovery key checked out, so that for future boots everything is in order again.

Other approaches can work too: for example, some OSes simply remove TPM PCR policy protection of disk encryption keys altogether immediately before OS or firmware updates, and then reenable it right after. Of course, this opens a time window where the key bound to the TPM is much less protected than people might assume. I'd try to avoid such a scheme if possible.

Anything Else?

So, given that we are talking about idealized systems: I personally actually think the ideal OS would be much simpler, and thus more secure than this:

I'd try to ditch the Shim, and instead focus on enrolling the distribution vendor keys directly in the UEFI firmware certificate list. This is actually supported by all firmwares too. This has various benefits: it's no longer necessary to bind everything to Microsoft's root key, you can just enroll your own stuff and thus make sure only what you want to trust is trusted and nothing else. To make an approach like this easier, we have been working on doing automatic enrollment of these keys from the systemd-boot boot loader, see this work in progress for details. This way the Firmware will authenticate the boot loader/kernel/initrd without any further component for this in place.

I'd also not bother with a separate boot partition, and just use the ESP for everything. The ESP is required anyway by the firmware, and is good enough for storing the few files we need.

FAQ

Can I implement all of this in my distribution today?

Probably not. While the big issues have mostly been addressed there's a lot of integration work still missing. As you might have seen I linked some PRs that haven't even been merged into our tree yet, and definitely not been released yet or even entered the distributions.

Will this show up in Fedora/Debian/Ubuntu soon?

I don't know. I am making a proposal how these things might work, and am working on getting various building blocks for this into shape. What the distributions do is up to them. But even if they don't follow the recommendations I make 100%, or don't want to use the building blocks I propose I think it's important they start thinking about this, and yes, I think they should be thinking about defaulting to setups like this.

Work for measuring/signing initrds on Fedora has been started, here's a slide deck with some information about it.

But isn't a TPM evil?

Some corners of the community tried (unfortunately successfully to some degree) to paint TPMs/Trusted Computing/SecureBoot as generally evil technologies that stop us from using our systems the way we want. That idea is rubbish though, I think. We should focus on what it can deliver for us (and that's a lot I think, see above), and appreciate the fact we can actually use it to kick out perceived evil empires from our devices instead of being subjected to them. Yes, the way SecureBoot/TPMs are defined puts you in the driver seat if you want — and you may enroll your own certificates to keep out everything you don't like.

What if my system doesn't have a TPM?

TPMs are becoming quite ubiquitous, in particular as the upcoming Windows versions will require them. In general I think we should focus on modern, fully equipped systems when designing all this, and then find fall-backs for more limited systems. Frankly it feels as if so far the design approach for all this was the other way round: try to make the new stuff work like the old rather than the old like the new (I mean, to me it appears this thinking is the main raison d'être for the Grub boot loader).

More specifically, on the systems where we have no TPM we ultimately cannot provide the same security guarantees as for those which have. So depending on the resource to protect we should fall back to different TPM-less mechanisms. For example, if we have no TPM then the root file system should probably be encrypted with a user provided password, typed in at boot as before. And for the encrypted boot credentials we probably should simply not encrypt them, and place them in the ESP unencrypted.

Effectively this means: without TPM you'll still get protection regarding the basic attack scenario, as before, but not the other two.

What if my system doesn't have UEFI?

Many of the mechanisms explained above taken individually do not require UEFI. But of course the chain of trust suggested above requires something like UEFI SecureBoot. If your system lacks UEFI it's probably best to find work-alikes to the technologies suggested above, but I doubt I'll be able to help you there.

rpm/dpkg already cryptographically validates all packages at installation time (gpg), why would I need more than that?

This type of package validation happens once: at the moment of installation (or update) of the package, but not anymore when the data installed is actually used. Thus when an attacker manages to modify the package data after installation and before use they can make any change they like without this ever being noticed. Such package download validation does address certain attack scenarios (i.e. man-in-the-middle attacks on network downloads), but it doesn't protect you from attackers with physical access, as described in the attack scenarios above.

Systems such as ostree aren't better than rpm/dpkg regarding this BTW, their data is not validated on use either, but only during download or when processing tree checkouts.

Key really here is that the scheme explained here provides offline protection for the data "at rest" — even someone with physical access to your device cannot easily make changes that aren't noticed on next use. rpm/dpkg/ostree provide online protection only: as long as the system remains up, and all OS changes are done through the intended program code-paths, and no one has physical access everything should be good. In today's world I am sure this is not good enough though. As mentioned most modern OSes provide offline protection for the data at rest in one way or another. Generic Linux distributions are terribly behind on this.

This is all so desktop/laptop focused, what about servers?

I am pretty sure servers should provide similar security guarantees as outlined above. In a way servers are a much simpler case: there are no users and no interactivity. Thus the discussion of /home/ and what it contains and of user passwords doesn't matter. However, the authenticated initrd and the unattended TPM-based encryption I think are very important for servers too, in a trusted data center environment. It provides security guarantees so far not given by Linux server OSes.

I'd like to help with this, or discuss/comment on this

Submit patches or reviews through GitHub. General discussion about this is best done on the systemd mailing list.

Creating Quality Backtraces for Crash Reports

Posted by Michael Catanzaro on September 19, 2021 02:41 AM

Hello Linux users! Help developers help you: include a quality backtrace taken with gdb each and every time you create an issue report for a crash. If you don’t, most developers will request that you provide a backtrace, then ignore your issue until you manage to figure out how to do so. Save us the trouble and just provide the backtrace with your initial report, so everything goes smoother. (Backtraces are often called “stack traces.” They are the same thing.)

Don’t just copy the lower-quality backtrace you see in your system journal into your issue report. That’s a lot better than nothing, but if you really want the crash to be fixed, you should provide the developers with a higher-quality backtrace from gdb. Don’t know how to get a quality backtrace with gdb? Read on.

(Note: this blog post is occasionally updated to maintain relevance and remove historical information. Last update: May 2022)

Modern Crash Reporting

Here are instructions for getting a quality backtrace for a crashing process on Fedora 35, or any other Linux-based OS that enables coredumpctl and debuginfod:

$ coredumpctl gdb
(gdb) bt full

Enter ‘c’ (continue) when required. Enter ‘y’ when prompted to enable debuginfod. When it’s done printing, press ‘q’ to quit. That’s it! That’s all you need to know. You’re done. Two points of note:

  • When a process crashes, a core dump is caught by systemd-coredump and stored for future use. The coredumpctl gdb command opens the most recent core dump in gdb. systemd-coredump has been enabled by default in Fedora since Fedora 26. (It’s also enabled by default in RHEL 8.)
  • After opening the core dump, gdb uses debuginfod to automatically download all required debuginfo packages, ensuring the generated backtrace is useful. debuginfod has been enabled by default in Fedora since Fedora 35.

Quality Linux operating systems ought to configure both debuginfod and systemd-coredump for you, so that they are running out-of-the-box. If you’re missing debuginfod or systemd-coredump, then read on to learn how to take a backtrace without these tools. It will be more complicated, of course.

systemd-coredump

If your operating system enables systemd-coredump by default, then congratulations! This makes reporting crashes much easier because you can easily retrieve a core dump for any recent crash using the coredumpctl command. For example, coredumpctl alone will list all available core dumps. coredumpctl gdb will open the core dump of the most recent crash in gdb. coredumpctl gdb 1234 will open the core dump corresponding to the most recent crash of a process with pid 1234. It doesn’t get easier than this.

Core dumps are stored under /var/lib/systemd/coredump. systemd-coredump will automatically delete core dumps that exceed configurable size limits (2 GB by default). It also deletes core dumps if your free disk space falls below a configurable threshold (15% free by default). Additionally, systemd-tmpfiles will delete core dumps automatically after some time has passed (three days by default). This ensures your disk doesn’t fill up with old core dumps. Although most of these settings seem good to me, the default 2 GB size limit is way too low in my opinion, as it causes systemd to immediately discard crashes of any application that uses WebKit. I recommend raising this limit to 20 GB by creating an /etc/systemd/coredump.conf.d/50-coredump.conf drop-in containing the following:

[Coredump]
ProcessSizeMax=20G
ExternalSizeMax=20G

The other settings are likely sufficient to prevent your disk from filling up with core dumps.

Sadly, although systemd-coredump has been around for a good while now and many Linux operating systems have it enabled by default, many still do not. Most notably, the Debian ecosystem is still not yet on board. To check if systemd-coredump is enabled on your system:

$ cat /proc/sys/kernel/core_pattern

If you see systemd-coredump, then you’re good.

To enable it in Debian or Ubuntu, just install it:

# apt install systemd-coredump

Ubuntu users, note this will cause apport to be uninstalled, since it is currently incompatible. Also note that I switched from $ (which indicates a normal prompt) to # (which indicates a root prompt).

In other operating systems, you may have to manually enable it:

# echo "kernel.core_pattern=|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" > /etc/sysctl.d/50-coredump.conf
# /usr/lib/systemd/systemd-sysctl --prefix kernel.core_pattern

Note the exact core pattern to use changes occasionally in newer versions of systemd, so these instructions may not work everywhere.

Detour: Manual Core Dump Handling

If you don’t want to enable systemd-coredump, life is harder and you should probably reconsider, but it’s still possible to debug most crashes. First, enable core dump creation by removing the default 0-byte size limit on core files:

$ ulimit -c unlimited

This change is temporary and only affects the current instance of your shell. For example, if you open a new tab in your terminal, you will need to set the ulimit again in the new tab.

Next, run your program in the terminal and try to make it crash. A core file will be generated in the current directory. Open it by starting the program that crashed in gdb and passing the filename of the core file that was created. For example:

$ gdb gnome-chess ./core

This is downright primitive, though:

  • You’re going to have a hard time getting backtraces for services that are crashing, for starters. If starting the service normally, how do you set the ulimit? I’m sure there’s a way to do it, but I don’t know how! It’s probably easier to start the service manually, but then what command line flags are needed to properly do so? It will be different for each service, and you have to figure this all out for yourself.
  • Special situations become very difficult. For example, if a service is crashing only when run early during boot, or only during an initial setup session, you are going to have an especially hard time.
  • If you don’t know how to reproduce a crash that occurs only rarely, it’s inevitably going to crash when you’re not prepared to manually catch the core dump. Sadly, not all crashes will occur on demand when you happen to be running the software from a terminal with the right ulimit configured.
  • Lastly, you have to remember to delete that core file when you’re done, because otherwise it will take up space on your disk space until you do. You’ll probably notice if you leave core files scattered in ~/home, but you might not notice if you’re working someplace else.

Seriously, just enable systemd-coredump. It solves all of these problems and guarantees you will always have easy access to a core dump when something crashes, even for crashes that occur only rarely.

Debuginfo Installation

Now that we know how to open a core dump in gdb, let’s talk about debuginfo. When you don’t have the right debuginfo packages installed, the backtrace generated by gdb will be low-quality. Almost all Linux software developers deal with low-quality backtraces on a regular basis, because most users are not very good at installing debuginfo. Again, if you’re using Fedora 35 or newer, you don’t have to worry about this anymore because debuginfod will take care of everything for you. I would be thrilled if other Linux operating systems would quickly adopt debuginfod so we can put the era of low-quality crash reports behind us. But if you’re using an operating system that does not provide a debuginfod server, you’ll need to learn how to install debuginfo manually.

As an example, I decided to force gnome-chess to crash using the command killall -SEGV gnome-chess, then I ran coredumpctl gdb to open the resulting core dump in gdb. After a bunch of spam, I saw this:

Missing separate debuginfos, use: dnf debuginfo-install gnome-chess-40.1-1.fc34.x86_64
--Type <RET> for more, q to quit, c to continue without paging--
Core was generated by `/usr/bin/gnome-chess --gapplication-service'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fa23d8b55bf in __GI___poll (fds=0x5636deb06930, nfds=2, timeout=2830)
    at ../sysdeps/unix/sysv/linux/poll.c:29
29  return SYSCALL_CANCEL (poll, fds, nfds, timeout);
[Current thread is 1 (Thread 0x7fa23ca0cd00 (LWP 140177))]
(gdb)

If you are using Fedora, RHEL, or related operating systems, the line “missing separate debuginfos” is a good hint that debuginfo is missing. It even tells you exactly which dnf debuginfo-install command to run to remedy this problem! But this is a Fedora ecosystem feature, and you won’t see this on most other operating systems. Usually, you’ll need to manually locate the right debuginfo packages to install. Debian and Ubuntu users can do this by searching for and installing -dbg or -dbgsym packages until each frame in the backtrace looks good. You’ll just have to manually guess the names of which debuginfo packages you need to install based on the names of the libraries in the backtrace. Look here for instructions for popular operating systems.

How do you know when the backtrace looks good? When each frame has file names, line numbers, function parameters, and local variables! Here is an example of a bad backtrace, if I continue the gnome-chess example above without properly installing the required debuginfo:

(gdb) bt full
#0 0x00007fa23d8b55bf in __GI___poll (fds=0x5636deb06930, nfds=2, timeout=2830)
    at ../sysdeps/unix/sysv/linux/poll.c:29
        sc_ret = -516
        sc_cancel_oldtype = 0
#1 0x00007fa23eee648c in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
#2 0x00007fa23ee8fc03 in g_main_context_iteration () at /lib64/libglib-2.0.so.0
#3 0x00007fa23e4b599d in g_application_run () at /lib64/libgio-2.0.so.0
#4 0x00005636dd7b79a2 in chess_application_main ()
#5 0x00007fa23d7e7b75 in __libc_start_main (main=0x5636dd7aaa50 <main>, argc=2, argv=0x7fff827b6438, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff827b6428)
    at ../csu/libc-start.c:332
        self = <optimized out>
        result = <optimized out>
        unwind_buf = 
              {cancel_jmp_buf = {{jmp_buf = {94793644186304, 829313697107602221, 94793644026480, 0, 0, 0, -829413713854928083, -808912263273321683}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x2, 0x7fff827b6438}, data = {prev = 0x0, cleanup = 0x0, canceltype = 2}}}
        not_first_call = <optimized out>
#6 0x00005636dd7aaa9e in _start ()

This backtrace has six frames, which shows where the code was during program execution when the crash occurred. You can see line numbers for frame #0 (poll.c:29) and #5 (libc-start.c:332), and these frames also show the values of function parameters and variables on the stack, which are often useful for figuring out what went wrong. These frames have good debuginfo because I already had debuginfo installed for glibc. But frames #1 through #4 do not look so useful, showing only function names and the library and nothing else. This is because I’m using Fedora 34 rather than Fedora 35, so I don’t have debuginfod yet, and I did not install proper debuginfo for libgio, libglib, and gnome-chess. (The function names are actually only there because programs in Fedora include some limited debuginfo by default. In many operating systems, you will see ??? instead of function names.) A developer looking at this backtrace is not going to know what went wrong.

Now, let’s run the recommended debuginfo-install command:

# dnf debuginfo-install gnome-chess-40.1-1.fc34.x86_64

When the command finishes, we’ll start gdb again, using coredumpctl gdb just like before. This time, we see this:

Missing separate debuginfos, use: dnf debuginfo-install avahi-libs-0.8-14.fc34.x86_64 colord-libs-1.4.5-2.fc34.x86_64 cups-libs-2.3.3op2-7.fc34.x86_64 fontconfig-2.13.94-2.fc34.x86_64 glib2-2.68.4-1.fc34.x86_64 graphene-1.10.6-2.fc34.x86_64 gstreamer1-1.19.1-2.1.18.4.fc34.x86_64 gstreamer1-plugins-bad-free-1.19.1-3.1.18.4.fc34.x86_64 gstreamer1-plugins-base-1.19.1-2.1.18.4.fc34.x86_64 gtk4-4.2.1-1.fc34.x86_64 json-glib-1.6.6-1.fc34.x86_64 krb5-libs-1.19.2-2.fc34.x86_64 libX11-1.7.2-3.fc34.x86_64 libX11-xcb-1.7.2-3.fc34.x86_64 libXfixes-6.0.0-1.fc34.x86_64 libdrm-2.4.107-1.fc34.x86_64 libedit-3.1-38.20210714cvs.fc34.x86_64 libepoxy-1.5.9-1.fc34.x86_64 libgcc-11.2.1-1.fc34.x86_64 libidn2-2.3.2-1.fc34.x86_64 librsvg2-2.50.7-1.fc34.x86_64 libstdc++-11.2.1-1.fc34.x86_64 libxcrypt-4.4.25-1.fc34.x86_64 llvm-libs-12.0.1-1.fc34.x86_64 mesa-dri-drivers-21.1.8-1.fc34.x86_64 mesa-libEGL-21.1.8-1.fc34.x86_64 mesa-libgbm-21.1.8-1.fc34.x86_64 mesa-libglapi-21.1.8-1.fc34.x86_64 nettle-3.7.3-1.fc34.x86_64 openldap-2.4.57-5.fc34.x86_64 openssl-libs-1.1.1l-1.fc34.x86_64 pango-1.48.9-2.fc34.x86_64

Yup, Fedora ecosystem users will need to run dnf debuginfo-install twice to install everything required, because gdb doesn’t list all required packages until the second time. Next, we’ll run coredumpctl gdb one last time. There will usually be a few more debuginfo packages that are still missing because they’re not available in the Fedora repositories, but now you’ll probably have enough to get a quality backtrace:

(gdb) bt full
#0  0x00007fa23d8b55bf in __GI___poll (fds=0x5636deb06930, nfds=2, timeout=2830)
    at ../sysdeps/unix/sysv/linux/poll.c:29
        sc_ret = -516
        sc_cancel_oldtype = 0
#1  0x00007fa23eee648c in g_main_context_poll
    (priority=, n_fds=2, fds=0x5636deb06930, timeout=, context=0x5636de7b24a0)
    at ../glib/gmain.c:4434
        ret = 
        errsv = 
        poll_func = 0x7fa23ee97c90 
        max_priority = 2147483647
        timeout = 2830
        some_ready = 
        nfds = 2
        allocated_nfds = 2
        fds = 0x5636deb06930
        begin_time_nsec = 30619110638882
#2  g_main_context_iterate.constprop.0
    (context=context@entry=0x5636de7b24a0, block=block@entry=1, dispatch=dispatch@entry=1, self=)
    at ../glib/gmain.c:4126
        max_priority = 2147483647
        timeout = 2830
        some_ready = 
        nfds = 2
        allocated_nfds = 2
        fds = 0x5636deb06930
        begin_time_nsec = 30619110638882
#3  0x00007fa23ee8fc03 in g_main_context_iteration
    (context=context@entry=0x5636de7b24a0, may_block=may_block@entry=1) at ../glib/gmain.c:4196
        retval = 
#4  0x00007fa23e4b599d in g_application_run
    (application=0x5636de7ae260 [ChessApplication], argc=-2105843004, argv=)
    at ../gio/gapplication.c:2560
        arguments = 0x5636de7b2400
        status = 0
        context = 0x5636de7b24a0
        acquired_context = 
        __func__ = "g_application_run"
#5  0x00005636dd7b79a2 in chess_application_main (args=0x7fff827b6438, args_length1=2)
    at src/gnome-chess.p/gnome-chess.c:5623
        _tmp0_ = 0x5636de7ae260 [ChessApplication]
        _tmp1_ = 0x5636de7ae260 [ChessApplication]
        _tmp2_ = 
        result = 0
...

I removed the last two frames because they are triggering a strange WordPress bug, but that’s enough to get the point. It looks much better! Now the developer can see exactly where the program crashed, including filenames, line numbers, and the values of function parameters and variables on the stack. This is as good as a crash report is normally going to get. In this case, it crashed when running poll() because gnome-chess was not actually doing anything at the time of the crash, since we crashed it by manually sending a SIGSEGV signal. Normally the backtrace will look more interesting.

debuginfod for Debian Users

Debian users can use debuginfod, but it has to be enabled manually:

$ DEBUGINFOD_URLS=https://debuginfod.debian.net/ gdb

See here for more information. This requires Debian 11 “bullseye” or newer. If you’re using Ubuntu or other operating systems derived from Debian, you’ll need to wait until a debuginfod server for your operating system is available.

Flatpak

If your application uses Flatpak, you can use the flatpak-coredumpctl script to open core dumps in gdb. For most runtimes, including those distributed by GNOME or Flathub, you will need to manually install (a) the debug extension for your app, (b) the SDK runtime corresponding to the platform runtime that you are using, and (c) the debug extension for the SDK runtime. For example, to install everything required to debug Epiphany 40 from Flathub, you would run:

$ flatpak install org.gnome.Epiphany.Debug//stable
$ flatpak install org.gnome.Sdk//40
$ flatpak install org.gnome.Sdk.Debug//40

(flatpak-coredumpctl will fail to start if you don’t have the correct SDK runtime installed, but it will not fail if you’re missing the debug extensions. You’ll just wind up with a bad backtrace.)

The debug extensions need to exactly match the versions of the app and runtime that crashed, so backtrace generation may be unreliable after you install them for the very first time, because you would have installed the latest versions of the extensions, but your core dump might correspond to an older app or runtime version. If the crash is reproducible, it’s a good idea to run flatpak update after installing to ensure you have the latest version of everything, then reproduce the crash again.

Once your debuginfo is installed, you can open the backtrace in gdb using flatpak-coredumpctl. You just have to tell flatpak-coredumpctl the app ID to use:

$ flatpak-coredumpctl org.gnome.Epiphany

You can pass matches to coredumpctl using -m. For example, to open the core dump corresponding to a crashed process with pid 1234:

$ flatpak-coredumpctl -m 1234 org.gnome.Epiphany

Thibault Saunier wrote flatpak-coredumpctl because I complained about how hard it used to be to debug crashed Flatpak applications. Clearly it is no longer hard. Thanks Thibault!

On newer versions of Debian and Ubuntu, flatpak-coredumpctl is included in the libflatpak-dev subpackage rather than the base flatpak package, so you’ll have to install libflatpak-dev first. But on older OS versions, including Debian 10 “buster” and Ubuntu 20.04, it is unfortunately installed as /usr/share/doc/flatpak/examples/flatpak-coredumpctl rather than /usr/bin/flatpak-coredumpctl due to a regrettable packaging choice that has been corrected in newer package versions. As a workaround, you can simply copy it to /usr/local/bin. Don’t forget to delete your copy after upgrading to a newer OS version, or it will shadow the packaged version.

Fedora Flatpaks

Flatpaks distributed by Fedora are different than those distributed by GNOME or by Flathub because they do not have debug extensions. Historically, this has meant that debugging crashes was impractical. The best solution was to give up.

Good news! Fedora’s Flatpaks are compatible with debuginfod, which means debug extensions will no longer be missed. You do still need to manually install the org.fedoraproject.Sdk runtime corresponding to the version of the org.fedoraproject.Platform runtime that the application uses, because this is required for flatpak-coredumpctl to work, but nothing else is required. For example, to get a backtrace for Fedora’s Epiphany Flatpak using a Fedora 35 host system, I ran:

$ flatpak install org.fedoraproject.Sdk//f34
$ flatpak-coredumpctl org.gnome.Epiphany
(gdb) bt full

(The f34 is not a typo. Epiphany currently uses the Fedora 34 runtime regardless of what host system you are using.)

That’s it!

Miscellany

At this point, you should know enough to obtain a high-quality backtrace on most Linux systems. That will usually be all you really need, but it never hurts to know a little more, right?

Alternative Types of Backtraces

At the top of this blog post, I suggested using bt full to take the backtrace because this type of backtrace is the most useful to most developers. But there are other types of backtraces you might occasionally want to collect:

  • bt on its own without full prints a much shorter backtrace without stack variables or function parameters. This form of the backtrace is more useful for getting a quick feel for where the bug is occurring, because it is much shorter and easier to read than a full backtrace. But because there are no stack variables or function parameters, it might not contain enough information to solve the crash. I sometimes like to paste the first few lines of a bt backtrace directly into an issue report, then submit the bt full version of the backtrace as an attachment, since an entire bt full backtrace can be long and inconvenient if pasted directly into an issue report.
  • thread apply all bt prints a backtrace for every thread. Normally these backtraces are very long and noisy, so I don’t collect them very often, but when a threadsafety issue is suspected, this form of backtrace will sometimes be required.
  • thread apply all bt full prints a full backtrace for every thread. This is what automated bug report tools generally collect, because it provides the most information. But these backtraces are usually huge, and this level of detail is rarely needed, so I normally recommend starting with a normal bt full.

If in doubt, just use bt full like I showed at the top of this blog post. Developers will let you know if they want you to provide the backtrace in a different form.

gdb Logging

You can make gdb print your session to a file. For longer backtraces, this may be easier than copying the backtrace from a terminal:

(gdb) set logging on

Memory Corruption

While a backtrace taken with gdb is usually enough information for developers to debug crashes, memory corruption is an exception. Memory corruption is the absolute worst. When memory corruption occurs, the code will crash in a location that may be far removed from where the actual bug occurred, rendering gdb backtraces useless for tracking down the bug. As a general rule, if you see a crash inside a memory allocation routine like malloc() or g_slice_alloc(), you probably have memory corruption. If you see magazine_chain_pop_head(), that’s called by g_slice_alloc() and is a sure sign of memory corruption. Similarly, crashes in GTK’s CSS machinery are almost always caused by memory corruption somewhere else.

Memory corruption is generally impossible to debug unless you are able to reproduce the issue under valgrind. valgrind is extremely slow, so it’s impractical to use it on a regular basis, but it will get to the root of the problem where gdb cannot. As a general rule, you want to run valgrind with --track-origins=yes so that it shows you exactly what went wrong:

$ valgrind --track-origins=yes my_app

If you cannot reproduce the issue under valgrind, you’re usually totally out of luck. Memory corruption that only occurs rarely or under unknown conditions will lurk in your code indefinitely and cause occasional crashes that are effectively impossible to fix.

Another good tool for debugging memory corruption is address sanitizer (asan), but this is more complicated to use. Experienced users who are comfortable with rebuilding applications using special compiler flags may find asan very useful. However, because it can be very difficult to use,  I recommend sticking with valgrind if you’re just trying to report a bug.

Apport and ABRT

There are two popular downstream bug reporting tools: Ubuntu has Apport, and Fedora has ABRT. These tools are relatively easy to use — no command line knowledge required — and produce quality crash reports. Unfortunately, while the tools are promising, the crash reports go to downstream packagers who are generally either not watching bug reports, or else not interested or capable of fixing upstream software problems. Since downstream reports are very often ignored, it’s better to report crashes directly to upstream if you want your issue to be seen by the right developers and actually fixed. Of course, only report issues upstream if you’re using a recent software version. Fedora and Arch users can pretty much always safely report directly to upstream, as can Ubuntu users who are using the very latest version of Ubuntu. If you are an Ubuntu LTS user, you should stick with reporting issues to downstream only, or at least take the time to verify that the issue still occurs with a more recent software version.

There are a couple more problems with these tools. As previously mentioned, Ubuntu’s apport is incompatible with systemd-coredump. If you’ve read this far, you know you really want systemd-coredump enabled, so I recommend disabling apport until it learns to play ball with systemd-coredump.

The technical design of Fedora’s ABRT is currently better because it actually retrieves your core dumps from systemd-coredump, so you don’t have to choose between one or the other. Unfortunately, ABRT has many serious user experience bugs and warts. I can’t recommend it for this reason, but it if it works well enough for you, it does create some great downstream crash reports. Whether a downstream package maintainer will look at those reports is hit or miss, though.

What is a crash, really?

Most developers consider crashes on Unix systems to be program termination via a Unix signal that triggers creation of a core dump. The most common of these are SIGSEGV (segmentation fault, “invalid memory reference”) or SIBABRT (usually an intentional crash due to an assertion failure). Less-common signals are SIGBUS (“bad memory access”) or SIGILL (“illegal instruction”). Sandboxed applications might occasionally see SIGSYS (“bad system call”). See the manpage signal(7) for a full list. These are cases where you can get a backtrace to help with tracking down the issues.

What is not a crash? If your application is hanging or just not behaving properly, that is not a crash. If your application is killed using SIGTERM or SIGKILL — this can happen when systemd-oomd determines you are low on memory,  or when a service is taking too long to stop — this is also not a crash in the usual sense of the word, because you’re not going to be able to get a backtrace for it. If a website is slow or unavailable, the news might say that it “crashed,” but it’s obviously not the same thing as what we’re talking about here. The techniques in this blog post are no use for these sorts of “crashes.”

Conclusion

If you have systemd-coredump enabled and debuginfod installed and working, most crash reports will be simple.  Memory corruption is a frustrating exception. Encourage your operating system to enable systemd-coredump and debuginfod if it doesn’t already.  Happy crash reporting!

Flatpak portals - how do they work?

Posted by Peter Hutterer on September 01, 2021 03:23 AM

I've been working on portals recently and one of the issues for me was that the documentation just didn't quite hit the sweet spot. At least the bits I found were either too high-level or too implementation-specific. So here's a set of notes on how a portal works, in the hope that this is actually correct.

First, Portals are supposed to be a way for sandboxed applications (flatpaks) to trigger functionality they don't have direct access too. The prime example: opening a file without the application having access to $HOME. This is done by the applications talking to portals instead of doing the functionality themselves.

There is really only one portal process: /usr/libexec/xdg-desktop-portal, started as a systemd user service. That process owns a DBus bus name (org.freedesktop.portal.Desktop) and an object on that name (/org/freedesktop/portal/desktop). You can see that bus name and object with D-Feet, from DBus' POV there's nothing special about it. What makes it the portal is simply that the application running inside the sandbox can talk to that DBus name and thus call the various methods. Obviously the xdg-desktop-portal needs to run outside the sandbox to do its things.

There are multiple portal interfaces, all available on that one object. Those interfaces have names like org.freedesktop.portal.FileChooser (to open/save files). The xdg-desktop-portal implements those interfaces and thus handles any method calls on those interfaces. So where an application is sandboxed, it doesn't implement the functionality itself, it instead calls e.g. the OpenFile() method on the org.freedesktop.portal.FileChooser interface. Then it gets an fd back and can read the content of that file without needing full access to the file system.

Some interfaces are fully handled within xdg-desktop-portal. For example, the Camera portal checks a few things internally, pops up a dialog for the user to confirm access to if needed [1] but otherwise there's nothing else involved with this specific method call.

Other interfaces have a backend "implementation" DBus interface. For example, the org.freedesktop.portal.FileChooser interface has a org.freedesktop.impl.portal.FileChooser (notice the "impl") counterpart. xdg-desktop-portal does not implement those impl.portals. xdg-desktop-portal instead routes the DBus calls to the respective "impl.portal". Your sandboxed application calls OpenFile(), xdg-desktop-portal now calls OpenFile() on org.freedesktop.impl.portal.FileChooser. That interface returns a value, xdg-desktop-portal extracts it and returns it back to the application in respones to the original OpenFile() call.

What provides those impl.portals doesn't matter to xdg-desktop-portal, and this is where things are hot-swappable. GTK and Qt both provide (some of) those impl portals, There are GTK and Qt-specific portals with xdg-desktop-portal-gtk and xdg-desktop-portal-kde but another one is provided by GNOME Shell directly. You can check the files in /usr/share/xdg-desktop-portal/portals/ and see which impl portal is provided on which bus name. The reason those impl.portals exist is so they can be native to the desktop environment - regardless what application you're running and with a generic xdg-desktop-portal, you see the native file chooser dialog for your desktop environment.

So the full call sequence is:

  • At startup, xdg-desktop-portal parses the /usr/libexec/xdg-desktop-portal/*.portal files to know which impl.portal interface is provided on which bus name
  • The application calls OpenFile() on the org.freedesktop.portal.FileChooser interface on the object path /org/freedesktop/portal/desktop. It can do so because the bus name this object sits on is not restricted by the sandbox
  • xdg-desktop-portal receives that call. This is portal with an impl.portal so xdg-desktop-portal calls OpenFile() on the bus name that provides the org.freedesktop.impl.portal.FileChooser interface (as previously established by reading the *.portal files)
  • Assuming xdg-desktop-portal-gtk provides that portal at the moment, that process now pops up a GTK FileChooser dialog that runs outside the sandbox. User selects a file
  • xdg-desktop-portal-gtk sends back the fd for the file to the xdg-desktop-portal, and the impl.portal parts are done
  • xdg-desktop-portal receives that fd and sends it back as reply to the OpenFile() method in the normal portal
  • The application receives the fd and can read the file now
A few details here aren't fully correct, but it's correct enough to understand the sequence - the exact details depend on the method call anyway.

Finally: because of DBus restrictions, the various methods in the portal interfaces don't just reply with values. Instead, the xdg-desktop-portal creates a new org.freedesktop.portal.Request object and returns the object path for that. Once that's done the method is complete from DBus' POV. When the actual return value arrives (e.g. the fd), that value is passed via a signal on that Request object, which is then destroyed. This roundabout way is done for purely technical reasons, regular DBus methods would time out while the user picks a file path.

Anyway. Maybe this helps someone understanding how the portal bits fit together.

[1] it does so using another portal but let's ignore that
[2] not really hot-swappable though. You need to restart xdg-desktop-portal but not your host. So luke-warm-swappable only

Edit Sep 01: clarify that it's not GTK/Qt providing the portals, but xdg-desktop-portal-gtk and -kde

libinput and high-resolution wheel scrolling

Posted by Peter Hutterer on August 31, 2021 07:50 AM

Gut Ding braucht Weile. Almost three years ago, we added high-resolution wheel scrolling to the kernel (v5.0). The desktop stack however was first lagging and eventually left behind (except for an update a year ago or so, see here). However, I'm happy to announce that thanks to José Expósito's efforts, we now pushed it across the line. So - in a socially distanced manner and masked up to your eyebrows - gather round children, for it is storytime.

Historical History

In the beginning, there was the wheel detent. Or rather there were 24 of them, dividing a 360 degree [1] movement of a wheel into a neat set of 15 clicks. libinput exposed those wheel clicks as part of the "pointer axis" namespace and you could get the click count with libinput_event_pointer_get_axis_discrete() (announced here). The degree value is exposed as libinput_event_pointer_get_axis_value(). Other scroll backends (finger-scrolling or button-based scrolling) expose the pixel-precise value via that same function.

In a "recent" Microsoft Windows version (Vista!), MS added the ability for wheels to trigger more than 24 clicks per rotation. The MS Windows API now treats one "traditional" wheel click as a value of 120, anything finer-grained will be a fraction thereof. You may have a mouse that triggers quarter-wheel clicks, each sending a value of 30. This makes for smoother scrolling and is supported(-ish) by a lot of mice introduced in the last 10 years [2]. Obviously, three small scrolls are nicer than one large scroll, so the UX is less bad than before.

Now it's time for libinput to catch up with Windows Vista! For $reasons, the existing pointer axis API could get changed to accommodate for the high-res values, so a new API was added for scroll events. Read on for the details, you will believe what happens next.

Out with the old, in with the new

As of libinput 1.19, libinput has three new events: LIBINPUT_EVENT_POINTER_SCROLL_WHEEL, LIBINPUT_EVENT_POINTER_SCROLL_FINGER, and LIBINPUT_EVENT_POINTER_SCROLL_CONTINUOUS. These events reflect, perhaps unsuprisingly, scroll movements of a wheel, a finger or along a continuous axis (e.g. button scrolling). And they replace the old event LIBINPUT_EVENT_POINTER_AXIS. Those familiar with libinput will notice that the new event names encode the scroll source in the event name now. This makes them slightly more flexible and saves callers an extra call.

In terms of actual API, the new events have two new functions: libinput_event_pointer_get_scroll_value(). For the FINGER and CONTINUOUS events, the value returned is in "pixels" [3]. For the new WHEEL events, the value is in degrees. IOW this is a drop-in replacement for the old libinput_event_pointer_get_axis_value() function. The second call is libinput_event_pointer_get_scroll_value_v120() which, for WHEEL events, returns the 120-based logical units the kernel uses as well. libinput_event_pointer_has_axis() returns true if the given axis has a value, just as before. With those three calls you now get the data for the new events.

Backwards compatibility

To ensure backwards compatibility, libinput generates both old and new events so the rule for callers is: if you want to support the new events, just ignore the old ones completely. libinput also guarantees new events even on pre-5.0 kernels. This makes the old and new code easy to ifdef out, and once you get past the immediate event handling the code paths are virtually identical.

When, oh when?

These changes have been merged into the libinput main branch and will be part of libinput 1.19. Which is due to be released over the next month or so, so feel free to work backwards from that for your favourite distribution.

Having said that, libinput is merely the lowest block in the Jenga tower that is the desktop stack. José linked to the various MRs in the upstream libinput MR, so if you're on your seat's edge waiting for e.g. GTK to get this, well, there's an MR for that.

[1] That's degrees of an angle, not Fahrenheit
[2] As usual, on a significant number of those you'll need to know whatever proprietary protocol the vendor deemed to be important IP. Older MS mice stand out here because they use straight HID.
[3] libinput doesn't really have a concept of pixels, but it has a normalized pixel that movements are defined as. Most callers take that as real pixels except for the high-resolution displays where it's appropriately scaled.

Pango updates

Posted by Matthias Clasen on August 26, 2021 05:15 PM

I’ve spent some time on Pango, recently. Here is a little update on the feature work that I’ve done there. All of these changes will appear in Pango 1.50 and GTK 4.6.

The general directions of this work are:

  • Take advantage of the fact that we are now using harfbuzz on all platforms. Among other things, this gives us much easier access to font information.
  • Match CSS where it makes sense. If nothing else, this makes it much easier to connect new Pango features to the CSS machinery in GTK.

CSS features

Lets start with the second point: matching CSS.

Line spacing has historically been a bit painful in GtkTextView. You can set distances before and after paragraphs, and between wrapped lines inside a paragraph. But this does not take font sizes into account—it is a fixed number of pixels.

A while ago, I added a line-spacing factor to Pango, which was meant to help with the font size dependency. You basically tell Pango: I want the baselines of this paragraph spaced apart 1.33 times as wide as they would normally be. The remaining problem is that Pango handles text one paragraph at a time. So as far as it is concerned, there is not previous baseline above the first line in a paragraph, and it does not increase the spacing between paragraphs.

The CSS solution to this problem is to just make the lines themselves taller, and place them flush next to each other.  With this approach, you need to be a little careful to make sure that you still get consistent baseline-to-baseline distances. But at least it solves the paragraph spacing issue.

Pango recently gained a line-height attribute that does just that, and GTK now supports the corresponding CSS property.

Another feature that has come to Pango from the CSS side is support for text transformation (also in the screenshot). This lets you change the capitalization of text. Note that this is just for presentation purposes—if you select STRASSE in the example and copy it to the clipboard,  you get the original straße.

And again, GTK supports the corresponding CSS property.

Font features

As I said, harfbuzz makes it much easier for us to access features of the fonts we use. One of these that I have recently looked into is caret metrics. High-quality italic fonts can contain information about the slope at which the text caret is best drawn to match the text:

I’ve added a new api to get this information, and made GTK use it, with this result:

<video class="wp-video-shortcode" controls="controls" height="267" id="video-9287-1" preload="metadata" width="474"><source src="https://blogs.gnome.org/mclasen/files/2021/08/Screencast-from-08-24-2021-035115-PM.webm?_=1" type="video/webm">https://blogs.gnome.org/mclasen/files/2021/08/Screencast-from-08-24-2021-035115-PM.webm</video>

Another useful bit of font information concerns placement or carets inside ligatures.

 Historically, Pango has just divided the width of the glyph evenly among the characters that are forming the ligature (w and i, in this example), but high-quality fonts can provide this information. This is most relevant for scripts using many ligatures, such as Arabic.I’ve made Pango use the ligature caret data if it is available from the font, and got this result:

<video class="wp-video-shortcode" controls="controls" height="267" id="video-9287-2" preload="metadata" width="474"><source src="https://blogs.gnome.org/mclasen/files/2021/08/Screencast-from-08-25-2021-105318-PM.webm?_=2" type="video/webm">https://blogs.gnome.org/mclasen/files/2021/08/Screencast-from-08-25-2021-105318-PM.webm</video>

The wi ligature in this test is what I could come up with after a struggling for a few hours with fontforge (clearly, font design is not in my future). The only other fonts I’ve found with ligature caret information are Arabic, and I sadly can’t read or write that script.

The last feature closed a 15 year old bug – not something you get to do every day!

libei - a status update

Posted by Peter Hutterer on August 25, 2021 05:29 AM

A year ago, I first announced libei - a library to support emulated input. After an initial spurt of development, it was left mostly untouched until a few weeks ago. Since then, another flurry of changes have been added, including some initial integration into GNOME's mutter. So, let's see what has changed.

A Recap

First, a short recap of what libei is: it's a transport layer for emulated input events to allow for any application to control the pointer, type, etc. But, unlike the XTEST extension in X, libei allows the compositor to be in control over clients, the devices they can emulate and the input events as well. So it's safer than XTEST but also a lot more flexible. libei already supports touch and smooth scrolling events, something XTest doesn't have or is struggling with.

Terminology refresher: libei is the client library (used by an application wanting to emulate input), EIS is the Emulated Input Server, i.e. the part that typically runs in the compositor.

Server-side Devices

So what has changed recently: first, the whole approach has flipped on its head - now a libei client connects to the EIS implementation and "binds" to the seats the EIS implementation provides. The EIS implementation then provides input devices to the client. In the simplest case, that's just a relative pointer but we have capabilities for absolute pointers, keyboards and touch as well. Plans for the future is to add gestures and tablet support too. Possibly joysticks, but I haven't really thought about that in detail yet.

So basically, the initial conversation with an EIS implementation goes like this:

  • Client: Hello, I am $NAME
  • Server: Hello, I have "seat0" and "seat1"
  • Client: Bind to "seat0" for pointer, keyboard and touch
  • Server: Here is a pointer device
  • Server: Here is a keyboard device
  • Client: Send relative motion event 10/2 through the pointer device
Notice how the touch device is missing? The capabilities the client binds to are just what the client wants, the server doesn't need to actually give the client a device for that capability.

One of the design choices for libei is that devices are effectively static. If something changes on the EIS side, the device is removed and a new device is created with the new data. This applies for example to regions and keymaps (see below), so libei clients need to be able to re-create their internal states whenever the screen or the keymap changes.

Device Regions

Devices can now have regions attached to them, also provided by the EIS implementation. These regions define areas reachable by the device and are required for clients such as Barrier. On a dual-monitor setup you may have one device with two regions or two devices with one region (representing one monitor), it depends on the EIS implementation. But either way, as libei client you will know that there is an area and you will know how to reach any given pixel on that area. Since the EIS implementation decides the regions, it's possible to have areas that are unreachable by emulated input (though I'm struggling a bit for a real-world use-case).

So basically, the conversation with an EIS implementation goes like this:

  • Client: Hello, I am $NAME
  • Server: Hello, I have "seat0" and "seat1"
  • Client: Bind to "seat0" for absolute pointer
  • Server: Here is an abs pointer device with regions 1920x1080@0,0, 1080x1920@1920,0
  • Server: Here is an abs pointer device with regions 1920x1080@0,0
  • Server: Here is an abs pointer device with regions 1080x1920@1920,0
  • Client: Send abs position 100/100 through the second device
Notice how we have three absolute devices? A client emulating a tablet that is mapped to a screen could just use the third device. As with everything, the server decides what devices are created and the clients have to figure out what they want to do and how to do it.

Perhaps unsurprisingly, the use of regions make libei clients windowing-system independent. The Barrier EI support WIP no longer has any Wayland-specific code in it. In theory, we could implement EIS in the X server and libei clients would work against that unmodified.

Keymap handling

The keymap handling has been changed so the keymap too is provided by the EIS implementation now, effectively in the same way as the Wayland compositor provides the keymap to Wayland clients. This means a client knows what keycodes to send, it can handle the state to keep track of things, etc. Using Barrier as an example again - if you want to generate an "a", you need to look up the keymap to figure out which keycode generates an A, then you can send that through libei to actually press the key.

Admittedly, this is quite messy. XKB (and specifically libxkbcommon) does not make it easy to go from a keysym to a key code. The existing Barrier X code is full of corner-cases with XKB already, I espect those to be necessary for the EI support as well.

Scrolling

Scroll events have four types: pixel-based scrolling, discrete scrolling, and scroll stop/cancel events. The first should be obvious, discrete scrolling is for mouse wheels. It uses the same 120-based API that Windows (and the kernel) use, so it's compatible with high-resolution wheel mice. The scroll stop event notifies an EIS implementation that the scroll interaction has stopped (e.g. lifting fingers off) which in turn may start kinetic scrolling - just like the libinput/Wayland scroll stop events. The scroll cancel event notifies the EIS implementation that scrolling really has stopped and no kinetic scrolling should be triggered. There's no equivalent in libinput/Wayland for this yet but it helps to get the hook in place.

Emulation "Transactions"

This has fairly little functional effect, but interactions with an EIS server are now sandwiched in a start/stop emulating pair. While this doesn't matter for one-shot tools like xdotool, it does matter for things like Barrier which can send the start emulating event when the pointer enters the local window. This again allows the EIS implementation to provide some visual feedback to the user. To correct the example from above, the sequence is actually:

  • ...
  • Server: Here is a pointer device
  • Client: Start emulating
  • Client: Send relative motion event 10/2 through the pointer device
  • Client: Send relative motion event 1/4 through the pointer device
  • Client: Stop emulating

Properties

Finally, there is now a generic property API, something copied from PipeWire. Properties are simple key/value string pairs and cover those things that aren't in the immediate API. One example here: the portal can set things like "ei.application.appid" to the Flatpak's appid. Properties can be locked down and only libei itself can set properties before the initial connection. This makes them reliable enough for the EIS implementation to make decisions based on their values. Just like with PipeWire, the list of useful properties will grow over time. it's too early to tell what is really needed.

Repositories

Now, for the actual demo bits: I've added enough support to Barrier, XWayland, Mutter and GNOME Shell that I can control a GNOME on Wayland session through Barrier (note: the controlling host still needs to run X since we don't have the ability to capture input events under Wayland yet). The keymap handling in Barrier is nasty but it's enough to show that it can work.

GNOME Shell has a rudimentary UI, again just to show what works:

The status icon shows ... if libei clients are connected, it changes to !!! while the clients are emulating events. Clients are listed by name and can be disconnected at will. I am not a designer, this is just a PoC to test the hooks.

Note how xdotool is listed in this screenshot: that tool is unmodified, it's the XWayland libei implementation that allows it to work and show up correctly

The various repositories are in the "wip/ei" branch of:

And of course libei itself.

Where to go from here? The last weeks were driven by rapid development, so there's plenty of test cases to be written to make sure the new code actually works as intended. That's easy enough. Looking at the Flatpak integration is another big ticket item, once the portal details are sorted all the pieces are (at least theoretically) in place. That aside, improving the integrations into the various systems above is obviously what's needed to get this working OOTB on the various distributions. Right now it's all very much in alpha stage and I could use help with all of those (unless you're happy to wait another year or so...). Do ping me if you're interested to work on any of this.

HighContrast variants for Adwaita-qt

Posted by Jan Grulich on August 24, 2021 10:23 AM

In the past we used to have a completely different project to cover HighContrast variants of GTK Adwaita theme. This was all implemented as Highcontrast-qt, a project nobody has touched for 6 years. You can imagine how it looks like these days when you compare it to what we have now. I think even GTK variant of HighContrast was a completely separate theme back then, while today days it’s just Adwaita with a different set of colors.

Since GTK made the new HighContrast theme with just few modifications to the original Adwaita theme, I decided to use same approach and have Adwaita-qt to provide all four variants as well (Adwaita, Adwaita-dark, HighContrast and HighContrastInverse). While this looks like a simple thing to do, as you just need to add additional color palette, it was a pain to do it in Adwaita-qt. The reason is that Adwaita-qt is full of hardcoded color definitions, where all of them were randomly taken from GTK Adwaita stylesheets. Everytime something changed in GTK Adwaita, we would have to manually pick the change and replace the changed color value on our side. This was not really sustainable, especially when I wanted to have four different variants.

To improve this situation and make my life easier, I decided to bundle GTK stylesheets for Adwaita theme, have it processed and write a simple parser to make everything automated. And I did exactly that. I included the stylesheets and wrote some definitions myself so I can let them processed with sassc (GTK uses SASS for theming) and have all my definitions substituted for simple parsing. I no longer have to pick all base colors manually, all of them are being parsed for all four variants and same goes for basic styling of Buttons, CheckBoxes and Radio Buttons, where each of them have all kind of possible states (active, hovered, checked etc.) and use not only a simple color, but also gradients. You can imagine how hard it was to hardcode all values for each state. The parser I wrote is really basic and simple with use of regular expressions as the code I’m trying to process is not that complex.

The code I try to process is either a simple definition:

@define-color base_color #ffffff;

Or widget definition in this form:

button:checked { color: #2e3436; border-color: #cdc7c2; background-image: image(#dad6d2); box-shadow: none; }

The result is:

<figure class="wp-block-image size-large"><figcaption>Adwaita-qt: Adwaita variant</figcaption></figure> <figure class="wp-block-image size-large"><figcaption>Adwaita-qt: HighContrast variant</figcaption></figure>

I think this is a big step forward for Adwaita-qt and it will allow me to more quicky respond on changes happening in GTK Adwaita/HighContrast theme. I can also imagine this being extended in the future to support some additional variants, like modified Adwaita theme you can find in Ubuntu (at least if the stylesheet is similar enough). As mentioned, Adwaita-qt now supports four variants and they should be on par with GTK 4.4, at least when it comes to colors and style for most used widgets, because there are still places in Adwaita-qt which need some extra work. Anyway, this all is now released as Adwaita-qt 1.4.0 and I will be updating Flathub and Fedora builds to it soon.