Fedora desktop Planet

Reconstructing an invalid TPM event log

Posted by Matthew Garrett on September 13, 2023 09:02 PM
TPMs contain a set of registers ("Platform Configuration Registers", or PCRs) that are used to track what a system boots. Each time a new event is measured, a cryptographic hash representing that event is passed to the TPM. The TPM appends that hash to the existing value in the PCR, hashes that, and stores the final result in the PCR. This means that while the PCR's value depends on the precise sequence and value of the hashes presented to it, the PCR value alone doesn't tell you what those individual events were. Different PCRs are used to store different event types, but there are still more events than there are PCRs so we can't avoid this problem by simply storing each event separately.

This is solved using the event log. The event log is simply a record of each event, stored in RAM. The algorithm the TPM uses to calculate the PCR values is known, so we can reproduce that by simply taking the events from the event log and replaying the series of events that were passed to the TPM. If the final calculated value is the same as the value in the PCR, we know that the event log is accurate, which means we now know the value of each individual event and can make an appropriate judgement regarding its security.

If any value in the event log is invalid, we'll calculate a different PCR value and it won't match. This isn't terribly helpful - we know that at least one entry in the event log doesn't match what was passed to the TPM, but we don't know which entry. That means we can't trust any of the events associated with that PCR. If you're trying to make a security determination based on this, that's going to be a problem.

PCR 7 is used to track information about the secure boot policy on the system. It contains measurements of whether or not secure boot is enabled, and which keys are trusted and untrusted on the system in question. This is extremely helpful if you want to verify that a system booted with secure boot enabled before allowing it to do something security or safety critical. Unfortunately, if the device gives you an event log that doesn't replay correctly for PCR 7, you now have no idea what the security state of the system is.

We ran into that this week. Examination of the event log revealed an additional event other than the expected ones - a measurement accompanied by the string "Boot Guard Measured S-CRTM". Boot Guard is an Intel feature where the CPU verifies the firmware is signed with a trusted key before executing it, and measures information about the firmware in the process. Previously I'd only encountered this as a measurement into PCR 0, which is the PCR used to track information about the firmware itself. But it turns out that at least some versions of Boot Guard also measure information about the Boot Guard policy into PCR 7. The argument for this is that this is effectively part of the secure boot policy - having a measurement of the Boot Guard state tells you whether Boot Guard was enabled, which tells you whether or not the CPU verified a signature on your firmware before running it (as I wrote before, I think Boot Guard has user-hostile default behaviour, and that enforcing this on consumer devices is a bad idea).

But there's a problem here. The event log is created by the firmware, and the Boot Guard measurements occur before the firmware is executed. So how do we get a log that represents them? That one's fairly simple - the firmware simply re-calculates the same measurements that Boot Guard did and creates a log entry after the fact[1]. All good.

Except. What if the firmware screws up the calculation and comes up with a different answer? The entry in the event log will now not match what was sent to the TPM, and replaying will fail. And without knowing what the actual value should be, there's no way to fix this, which means there's no way to verify the contents of PCR 7 and determine whether or not secure boot was enabled.

But there's still a fundamental source of truth - the measurement that was sent to the TPM in the first place. Inspired by Henri Nurmi's work on sniffing Bitlocker encryption keys, I asked a coworker if we could sniff the TPM traffic during boot. The TPM on the board in question uses SPI, a simple bus that can have multiple devices connected to it. In this case the system flash and the TPM are on the same SPI bus, which made things easier. The board had a flash header for external reprogramming of the firmware in the event of failure, and all SPI traffic was visible through that header. Attaching a logic analyser to this header made it simple to generate a record of that. The only problem was that the chip select line on the header was attached to the firmware flash chip, not the TPM. This was worked around by simply telling the analysis software that it should invert the sense of the chip select line, ignoring all traffic that was bound for the flash and paying attention to all other traffic. This worked in this case since the only other device on the bus was the TPM, but would cause problems in the event of multiple devices on the bus all communicating.

With the aid of this analyser plugin, I was able to dump all the TPM traffic and could then search for writes that included the "0182" sequence that corresponds to the command code for a measurement event. This gave me a couple of accesses to the locality 3 registers, which was a strong indication that they were coming from the CPU rather than from the firmware. One was for PCR 0, and one was for PCR 7. This corresponded to the two Boot Guard events that we expected from the event log. The hash in the PCR 0 measurement was the same as the hash in the event log, but the hash in the PCR 7 measurement differed from the hash in the event log. Replacing the event log value with the value actually sent to the TPM resulted in the event log now replaying correctly, supporting the hypothesis that the firmware was failing to correctly reconstruct the event.

What now? The simple thing to do is for us to simply hard code this fixup, but longer term we'd like to figure out how to reconstruct the event so we can calculate the expected value ourselves. Unfortunately there doesn't seem to be any public documentation on this. Sigh.

[1] What stops firmware on a system with no Boot Guard faking those measurements? TPMs have a concept of "localities", effectively different privilege levels. When Boot Guard performs its initial measurement into PCR 0, it does so at locality 3, a locality that's only available to the CPU. This causes PCR 0 to be initialised to a different initial value, affecting the final PCR value. The firmware can't access locality 3, so can't perform an equivalent measurement, so can't fake the value.

comment count unavailable comments

Crosswords 0.3.11: Acrostic Panels

Posted by Jonathan Blandford on September 03, 2023 11:25 PM

Long time, no release.

When I last blogged about GNOME Crosswords, I had a design plan to improve the editing API. It’s been a busy summer since then. The crosswords team rewrote large chunks of code to implement and use this new API:

crosswords: 146 files changed, 10545 insertions(+), 4915 deletions(-)
libipuz:  53 files changed, 8224 insertions(+), 961 deletions(-)

There’s now over 38KLOC between the two codebases — this is starting to look like a real application!

Editor and libpanel

The biggest change this cycle is the implementation of the new editing interface. I started changing the code five months ago, but it took a while to land. We now use libpanel from GNOME Builder to manage the information panels. Libpanel has a lot of the functionality I want, though unfortunately I’m fighting its geometry handling.

<figure aria-describedby="caption-attachment-7334" class="wp-caption aligncenter" id="attachment_7334" style="width: 840px">New grid editor panel<figcaption class="wp-caption-text" id="caption-attachment-7334">New grid editor panel</figcaption></figure> <figure aria-describedby="caption-attachment-7337" class="wp-caption aligncenter" id="attachment_7337" style="width: 840px">New clues editor panel<figcaption class="wp-caption-text" id="caption-attachment-7337">New clues editor panel</figcaption></figure>

I really struggled getting the UI design for this to work, and I had a number of regrettable paths along the way. Fortunately, Niko agreed to help out with this, and showed up with some fabulous design work! I’m so much happier with the current approach, and am getting ready to implement more of his designs for the next cycle.

Behind the scenes, implementing this was a challenge. I blogged about those challenges previously, but in a nutshell, mutating a puzzle has a lot of side-effects which can leave you in an invalid state. For example, something simple like adding a block could completely change the numbering of the grid.

Federico and I fixed this by adding a number of functions to enforce heuristics and conventions. I also added ipuz_puzzle_fix_all() which will  get a fully well-formed puzzle regardless of the state. It’s turned out to be a really nice design pattern.

I have now released the Editor as a separate application in flathub. You can download it here.


Another major feature this release is Acrostic Puzzle support. Tanmay worked on that for his GSOC project and did fabulously (details in his blog post.) The end result is gorgeous:

<figure aria-describedby="caption-attachment-7328" class="wp-caption aligncenter" id="attachment_7328" style="width: 840px">Animation of an Acrostic puzzle being solved<figcaption class="wp-caption-text" id="caption-attachment-7328">Animation of an Acrostic puzzle being solved</figcaption></figure>

I had a great time mentoring Tanmay for the summer, and we’re already making plans on how to add acrostic support to the Editor.


GUADEC in Latvia was a blast. Riga was fun, the countryside was surprisingly interesting, and the overall feel of the conference was lovely. As always, it was great to meet so many people, new and old.

<figure aria-describedby="caption-attachment-7346" class="wp-caption aligncenter" id="attachment_7346" style="width: 300px">Stone statue looking like a goomba<figcaption class="wp-caption-text" id="caption-attachment-7346">A wild Goomba in Riga</figcaption></figure>

GNOME Superstar Martin came to my Crosswords BOF in Riga and got it running on an actual mobile device. The end result was.. actually usable! It seems like all the work we did on adaptive sizing paid dividends. There are some rough corners (and Martin filed a number of bugs) but it worked surprisingly well out of the box.

I also got to meet with my other GSOC student Pratham. His work on anagrams will land next release, so we’ll have more to talk about then.


The final thing mentioning is that I bought the libipuz.org domain for the library. Unfortunately the main ipuz.org site has been down all summer, I put a mirror of the ipuz spec up there so people can read it.

I also managed to reach Roy Leban by phone. He’s the original author of the ipuz spec, and was able to clarify some questions I had when interpreting it. This led me down a deep rabbit hole around the part of the spec regarding clue directions, as it required a substantial rewrite to avoid hardcoding Across/Down directions everywhere.

Finally, Pranjal showed up completely out of nowhere with an MR to add ipuz_puzzle_equal(). This was a tricky function to write, and one I’ve wanted for a really long time. This has been making all our tests so much better. Hero! Pranjal is interested in adding a sudoku loader/saver to libipuz — maybe there will be more in this space in the future.


  • Niko, for massive help with the designs for the Editor
  • Tanmay, for Acrostic support
  • Martin, for testing Crosswords on mobile
  • Pranjal, for ipuz_puzzle_equal()
  • Federico, for testing fixes and overall support
  • Rosanna, for continued advice and crossword support
  • Pratham, for initial anagram support
  • Bart, for whatever magic he did to make libipuz.org work
  • The translators, for keeping us multi-lingual

Until next time!

Unix sockets, Cygwin, SSH agents, and sadness

Posted by Matthew Garrett on August 29, 2023 06:57 AM
Work involves supporting Windows (there's a lot of specialised hardware design software that's only supported under Windows, so this isn't really avoidable), but also involves git, so I've been working on extending our support for hardware-backed SSH certificates to Windows and trying to glue that into git. In theory this doesn't sound like a hard problem, but in practice oh good heavens.

Git for Windows is built on top of msys2, which in turn is built on top of Cygwin. This is an astonishing artifact that allows you to build roughly unmodified POSIXish code on top of Windows, despite the terrible impedance mismatches inherent in this. One is that until 2017, Windows had no native support for Unix sockets. That's kind of a big deal for compatibility purposes, so Cygwin worked around it. It's, uh, kind of awful. If you're not a Cygwin/msys app but you want to implement a socket they can communicate with, you need to implement this undocumented protocol yourself. This isn't impossible, but ugh.

But going to all this trouble helps you avoid another problem! The Microsoft version of OpenSSH ships an SSH agent that doesn't use Unix sockets, but uses a named pipe instead. So if you want to communicate between Cygwinish OpenSSH (as is shipped with git for Windows) and the SSH agent shipped with Windows, you need something that bridges between those. The state of the art seems to be to use npiperelay with socat, but if you're already writing something that implements the Cygwin socket protocol you can just use npipe to talk to the shipped ssh-agent and then export your own socket interface.

And, amazingly, this all works? I've managed to hack together an SSH agent (using Go's SSH agent implementation) that can satisfy hardware backed queries itself, but forward things on to the Windows agent for compatibility with other tooling. Now I just need to figure out how to plumb it through to WSL. Sigh.

comment count unavailable comments

Small Caps in Impress

Posted by Caolán McNamara on August 23, 2023 04:01 PM

Writer supports Small Caps, but Impress and drawing shapes in general never fully supported Small Caps. The option was available, and the character dialog provided a preview, but Small Caps was rendered the same as All Caps, as seen here.

 This has lingered for years and it's not easy as a user to try and manually workaround with varying font sizes because underline/overline/strike-through decorations won't link up, as seen in this example:

 but finally for Collabora Hack Week I was able to carve out some time to fix this. So in today's LibreOffice trunk we finally have this implemented as:

In addition a buggy case seen in the preview of double overlines for mixed upper/lower case was also fixed, from:


Also noticed during all of this was the wrong width scaling used for the red wave line underneath incorrect spelling in impress when the text is superscript or subscript so the line didn't stretch over the whole text, as seen here: 

Now corrected as:

and finally added was a missing implementation in the RTF export of shape text to allow the small caps format to be copy and pasted into other applications from impress.

Qt theming in Fedora Workstation

Posted by Jan Grulich on August 22, 2023 03:47 PM

We have been working on and using custom Qt theming in Fedora Workstation for many years now. By custom Qt theming, I’m talking about the QGnomePlatform and Adwaita-qt projects. If you haven’t heard of them, you can read my recent blog post explaining what they are. While these projects are in some ways better than what Qt upstream has to offer, there were also drawbacks/issues and that’s why I decided to make a final decision and discontinue both projects. The issues are explained in the aforementioned blog post, but one of the main drawbacks is that we are in this development alone and not working directly in the upstream makes it less attractive for other contributors. It’s also not used by default anywhere other than Fedora, so it’s not properly tested by other developers working on Qt applications using different distributions. These reasons led me to submit a Fedora 39 feature to remove our custom Qt theming in Fedora Workstation in favor of Qt’s defaults. The only problem is that if we just go with Qt’s default, we would go backwards a bit. This is because upstream Qt does not provide any decent client-side window decorations (problem #1), and the QGtkTheme in Qt5 (QGnomePlatform equivalent) is a bit behind its Qt6 version with many improvements and integration goodies (problem #2) recently made by Axel Spoerl of the Qt Group, whom I met during this year’s KDE Akademy.

Solution to problem #1

QGnomePlatform used to be our solution to this problem, as QGnomePlatform implemented it’s own version of the QWaylandAbstractDecoration plugin. This was a GTK 3-like decoration plugin that used Adwaita-qt for button rendering and QGnomePlatform bits (e.g. GSettings configuration) to get the titlebar layout. Since we are going to remove QGnomePlatform, we needed an alternative. So I started working on the QAdwaitaDecorations project. This is supposed to be a an intermediate step as I would like to have a proper GNOME/Gtk decorations directly in Qt upstream, but since I was in a hurry to get everything done in time for Fedora 39, we have this for now. QAdwaitaDecorations plugin is based on the decorations we have in QGnomePlatform, but there is no dependency on GTK or Glib (e.g. GSettings) or on Adwaita-qt. We use xdg-desktop-portal to get the titlebar layout and do our own drawing instead. This decoration plugin should also have now a GTK 4-like style so the buttons and colors of the decorations are different.

Below is a screenshot of Wireshark (Qt6) using QAdwaitaDecorations plugin + QGtkTheme + Fusion:

<figure class="wp-block-image size-large"></figure>

Solution to problem #2

Since Qt5 is no longer actively developed, the only possible solution is to backport all QGtkTheme improvements from Qt6, so I did that + modified some of those changes to avoid breaking binary compatibility. This results in about ~15 related backports to Qt5 so far, and it seems to work pretty well. I also made sure that Fedora 38 and older will still use QGnomePlatform by default, so we don’t change the behavior for existing users. Also, a small change to our QtWayland package was needed to make it use the new decoration plugin by default.

Future plans

As mentioned, I would really like to have everything directly in Qt upstream (talking about QAdwaitaDecorations). That way we get other contributions and thus fixes/improvements for free and a lot more users. Another thing is that QGnomePlatform supports things that are not yet supported/implemented in QGtkTheme, like support for xdg-desktop-portal instead of just relying on GSettings. Not to mention that GTK 4 has been around for a while, and both QGnomePlatform and QGtkTheme are still GTK 3 based. I will definitely try to make some of these things happen for Fedora 40, but knowing myself, it’s better not to make any promises, as things usually don’t go according to plan.

GNOME 45 Core Apps Update

Posted by Michael Catanzaro on August 17, 2023 03:57 PM

It’s been a few months since I last reviewed the state of GNOME core apps. For GNOME 45, we have implemented the changes proposed in the “Imminent Core App Changes” section of that blog post:

  • Loupe enters core as GNOME’s new image viewer app, developed by Christopher Davis and Sophie Herold. Loupe will be branded as Image Viewer and replaces Eye of GNOME, which will no longer use the Image Viewer branding. Eye of GNOME will continue to be maintained by Felix Riemann, and contributions are still welcome there.
  • Snapshot enters core as GNOME’s new camera app, developed by Maximiliano Sandoval and Jamie Murphy. Snapshot will be branded as Camera and replaces Cheese. Cheese will continue to be maintained by David King, and contributions are still welcome there.
  • GNOME Photos has been removed from core without replacement. This application could have been retained if more developers were interested in it, but we have made the decision to remove it due to lack of volunteers interested in maintaining it. Photos will likely be archived eventually, unless a new maintainer volunteers to save it.

GNOME 45 beta will be released imminently with the above changes. Testing the release and reporting bugs is much appreciated.

We are also looking for volunteers interested in helping implement future core app changes. Specifically, improvements are required for Music to remain in core, and improvements are required for Geary to enter core. We’re also not quite sure what to do with Contacts. If you’re interested in any of these projects, consider getting involved.

New responsibilities

Posted by Bastien Nocera on August 14, 2023 09:31 AM

As part of the same process outlined in Matthias Clasen's "LibreOffice packages" email, my management chain has made the decision to stop all upstream and downstream work on desktop Bluetooth, multimedia applications (namely totem, rhythmbox and sound-juicer) and libfprint/fprintd. The rest of my upstream and downstream work will be reassigned depending on Red Hat's own priorities (see below), as I am transferred to another team that deals with one of a list of Red Hat’s priority projects.

I'm very disappointed, because those particular projects were already starved for resources: I spent less than 10% of my work time on them in the past year, with other projects and responsibilities taking most of my time.

This means that, in the medium-term at least, all those GNOME projects will go without a maintainer, reviewer, or triager:
- gnome-bluetooth (including Settings panel and gnome-shell integration)
- totem, totem-pl-parser, gom
- libgnome-volume-control
- libgudev
- geocode-glib
- gvfs AFC backend

Those freedesktop projects will be archived until further notice:
- power-profiles-daemon
- switcheroo-control
- iio-sensor-proxy
- low-memory-monitor

I will not be available for reviewing libfprint/fprintd, upower, grilo/grilo-plugins, gnome-desktop thumbnailer sandboxing patches, or any work related to XDG specifications.

Kernel work, reviews and maintenance, including recent work on SteelSeries headset and Logitech devices kernel drivers, USB revoke for Flatpak Portal support, or core USB is suspended until further notice.

All my Fedora packages were orphaned about a month and a half ago, it's likely that there are still some that are orphaned, if there are takers. RHEL packages were unassigned about 3 weeks ago, they've been reassigned since then, so I cannot point to the new maintainer(s).

If you are a partner, or a customer, I would recommend that you get in touch with your Red Hat contacts to figure out what the plan is going forward for the projects you might be involved with.

If you are a colleague that will take on all or part of the 90% of the work that's not being stopped, or a community member that was relying on my work to further advance your own projects, get in touch, I'll do my best to accommodate your queries, time permitting.

I'll try to make sure to update this post, or create a new one if and when any of the above changes.

Updating Fedora the unsupported way

Posted by Matthew Garrett on August 08, 2023 05:54 AM
I dug out a computer running Fedora 28, which was released 2018-04-01 - over 5 years ago. Backing up the data and re-installing seemed tedious, but the current version of Fedora is 38, and while Fedora supports updates from N to N+2 that was still going to be 5 separate upgrades. That seemed tedious, so I figured I'd just try to do an update from 28 directly to 38. This is, obviously, extremely unsupported, but what could possibly go wrong?

Running sudo dnf system-upgrade download --releasever=38 didn't successfully resolve dependencies, but sudo dnf system-upgrade download --releasever=38 --allowerasing passed and dnf started downloading 6GB of packages. And then promptly failed, since I didn't have any of the relevant signing keys. So I downloaded the fedora-gpg-keys package from F38 by hand and tried to install it, and got a signature hdr data: BAD, no. of bytes(88084) out of range error. It turns out that rpm doesn't handle cases where the signature header is larger than a few K, and RPMs from modern versions of Fedora. The obvious fix would be to install a newer version of rpm, but that wouldn't be easy without upgrading the rest of the system as well - or, alternatively, downloading a bunch of build depends and building it. Given that I'm already doing all of this in the worst way possible, let's do something different.

The relevant code in the hdrblobRead function of rpm's lib/header.c is:

int32_t il_max = HEADER_TAGS_MAX;
int32_t dl_max = HEADER_DATA_MAX;

il_max = 32;
dl_max = 8192;

which indicates that if the header in question is RPMTAG_HEADERSIGNATURES, it sets more restrictive limits on the size (no, I don't know why). So I installed rpm-libs-debuginfo, ran gdb against librpm.so.8, loaded the symbol file, and then did disassemble hdrblobRead. The relevant chunk ends up being:

0x000000000001bc81 <+81>: cmp $0x3e,%ebx
0x000000000001bc84 <+84>: mov $0xfffffff,%ecx
0x000000000001bc89 <+89>: mov $0x2000,%eax
0x000000000001bc8e <+94>: mov %r12,%rdi
0x000000000001bc91 <+97>: cmovne %ecx,%eax

which is basically "If ebx is not 0x3e, set eax to 0xffffffff - otherwise, set it to 0x2000". RPMTAG_HEADERSIGNATURES is 62, which is 0x3e, so I just opened librpm.so.8 in hexedit, went to byte 0x1bc81, and replaced 0x3e with 0xfe (an arbitrary invalid value). This has the effect of skipping the if (regionTag == RPMTAG_HEADERSIGNATURES) code and so using the default limits even if the header section in question is the signatures. And with that one byte modification, rpm from F28 would suddenly install the fedora-gpg-keys package from F38. Success!

But short-lived. dnf now believed packages had valid signatures, but sadly there were still issues. A bunch of packages in F38 had files that conflicted with packages in F28. These were largely Python 3 packages that conflicted with Python 2 packages from F28 - jumping this many releases meant that a bunch of explicit replaces and the like no longer existed. The easiest way to solve this was simply to uninstall python 2 before upgrading, and avoiding the entire transition. Another issue was that some data files had moved from libxcrypt-common to libxcrypt, and removing libxcrypt-common would remove libxcrypt and a bunch of important things that depended on it (like, for instance, systemd). So I built a fake empty package that provided libxcrypt-common and removed the actual package. Surely everything would work now?

Ha no. The final obstacle was that several packages depended on rpmlib(CaretInVersions), and building another fake package that provided that didn't work. I shouted into the void and Bill Nottingham answered - rpmlib dependencies are synthesised by rpm itself, indicating that it has the ability to handle extensions that specific packages are making use of. This made things harder, since the list is hard-coded in the binary. But since I'm already committing crimes against humanity with a hex editor, why not go further? Back to editing librpm.so.8 and finding the list of rpmlib() dependencies it provides. There were a bunch, but I couldn't really extend the list. What I could do is overwrite existing entries. I tried this a few times but (unsurprisingly) broke other things since packages depended on the feature I'd overwritten. Finally, I rewrote rpmlib(ExplicitPackageProvide) to rpmlib(CaretInVersions) (adding an extra '\0' at the end of it to deal with it being shorter than the original string) and apparently nothing I wanted to install depended on rpmlib(ExplicitPackageProvide) because dnf finished its transaction checks and prompted me to reboot to perform the update. So, I did.

And about an hour later, it rebooted and gave me a whole bunch of errors due to the fact that dbus never got started. A bit of digging revealed that I had no /etc/systemd/system/dbus.service, a symlink that was presumably introduced at some point between F28 and F38 but which didn't get automatically added in my case because well who knows. That was literally the only thing I needed to fix up after the upgrade, and on the next reboot I was presented with a gdm prompt and had a fully functional F38 machine.

You should not do this. I should not do this. This was a terrible idea. Any situation where you're binary patching your package manager to get it to let you do something is obviously a bad situation. And with hindsight performing 5 independent upgrades might have been faster. But that would have just involved me typing the same thing 5 times, while this way I learned something. And what I learned is "Terrible ideas sometimes work and so you should definitely act upon them rather than doing the sensible thing", so like I said, you should not do this in case you learn the same lesson.

comment count unavailable comments

Introducing Passim

Posted by Richard Hughes on July 28, 2023 08:57 AM

tl;dr: Passim is a local caching server that uses mDNS to advertise files by their SHA-256 hash. Named after the Latin word for “here, there and everywhere” it might save a lot of people a lot of money.


Much of the software running on your computer that connects to other systems over the Internet needs to periodically download metadata or other information needed to perform other requests.

As part of running the passim/LVFS projects I’ve seen how download this “small” file once per 24h turns into tens of millions of requests per day — which is about ~10TB of bandwidth! Everybody downloads the same file from a CDN, and although a CDN is not super-expensive, it’s certainly not free. Everybody on your local network (perhaps dozens of users in an office) has to download the same 1MB blob of metadata from a CDN over a perhaps-non-free shared internet link.

What if we could download the file from the Internet CDN on one machine, and the next machine on the local network that needs it instead downloads it from the first machine? We could put a limit on the number of times it can be shared, and the maximum age so that we don’t store yesterdays metadata forever, and so that we don’t turn a ThinkPad X220 into a machine distributing 1Gb/s to every other machine in the office. We could cut the CDN traffic by at least one order of magnitude, but possibly much more. This is better for the person paying the cloud bill, the person paying for the internet connection, and the planet as a whole.

This is what Passim might be. You add automatically or manually add files to the daemon which stores them in /var/lib/passim/data with xattrs set on each file for the max-age and share-limit. When the file has been shared more than the share limit number of times, or is older than the max age it is deleted and not advertised to other clients.

The daemon then advertises the availability of the file as a mDNS service subtype and provides a tiny single-threaded HTTP v1.1 server that supplies the file over HTTPS using a self-signed certificate.

The file is sent when requested from a URL like – any file requested without the checksum will not be supplied. Although this is a chicken-and-egg problem where you don’t know the payload checksum until you’ve checked the remote server, this is solved using a tiny <100 byte request to the CDN for the payload checksum (or a .jcat file) and then the multi-megabyte (or multi-gigabyte!) payload can be found using mDNS. Using a Jcat file also means you know the PKCS#7/GPG signature of the thing you’re trying to request. Using a Metalink request would work as well I think.

Sharing Considerations

Here we’ve assuming your local network (aka LAN) is a nice and friendly place, without evil people trying to overwhelm your system or feed you fake files. Although we request files by their hash (and thus can detect tampering) and we hopefully also use a signature, it still uses resources to send a file over the network.

We’ll assume that any network with working mDNS (as implemented in Avahi) is good enough to get metadata from other peers. If Avahi is not running, or mDNS is turned off on the firewall then no files will be shared.

The cached index is available to localhost without any kind of authentication as a webpage on https://localhost:27500/.

Only processes running as UID 0 (a.k.a. root) can publish content to Passim. Before sharing everything, the effects of sharing can be subtle; if you download a security update for a Lenovo P1 Gen 3 laptop and share it with other laptops on your LAN — it also tells any attacker [with a list of all possible firmware updates] on your local network your laptop model and also that you’re running a system firmware that isn’t currently patched against the latest firmware bug.

My recommendation here is only to advertise files that are common to all machines. For instance:

  • AdBlocker metadata
  • Firmware update metadata
  • Remote metadata for update frameworks, e.g. apt-get/dnf etc.

Implementation Considerations

Any client MUST calculate the checksum of the supplied file and verify that it matches. There is no authentication or signing verification done so this step is non-optional. A malicious server could advertise the hash of firmware.xml.gz but actually supply evil-payload.exe — and you do not want that.


The obvious comparison to make is IPFS. I’ll try to make this as fair as possible, although I’m obviously somewhat biased.


  • Existing project that’s existed for many years tested by many people
  • Allows sharing with other users not on your local network
  • Not packaged in any distributions and not trivial to install correctly
  • Requires a significant time to find resources
  • Does not prioritize local clients over remote clients
  • Requires a internet-to-IPFS “gateway” which cost me a lot of $$$ for a large number of files


  • New project that’s not even finished
  • Only allowed sharing with computers on your local network
  • Returns results within 2s

One concern we had specifically with IPFS for firmware were ITAR/EAR legal considerations. e.g. we couldn’t share firmware containing strong encryption with users in some countries — which is actually most of the firmware the LVFS distributes. From an ITAR/EAR point of view Passim would be compliant (as it only shares locally, presumably in the same country) and IPFS certainly is not.

There’s a longer README in the git repo. There’s also a test patch that wires up fwupd with libpassim although it’s not ready for merging. For instance, I think it’s perfectly safe to share metadata but not firmware or distro package payloads – but for some people downloading payloads on a cellular link might be exactly what they want – so it’ll be configurable. For reference Windows Update also shares content (not just metadata) so maybe I’m worrying about nothing, and doing a distro upgrade from the computer next to them is exactly what people need. Small steps perhaps.

Comments welcome.

EDIT 2023-08-22: Made changes to reflect that we went from HTTP 1.0 to HTTP 1.1 with TLS.

Urgent VCL Mission

Posted by Caolán McNamara on July 25, 2023 07:37 PM

To mark the occasion of Noel's merge of Convert internal vcl bitmap formats transparency->alpha to align LibreOffice's internal concept of color opacity with everything else's.

Urgent VCL Mission

This work, "Urgent VCL Mission", is adapted from "Urgent Mission" by Randall Munroe, used under CC BY-NC 2.5, and licensed under same by me.

Amazingly there exists, not one, but two fonts based on Randall's handwriting.

Onward and upward to the end of split-Alpha.

Roots of Trust are difficult

Posted by Matthew Garrett on July 11, 2023 07:58 AM
The phrase "Root of Trust" turns up at various points in discussions about verified boot and measured boot, and to a first approximation nobody is able to give you a coherent explanation of what it means[1]. The Trusted Computing Group has a fairly wordy definition, but (a) it's a lot of words and (b) I don't like it, so instead I'm going to start by defining a root of trust as "A thing that has to be trustworthy for anything else on your computer to be trustworthy".

(An aside: when I say "trustworthy", it is very easy to interpret this in a cynical manner and assume that "trust" means "trusted by someone I do not necessarily trust to act in my best interest". I want to be absolutely clear that when I say "trustworthy" I mean "trusted by the owner of the computer", and that as far as I'm concerned selling devices that do not allow the owner to define what's trusted is an extremely bad thing in the general case)

Let's take an example. In verified boot, a cryptographic signature of a component is verified before it's allowed to boot. A straightforward implementation of a verified boot implementation has the firmware verify the signature on the bootloader or kernel before executing it. In this scenario, the firmware is the root of trust - it's the first thing that makes a determination about whether something should be allowed to run or not[2]. As long as the firmware behaves correctly, and as long as there aren't any vulnerabilities in our boot chain, we know that we booted an OS that was signed with a key we trust.

But what guarantees that the firmware behaves correctly? What if someone replaces our firmware with firmware that trusts different keys, or hot-patches the OS as it's booting it? We can't just ask the firmware whether it's trustworthy - trustworthy firmware will say yes, but the thing about malicious firmware is that it can just lie to us (either directly, or by modifying the OS components it boots to lie instead). This is probably not sufficiently trustworthy!

Ok, so let's have the firmware be verified before it's executed. On Intel this is "Boot Guard", on AMD this is "Platform Secure Boot", everywhere else it's just "Secure Boot". Code on the CPU (either in ROM or signed with a key controlled by the CPU vendor) verifies the firmware[3] before executing it. Now the CPU itself is the root of trust, and, well, that seems reasonable - we have to place trust in the CPU, otherwise we can't actually do computing. We can now say with a reasonable degree of confidence (again, in the absence of vulnerabilities) that we booted an OS that we trusted. Hurrah!

Except. How do we know that the CPU actually did that verification? CPUs are generally manufactured without verification being enabled - different system vendors use different signing keys, so those keys can't be installed in the CPU at CPU manufacture time, and vendors need to do code development without signing everything so you can't require that keys be installed before a CPU will work. So, out of the box, a new CPU will boot anything without doing verification[4], and development units will frequently have no verification.

As a device owner, how do you tell whether or not your CPU has this verification enabled? Well, you could ask the CPU, but if you're doing that on a device that booted a compromised OS then maybe it's just hotpatching your OS so when you do that you just get RET_TRUST_ME_BRO even if the CPU is desperately waving its arms around trying to warn you it's a trap. This is, unfortunately, a problem that's basically impossible to solve using verified boot alone - if any component in the chain fails to enforce verification, the trust you're placing in the chain is misplaced and you are going to have a bad day.

So how do we solve it? The answer is that we can't simply ask the OS, we need a mechanism to query the root of trust itself. There's a few ways to do that, but fundamentally they depend on the ability of the root of trust to provide proof of what happened. This requires that the root of trust be able to sign (or cause to be signed) an "attestation" of the system state, a cryptographically verifiable representation of the security-critical configuration and code. The most common form of this is called "measured boot" or "trusted boot", and involves generating a "measurement" of each boot component or configuration (generally a cryptographic hash of it), and storing that measurement somewhere. The important thing is that it must not be possible for the running OS (or any pre-OS component) to arbitrarily modify these measurements, since otherwise a compromised environment could simply go back and rewrite history. One frequently used solution to this is to segregate the storage of the measurements (and the attestation of them) into a separate hardware component that can't be directly manipulated by the OS, such as a Trusted Platform Module. Each part of the boot chain measures relevant security configuration and the next component before executing it and sends that measurement to the TPM, and later the TPM can provide a signed attestation of the measurements it was given. So, an SoC that implements verified boot should create a measurement telling us whether verification is enabled - and, critically, should also create a measurement if it isn't. This is important because failing to measure the disabled state leaves us with the same problem as before; someone can replace the mutable firmware code with code that creates a fake measurement asserting that verified boot was enabled, and if we trust that we're going to have a bad time.

(Of course, simply measuring the fact that verified boot was enabled isn't enough - what if someone replaces the CPU with one that has verified boot enabled, but trusts keys under their control? We also need to measure the keys that were used in order to ensure that the device trusted only the keys we expected, otherwise again we're going to have a bad time)

So, an effective root of trust needs to:

1) Create a measurement of its verified boot policy before running any mutable code
2) Include the trusted signing key in that measurement
3) Actually perform that verification before executing any mutable code

and from then on we're in the hands of the verified code actually being trustworthy, and it's probably written in C so that's almost certainly false, but let's not try to solve every problem today.

Does anything do this today? As far as I can tell, Intel's Boot Guard implementation does. Based on publicly available documentation I can't find any evidence that AMD's Platform Secure Boot does (it does the verification, but it doesn't measure the policy beforehand, so it seems spoofable), but I could be wrong there. I haven't found any general purpose non-x86 parts that do, but this is in the realm of things that SoC vendors seem to believe is some sort of value-add that can only be documented under NDAs, so please do prove me wrong. And then there are add-on solutions like Titan, where we delegate the initial measurement and validation to a separate piece of hardware that measures the firmware as the CPU reads it, rather than requiring that the CPU do it.

But, overall, the situation isn't great. On many platforms there's simply no way to prove that you booted the code you expected to boot. People have designed elaborate security implementations that can be bypassed in a number of ways.

[1] In this respect it is extremely similar to "Zero Trust"
[2] This is a bit of an oversimplification - once we get into dynamic roots of trust like Intel's TXT this story gets more complicated, but let's stick to the simple case today
[3] I'm kind of using "firmware" in an x86ish manner here, so for embedded devices just think of "firmware" as "the first code executed out of flash and signed by someone other than the SoC vendor"
[4] In the Intel case this isn't strictly true, since the keys are stored in the motherboard chipset rather than the CPU, and so taking a board with Boot Guard enabled and swapping out the CPU won't disable Boot Guard because the CPU reads the configuration from the chipset. But many mobile Intel parts have the chipset in the same package as the CPU, so in theory swapping out that entire package would disable Boot Guard. I am not good enough at soldering to demonstrate that.

comment count unavailable comments

Firefox on Linux update

Posted by Martin Stransky on July 07, 2023 02:36 PM
<figure class="aligncenter size-large is-resized"></figure>

It may look like Firefox development is stalled and frozen but nothing could be further from the truth. 22 300 patches landed in Firefox Mercurial repository since new year and we keep hacking 😀

Let’s look what’s new in Firefox on Linux and what could be interesting for Fedora users.

  • Large WebRTC update was merged with DMABuf screen sharing and xdg camera portal support, both developed by Jan Grulich. It should speed up screen sharing via. portals (used by Wayland, Snap and Flatpak) and enable new webcams on Linux.
  • VA-API was enabled for Intel GPU on Firefox 115.0. AMD is delayed due to various driver bugs but you can force enable it by setting media.hardware-video-decoding.force-enabled at about:config. VA-API state is printed at about:support page now.
  • David Turner implemented video hardware decoding support for H.264 on a Raspberry Pi 4 via V4L2-M2M and there’s ongoing work to enable VP8/9 too. If you have an arm with HW video decoding, try it in next Firefox 116.0 (or recent Beta).
  • Emilio Cobos Álvarez – fixed many Linux/Gtk style bugs like wrong colors, menu round corner rendering, titlebar rendering and also lots of Wayland bugs. He unified Firefox main window rendering on all desktops (we use client decorations – CSD – now) and have fixed regressions on desktops like XFCE/TWM.
    The style bugs are especially tricky ones as we removed X11/Wayland display connection from web content processes (due to security) and we can’t use native styles directly then.
  • Firefox 116 allows you to create Wayland or X11 exclusive builds. It means you can build Firefox without X11 (or Wayland) dependency which saves space and resources. Also you don’t need anymore Wayland build to run DMABuf and VA-API backends.
  • Mozilla started to implement Firefox tests on Wayland (along with testsuite migration to Ubuntu 22.04). It allows to ship Wayland backend by default with Mozilla binaries and Ubuntu/Snap and reduces regressions during development.
  • Fedora ships Firefox 115.0 with PGO/LTO optimization again (built by GCC) after long period of breakage. Mozilla/clang builds are still a bit faster than GCC ones (Mozilla binaries give me ~ 200 points in Speedometer2.1 while Fedora ones ~ 190 points) but GCC binaries are slightly smaller (130 MB vs. 150 MB).

    But it’s still much better than non-PGO ones, Fedora Firefox 114.0 did only ~140 points in the benchmark.

    Just out of curiosity I tested Chrome browsers too. Fedora Chromium gives me ~140 points and official Google Chrome reached ~ 200 points. Looks like Fedora Chromium need some tweaks too :D.

No one fights alone. A guide to your first Firefox patch on Linux.

Posted by Martin Stransky on July 04, 2023 12:13 PM
<figure class="wp-block-image size-large"></figure>

Have you ever hit an annoying Firefox bug and want to fix it? You’re right here. Firefox is a great open source project with many volunteer contributors, large community and any patches and help is very welcome.

This post aims to help you with your first patch to Firefox and become an active Firefox contributor.

Get and build Firefox sources

First of all you need to get Firefox sources and build themon Linux. Ubuntu is a natural choice of many but I use Fedora as it’s Wayland leading distro and I’m also Red Hat employee. It doesn’t matter much because Firefox installs its own build environment.

Firefox development happens on nightly which is the latest up-to-date sources and all changes goes there. For start you need mercurial package installed.

Download latest Firefox sources to src directory by mercurial:

hg clone http://hg.mozilla.org/mozilla-central/ src

A finished Firefox build may need ~ 40GB of free space so don’t be surprised. It’s quite a beast 😀

Create .mozconfig file in src directory to set build conditions. Correct values are selected automatically so you don’t need to care much about it now. You can use mine generic one for optimized builds:

. $topsrcdir/browser/config/mozconfig

mk_add_options BUILD_OFFICIAL=1
mk_add_options MOZILLA_OFFICIAL=1
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/objdir
mk_add_options AUTOCLOBBER=1

ac_add_options --disable-debug
ac_add_options --enable-optimize

And one for debug / non-optimized builds. That’s suitable for debugging:

. $topsrcdir/browser/config/mozconfig

mk_add_options BUILD_OFFICIAL=1
mk_add_options MOZILLA_OFFICIAL=1
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/objdir
mk_add_options AUTOCLOBBER=1

ac_add_options --enable-debug
ac_add_options --disable-optimize

Then you go to create a build environment and download build tools for it. Run bootstrap and select Firefox for Desktop there:

./mach bootstrap

That downloads all requested tools from Mozilla and put them to ~/.mozbuild directory. It also configures Mercurial to work with Firefox sources. If you decide to quit or you want free disk space, just delete .mozbuild and src dirs.

You should be ready to build Firefox from sources now:

./mach build

When build is finished a new firefox build is in src/objdir/dist/bin directory and you can run it there.

Firefox hacking

Mozilla provides powerful tool Searchfox where Firefox sources may be browsed and investigated. From Linux/Gtk perspective some interesting code parts are:

  • widget/gtk – there’s Firefox/Linux/Gtk integration point here named toolkit and almost all Linux related code is located here. Corresponding Bugzilla component is Core :: Widget/Gtk where most of Linux bugs are filed.
  • widget/gtk/nsWindow.cpp – every Firefox window (main browser window, popups, tooltips etc.) have its nsWindow object. It creates GtkWindow for browser, handles Gtk signals and so on. Look at nsWindow::Create() for instance.
  • widget/gtk/DMABufSurface.cpp – implements dmabuf based surfaces. It’s used for various Linux/HW operations like video playback and WebGL rendering.
  • toolkit/xre – Firefox startup code. Set’s up GdkDisplay, wayland/x11 connections and remote startup code.
  • dom/media/platforms/ffmpeg – ffmpeg video playback implementation, VA-API related code.
  • gfx/thebes/gfxPlatformGtk.cpp – Set’s up gfx environment on LinuxGtk. Initilizes and configures WebRender, VA-API, DMABuf and related features and enable/disable them according to actual hardware.
  • gfx/gl – EGL/GLX related code, contains OpenGL setup for Linux (and other systems).

You can use there hints as starting points for further investigation.

Mercurial mini-howto

Mercurial is a VCS used by Mozilla. I’ve never been good at it and both Git and Mercurial are still a mystery to me. But don’t worry, you need only a small subset of Mercurial commands to hack Firefox and submit your patches.

Show all changes in your sources:

hg diff

Revert all your changes:

hg revert --all

Or revert specified files:

hg revert file1 file2 ...

Commit your changes:

hg commit -m "commit message"

This is the most important Mercurial command you use. It saves your changes locally and you can submit them to Mozilla for review later.

Mercurial commits all changes by default but you can ask to commit selected files only:

hg commit -m "commit message" file1 file2 ...

After commit the change gets a revision number. You can print that by Mozilla hg wip helper:

hg wip

And you show complete patch (message and changes) by:

hg log -v --patch -r revision_number

Now let’s look at commit message. It’s important part of the commit and has following format for Firefox project:

Bug XXXXX - What changed r?reviewer_name

Look at Bug 1818517 and D172914 patch. They’re a good example how the commit message looks like:

Bug 1818517 - Ignore crossing-due-to-grab-start r?stransky

Bug field tells Mozilla tools where to attach your patch while r? field selects who is going to check and ack the change. You always need to specify these two entries.

Without bugzilla number and reviewer name the patch can’t be uploaded or will hang unnoticed there.

The patch reviewer is a key person for your contribution. He/she may direct you how to improve the patch, ack it and eventually check it into project. Sorrect reviewer selection may be tricky and could discourage new contributors.

If you create a new patch for Linux you can always select me (r?stransky). Reviewers for other parts of Firefox are listed here.

Submit your patch for review

So we have a patch committed to your local repository and want to upload it to Mozilla for review. First you need to create account in Bugzilla (bug tracking system) and Phabricator (patch tracking) and install moz-phab tool to submit your patches.

Let’s say we have Bugzilla/Phabricator installed and moz-phab is working. It’s time to submit your first patch. Make sure you enabled Mozilla mercurial extensions in mach bootstrap command above. If not please run it again and enable it.

hg wip command lists patches/revisions in nice format and you can select what to submit:

<figure class="wp-block-image size-large"></figure>

To submit our patch for Bug 1818517 use moz-phab tool with patch revision numbers:

moz-phab submit d67996f50ead d67996f50ead

and you have your first patch uploaded.

More useful Mercurial commands

Update your sources to the latest ones:

hg pull
hg up central

Switch sources to any revision/branch from history. Use hg wip to get revision number and use hg up to switch:

<figure class="wp-block-image size-large"></figure>
hg up d67996f50ead

Rebase tool allows you to re-arrange patches across branches, make patch stream linear or rebase your patches to latest trunk. It rebases single patch or more changesets.

In case of collision a merge tool is called. I personally use Meld tool as vimdiff is difficult to use for me. Use hg rebase –help to get complete options list. I usually use:

hg rebase -s source_revision -d dest_revision -k

histedit is a great tool for commit management:

hg histedit

It allows you to re-arrange, merge and drop individual patches and change commit messages or show individual patches.

Where to start?

Wanna help with the project and fix bugs reported by other people? Cool! Bugzilla is your friend there are trackers related to Linux/Gtk here:

And that’s all! Don’t hesitate to join Mozilla Matrix chat, have fun and become member of Firefox developer community. If you’re blocked on any step listed here, drop me a note or ask on Mozilla chat.

ASG! 2023 CfP Closes Soon

Posted by Lennart Poettering on July 03, 2023 10:00 PM

<large>The All Systems Go! 2023 Call for Participation Closes in Three Days!</large>

The Call for Participation (CFP) for All Systems Go! 2023 will close in three days, on 7th of July! We’d like to invite you to submit your proposals for consideration to the CFP submission site quickly!

ASG image

All topics relevant to foundational open-source Linux technologies are welcome. In particular, however, we are looking for proposals including, but not limited to, the following topics:

The CFP will close on July 7th, 2023. A response will be sent to all submitters on or before July 14th, 2023. The conference takes place in 🗺️ Berlin, Germany 🇩🇪 on Sept. 13-14th.

All Systems Go! 2023 is all about foundational open-source Linux technologies. We are primarily looking for deeply technical talks by and for developers, engineers and other technical roles.

We focus on the userspace side of things, so while kernel topics are welcome they must have clear, direct relevance to userspace. The following is a non-comprehensive list of topics encouraged for 2023 submissions:

  • Image-Based Linux 🖼️
  • Secure and Measured Boot 📏
  • TPM-Based Local/Remote Attestation, Encryption, Authentication 🔑
  • Low-level container executors and infrastructure ⚙️.
  • IoT, embedded and server Linux infrastructure
  • Reproducible builds 🔧
  • Package management, OS, container 📦, image delivery and updating
  • Building Linux devices and applications 🏗️
  • Low-level desktop 💻 technologies
  • Networking 🌐
  • System and service management 🚀
  • Tracing and performance measuring 🔍
  • IPC and RPC systems 🦜
  • Security 🔐 and Sandboxing 🏖️

For more information please visit our conference website!

gitlab.freedesktop.org now has a bugbot for automatic issue/merge request processing

Posted by Peter Hutterer on July 03, 2023 06:34 AM

As of today, gitlab.freedesktop.org provides easy hooks to invoke the gitlab-triage tool for your project. gitlab-triage allows for the automation of recurring tasks, for example something like

If the label FOO is set, close the issue and add a comment containing ".... blah ..."
Many project have recurring tasks like this, e.g. the wayland project gets a lot of issues that are compositor (not protocol) issues. Being able to just set a label and have things happen is much more convenient than having to type out the same explanations over and over again.

The goal for us was to provide automated handling for these with as little friction as possible. And of course each project must be able to decide what actions should be taken. Usually gitlab-triage is run as part of project-specific scheduled pipelines but since we already have webhook-based spam-fighting tools we figured we could make this even easier.

So, bugbot was born. Any project registered with bugbot can use labels prefixed with "bugbot::" to have gitlab-triage invoked against the project's policies file. These labels thus serve as mini-commands for bugbot, though each project decides what happens for any particular label. bugbot effectively works like this:

sleep 30
for label in {issue|merge_request}.current_labels:
  if label.startswith("bugbot::"):
     wget https://gitlab.freedesktop.org/foo/bar/-/raw/{main|master}/.triage-policies.yml
     run-gitlab-triage --as-user @bugbot --use-file .triage-policies.yml
And this is triggered on every issue/merge request update for any registered project which means that all you need to do is set the label and you're done. The things of note here:
  • bugbot delays by 30 seconds, giving you time to unset an accidentally applied label before it takes effect
  • bugbot doesn't care about the label beyond the (hard-coded) "bugbot::" prefix
  • bugbot always runs your project's triage policies, from your main or master branch (whichever succeeds first)
  • The actions are performed as the bugbot user, not your user
The full documentation of what you can do in a policies file is available at the gitlab-triage documentation but let's look at a simple example that shouldn't even need explanation:
      - name: convert bugbot label to other label
            - "bugbot::foo"
            - "foo"
            - "bugbot::foo"
          comment: |
            Nice label you have there. Would be a shame 
            if someone removed it
          status: "close"
And the effect of this file can be seen in this issue here.

Registering a project

Bugbot is part of the damspam project and registering a project can be done with a single command. Note: this can only be done by someone with the Maintainer role or above.

Create a personal access token with API access and save the token value as $XDG_CONFIG_HOME/bugbot/user.token Then run the following commands with your project's full path (e.g. mesa/mesa, pipewire/wireplumber, xorg/lib/libX11):

$ pip install git+https://gitlab.freedesktop.org/freedesktop/damspam
$ bugbot request-webhook foo/bar
After this you may remove the token file and the package
$ pip uninstall damspam
$ rm $XDG_CONFIG_HOME/bugbot/user.token
The bugbot command will file an issue in the freedesktop/fdo-bots repository. This issue will be automatically processed and should be done by the time you finish the above commands, see this issue for an example. Note: the issue processing requires a git push to an internal repo - if you script this for multiple repos please put a sleep(30) in to avoid conflicts.

Adding triage policies

Once registered, the .triage-policies.yml file must be added to the root directory of your project. What bugbot commands you want to respond to (and the actions to take) is up to you, though there are two things of note: you should always remove the bugbot label you are reacting to to avoid duplicate processing and gitlab-triage does not create new labels. So any label in your actions must be manually created in the project first. Beyond that - the sky's your limit.

Remember you can test your policies file with

 $ gitlab-triage --dry-run --token $GITLAB_TOKEN \
   --source-id foo/bar  --resource-reference 1234

As usual, many thanks to Benjamin Tissoires for reviews and the magic of integrating this into infrastructure.

snegg - Python bindings for libei

Posted by Peter Hutterer on June 06, 2023 09:36 AM

After what was basically a flurry of typing, the snegg Python bindings for libei are now available. This is a Python package that provides bindings to the libei/libeis/liboeffis C libraries with a little bit of API improvement to make it not completely terrible. The main goal of these bindings (at least for now) is to provide some quick and easy way to experiment with what could possibly be done using libei - both server-side and client-side. [1] The examples directory has a minimal EI client (with portal support via liboeffis) and a minimal EIS implementation. The bindings are still quite rough and the API is nowhere near stable.

A proper way to support EI in Python would be to implement the protocol directly - there's no need for the C API quirkiness this way and you can make full use of things like async and whatnot. If you're interested in that, get in touch! Meanwhile, writing something roughly resemling xdotool is probably only a few hundred lines of python code. [2]

[1] writing these also exposed a few bugs in libei itself so I'm happy 1.0 wasn't out just yet
[2] at least the input emulation parts of xdotool

PipeWire camera support in Firefox

Posted by Jan Grulich on May 29, 2023 12:18 PM

New year, new challenges.

We finally reached a major milestone with Chromium 110, which was a release where we finally got screen sharing enabled by default on Wayland, and since then you no longer have to go into the preferences and enable the flag you need. That doesn’t mean my work there is over, but I’ve shifted my focus to something related but slightly different and that is PipeWire camera support.

Work on PipeWire camera support started in 2021 and was done by Michael Olbrich (Pengutronix). He submitted a huge change to Chromium to add this support and had trouble finding a reviewer because there was actually no one who knew anything about PipeWire in the Chromium project. I actually saw his change request by accident, but we got in touch and decided to move this to WebRTC instead, because having it lower in the stack means we would get it automatically in other browsers, like Firefox. Michael attended a meeting we used to have regularly for screen sharing support in WebRTC and we discussed how to implement PipeWire camera support in WebRTC instead and how to reuse some of the code we already had for screen sharing to avoid code duplication. After a few submitted and reverted reviews (usually when things break Chromium parts that are not covered by CI, happened to me many times), we ended up with PipeWire camera support in WebRTC (talking about the beginning of this year).

Journey to PipeWire camera support in Firefox

Up to recently, my work has mostly been 95% WebRTC and 5% Chromium, but I have not been familiar with Firefox at all (not counting WebRTC backports). I actually started fixing screen sharing support in there first before moving to camera, because I noticed a few issues after Firefox (finally) did a WebRTC rebase to some of the newer versions. They’ve actually started doing monthly WebRTC rebases, which is really a good thing and I’m glad to see that happening. Anyway, even though Firefox has more recent WebRTC these days, when I started in February, there was still no PipeWire camera support at all because WebRTC was still a few months behind, so I had to backport all the patches and make them work with Firefox. Only then I could finally start working on the actual PipeWire camera support from WebRTC. Working on the backports, I was still working in the WebRTC space, so everything was somewhat familiar. Implementing the actual PipeWire support was a different story and took me some extra time to understand how everything works. This includes camera API on the WebRTC side, camera support on the Firefox side, and I also had to learn all the APIs specifically used in Firefox, but admittedly, learning about new things is fun too. After some tries and errors it started to work and I was able to share my camera using the PipeWire camera backend from WebRTC. You have to trust me that the picture below is not using the V4L2 backend.

<figure class="wp-block-image size-large is-resized is-style-default"></figure>

I went ahead and submitted my WebRTC backports and the PipeWire camera backend implementation for review to Firefox. Unfortunately, I was told that the code where I placed my implementation could also be used by the WebRTC Javascript API, which is used by bots to check for camera presence on the client side, which I didn’t know as someone who just recently started working on camera support. This was a problem because we get PipeWire access through xdg-desktop-portal and this involves showing a dialog to the user asking for camera access. Showing a camera request dialog randomly to the user would not be a good experience. Going back to the drawing board, I talked to Andreas Pehrson (Mozilla/WebRTC). Andreas was a great helper and we came up with a solution on how to implement it properly in Firefox and avoid things like I’ve mentioned before. This time it involved some re-org changes in WebRTC, where I split the xdg-desktop-portal and PipeWire implementations for PipeWire video capture, so we can request camera access in Firefox only when appropriate and only do the PipeWire stuff in the backend assuming the access was granted. So I did implement it again, this time according to what we agreed on with Andreas and it worked.

<figure class="wp-block-image size-large is-resized"></figure>

This is now submitted for review again and hopefully this time it will only need some minor fixes and not a complete rewrite like before, and you will be able to try/use it sooner rather than later. The main change is submitted here, but it is accompanied by other changes with WebRTC backports or changes that make the backports buildable with Firefox. With the first version of the change, I had a Fedora COPR repository, but I had to discontinue it, because it was too hard to maintain it in a buildable state on top of a stable Firefox. But you can be sure that Fedora will be the very first consumer of these changes once they are merged.

Why do we need this?

For many reasons. I would recommend you to read a blog post from Christian Schaller, where everything is explained into the details and gives you more information about the camera stack. Main reasons are:

  • Security
    • Access to the camera must be granted by the user, so you can be sure that no one is using your camera behind your back.
  • Flexibility
    • Your camera can be accessed by multiple clients simultaneously.
  • Libcamera support
    • Needed for ARM devices or devices using ChromeOS

Chromium support

While support in WebRTC has been done already a few months ago, Chromium originally didn’t use WebRTC video capture API for camera support and for that reason it had to be added. Michael implemented it and it is still currently pending on review so currently both Chromium and Firefox are both implemented, but waiting for approval.

Future plans

Most importantly, I want to get everything merged and working seamlessly, but I’m already aware of some issues and missing functionality in the PipeWire backend in WebRTC. And we also have the same problem we used to have with screen sharing, which is that it’s not enabled by default, unit tested, and feature complete and these things take time fix.

GNOME Logout Inhbit

Posted by Caolán McNamara on May 11, 2023 11:14 AM

Recently added support in LibreOffice towards 7.6 for GNOME logout inhibit if LibreOffice has open documents with unsaved changes.

GNOME Core Apps Update

Posted by Michael Catanzaro on May 10, 2023 09:32 PM

It’s been a while since my big core app reorganization for GNOME 3.22. Here is a history of core app changes since then:

  • GNOME 3.26 (September 2017) added Music, To Do (which has since been renamed to Endeavor), and Document Scanner (simple-scan). (I blogged about this at the time, then became lazy and stopped blogging about core app updates, until now.)
  • To Do was removed in GNOME 3.28 (March 2018) due to lack of consensus over whether it should really be a core app.  As a result of this, we improved communication between GNOME release team and design team to ensure both teams agree on future core app changes. Mea culpa.
  • Documents was removed in GNOME 3.32 (March 2019).
  • A new Developer Tools subcategory of core was created in GNOME 3.38 (September 2020), adding Builder, dconf Editor, Devhelp, and Sysprof. These apps are only interesting for software developers and are not intended to be installed by default in general-purpose operating systems like the rest of GNOME core.
  • GNOME 41 (September 2021) featured the first larger set of changes to GNOME core since GNOME 3.22. This release removed Archive Manager (file-roller), since Files (nautilus) is now able to handle archives, and also removed gedit (formerly Text Editor). It added Connections and a replacement Text Editor app (gnome-text-editor). It also added a new Mobile subcategory of core, for apps intended for mobile-focused operating systems, featuring the dialer app Calls. (To date, the Mobile subcategory has not been very successful: so far Calls is the only app included there.)
  • GNOME 42 (March 2022) featured a second larger set of changes. Screenshot was removed because GNOME Shell gained a built-in screenshot tool. Terminal was removed in favor of Console (kgx). We also moved Boxes to the Developer Tools subcategory, to recommend that it no longer be installed by default in general purpose operating systems.
  • GNOME 43 (September 2022) added D-Spy to Developer Tools.

OK, now we’re caught up on historical changes. So, what to expect next?

New Process for Core Apps Changes

Although most of the core app changes have gone smoothly, we ran into some trouble replacing Terminal with Console. Console provides a fresher and simpler user interface on top of vte, the same terminal backend used by Terminal, so Console and Terminal share much of the same underlying functionality. This means work of the Terminal maintainers is actually key to the success of Console. Using a new terminal app rather than evolving Terminal allowed for bigger changes to the default user experience without upsetting users who prefer the experience provided by Terminal. I think Console is generally nicer than Terminal, but it is missing a few features that Fedora Workstation developers thought were important to have before replacing Terminal with Console. Long story short: this core app change was effectively rejected by one of our most important downstreams. Since then, Console has not seen very much development, and accordingly it is unlikely to be accepted into Fedora Workstation anytime soon. We messed up by adding the app to core before downstreams were comfortable with it, and at this point it has become unclear whether Console should remain in core or whether we should give up and bring back Terminal. Console remains for now, but I’m not sure where we go from here. Help welcome.

To prevent this situation from happening again, Chris and Sophie developed a detailed and organized process for adding or removing core apps, including a new Incubator category designed to provide notice to downstreams that we are considering adding new apps to GNOME core. The new Incubator is much more structured than my previous short-lived Incubator attempt in GNOME 3.22. When apps are added to Incubator, I’ve been proactively asking other Fedora Workstation developers to provide feedback to make sure the app is considered ready there, to avoid a repeat of the situation with Console. Other downstreams are also welcome to watch the  Incubator/Submission project and provide feedback on newly-submitted apps, which should allow plenty of heads-up so downstreams can let us know sooner rather than later if there are problems with Incubator apps. Hopefully this should ensure apps are actually adopted by downstreams when they enter GNOME core.

Imminent Core App Changes

Currently there are two apps in Incubator. Loupe is a new image viewer app developed by Chris and Sophie to replace Image Viewer (eog). Snapshot is a new camera app developed by Maximiliano and Jamie to replace Cheese. These apps are maturing rapidly and have received primarily positive feedback thus far, so they are likely to graduate from Incubator and enter GNOME core sooner rather than later. The time to provide feedback is now. Don’t be surprised if Loupe is included in core for GNOME 45.

In addition to Image Viewer and Cheese, we are also considering removing Photos. Photos is one of our “content apps” designed to allow browsing an entire collection of files independently of their filesystem locations. Historically, the other two content apps were Documents and Music. The content app strategy did not work very well for Documents, since a document browser doesn’t really offer many advantages over a file browser, but Photos and Music are both pretty decent at displaying your collection of pictures or songs, assuming you have such a collection. We have been discussing what to do with Photos and the other content apps for a very long time, at least since 2015. It took a very long time to reach some rough consensus, but we have finally agreed that the design of Photos still makes sense for GNOME: having a local app for viewing both local and cloud photos is still useful. However, Photos is no longer actively maintained. Several basic functionality bugs imperiled timely release of Fedora 37 last fall, and the app is less useful than previously because it no longer integrates with cloud services like Google Photos. (The Google integration depends on libgdata, which was removed from GNOME 44 because it did not survive the transition to libsoup 3.) Photos has failed the new core app review process due to lack of active maintenance, and will be soon be removed from GNOME core unless a new maintainer steps up to take care of it. Volunteers welcome.

Future Core App Changes

Lastly, I want to talk about some changes that are not yet planned, but might occur in the future. Think of this entire section as brainstorming rather than any concrete plans.

Like Photos, we have also been discussing the status of Music. The popularity of DRM-encumbered cloud music services has increased, and local music storage does not seem to be as common as it used to be. If you do have local music, Music is pretty decent at handling it, but there are prominent bugs and missing features (like the ability to select which folders to index) detracting from the user experience. We do not have consensus on whether having a core app to play local music files still makes sense, since most users probably do not have a local music collection anymore. But perhaps all that is a moot point, because Videos (totem) 3.38 removed support for opening audio files, leaving us with no core apps capable of playing audio for the past 2.5 years. Previously, our default music player was Videos, which was really weird, and now we have none; Music can only play audio files that you’ve navigated to using Music itself, so it’s impossible for Music to be our default music player. My suggestion to rename Videos to Media Player and handle audio files again has not been well-received, so the most likely solution to this conundrum is to teach Music how to open audio files, likely securing its future in core. A merge request exists, but it does not look close to landing. Fedora Workstation is still shipping Rhythmbox rather than Music specifically due to this problem. My opinion is this needs to be resolved for Music to remain in core.

It would be nice to have an email client in GNOME core, since everybody uses email and local clients are much nicer than webmail. The only plausible candidate here is Geary. (If you like Evolution, consider that you might not like the major UI changes and many, many feature removals that would be necessary for Evolution to enter GNOME core.) Geary has only one active maintainer, and adding a big application that depends on just one person seems too risky. If more developers were interested in maintaining Geary, it would feel like a safer addition to GNOME core.

Contacts feels a little out of place currently. It’s mostly useful for storing email addresses, but you cannot actually do anything with them because we have no email application in core. Like Photos, Contacts has had several recent basic functionality bugs that imperiled timely Fedora releases, but these seem to have been largely resolved, so it’s not causing urgent problems. Still, for Contacts to remain in the long term, we’re probably going to need another maintainer here too. And perhaps it only makes sense to keep if we add Geary.

Finally, should Maps move to the Mobile category? It seems clearly useful to have a maps app installed by default on a phone, but I wonder how many desktop users really prefer to use Maps rather than a maps website.

GNOME 44 Core Apps

I’ll end this blog post with an updated list of core apps as of GNOME 44. Here they are:

  • Main category (26 apps):
    • Calculator
    • Calendar
    • Characters
    • Cheese
    • Clocks
    • Connections
    • Console (kgx)
    • Contacts
    • Disks (gnome-disk-utility)
    • Disk Usage Analyzer (baobab)
    • Document Scanner (simple-scan)
    • Document Viewer (evince)
    • Files (nautilus)
    • Fonts (gnome-font-viewer)
    • Help (yelp)
    • Image Viewer (eog)
    • Logs
    • Maps
    • Music
    • Photos
    • Software
    • System Monitor
    • Text Editor
    • Videos (totem)
    • Weather
    • Web (epiphany)
  • Developer Tools (6 apps):
    • Boxes
    • Builder
    • dconf Editor
    • Devhelp
    • D-Spy
    • sysprof
  • Mobile (1 app):
    • Calls

MSI and Insecure KMs

Posted by Richard Hughes on May 09, 2023 01:40 PM

As some as you may know, MSI suffered a data breach which leaked a huge amount of source code, documentation and low-level firmware PRIVATE KEYS. This is super bad as it now allows anyone to sign a random firmware image and install it as an official MSI firmware. It’s even more super bad than that, as the certificates leaked seem to be the KeyManifest keys, which actually control the layer below SecureBoot, this little-documented and even less well understood thing called BootGuard. I’ll not overplay the impact here, but there is basically no firmware security on most modern MSI hardware now. We already detect the leaked test keys from Lenovo and notify the user via the HSI test failure and I think we should do the same thing for MSI devices too. I’ve not downloaded the leak for obvious reasons, and I don’t think the KM hashes would be easy to find either.

So what can you do to help? Do you have an MSI laptop or motherboard affected by the leak? The full list is here (source: Binarly) and if you have one of those machines I’d ask if you could follow the instructions below, run MEInfo and attach it to the discussion please.

As for how to get MEInfo, Intel doesn’t want to make it easy for us. The Intel CSME System Tools are all different binaries, and are seemingly all compiled one-by-one for each specific MEI generation — and available only from a semi-legitimate place unless you’re an OEM or ODM. Once you have the archive of tools you either have to work out what CSME revision you have (e.g. Ice Point is 13.0) or do what I do and extract all the versions and just keep running them until one works. e.g. choosing the wrong one will get you:

sudo ./CSME\ System\ Tools\ v13.50\ r3/MEInfo/LINUX64/MEInfo 
Intel (R) MEInfo Version:
Copyright (C) 2005 - 2021, Intel Corporation. All rights reserved.
Error 621: Unsupported hardware platform. HW: Cometlake Platform. Supported HW: Jasplerlake Platform.

And choosing the right one will get you:

Intel (R) MEInfo Version:
Copyright (C) 2005 - 2021, Intel Corporation. All rights reserved.

General FW Information
OEM Public Key Hash FPF                          2B4D5D79BD7EE3C192412A4501D88FB2066C853FF7B1060765395D671B15D30C

Now, how to access these hashes is what Intel keeps a secret, for no reason at all. I literally need to know what integer index to use when querying the HECI device. I’ve asked Intel, but I’ve been waiting since October 2022. For instance:

sudo strace -xx -s 4096  -e openat,read,write,close ./CSME\ System\ Tools\ v14.0.20+\ r20/MEInfo/LINUX64/MEInfo
write(3, "\x0a\x0a\x00\x00\x00\x23\x00\x40\x00\x00\x00\x00\x20\x00\x00\x00\x00", 17) = 17
read(3, "\x0a\x8a\x00\x00\x20\x00\x00\x00\x2b\x4d\x5d\x79\xbd\x7e\xe3\xc1\x92\x41\x2a\x45\x01\xd8\x8f\xb2\x06\x6c\x85\x3f\xf7\xb1\x06\x07\x65\x39\x5d\x67\x1b\x15\xd3\x0c", 4096) = 40

That contains all the information I need – the Comet Lake READ_FILE_EX ID is 0x40002300 and there’s a SHA256 hash that matches what the OEM Public Key Hash FPF console output said above. There are actually three accesses to get the same hash in three different places, so until I know why I’d like the entire output from MEInfo.

The information I need uploading to the bug is then just these two files:

sudo strace -xx -s 4096  -e openat,read,write,close ./THE_CORRECT_PATH/MEInfo/LINUX64/MEInfo &> YOUR_GITHUB_USERNAME-meinfo-strace.txt

If I need more info I’ll ask on the ticket. Thanks!

libei and a fancy protocol

Posted by Peter Hutterer on May 09, 2023 12:51 AM

libei is the library for Emulated Input - see this post for an introduction. Like many projects, libei was started when it was still unclear if it could be the right solution to the problem. In the years (!) since, we've upgraded the answer to that question from "hopefully" to "yeah, I reckon" - doubly so since we added support for receiver contexts and got InputLeap working through the various portal changes.

Emulating or capturing input needs two processes to communicate for obvious reasons so the communication protocol is a core part of it. But initially, libei was a quickly written prototype and the protocol was hacked up on an as-needed let's-get-this-working basis. The rest of the C API got stable enough but the protocol was the missing bit. Long-term the protocol must be stable - without a stable protocol updating your compositor may break all flatpaks still shipping an older libei. Or updating a flatpak may not work with an older compositor. So in the last weeks/months, a lot of work as gone into making the protocol stable. This consisted of two parts: drop protobuf and make the variuos features interface-dependent, unashamedly quite like the Wayland protocol which is also split into a number of interfaces that can be independently versioned. Initially, I attempted to make the protocol binary compatible with Wayland but dropped that goal eventually - the benefits were minimal and the effort and limitations (due to different requirements) were quite significant.

The protocol is defined in a single XML file and can be used directly from language bindings (if any). The protocol documentation is quite extensive but it's relatively trivial in principal: the first 8 bytes of each message are the object ID, then we have 4 bytes for the message length in bytes, then 4 for the object-specific opcode. That opcode is one of the requests or events in the object's interface - which is defined at object creation time. Unlike Wayland, the majority of objects in libei are created in server-side (the EIS implementation decides which seats are available and which devices in those seats). The remainder of the message are the arguments. Note that unlike other protocols the message does not carry a signature - prior knowledge of the message is required to parse the arguments. This is a direct effect of initially making it wayland-compatible and I didn't really find it worth the effort to add this.

Anyway, long story short: swapping the protocol out didn't initially have any effect on the C library but with the changes came some minor updates to remove some of the warts in the API. Perhaps the biggest change is that the previous capabilities of a device are now split across several interfaces. Your average mouse-like emulated device will have the "pointer", "button" and "scroll" interfaces, or maybe the "pointer_absolute", "button" and "scroll" interface. The touch and keyboard interfaces were left as-is. Future interfaces will likely include gestures and tablet tools, I have done some rough prototyping locally and it will fit in nicely enough with the current protocol.

At the time of writing, the protocol is not officialy stable but I have no intention of changing it short of some bug we may discover. Expect libei 1.0 very soon.

Twitter's e2ee DMs are better than nothing

Posted by Matthew Garrett on May 04, 2023 09:49 PM
(Edit 2023-05-10: This has now launched for a subset of Twitter users. The code that existed to notify users that device identities had changed does not appear to have been enabled - as a result, in its current form, Twitter can absolutely MITM conversations and read your messages)

Elon Musk appeared on an interview with Tucker Carlson last month, with one of the topics being the fact that Twitter could be legally compelled to hand over users' direct messages to government agencies since they're held on Twitter's servers and aren't encrypted. Elon talked about how they were in the process of implementing proper encryption for DMs that would prevent this - "You could put a gun to my head and I couldn't tell you. That's how it should be."

tl;dr - in the current implementation, while Twitter could subvert the end-to-end nature of the encryption, it could not do so without users being notified. If any user involved in a conversation were to ignore that notification, all messages in that conversation (including ones sent in the past) could then be decrypted. This isn't ideal, but it still seems like an improvement over having no encryption at all. More technical discussion follows.

For context: all information about Twitter's implementation here has been derived from reverse engineering version 9.86.0 of the Android client and 9.56.1 of the iOS client (the current versions at time of writing), and the feature hasn't yet launched. While it's certainly possible that there could be major changes in the protocol between now launch, Elon has asserted that they plan to launch the feature this week so it's plausible that this reflects what'll ship.

For it to be impossible for Twitter to read DMs, they need to not only be encrypted, they need to be encrypted with a key that's not available to Twitter. This is what's referred to as "end-to-end encryption", or e2ee - it means that the only components in the communication chain that have access to the unencrypted data are the endpoints. Even if the message passes through other systems (and even if it's stored on other systems), those systems do not have access to the keys that would be needed to decrypt the data.

End-to-end encrypted messengers were initially popularised by Signal, but the Signal protocol has since been incorporated into WhatsApp and is probably much more widely used there. Millions of people per day are sending messages to each other that pass through servers controlled by third parties, but those third parties are completely unable to read the contents of those messages. This is the scenario that Elon described, where there's no degree of compulsion that could cause the people relaying messages to and from people to decrypt those messages afterwards.

But for this to be possible, both ends of the communication need to be able to encrypt messages in a way the other end can decrypt. This is usually performed using AES, a well-studied encryption algorithm with no known significant weaknesses. AES is a form of what's referred to as a symmetric encryption, one where encryption and decryption are performed with the same key. This means that both ends need access to that key, which presents us with a bootstrapping problem. Until a shared secret is obtained, there's no way to communicate securely, so how do we generate that shared secret? A common mechanism for this is something called Diffie Hellman key exchange, which makes use of asymmetric encryption. In asymmetric encryption, an encryption key can be split into two components - a public key and a private key. Both devices involved in the communication combine their private key and the other party's public key to generate a secret that can only be decoded with access to the private key. As long as you know the other party's public key, you can now securely generate a shared secret with them. Even a third party with access to all the public keys won't be able to identify this secret. Signal makes use of a variation of Diffie-Hellman called Extended Triple Diffie-Hellman that has some desirable properties, but it's not strictly necessary for the implementation of something that's end-to-end encrypted.

Although it was rumoured that Twitter would make use of the Signal protocol, and in fact there are vestiges of code in the Twitter client that still reference Signal, recent versions of the app have shipped with an entirely different approach that appears to have been written from scratch. It seems simple enough. Each device generates an asymmetric keypair using the NIST P-256 elliptic curve, along with a device identifier. The device identifier and the public half of the key are uploaded to Twitter using a new API endpoint called /1.1/keyregistry/register. When you want to send an encrypted DM to someone, the app calls /1.1/keyregistry/extract_public_keys with the IDs of the users you want to communicate with, and gets back a list of their public keys. It then looks up the conversation ID (a numeric identifier that corresponds to a given DM exchange - for a 1:1 conversation between two people it doesn't appear that this ever changes, so if you DMed an account 5 years ago and then DM them again now from the same account, the conversation ID will be the same) in a local database to retrieve a conversation key. If that key doesn't exist yet, the sender generates a random one. The message is then encrypted with the conversation key using AES in GCM mode, and the conversation key is then put through Diffie-Hellman with each of the recipients' public device keys. The encrypted message is then sent to Twitter along with the list of encrypted conversation keys. When each of the recipients' devices receives the message it checks whether it already has a copy of the conversation key, and if not performs its half of the Diffie-Hellman negotiation to decrypt the encrypted conversation key. One it has the conversation key it decrypts it and shows it to the user.

What would happen if Twitter changed the registered public key associated with a device to one where they held the private key, or added an entirely new device to a user's account? If the app were to just happily send a message with the conversation key encrypted with that new key, Twitter would be able to decrypt that and obtain the conversation key. Since the conversation key is tied to the conversation, not any given pair of devices, obtaining the conversation key means you can then decrypt every message in that conversation, including ones sent before the key was obtained.

(An aside: Signal and WhatsApp make use of a protocol called Sesame which involves additional secret material that's shared between every device a user owns, hence why you have to do that QR code dance whenever you add a new device to your account. I'm grossly over-simplifying how clever the Signal approach is here, largely because I don't understand the details of it myself. The Signal protocol uses something called the Double Ratchet Algorithm to implement the actual message encryption keys in such a way that even if someone were able to successfully impersonate a device they'd only be able to decrypt messages sent after that point even if they had encrypted copies of every previous message in the conversation)

How's this avoided? Based on the UI that exists in the iOS version of the app, in a fairly straightforward way - each user can only have a single device that supports encrypted messages. If the user (or, in our hypothetical, a malicious Twitter) replaces the device key, the client will generate a notification. If the user pays attention to that notification and verifies with the recipient through some out of band mechanism that the device has actually been replaced, then everything is fine. But, if any participant in the conversation ignores this warning, the holder of the subverted key can obtain the conversation key and decrypt the entire history of the conversation. That's strictly worse than anything based on Signal, where such impersonation would simply not work, but even in the Twitter case it's not possible for someone to silently subvert the security.

So when Elon says Twitter wouldn't be able to decrypt these messages even if someone held a gun to his head, there's a condition applied to that - it's true as long as nobody fucks up. This is clearly better than the messages just not being encrypted at all in the first place, but overall it's a weaker solution than Signal. If you're currently using Twitter DMs, should you turn on encryption? As long as the limitations aren't too limiting, definitely! Should you use this in preference to Signal or WhatsApp? Almost certainly not. This seems like a genuine incremental improvement, but it'd be easy to interpret what Elon says as providing stronger guarantees than actually exist.

comment count unavailable comments

PSA: upgrade your LUKS key derivation function

Posted by Matthew Garrett on April 18, 2023 12:26 AM
Here's an article from a French anarchist describing how his (encrypted) laptop was seized after he was arrested, and material from the encrypted partition has since been entered as evidence against him. His encryption password was supposedly greater than 20 characters and included a mixture of cases, numbers, and punctuation, so in the absence of any sort of opsec failures this implies that even relatively complex passwords can now be brute forced, and we should be transitioning to even more secure passphrases.

Or does it? Let's go into what LUKS is doing in the first place. The actual data is typically encrypted with AES, an extremely popular and well-tested encryption algorithm. AES has no known major weaknesses and is not considered to be practically brute-forceable - at least, assuming you have a random key. Unfortunately it's not really practical to ask a user to type in 128 bits of binary every time they want to unlock their drive, so another approach has to be taken.

This is handled using something called a "key derivation function", or KDF. A KDF is a function that takes some input (in this case the user's password) and generates a key. As an extremely simple example, think of MD5 - it takes an input and generates a 128-bit output, so we could simply MD5 the user's password and use the output as an AES key. While this could technically be considered a KDF, it would be an extremely bad one! MD5s can be calculated extremely quickly, so someone attempting to brute-force a disk encryption key could simply generate the MD5 of every plausible password (probably on a lot of machines in parallel, likely using GPUs) and test each of them to see whether it decrypts the drive.

(things are actually slightly more complicated than this - your password is used to generate a key that is then used to encrypt and decrypt the actual encryption key. This is necessary in order to allow you to change your password without having to re-encrypt the entire drive - instead you simply re-encrypt the encryption key with the new password-derived key. This also allows you to have multiple passwords or unlock mechanisms per drive)

Good KDFs reduce this risk by being what's technically referred to as "expensive". Rather than performing one simple calculation to turn a password into a key, they perform a lot of calculations. The number of calculations performed is generally configurable, in order to let you trade off between the amount of security (the number of calculations you'll force an attacker to perform when attempting to generate a key from a potential password) and performance (the amount of time you're willing to wait for your laptop to generate the key after you type in your password so it can actually boot). But, obviously, this tradeoff changes over time - defaults that made sense 10 years ago are not necessarily good defaults now. If you set up your encrypted partition some time ago, the number of calculations required may no longer be considered up to scratch.

And, well, some of these assumptions are kind of bad in the first place! Just making things computationally expensive doesn't help a lot if your adversary has the ability to test a large number of passwords in parallel. GPUs are extremely good at performing the sort of calculations that KDFs generally use, so an attacker can "just" get a whole pile of GPUs and throw them at the problem. KDFs that are computationally expensive don't do a great deal to protect against this. However, there's another axis of expense that can be considered - memory. If the KDF algorithm requires a significant amount of RAM, the degree to which it can be performed in parallel on a GPU is massively reduced. A Geforce 4090 may have 16,384 execution units, but if each password attempt requires 1GB of RAM and the card only has 24GB on board, the attacker is restricted to running 24 attempts in parallel.

So, in these days of attackers with access to a pile of GPUs, a purely computationally expensive KDF is just not a good choice. And, unfortunately, the subject of this story was almost certainly using one of those. Ubuntu 18.04 used the LUKS1 header format, and the only KDF supported in this format is PBKDF2. This is not a memory expensive KDF, and so is vulnerable to GPU-based attacks. But even so, systems using the LUKS2 header format used to default to argon2i, again not a memory expensive KDFwhich is memory strong, but not designed to be resistant to GPU attack (thanks to the comments pointing out my misunderstanding here). New versions default to argon2id, which is. You want to be using argon2id.

What makes this worse is that distributions generally don't update this in any way. If you installed your system and it gave you pbkdf2 as your KDF, you're probably still using pbkdf2 even if you've upgraded to a system that would use argon2id on a fresh install. Thankfully, this can all be fixed-up in place. But note that if anything goes wrong here you could lose access to all your encrypted data, so before doing anything make sure it's all backed up (and figure out how to keep said backup secure so you don't just have your data seized that way).

First, make sure you're running as up-to-date a version of your distribution as possible. Having tools that support the LUKS2 format doesn't mean that your distribution has all of that integrated, and old distribution versions may allow you to update your LUKS setup without actually supporting booting from it. Also, if you're using an encrypted /boot, stop now - very recent versions of grub2 support LUKS2, but they don't support argon2id, and this will render your system unbootable.

Next, figure out which device under /dev corresponds to your encrypted partition. Run


and look for entries that have a type of "crypt". The device above that in the tree is the actual encrypted device. Record that name, and run

sudo cryptsetup luksHeaderBackup /dev/whatever --header-backup-file /tmp/luksheader

and copy that to a USB stick or something. If something goes wrong here you'll be able to boot a live image and run

sudo cryptsetup luksHeaderRestore /dev/whatever --header-backup-file luksheader

to restore it.

(Edit to add: Once everything is working, delete this backup! It contains the old weak key, and someone with it can potentially use that to brute force your disk encryption key using the old KDF even if you've updated the on-disk KDF.)

Next, run

sudo cryptsetup luksDump /dev/whatever

and look for the Version: line. If it's version 1, you need to update the header to LUKS2. Run

sudo cryptsetup convert /dev/whatever --type luks2

and follow the prompts. Make sure your system still boots, and if not go back and restore the backup of your header. Assuming everything is ok at this point, run

sudo cryptsetup luksDump /dev/whatever

again and look for the PBKDF: line in each keyslot (pay attention only to the keyslots, ignore any references to pbkdf2 that come after the Digests: line). If the PBKDF is either "pbkdf2" or "argon2i" you should convert to argon2id. Run the following:

sudo cryptsetup luksConvertKey /dev/whatever --pbkdf argon2id

and follow the prompts. If you have multiple passwords associated with your drive you'll have multiple keyslots, and you'll need to repeat this for each password.

Distributions! You should really be handling this sort of thing on upgrade. People who installed their systems with your encryption defaults several years ago are now much less secure than people who perform a fresh install today. Please please please do something about this.

comment count unavailable comments

Booting modern Intel CPUs

Posted by Matthew Garrett on April 17, 2023 12:04 AM
CPUs can't do anything without being told what to do, which leaves the obvious problem of how do you tell a CPU to do something in the first place. On many CPUs this is handled in the form of a reset vector - an address the CPU is hardcoded to start reading instructions from when power is applied. The address the reset vector points to will typically be some form of ROM or flash that can be read by the CPU even if no other hardware has been configured yet. This allows the system vendor to ship code that will be executed immediately after poweron, configuring the rest of the hardware and eventually getting the system into a state where it can run user-supplied code.

The specific nature of the reset vector on x86 systems has varied over time, but it's effectively always been 16 bytes below the top of the address space - so, 0xffff0 on the 20-bit 8086, 0xfffff0 on the 24-bit 80286, and 0xfffffff0 on the 32-bit 80386. Convention on x86 systems is to have RAM starting at address 0, so the top of address space could be used to house the reset vector with as low a probability of conflicting with RAM as possible.

The most notable thing about x86 here, though, is that when it starts running code from the reset vector, it's still in real mode. x86 real mode is a holdover from a much earlier era of computing. Rather than addresses being absolute (ie, if you refer to a 32-bit address, you store the entire address in a 32-bit or larger register), they are 16-bit offsets that are added to the value stored in a "segment register". Different segment registers existed for code, data, and stack, so a 16-bit address could refer to different actual addresses depending on how it was being interpreted - jumping to a 16 bit address would result in that address being added to the code segment register, while reading from a 16 bit address would result in that address being added to the data segment register, and so on. This is all in order to retain compatibility with older chips, to the extent that even 64-bit x86 starts in real mode with segments and everything (and, also, still starts executing at 0xfffffff0 rather than 0xfffffffffffffff0 - 64-bit mode doesn't support real mode, so there's no way to express a 64-bit physical address using the segment registers, so we still start just below 4GB even though we have massively more address space available).

Anyway. Everyone knows all this. For modern UEFI systems, the firmware that's launched from the reset vector then reprograms the CPU into a sensible mode (ie, one without all this segmentation bullshit), does things like configure the memory controller so you can actually access RAM (a process which involves using CPU cache as RAM, because programming a memory controller is sufficiently hard that you need to store more state than you can fit in registers alone, which means you need RAM, but you don't have RAM until the memory controller is working, but thankfully the CPU comes with several megabytes of RAM on its own in the form of cache, so phew). It's kind of ugly, but that's a consequence of a bunch of well-understood legacy decisions.

Except. This is not how modern Intel x86 boots. It's far stranger than that. Oh, yes, this is what it looks like is happening, but there's a bunch of stuff going on behind the scenes. Let's talk about boot security. The idea of any form of verified boot (such as UEFI Secure Boot) is that a signature on the next component of the boot chain is validated before that component is executed. But what verifies the first component in the boot chain? You can't simply ask the BIOS to verify itself - if an attacker can replace the BIOS, they can replace it with one that simply lies about having done so. Intel's solution to this is called Boot Guard.

But before we get to Boot Guard, we need to ensure the CPU is running in as bug-free a state as possible. So, when the CPU starts up, it examines the system flash and looks for a header that points at CPU microcode updates. Intel CPUs ship with built-in microcode, but it's frequently old and buggy and it's up to the system firmware to include a copy that's new enough that it's actually expected to work reliably. The microcode image is pulled out of flash, a signature is verified, and the new microcode starts running. This is true in both the Boot Guard and the non-Boot Guard scenarios. But for Boot Guard, before jumping to the reset vector, the microcode on the CPU reads an Authenticated Code Module (ACM) out of flash and verifies its signature against a hardcoded Intel key. If that checks out, it starts executing the ACM. Now, bear in mind that the CPU can't just verify the ACM and then execute it directly from flash - if it did, the flash could detect this, hand over a legitimate ACM for the verification, and then feed the CPU different instructions when it reads them again to execute them (a Time of Check vs Time of Use, or TOCTOU, vulnerability). So the ACM has to be copied onto the CPU before it's verified and executed, which means we need RAM, which means the CPU already needs to know how to configure its cache to be used as RAM.

Anyway. We now have an ACM loaded and verified, and it can safely be executed. The ACM does various things, but the most important from the Boot Guard perspective is that it reads a set of write-once fuses in the motherboard chipset that represent the SHA256 of a public key. It then reads the initial block of the firmware (the Initial Boot Block, or IBB) into RAM (or, well, cache, as previously described) and parses it. There's a block that contains a public key - it hashes that key and verifies that it matches the SHA256 from the fuses. It then uses that key to validate a signature on the IBB. If it all checks out, it executes the IBB and everything starts looking like the nice simple model we had before.

Except, well, doesn't this seem like an awfully complicated bunch of code to implement in real mode? And yes, doing all of this modern crypto with only 16-bit registers does sound like a pain. So, it doesn't. All of this is happening in a perfectly sensible 32 bit mode, and the CPU actually switches back to the awful segmented configuration afterwards so it's still compatible with an 80386 from 1986. The "good" news is that at least firmware can detect that the CPU has already configured the cache as RAM and can skip doing that itself.

I'm skipping over some steps here - the ACM actually does other stuff around measuring the firmware into the TPM and doing various bits of TXT setup for people who want DRTM in their lives, but the short version is that the CPU bootstraps itself into a state where it works like a modern CPU and then deliberately turns a bunch of the sensible functionality off again before it starts executing firmware. I'm also missing out the fact that this entire process only kicks off after the Management Engine says it can, which means we're waiting for an entirely independent x86 to boot an entire OS before our CPU even starts pretending to execute the system firmware.

Of course, as mentioned before, on modern systems the firmware will then reprogram the CPU into something actually sensible so OS developers no longer need to care about this[1][2], which means we've bounced between multiple states for no reason other than the possibility that someone wants to run legacy BIOS and then boot DOS on a CPU with like 5 orders of magnitude more transistors than the 8086.

tl;dr why can't my x86 wake up with the gin protected mode already inside it

[1] Ha uh except that on ACPI resume we're going to skip most of the firmware setup code so we still need to handle the CPU being in fucking 16-bit mode because suspend/resume is basically an extremely long reboot cycle

[2] Oh yeah also you probably have multiple cores on your CPU and well bad news about the state most of the cores are in when the OS boots because the firmware never started them up so they're going to come up in 16-bit real mode even if your boot CPU is already in 64-bit protected mode, unless you were using TXT in which case you have a different sort of nightmare that if we're going to try to map it onto real world nightmare concepts is one that involves a lot of teeth. Or, well, that used to be the case, but ACPI 6.4 (released in 2021) provides a mechanism for the OS to ask the firmware to wake the CPU up for it so this is invisible to the OS, but you're still relying on the firmware to actually do the heavy lifting here

comment count unavailable comments

Crosswords 0.3.8: Change Management

Posted by Jonathan Blandford on April 02, 2023 04:00 PM

It’s time for another Crosswords release. This is a somewhat quieter release on the surface as it doesn’t have as many user-visible changes. But like the last release, a lot happened under the hood in preparation for the next phase.

This release marks a change in focus. I’ve shifted my work to the editor instead of the game. I hadn’t given the editor much attention over the past year and it’s overdue for updates. I have a lot of features planned for it; it’s time to make progress on them.

Crosswords Editor

The first change I made was to give a revamp to the workflow for creating a new puzzle. The old editor would let you change the puzzle type while editing it — something that was technically neat but not actually useful to setters. I have different editing sections planned based on the puzzle type, which means that restricting a window to one type. For example, editing an acrostic grid is totally different from editing a crossword grid.

The new greeter also cleans up a  weird flow where the first tab would lock once you’d picked your initial values.

<figure aria-describedby="caption-attachment-7263" class="wp-caption aligncenter" id="attachment_7263" style="width: 840px">New greeter dialog for the Crossword Editor<figcaption class="wp-caption-text" id="caption-attachment-7263">New puzzle greeter</figcaption></figure>

To do this, I added a greeter that lets you select the type of puzzle right from the outset. We also took advantage of the fact that we added a separate GType for each puzzle type. It’s very loosely inspired by the new project greeter from GNOME Builder.

The second thing I spent time on wasn’t actually a code change, but a design plan. An implementation challenge I’ve had is balancing letting people use all the crazy features that the ipuz spec allows, and adding guardrails to let people write standard puzzles without thinking about those extra features. The problem with those features is that you can easily end up in a legal-but-weird puzzle file. As an example, imagine a crossword where the numbering of all the clues are out-of-order. That’s a legal .ipuz file and possibly valid in fringe circumstances, but rarely what any puzzle designer actually wants.

There is a new design doc for how to handle intermediate states. The details are complicated, but the overall approach involves adding lint() and fixup() functions to the puzzle types. This will let us make the changes we want, but then let the editor get the puzzle back to a reasonable state.

Many thanks to Federico, who very kindly let me call him on the weekends to talk through the issues and iterate to a proposal. I’ve started updating libipuz to implement this new design.

Crosswords: Adaptive Layout

This is the third release that I will have blogged about the adaptive layout, and it’s the first time I feel good about the results. It has been incredibly challenging to get Crosswords to work well at a variety of sizes. This cycle, I introduced the concept of a both a natural size and a minimum size to the game. This results in a mixture of user control and screen-driven sizing. Here is an example of a crossword scaling to the very small and to the very large.

<video class="wp-video-shortcode" controls="controls" height="463" id="video-7260-1" preload="metadata" width="840"><source src="https://blogs.gnome.org/jrb/files/2023/04/Screencast-from-2023-04-01-19-04-31.webm?_=1" type="video/webm">https://blogs.gnome.org/jrb/files/2023/04/Screencast-from-2023-04-01-19-04-31.webm</video>
<video class="wp-video-shortcode" controls="controls" height="463" id="video-7260-2" preload="metadata" width="840"><source src="https://blogs.gnome.org/jrb/files/2023/04/Screencast-from-2023-04-01-19-09-22.webm?_=2" type="video/webm">https://blogs.gnome.org/jrb/files/2023/04/Screencast-from-2023-04-01-19-09-22.webm</video>

I hope I can put this feature down for now!

Misc fixes and thanks

There were a number of other fixes this release. Most excitingly, we have a number of new contributors too! Crosswords is a potential GNOME GSOC program and some perspective students spent time trying to learn the code base. Here are the improvements and credits.

Puzzle Set tagging dialog

  • First, we now render the puzzle set description labels to look more like tags. Thanks to Pratham for working on this.
  • Thanks to Tanmay for fixing a bug where horizontal clue enumerations weren’t rendering correctly. He also contributed fixes to keyboard layout dialog, and started a promising thumbnailer (targeting next release)
  • Thanks to Philip for layout fixes, a new icon, and for having a preternatural ability to find bugs in my code as soon as I declare them “done.”
  • I fixed an issue where we didn’t lock down the puzzle correctly after winning the game. I’ve been trying to track this down for six months.
  • Thanks as always to the  translators for translations.

Until next time!

New gitlab.freedesktop.org spamfighting abilities

Posted by Peter Hutterer on March 29, 2023 07:31 AM

As of today, gitlab.freedesktop.org allows anyone with a GitLab Developer role or above to remove spam issues. If you are reading this article a while after it's published, it's best to refer to the damspam README for up-to-date details. I'm going to start with the TLDR first.

For Maintainers

Create a personal access token with API access and save the token value as $XDG_CONFIG_HOME/damspam/user.token Then run the following commands with your project's full path (e.g. mesa/mesa, pipewire/wireplumber, xorg/lib/libX11):

$ pip install git+https://gitlab.freedesktop.org/freedesktop/damspam
$ damspam request-webhook foo/bar
# clean up, no longer needed.
$ pip uninstall damspam
$ rm $XDG_CONFIG_HOME/damspam/user.token
The damspam command will file an issue in the freedesktop/fdo-bots repository. This issue will be automatically processed by a bot and should be done by the time you finish the above commands, see this issue for an example. Note: the issue processing requires a git push to an internal repo - if you script this for multiple repos please put a sleep(30) in to avoid conflicts.

Once the request has been processed (and again, this should be instant), any issue in your project that gets assigned the label Spam will be processed automatically by damspam. See the next section for details.

For Developers

Once the maintainer for your project has requested the webhook, simply assign the Spam label to any issue that is spam. The issue creator will be blocked (i.e. cannot login), this issue and any other issue filed by the same user will be closed and made confidential (i.e. they are no longer visible to the public). In the future, one of the GitLab admins can remove that user completely but meanwhile, they and their spam are gone from the public eye and they're blocked from producing more. This should happen within seconds of assigning the Spam label.

For GitLab Admins

Create a personal access token with API access for the @spambot user and save the token value as $XDG_CONFIG_HOME/damspam/spambot.token. This is so you can operate as spambot instead of your own user. Then run the following command to remove all tagged spammers:

$ pip install git+https://gitlab.freedesktop.org/freedesktop/damspam
$ damspam purge-spammers
The last command will list any users that are spammers (together with an issue that should make it simple to check whether it is indeed spam) and after interactive confirmation purge them as requested. At the time of writing, the output looks like this:
$ damspam purge-spammers
0: naughtyuser              : https://gitlab.freedesktop.org/somenamespace/project/-/issues/1234: [STREAMING@TV]!* LOOK AT ME
1: abcuseless               : https://gitlab.freedesktop.org/somenamespace/project/-/issues/4567: ((@))THIS STREAM IS IMPORTANT
2: anothergit               : https://gitlab.freedesktop.org/somenamespace/project/-/issues/8778: Buy something, really
3: whatawasteofalife        : https://gitlab.freedesktop.org/somenamespace/project/-/issues/9889: What a waste of oxygen I am
Purging a user means a full delete including all issues, MRs, etc. This is nonrecoverable!
Please select the users to purge:
[q]uit, purge [a]ll, or the index: 
Purging the spammers will hard-delete them and remove anything they ever did on gitlab. This is irreversible.

How it works

There are two components at play here: hookiedookie, a generic webhook dispatcher, and damspam which handles the actual spam issues. Hookiedookie provides an HTTP server and "does things" with JSON data on request. What it does is relatively generic (see the Settings.yaml example file) but it's set up to be triggered by a GitLab webhook and thus receives this payload. For damspam the rules we have for hookiedookie come down to something like this: if the URL is "webhooks/namespace/project" and damspam is set up for this project and the payload is an issue event and it has the "Spam" label in the issue labels, call out to damspam and pass the payload on. Other rules we currently use are automatic reload on push events or the rule to trigger the webhook request processing bot as above.

This is also the reason a maintainer has to request the webhook. When the request is processed, the spambot installs a webhook with a secret token (a uuid) in the project. That token will be sent as header (a standard GitLab feature). The project/token pair is also added to hookiedookie and any webhook data must contain the project name and matching token, otherwise it is discarded. Since the token is write-only, no-one (not even the maintainers of the project) can see it.

damspam gets the payload forwarded but is otherwise unaware of how it is invoked. It checks the issue, fetches the data needed, does some safety check and if it determines that yes, this is spam, then it closes the issue, makes it confidential, blocks the user and then recurses into every issue this user ever filed. Not necessarily in that order. There are some safety checks, so you don't have to worry about it suddenly blocking every project member.


For a while now, we've suffered from a deluge of spam (and worse) that makes it through the spam filters. GitLab has a Report Abuse feature for this but it's... woefully incomplete. The UI guides users to do the right thing - as reporter you can tick "the user is sending spam" and it automatically adds a link to the reported issue. But: none of this useful data is visible to admins. Seriously, look at the official screenshots. There is no link to the issue, all you get is a username, the user that reported it and the content of a textbox that almost never has any useful information. The link to the issue? Not there. The selection that the user is a spammer? Not there.

For an admin, this is frustrating at best. To verify that the user is indeed sending spam, you have to find the issue first. Which, at best, requires several clicks and digging through the profile activities. At worst you know that the user is a spammer because you trust the reporter but you just can't find the issue for whatever reason.

But even worse: reporting spam does nothing immediately. The spam stays up until an admin wakes up, reviews the abuse reports and removes that user. Meanwhile, the spammer can happily keep filing issues against the project. Overall, it is not a particularly great situation.

With hookiedookie and damspam, we're now better equipped to stand against the tide of spam. Anyone who can assign labels can help fight spam and the effect is immediate. And it's - for our use-cases - safe enough: if you trust someone to be a developer on your project, we can trust them to not willy-nilly remove issues pretending they're spam. In fact, they probably could've deleted issues beforehand already anyway if they wanted to make them disappear.

Other instances

While we're definitely aiming at gitlab.freedesktop.org, there's nothing in particular that requires this instance. If you're the admin for a public gitlab instance feel free to talk to Benjamin Tissoires or me to check whether this could be useful for you too, and what changes would be necessary.

We need better support for SSH host certificates

Posted by Matthew Garrett on March 24, 2023 05:12 PM
Github accidentally committed their SSH RSA private key to a repository, and now a bunch of people's infrastructure is broken because it needs to be updated to trust the new key. This is obviously bad, but what's frustrating is that there's no inherent need for it to be - almost all the technological components needed to both reduce the initial risk and to make the transition seamless already exist.

But first, let's talk about what actually happened here. You're probably used to the idea of TLS certificates from using browsers. Every website that supports TLS has an asymmetric pair of keys divided into a public key and a private key. When you contact the website, it gives you a certificate that contains the public key, and your browser then performs a series of cryptographic operations against it to (a) verify that the remote site possesses the private key (which prevents someone just copying the certificate to another system and pretending to be the legitimate site), and (b) generate an ephemeral encryption key that's used to actually encrypt the traffic between your browser and the site. But what stops an attacker from simply giving you a fake certificate that contains their public key? The certificate is itself signed by a certificate authority (CA), and your browser is configured to trust a preconfigured set of CAs. CAs will not give someone a signed certificate unless they prove they have legitimate ownership of the site in question, so (in theory) an attacker will never be able to obtain a fake certificate for a legitimate site.

This infrastructure is used for pretty much every protocol that can use TLS, including things like SMTP and IMAP. But SSH doesn't use TLS, and doesn't participate in any of this infrastructure. Instead, SSH tends to take a "Trust on First Use" (TOFU) model - the first time you ssh into a server, you receive a prompt asking you whether you trust its public key, and then you probably hit the "Yes" button and get on with your life. This works fine up until the point where the key changes, and SSH suddenly starts complaining that there's a mismatch and something awful could be happening (like someone intercepting your traffic and directing it to their own server with their own keys). Users are then supposed to verify whether this change is legitimate, and if so remove the old keys and add the new ones. This is tedious and risks users just saying "Yes" again, and if it happens too often an attacker can simply redirect target users to their own server and through sheer fatigue at dealing with this crap the user will probably trust the malicious server.

Why not certificates? OpenSSH actually does support certificates, but not in the way you might expect. There's a custom format that's significantly less complicated than the X509 certificate format used in TLS. Basically, an SSH certificate just contains a public key, a list of hostnames it's good for, and a signature from a CA. There's no pre-existing set of trusted CAs, so anyone could generate a certificate that claims it's valid for, say, github.com. This isn't really a problem, though, because right now nothing pays attention to SSH host certificates unless there's some manual configuration.

(It's actually possible to glue the general PKI infrastructure into SSH certificates. Please do not do this)

So let's look at what happened in the Github case. The first question is "How could the private key have been somewhere that could be committed to a repository in the first place?". I have no unique insight into what happened at Github, so this is conjecture, but I'm reasonably confident in it. Github deals with a large number of transactions per second. Github.com is not a single computer - it's a large number of machines. All of those need to have access to the same private key, because otherwise git would complain that the private key had changed whenever it connected to a machine with a different private key (the alternative would be to use a different IP address for every frontend server, but that would instead force users to repeatedly accept additional keys every time they connect to a new IP address). Something needs to be responsible for deploying that private key to new systems as they're brought up, which means there's ample opportunity for it to accidentally end up in the wrong place.

Now, best practices suggest that this should be avoided by simply placing the private key in a hardware module that performs the cryptographic operations, ensuring that nobody can ever get at the private key. The problem faced here is that HSMs typically aren't going to be fast enough to handle the number of requests per second that Github deals with. This can be avoided by using something like a Nitro Enclave, but you're still going to need a bunch of these in different geographic locales because otherwise your front ends are still going to be limited by the need to talk to an enclave on the other side of the planet, and now you're still having to deal with distributing the private key to a bunch of systems.

What if we could have the best of both worlds - the performance of private keys that just happily live on the servers, and the security of private keys that live in HSMs? Unsurprisingly, we can! The SSH private key could be deployed to every front end server, but every minute it could call out to an HSM-backed service and request a new SSH host certificate signed by a private key in the HSM. If clients are configured to trust the key that's signing the certificates, then it doesn't matter what the private key on the servers is - the client will see that there's a valid certificate and will trust the key, even if it changes. Restricting the validity of the certificate to a small window of time means that if a key is compromised an attacker can't do much with it - the moment you become aware of that you stop signing new certificates, and once all the existing ones expire the old private key becomes useless. You roll out a new private key with new certificates signed by the same CA and clients just carry on trusting it without any manual involvement.

Why don't we have this already? The main problem is that client tooling just doesn't handle this well. OpenSSH has no way to do TOFU for CAs, just the keys themselves. This means there's no way to do a git clone ssh://git@github.com/whatever and get a prompt asking you to trust Github's CA. Instead, you need to add a @cert-authority github.com (key) line to your known_hosts file by hand, and since approximately nobody's going to do that there's only marginal benefit in going to the effort to implement this infrastructure. The most important thing we can do to improve the security of the SSH ecosystem is to make it easier to use certificates, and that means improving the behaviour of the clients.

It should be noted that certificates aren't the only approach to handling key migration. OpenSSH supports a protocol for key rotation, basically by allowing the server to provide a set of multiple trusted keys that the client can cache, and then invalidating old ones. Unfortunately this still requires that the "new" private keys be deployed in the same way as the old ones, so any screwup that results in one private key being leaked may well also result in the additional keys being leaked. I prefer the certificate approach.

Finally, I've seen a couple of people imply that the blame here should be attached to whoever or whatever caused the private key to be committed to a repository in the first place. This is a terrible take. Humans will make mistakes, and your systems should be resilient against that. There's no individual at fault here - there's a series of design decisions that made it possible for a bad outcome to occur, and in a better universe they wouldn't have been necessary. Let's work on building that better universe.

comment count unavailable comments

WebKitGTK API for GTK 4 Is Now Stable

Posted by Michael Catanzaro on March 21, 2023 07:01 PM

With the release of WebKitGTK 2.40.0, WebKitGTK now finally provides a stable API and ABI for GTK 4 applications. The following API versions are provided:

  • webkit2gtk-4.0: this API version uses GTK 3 and libsoup 2. It is obsolete and users should immediately port to webkit2gtk-4.1. To get this with WebKitGTK 2.40, build with -DPORT=GTK -DUSE_SOUP2=ON.
  • webkit2gtk-4.1: this API version uses GTK 3 and libsoup 3. It contains no other changes from webkit2gtk-4.0 besides the libsoup version. With WebKitGTK 2.40, this is the default API version that you get when you build with -DPORT=GTK. (In 2.42, this might require a different flag, e.g. -DUSE_GTK3=ON, which does not exist yet.)
  • webkitgtk-6.0: this API version uses GTK 4 and libsoup 3. To get this with WebKitGTK 2.40, build with -DPORT=GTK -DUSE_GTK4=ON. (In 2.42, this might become the default API version.)

WebKitGTK 2.38 had a different GTK 4 API version, webkit2gtk-5.0. This was an unstable/development API version and it is gone in 2.40, so applications using it will break. Fortunately, that should be very few applications. If your operating system ships GNOME 42, or any older version, or the new GNOME 44, then no applications use webkit2gtk-5.0 and you have no extra work to do. But for operating systems that ship GNOME 43, webkit2gtk-5.0 is used by gnome-builder, gnome-initial-setup, and evolution-data-server:

  • For evolution-data-server 3.46, use this patch which applies on evolution-data-server 3.46.4.
  • For gnome-initial-setup 43, use this patch which applies on gnome-initial-setup 43.2. (Update: for your convenience, this patch will be included in gnome-initial-setup 43.3.)
  • For gnome-builder 43, all required changes are present in version 43.7.

Remember, patching is only needed for GNOME 43. Other versions of GNOME will have no problems with WebKitGTK 2.40.

There is no proper online documentation yet, but in the meantime you can view the markdown source for the migration guide to help you with porting your applications. Although the API is now stable and it is close to feature parity with the GTK 3 version, there are some problems to be aware of:

Big thanks to everyone who helped make this possible.

GTK 4.10 and LibreOffice accessibility

Posted by Caolán McNamara on March 21, 2023 04:58 PM


Towards GTK 4.10 some of the functionality to integrate LibreOffice's internal accessibility support with GTK has been exposed. Some experimenting this week with GTK trunk gives me the above, writer's document accessibility hierarchy integrated with the GTK one.

Very little actually works, but a working base to start from.

Explained: QGnomePlatform and Adwaita-qt

Posted by Jan Grulich on March 08, 2023 04:06 PM
<figure class="aligncenter size-large is-resized"></figure>

I decided to write this post to explain everything I can about these two projects. There have been discussions and people demanding these projects should not be used by default in Fedora. As part of this, some issues were raised and it might not be clear which component might be responsible for what. I ended up constantly defending these projects in many discussions and ended up being exhausted by doing so over and over, so take this as an explainer to shed some light.

Brief introduction to QGnomePlatform and Adwaita-qt

To give you some context before I go into details, you can think about Adwaita-qt as the UI representation and QGnomePlatform as the integration between GNOME and Qt. QGnomePlatform applies your GNOME configuration and behavior to Qt apps, together with some integration bits, like dialogs or client-side decorations. Adwaita-qt is responsible for the style of the app itself, including the style of all visible parts (widgets/buttons).


What is QGnomePlatform?

QGnomePlatform is a Qt Platform Theme (part of Qt Platform Abstraction API), where such a plugin is responsible for the app integration into the desktop environment. It is designed to provide integration between Qt apps and the GNOME platform. To explain in an example. Without any platform integration, Qt apps running on GNOME would use default styling and configuration so your fonts, icon theme, dialogs would not fit into the desktop. Also in the case of QGnomePlatform you would not have GNOME-like client-side decorations.

What QGnomePlatform provides?

QGnomePlatform provides the following integration for Qt apps running in GNOME:

  • Font configuration *
  • Icon theme *
  • Cursor size and cursor theme *
  • Static hints (like double-click time, long-press time etc.) *
  • Dialogs:
    • File dialog (both using GTK3 * and native dialog using xdg-desktop-portal)
    • Font dialog using GTK3 *
    • Color dialog using GTK3 *
  • Client-side decorations
  • Support for Settings portal from xdg-desktop-portal settings
    • Unlike the built-in GTK3 theme that can get everything only from GSettings
    • Brings support for light/dark theme switching introduced in GNOME 42
  • Use Adwaita-qt theme by default (elephant in the room) and Adwaita color palette
    • Also provides support for additional themes, like Kvantum
  • Support for Cinnamon desktop

* these can also be provided by built-in GTK3 platform theme in Qt itself (just for comparison what QGnomePlatform does extra)

Issues QGnomePlatform gets wrongly blamed for

Client-side decorations

As stated above, QGnomePlatform provides an implementation of CSD. It’s actually the only Qt CSD implementation I’m aware of, excluding the reference implementation provided by QtWayland, named Bradient. Below is a screenshot comparing QGnomePlatform (left) and Bradient (right).

<figure class="wp-block-image size-large"></figure>

I saw many times people complaining about missing shadows support and resizing issues. The truth is that officially there was no proper shadows support in the Qt API until I introduced it with Qt 6.2. That’s the reason we don’t have it for Qt5, unless you are a Fedora user, where this support has been backported and enabled in QGnomePlatform build. I also fixed all kinds of CSD related issues in QtWayland.ven though Qt has proper support for shadows in Qt6 now, the reference implementation doesn’t use them.

Misplaced popups/menus

This has nothing to do with Qt platform theme or CSD implementation, because it was actually a bug in QtWayland. Unfortunately this fix is only in Qt 6 and cannot be backported officially to Qt 5 as it would break KDE Plasma. I managed to at least patch QtWayland in Fedora and Flatpak KDE runtime, where I modified this patch to not affect KDE Plasma at all.

What can QGnomePlatform be blamed for?

Forcing Adwaita-qt color palette

QGnomePlatform sets Adwaita-qt color palette to each Qt app so applications can use QPalette API to get access to colors used in the style itself. This can be for example useful when an app creates custom widgets that would not get styled by the QStyle itself. This creates a problem for KDE applications using the KColorScheme API. 

Examples of this issue:

<figure class="alignleft size-medium"></figure>
<figure class="wp-block-image size-medium"></figure>
<figure class="alignleft size-medium"></figure>

The reason is that KColorScheme and QPalette are out of sync and there might be color roles that are in KColorScheme, but not in QPalette. If an app requests a color from KColorScheme that’s not in QPalette, KColorScheme will default to Breeze style and a color that’s not going to fit the Adwaita-qt style will be provided, causing the app to mix light and dark colors.

Luckily, we have identified a workaround that can be done in QGnomePlatform to avoid this issue. Here is the QGnomePlatform bug with more details.

I think this is the most visible and user-facing issue we currently have and get blamed for so I would like to fix this as soon as possible.


Adwaita-qt is a Qt style for widgets. Qt style is again part of Qt Platform Abstraction. You can think of it as a theme for your application. It’s what changes the visualization of your widgets like buttons, checkboxes etc. and it’s the only thing that changes the appearance of the application itself. For comparison, the screenshot below is Adwaita-qt (light) and the second one is the Qt’s default Fusion style used on Linux used without any QGnomePlatform influence so basically what you would get by default.

<figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure>

I can also add that Adwaita-qt supports HighContrast variants, which are useful for visually impaired people.

<figure class="wp-block-image size-large"></figure>

What issues can Adwaita-qt be blamed for?

I already mentioned the color mismatch issue which is not really Adwaita-qt’s fault. There are of course issues in Adwaita-qt itself and it’s far from being perfect. The whole style needs a complete overhaul, because the last one was done in 2019 and the Adwaita theme changed a lot recently with GTK4. Another issue is that while the majority of common widgets are styled just fine, there are still some widgets that are rarely used and might have issues with this style.

Another issue is that  apps that customize standard widgets (e.g. through CSS), might get into trouble. Below is a screenshot of Wireshark (pure Qt app) compared to Breeze in KDE.

<figure class="wp-block-image size-large"></figure>

Another example can be seen in the Black Chocobo app, which uses some customization:

<figure class="wp-block-image size-large"><figcaption class="wp-element-caption"> 
Below is Black Chocobo using Fusion style.
</figcaption></figure> <figure class="wp-block-image size-large"></figure>

Should QGnomePlatform get removed/replaced?

Definitely not. It has many benefits and extras compared to Qt’s default platform and most importantly gives you CSD support. Once I fix the color mismatch issue, there shouldn’t be anything users should complain about. Obviously, this would not be an issue when Breeze is used instead of Adwaita-qt, but still an issue when Fusion is used so it would still need to be addressed. You can see a screenshot below showing Fusion style used in combination with GNOME set to dark theme:

<figure class="wp-block-image size-full"></figure>

Should Adwaita-qt get removed/replaced?

Maybe. It depends on the alternatives. Obviously using Breeze would get rid of all the widget issues one might experience with Adwaita-qt, however, it is problematic to ship it by default due to bringing dependencies on KDE Frameworks and Plasma breeze style. With the default Fusion style you will also get many widget issues fixed, but you still need to set the color palette through QGnomePlatform in case you want Fusion to be “dark” and fit into the desktop, otherwise you will always end up using the default “light” variant no matter what configuration you set in GNOME.


I hope that this post is useful for those observing issues with Qt apps under GNOME, and will help them to understand which component is responsible for what, as well as the issues involved. In case you are interested and would like to either contribute a patch or report an issue, here are links to QGnomePlatform and Adwaita-qt repositories.

Blocking free API access to Twitter doesn't stop abuse

Posted by Matthew Garrett on February 02, 2023 10:21 AM
In one week from now, Twitter will block free API access. This prevents anyone who has written interesting bot accounts, integrations, or tooling from accessing Twitter without paying for it. A whole number of fascinating accounts will cease functioning, people will no longer be able to use tools that interact with Twitter, and anyone using a free service to do things like find Twitter mutuals who have moved to Mastodon or to cross-post between Twitter and other services will be blocked.

There's a cynical interpretation to this, which is that despite firing 75% of the workforce Twitter is still not profitable and Elon is desperate to not have Twitter go bust and also not to have to tank even more of his Tesla stock to achieve that. But let's go with the less cynical interpretation, which is that API access to Twitter is something that enables bot accounts that make things worse for everyone. Except, well, why would a hostile bot account do that?

To interact with an API you generally need to present some sort of authentication token to the API to prove that you're allowed to access it. It's easy enough to restrict issuance of those tokens to people who pay for the service. But, uh, how do the apps work? They need to be able to communicate with the service to tell it to post tweets, retrieve them, and so on. And the simple answer to that is that they use some hardcoded authentication tokens. And while registering for an API token yourself identifies that you're not using an official client, using the tokens embedded in the clients makes it look like you are. If you want to make it look like you're a human, you're already using tokens ripped out of the official clients.

The Twitter client API keys are widely known. Anyone who's pretending to be a human is using those already and will be unaffected by the shutdown of the free API tier. Services like movetodon.org do get blocked. This isn't an anti-abuse choice. It's one that makes it harder to move to other services. It's one that blocks a bunch of the integrations and accounts that bring value to the platform. It's one that hurts people who follow the rules, without hurting the ones who don't. This isn't an anti-abuse choice, it's about trying to consolidate control of the platform.

comment count unavailable comments

Firefox, VA-API and NVIDIA on Fedora 37

Posted by Martin Stransky on January 31, 2023 08:19 PM
<figure class="wp-block-image size-large"><figcaption class="wp-element-caption">Image comes from https://www.nvidia.com/en-us/about-nvidia/legal-info/logo-brand-usage/</figcaption></figure>

Some time ago I got borrowed NVIDIA GeForce GTX 1070 from my employer (Red Hat) and I finally managed to put it to a workstation instead of my own AMD RX 6600 XT.

I installed proprietary drivers from rpmfusion and to my surprise everything worked smoothly (except Atom on XWayland). Both Wayland and X11 Gnome sessions popped up, Firefox picked up HW accelerated backend (WebRender) with DMABuf support so it’s time to check VA-API.

Thanks to nvidia-vaapi-driver by Stephen “elFarto” Firefox may directly decode video on NVIDIA hardware. The driver translates VA-API calls from Firefox to VPDAU used by NVIDIA. I think you also need a decently fresh NVIDIA drivers which supports DMABuf (which is used to transfer decoded images between Firefox processes and render them as GL textures).

I hit three bumps on the road. First one is Firefox RDD sandbox. Firefox runs media decode in extra process (RDD) which restricts where decoder can access. It was adjusted by Mozilla folks for VA-API decode on Intel/AMD but NVIDIA needs some extra tweaks. Right now you need to disable the sanbox by MOZ_DISABLE_RDD_SANDBOX=1 env variable.

Next one is a bug in recent NVIDIA 525 driver series (which I got from rpmfusion) and I needed to use direct mode (whatever it is).

Last issue may be in Firefox itself. Broken graphics hardware may freeze whole browser on start or spread coredumps on every start. That’s being worked on as Bug 1813500 and Bug 1787182.

There’s a complete how-to for Firefox/NVIDIA/Fedora 37 available on Fedora wiki.

Nvidia-vaapi-driver playback performance is similar to what I see on AMD/Intel. It also correctly handles decoding of intermediate frames which is recent AMD NAVI2 bug (Bug 1772028, Bug 1802844) so the playback is smooth and I haven’t seen any glitches or CPU usage peaks.

There are few options for NVIDIA users how to run it.

If you have a workstation (or laptop) with one NVIDIA graphics card, it’s quite simple. On Fedora 37 you boot on noveau drivers and then install NVIDIA drivers from rpmfusion. With the proprietary drivers both Wayland and X11 Gnome sessions work fine (anyone to test KDE?) and hardware acceleration is enabled.

A bit different scenario comes with integrated Intel device and secondary NVIDIA one. I don’t see any reason why to use NVIDIA as Intel works pretty well with Wayland, VA-API and X11/EGL. But if you really want to set NVIDIA as primary, X11 may be better for you. Wayland on secondary NVIDIA GPU is supported by Sway Wayland compositor only which is a bit geeky (or I’m just too lazy to learn new shortcuts and get used to a new environment).

Anyway, if you need to use NVIDIA as your primary GPU, there’s a hope for you (and it’s not due to me). Give it a try and report eventual bugs at Mozilla NVIDIA VA-API bug tracker.

Further adventures in Apple PKCS#11 land

Posted by Matthew Garrett on January 27, 2023 11:39 PM
After my previous efforts, I wrote up a PKCS#11 module of my own that had no odd restrictions about using non-RSA keys and I tested it. And things looked much better - ssh successfully obtained the key, negotiated with the server to determine that it was present in authorized_keys, and then went to actually do the key verification step. At which point things went wrong - the Sign() method in my PKCS#11 module was never called, and a strange
debug1: identity_sign: sshkey_sign: error in libcrypto
sign_and_send_pubkey: signing failed for ECDSA "testkey": error in libcrypto"

error appeared in the ssh output. Odd. libcrypto was originally part of OpenSSL, but Apple ship the LibreSSL fork. Apple don't include the LibreSSL source in their public source repo, but do include OpenSSH. I grabbed the OpenSSH source and jumped through a whole bunch of hoops to make it build (it uses the macosx.internal SDK, which isn't publicly available, so I had to cobble together a bunch of headers from various places), and also installed upstream LibreSSL with a version number matching what Apple shipped. And everything worked - I logged into the server using a hardware-backed key.

Was the difference in OpenSSH or in LibreSSL? Telling my OpenSSH to use the system libcrypto resulted in the same failure, so it seemed pretty clear this was an issue with the Apple version of the library. The way all this works is that when OpenSSH has a challenge to sign, it calls ECDSA_do_sign(). This then calls ECDSA_do_sign_ex(), which in turn follows a function pointer to the actual signature method. By default this is a software implementation that expects to have the private key available, but you can also register your own callback that will be used instead. The OpenSSH PKCS#11 code does this by calling EC_KEY_set_method(), and as a result calling ECDSA_do_sign() ends up calling back into the PKCS#11 code that then calls into the module that communicates with the hardware and everything works.

Except it doesn't under macOS. Running under a debugger and setting a breakpoint on EC_do_sign(), I saw that we went down a code path with a function called ECDSA_do_sign_new(). This doesn't appear in any of the public source code, so seems to be an Apple-specific patch. I pushed Apple's libcrypto into Ghidra and looked at ECDSA_do_sign() and found something that approximates this:
nid = EC_GROUP_get_curve_name(curve);
if (nid == NID_X9_62_prime256v1) {
  return ECDSA_do_sign_new(dgst,dgst_len,eckey);
return ECDSA_do_sign_ex(dgst,dgst_len,NULL,NULL,eckey);
What this means is that if you ask ECDSA_do_sign() to sign something on a Mac, and if the key in question corresponds to the NIST P256 elliptic curve type, it goes down the ECDSA_do_sign_new() path and never calls the registered callback. This is the only key type supported by the Apple Secure Enclave, so I assume it's special-cased to do something with that. Unfortunately the consequence is that it's impossible to use a PKCS#11 module that uses Secure Enclave keys with the shipped version of OpenSSH under macOS. For now I'm working around this with an SSH agent built using Go's agent module, forwarding most requests through to the default session agent but appending hardware-backed keys and implementing signing with them, which is probably what I should have done in the first place.

comment count unavailable comments

Build security with the assumption it will be used against your friends

Posted by Matthew Garrett on January 23, 2023 10:44 AM
Working in information security means building controls, developing technologies that ensure that sensitive material can only be accessed by people that you trust. It also means categorising people into "trustworthy" and "untrustworthy", and trying to come up with a reasonable way to apply that such that people can do their jobs without all your secrets being available to just anyone in the company who wants to sell them to a competitor. It means ensuring that accounts who you consider to be threats shouldn't be able to do any damage, because if someone compromises an internal account you need to be able to shut them down quickly.

And like pretty much any security control, this can be used for both good and bad. The technologies you develop to monitor users to identify compromised accounts can also be used to compromise legitimate users who management don't like. The infrastructure you build to push updates to users can also be used to push browser extensions that interfere with labour organisation efforts. In many cases there's no technical barrier between something you've developed to flag compromised accounts and the same technology being used to flag users who are unhappy with certain aspects of management.

If you're asked to build technology that lets you make this sort of decision, think about whether that's what you want to be doing. Think about who can compel you to use it in ways other than how it was intended. Consider whether that's something you want on your conscience. And then think about whether you can meet those requirements in a different way. If they can simply compel one junior engineer to alter configuration, that's very different to an implementation that requires sign-offs from multiple senior developers. Make sure that all such policy changes have to be clearly documented, including not just who signed off on it but who asked them to. Build infrastructure that creates a record of who decided to fuck over your coworkers, rather than just blaming whoever committed the config update. The blame trail should never terminate in the person who was told to do something or get fired - the blame trail should clearly indicate who ordered them to do that.

But most importantly: build security features as if they'll be used against you.

comment count unavailable comments

Crosswords 0.3.7: Adaptive Layout, Animations, and Arrows

Posted by Jonathan Blandford on January 23, 2023 07:32 AM

Howdy folks,

It’s GNOME Crosswords release time again! This is a big release with a lot of changes. Most importantly, I was able to find some time over the holidays to do some long-overdue refactoring of the core game code.

crosswords: 143 files changed, 12619 insertions(+), 5834 deletions(-)
libipuz:     41 files changed,  2345 insertions(+),  736 deletions(-)

A number of early design decisions haven’t panned out, and these decisions  held us back from doing some very basic things with the code. While not directly impactful on the user, this refactoring unlocked some long-overdue features. In addition, loading puzzles and puzzle sets are noticeably faster now.

In this post, I’m going to cover the mobile work, Arrowwords, the hackfest, and the new preferences dialog. Let’s take a look at what’s new:

Adaptive Layout and Animations

In the previous release, I tried to make Crosswords adapt to different screen sizes. It wasn’t all that usable, so I worked more on it this cycle.

First, the good news: I cleaned up a lot of the layout bugs, and (thanks to Carlos) got some form of touch screen keyboard input working. The end result is that I changed the appinfo file to indicate we supported all screen sizes and inputs, which means it should be available on all platforms.

I also had to put all the parts of the game into a custom container widget to get the spacing right. Once that was done, it was trivial to add some animations to it. It turns out AdwAnimation is really cool — and much easier to use then I had expected. I added animations to keep the puzzle centered and to align it vertically as appropriate. Here’s how resizing  now looks:

<video class="wp-video-shortcode" controls="controls" height="529" id="video-7193-3" preload="metadata" width="840"><source src="https://blogs.gnome.org/jrb/files/2023/01/0.3.7-animation-resize.webm?_=3" type="video/webm">https://blogs.gnome.org/jrb/files/2023/01/0.3.7-animation-resize.webm</video>

I’m pretty impressed with how easy it was to do things like this with GTK4.

Unfortunately, despite these changes, it’s not all good news. I don’t think the final results are compelling enough for mobile devices. The touchscreen input is flaky, and the onscreen keyboard isn’t great for crossword puzzles (and is so big!) I never really figured out how input was supposed to work together, and ended up just guessing until it mostly worked. I could really use better documentation on the onscreen keyboard! Or if someone explains it to me, I’m happy to write documentation for future users.

As I don’t actually have a mobile device that runs GNOME and I don’t have a touch screen, I’ve reached the limits of what I can do without additional help. I’d love feedback or advice from anyone who knows more about this space.


As part of the refactoring focus, I added a bunch of new puzzle types to libipuz (more on this in a future post!) One of these puzzle types was Arrowword puzzles — a crossword style more popular in northern Europe. This had been a semi-regular request and turned out to be easier to implement than expected. We were able to get something going relatively quickly.

I wrote the cell code, Federico helped with the svg overlay arrow rendering, and Philip helped with the testing and downloader. The end result is attractive:

<figure aria-describedby="caption-attachment-7233" class="wp-caption aligncenter" id="attachment_7233" style="width: 300px"><figcaption class="wp-caption-text" id="caption-attachment-7233">Arrowword Crossword</figcaption></figure>

Puzzle Sets

Due to the big holiday refactor, I was able to add a few small features I’ve wanted to add for a while. First, I changed Puzzle Sets to be opt-in. We’ve ended up accruing so many of them that it makes sense for people to choose the ones they want to play.

Along the way, I added extensions  to the puzzle sets to tag them so users can see more information before trying them. Here’s the dialog to select them.

<figure aria-describedby="caption-attachment-7227" class="wp-caption aligncenter" id="attachment_7227" style="width: 296px"><figcaption class="wp-caption-text" id="caption-attachment-7227">Puzzle Set selection dialog</figcaption></figure>

Additionally, I added an option to let you hide puzzles that have already been solved. This was another requested feature as the list of puzzles can get quite long.

Finally we show how many puzzles remain unsolved in the main screen. This should have been a simple feature to add, but the old code base made it surprisingly hard to implement. With the rewrite, this was only a few lines of code:

<figure aria-describedby="caption-attachment-7215" class="wp-caption aligncenter" id="attachment_7215" style="width: 234px">New Main screeen<figcaption class="wp-caption-text" id="caption-attachment-7215">The new main screen</figcaption></figure>

You can also see an “Add More Puzzles” button in that screenshot.


We held a virtual hackfest a few weekends ago! Four people showed up, and we focused on getting this release out the door. We got most of the last release blockers done during it. Davide added Italian puzzle sets and translations, and fixed the packit CI. Rosanna finished the GNOME Mini Crosswords set and started a new, alphabetically-themed one. Federico documented how to run crosswords out of podman and got the Arrowword SVG overlay themed and working. I landed the final fixes for the tagging, and landed a bunch of build fixes to the puzzle sets.

It was a lot of fun. I will try to hold another one at the end of the next release cycle.

<figure aria-describedby="caption-attachment-7224" class="wp-caption aligncenter" id="attachment_7224" style="width: 213px"><figcaption class="wp-caption-text" id="caption-attachment-7224">GNOME Mini Crosswords</figcaption></figure>

What’s next?

  • We’ve wanted to add autodownload-on-start for a while, and it should be possible to do that now.
  • I don’t love the current preferences dialog. It could use some design attention and tweaks.
  • And finally, I plan to switch back to work more on the Crossword Editor. It’s overdue for some love.

Until next time!

PKCS#11. hardware keystores, and Apple frustrations

Posted by Matthew Garrett on January 18, 2023 05:26 AM
There's a bunch of ways you can store cryptographic keys. The most obvious is to just stick them on disk, but that has the downside that anyone with access to the system could just steal them and do whatever they wanted with them. At the far end of the scale you have Hardware Security Modules (HSMs), hardware devices that are specially designed to self destruct if you try to take them apart and extract the keys, and which will generate an audit trail of every key operation. In between you have things like smartcards, TPMs, Yubikeys, and other platform secure enclaves - devices that don't allow arbitrary access to keys, but which don't offer the same level of assurance as an actual HSM (and are, as a result, orders of magnitude cheaper).

The problem with all of these hardware approaches is that they have entirely different communication mechanisms. The industry realised this wasn't ideal, and in 1994 RSA released version 1 of the PKCS#11 specification. This defines a C interface with a single entry point - C_GetFunctionList. Applications call this and are given a structure containing function pointers, with each entry corresponding to a PKCS#11 function. The application can then simply call the appropriate function pointer to trigger the desired functionality, such as "Tell me how many keys you have" and "Sign this, please". This is both an example of C not just being a programming language and also of you having to shove a bunch of vendor-supplied code into your security critical tooling, but what could possibly go wrong.

(Linux distros work around this problem by using p11-kit, which is a daemon that speaks d-bus and loads PKCS#11 modules for you. You can either speak to it directly over d-bus, or for apps that only speak PKCS#11 you can load a module that just transports the PKCS#11 commands over d-bus. This moves the weird vendor C code out of process, and also means you can deal with these modules without having to speak the C ABI, so everyone wins)

One of my work tasks at the moment is helping secure SSH keys, ensuring that they're only issued to appropriate machines and can't be stolen afterwards. For Windows and Linux machines we can stick them in the TPM, but Macs don't have a TPM as such. Instead, there's the Secure Enclave - part of the T2 security chip on x86 Macs, and directly integrated into the M-series SoCs. It doesn't have anywhere near as many features as a TPM, let alone an HSM, but it can generate NIST curve elliptic curve keys and sign things with them and that's good enough. Things are made more complicated by Apple only allowing keys to be used by the app that generated them, so it's hard for applications to generate keys on behalf of each other. This can be mitigated by using CryptoTokenKit, an interface that allows apps to present tokens to the systemwide keychain. Although this is intended for allowing a generic interface for access to such tokens (kind of like PKCS#11), an app can generate its own keys in the Secure Enclave and then expose them to other apps via the keychain through CryptoTokenKit.

Of course, applications then need to know how to communicate with the keychain. Browsers mostly do so, and Apple's version of SSH can to an extent. Unfortunately, that extent is "Retrieve passwords to unlock on-disk keys", which doesn't help in our case. PKCS#11 comes to the rescue here! Apple ship a module called ssh-keychain.dylib, a PKCS#11 module that's intended to allow SSH to use keys that are present in the system keychain. Unfortunately it's not super well maintained - it got broken when Big Sur moved all the system libraries into a cache, but got fixed up a few releases later. Unfortunately every time I tested it with our CryptoTokenKit provider (and also when I retried with SecureEnclaveToken to make sure it wasn't just our code being broken), ssh would tell me "provider /usr/lib/ssh-keychain.dylib returned no slots" which is not especially helpful. Finally I realised that it was actually generating more debug output, but it was being sent to the system debug logs rather than the ssh debug output. Well, when I say "more debug output", I mean "Certificate [<private>]: algorithm is not supported, ignoring it"</private>, which still doesn't tell me all that much. So I stuck it in Ghidra and searched for that string, and the line above it was

iVar2 = __auth_stubs::_objc_msgSend(uVar7,"isEqual:",*(undefined8*)__got::_kSecAttrKeyTypeRSA);

with it immediately failing if the key isn't RSA. Which it isn't, since the Secure Enclave doesn't support RSA. Apple's PKCS#11 module appears incapable of making use of keys generated on Apple's hardware.

There's a couple of ways of dealing with this. The first, which is taken by projects like Secretive, is to implement the SSH agent protocol and have SSH delegate key management to that agent, which can then speak to the keychain. But if you want this to work in all cases you need to implement all the functionality in the existing ssh-agent, and that seems like a bunch of work. The second is to implement a PKCS#11 module, which sounds like less work but probably more mental anguish. I'll figure that out tomorrow.

comment count unavailable comments

X servers no longer allow byte-swapped clients (by default)

Posted by Peter Hutterer on January 17, 2023 10:20 PM

In the beginning, there was the egg. Then fictional people started eating that from different ends, and the terms of "little endians" and "Big Endians" was born.

Computer architectures (mostly) come with one of either byte order: MSB first or LSB first. The two are incompatible of course, and many a bug was introduced trying to convert between the two (or, more common: failing to do so). The two byte orders were termed Big Endian and little endian, because that hilarious naming scheme at least gives us something to laugh about while contemplating throwing it all away and considering a future as, I don't know, a strawberry plant.

Back in the mullet-infested 80s when the X11 protocol was designed both little endian and big endian were common enough. And back then running the X server on a different host than the client was common too - the X terminals back then had less processing power than a smart toilet seat today so the cpu-intensive clients were running on some mainfraime. To avoid overtaxing the poor mainframe already running dozens of clients for multiple users, the job of converting between the two byte orders was punted to the X server. So to this day whenever a client connects, the first byte it sends is a literal "l" or "B" to inform the server of the client's byte order. Where the byte order doesn't match the X server's byte order, the client is a "swapped client" in X server terminology and all 16, 32, and 64-bit values must be "byte-swapped" into the server's byte order. All of those values in all requests, and then again back to the client's byte order in all outgoing replies and events. Forever, till a crash do them part.

If you get one of those wrong, the number is no longer correct. And it's properly wrong too, the difference between 0x1 and 0x01000000 is rather significant. [0] Which has the hilarious side-effect of... well, pretty much anything. But usually it ranges from crashing the server (thus taking all other clients down in commiseration) to leaking random memory locations. The list of security issues affecting the various SProcFoo implementations (X server naming scheme for Swapped Procedure for request Foo) is so long that I'm too lazy to pull out the various security advisories and link to them. Just believe me, ok? *jedi handwave*

These days, encountering a Big Endian host is increasingly niche, letting it run an X client that connects to your local little-endian X server is even more niche [1]. I think the only regular real-world use-case for this is running X clients on an s390x, connecting to your local intel-ish (and thus little endian) workstation. Not something most users do on a regular basis. So right now, the byte-swapping code is mainly a free attack surface that 99% of users never actually use for anything real. So... let's not do that?

I just merged a PR into the X server repo that prohibits byte-swapped clients by default. A Big Endian client connecting to an X server will fail the connection with an error message of "Prohibited client endianess, see the Xserver man page". [2] Thus, a whole class of future security issues avoided - yay!

For the use-cases where you do need to let Big Endian clients connect to your little endian X server, you have two options: start your X server (Xorg, Xwayland, Xnest, ...) with the +byteswappedclients commandline option. Alternatively, and this only applies for Xorg: add Option "AllowByteSwappedClients" "on" to the xorg.conf ServerFlags section. Both of these will change the default back to the original setting. Both are documented in the Xserver(1) and xorg.conf(5) man pages, respectively.

Now, there's a drawback: in the Wayland stack, the compositor is in charge of starting Xwayland which means the compositor needs to expose a way of passing +byteswappedclients to Xwayland. This is compositor-specific, bugs are filed for mutter (merged for GNOME 44), kwin and wlroots. Until those are addressed, you cannot easily change this default (short of changing /usr/bin/Xwayland into a wrapper script that passes the option through).

There's no specific plan yet which X releases this will end up in, primarily because the release cycle for X is...undefined. Probably xserver-23.0 if and when that happens. It'll probably find its way into the xwayland-23.0 release, if and when that happens. Meanwhile, distributions interested in this particular change should consider backporting it to their X server version. This has been accepted as a Fedora 38 change.

[0] Also, it doesn't help that much of the X server's protocol handling code was written with the attitude of "surely the client wouldn't lie about that length value"
[1] little-endian client to Big Endian X server is so rare that it's barely worth talking about. But suffice to say, the exact same applies, just with little and big swapped around.
[2] That message is unceremoniously dumped to stderr, but that bit is unfortunately a libxcb issue.

libinput and the custom pointer acceleration function

Posted by Peter Hutterer on January 17, 2023 04:47 AM

After 8 months of work by Yinon Burgansky, libinput now has a new pointer acceleration profile: the "custom" profile. This profile allows users to tweak the exact response of their device based on their input speed.

A short primer: the pointer acceleration profile is a function that multiplies the incoming deltas with a given factor F, so that your input delta (x, y) becomes (Fx, Fy). How this is done is specific to the profile, libinput's existing profiles had either a flat factor or an adaptive factor that roughly resembles what Xorg used to have, see the libinput documentation for the details. The adaptive curve however has a fixed behaviour, all a user could do was scale the curve up/down, but not actually adjust the curve.

Input speed to output speed

The new custom filter allows exactly that: it allows a user to configure a completely custom ratio between input speed and output speed. That ratio will then influence the current delta. There is a whole new API to do this but simplified: the profile is defined via a series of points of (x, f(x)) that are linearly interpolated. Each point is defined as input speed in device units/ms to output speed in device units/ms. For example, to provide a flat acceleration equivalent, specify [(0.0, 0.0), (1.0, 1.0)]. With the linear interpolation this is of course a 45-degree function, and any incoming speed will result in the equivalent output speed.

Noteworthy: we are talking about the speed here, not any individual delta. This is not exactly the same as the flat acceleration profile (which merely multiplies the deltas by a constant factor) - it does take the speed of the device into account, i.e. device units moved per ms. For most use-cases this is the same but for particularly slow motion, the speed may be calculated across multiple deltas (e.g. "user moved 1 unit over 21ms"). This avoids some jumpyness at low speeds.

But because the curve is speed-based, it allows for some interesting features too: the curve [(0.0, 1.0), (1.0, 1.0)] is a horizontal function at 1.0. Which means that any input speed results in an output speed of 1 unit/ms. So regardless how fast the user moves the mouse, the output speed is always constant. I'm not immediately sure of a real-world use case for this particular case (some accessibility needs maybe) but I'm sure it's a good prank to play on someone.

Because libinput is written in C, the API is not necessarily immediately obvious but: to configure you pass an array of (what will be) y-values and set the step-size. The curve then becomes: [(0 * step-size, array[0]), (1 * step-size, array[1]), (2 * step-size, array[2]), ...]. There are some limitations on the number of points but they're high enough that they should not matter.

Note that any curve is still device-resolution dependent, so the same curve will not behave the same on two devices with different resolution (DPI). And since the curves uploaded by the user are hand-polished, the speed setting has no effect - we cannot possibly know how a custom curve is supposed to scale. The setting will simply update with the provided value and return that but the behaviour of the device won't change in response.

Motion types

Finally, there's another feature in this PR - the so-called "movement type" which must be set when defining a curve. Right now, we have two types, "fallback" and "motion". The "motion" type applies to, you guessed it, pointer motion. The only other type available is fallback which applies to everything but pointer motion. The idea here is of course that we can apply custom acceleration curves for various different device behaviours - in the future this could be scrolling, gesture motion, etc. And since those will have a different requirements, they can be configure separately.

How to use this?

As usual, the availability of this feature depends on your Wayland compositor and how this is exposed. For the Xorg + xf86-input-libinput case however, the merge request adds a few properties so that you can play with this using the xinput tool:

  # Set the flat-equivalent function described above
  $ xinput set-prop "devname" "libinput Accel Custom Motion Points" 0.0 1.0
  # Set the step, i.e. the above points are on 0 u/ms, 1 u/ms, ...
  # Can be skipped, 1.0 is the default anyway
  $ xinput set-prop "devname" "libinput Accel Custom Motion Points" 1.0 
  # Now enable the custom profile
  $ xinput set-prop "devname" "libinput Accel Profile Enabled" 0 0 1
The above sets a custom pointer accel for the "motion" type. Setting it for fallback is left as an exercise to the reader (though right now, I think the fallback curve is pretty much only used if there is no motion curve defined).

Happy playing around (and no longer filing bug reports if you don't like the default pointer acceleration ;)


This custom profile will be available in libinput 1.23 and xf86-input-libinput-1.3.0. No release dates have been set yet for either of those.

Blogging and microblogging

Posted by Matthew Garrett on January 15, 2023 10:40 PM
Long-term Linux users may remember that Alan Cox used to write an online diary. This was before the concept of a "Weblog" had really become a thing, and there certainly weren't any expectations around what one was used for - while now blogging tends to imply a reasonably long-form piece on a specific topic, Alan was just sitting there noting small life concerns or particular technical details in interesting problems he'd solved that day. For me, that was fascinating. I was trying to figure out how to get into kernel development, and was trying to read as much LKML as I could to figure out how kernel developers did stuff. But when you see discussion on LKML, you're frequently missing the early stages. If an LKML patch is a picture of an owl, I wanted to know how to draw the owl, and most of the conversations about starting in kernel development were very "Draw two circles. Now draw the rest of the owl". Alan's musings gave me insight into the thought processes involved in getting from "Here's the bug" to "Here's the patch" in ways that really wouldn't have worked in a more long-form medium.

For the past decade or so, as I moved away from just doing kernel development and focused more on security work instead, Twitter's filled a similar role for me. I've seen people just dumping their thought process as they work through a problem, helping me come up with effective models for solving similar problems. I've learned that the smartest people in the field will spend hours (if not days) working on an issue before realising that they misread something back at the beginning and that's helped me feel like I'm not unusually bad at any of this. It's helped me learn more about my peers, about my field, and about myself.

Twitter's now under new ownership that appears to think all the worst bits of Twitter were actually the good bits, so I've mostly bailed to the Fediverse instead. There's no intrinsic length limit on posts there - Mastodon defaults to 500 characters per post, but that's configurable per instance. But even at 500 characters, it means there's more room to provide thoughtful context than there is on Twitter, and what I've seen so far is more detailed conversation and higher levels of meaningful engagement. Which is great! Except it also seems to discourage some of the posting style that I found so valuable on Twitter - if your timeline is full of nuanced discourse, it feels kind of rude to just scream "THIS FUCKING PIECE OF SHIT IGNORES THE HIGH ADDRESS BIT ON EVERY OTHER WRITE" even though that's exactly the sort of content I'm there for.

And, yeah, not everything has to be for me. But I worry that as Twitter's relevance fades for the people I'm most interested in, we're replacing it with something that's not equivalent - something that doesn't encourage just dropping 50 characters or so of your current thought process into a space where it can be seen by thousands of people. And I think that's a shame.

comment count unavailable comments

Integrating Linux with Okta Device Trust

Posted by Matthew Garrett on January 10, 2023 05:48 AM
I've written about bearer tokens and how much pain they cause me before, but sadly wishing for a better world doesn't make it happen so I'm making do with what's available. Okta has a feature called Device Trust which allows to you configure access control policies that prevent people obtaining tokens unless they're using a trusted device. This doesn't actually bind the tokens to the hardware in any way, so if a device is compromised or if a user is untrustworthy this doesn't prevent the token ending up on an unmonitored system with no security policies. But it's an incremental improvement, other than the fact that for desktop it's only supported on Windows and MacOS, which really doesn't line up well with my interests.

Obviously there's nothing fundamentally magic about these platforms, so it seemed fairly likely that it would be possible to make this work elsewhere. I spent a while staring at the implementation using Charles Proxy and the Chrome developer tools network tab and had worked out a lot, and then Okta published a paper describing a lot of what I'd just laboriously figured out. But it did also help clear up some points of confusion and clarified some design choices. I'm not going to give a full description of the details (with luck there'll be code shared for that before too long), but here's an outline of how all of this works. Also, to be clear, I'm only going to talk about the desktop support here - mobile is a bunch of related but distinct things that I haven't looked at in detail yet.

Okta's Device Trust (as officially supported) relies on Okta Verify, a local agent. When initially installed, Verify authenticates as the user, obtains a token with a scope that allows it to manage devices, and then registers the user's computer as an additional MFA factor. This involves it generating a JWT that embeds a number of custom claims about the device and its state, including things like the serial number. This JWT is signed with a locally generated (and hardware-backed, using a TPM or Secure Enclave) key, which allows Okta to determine that any future updates from a device claiming the same identity are genuinely from the same device (you could construct an update with a spoofed serial number, but you can't copy the key out of a TPM so you can't sign it appropriately). This is sufficient to get a device registered with Okta, at which point it can be used with Fastpass, Okta's hardware-backed MFA mechanism.

As outlined in the aforementioned deep dive paper, Fastpass is implemented via multiple mechanisms. I'm going to focus on the loopback one, since it's the one that has the strongest security properties. In this mode, Verify listens on one of a list of 10 or so ports on localhost. When you hit the Okta signin widget, choosing Fastpass triggers the widget into hitting each of these ports in turn until it finds one that speaks Fastpass and then submits a challenge to it (along with the URL that's making the request). Verify then constructs a response that includes the challenge and signs it with the hardware-backed key, along with information about whether this was done automatically or whether it included forcing the user to prove their presence. Verify then submits this back to Okta, and if that checks out Okta completes the authentication.

Doing this via loopback from the browser has a bunch of nice properties, primarily around the browser providing information about which site triggered the request. This means the Verify agent can make a decision about whether to submit something there (ie, if a fake login widget requests your creds, the agent will ignore it), and also allows the issued token to be cross-checked against the site that requested it (eg, if g1thub.com requests a token that's valid for github.com, that's a red flag). It's not quite at the same level as a hardware WebAuthn token, but it has many of the anti-phishing properties.

But none of this actually validates the device identity! The entire registration process is up to the client, and clients are in a position to lie. Someone could simply reimplement Verify to lie about, say, a device serial number when registering, and there'd be no proof to the contrary. Thankfully there's another level to this to provide stronger assurances. Okta allows you to provide a CA root[1]. When Okta issues a Fastpass challenge to a device the challenge includes a list of the trusted CAs. If a client has a certificate that chains back to that, it can embed an additional JWT in the auth JWT, this one containing the certificate and signed with the certificate's private key. This binds the CA-issued identity to the Fastpass validation, and causes the device to start appearing as "Managed" in the Okta device management UI. At that point you can configure policy to restrict various apps to managed devices, ensuring that users are only able to get tokens if they're using a device you've previously issued a certificate to.

I've managed to get Linux tooling working with this, though there's still a few drawbacks. The main issue is that the API only allows you to register devices that declare themselves as Windows or MacOS, followed by the login system sniffing browser user agent and only offering Fastpass if you're on one of the officially supported platforms. This can be worked around with an extension that spoofs user agent specifically on the login page, but that's still going to result in devices being logged as a non-Linux OS which makes interpreting the logs more difficult. There's also no ability to choose which bits of device state you log: there's a couple of existing integrations, and otherwise a fixed set of parameters that are reported. It'd be lovely to be able to log arbitrary material and make policy decisions based on that.

This also doesn't help with ChromeOS. There's no real way to automatically launch something that's bound to localhost (you could probably make this work using Crostini but there's no way to launch a Crostini app at login), and access to hardware-backed keys is kind of a complicated topic in ChromeOS for privacy reasons. I haven't tried this yet, but I think using an enterprise force-installed extension and the chrome.enterprise.platformKeys API to obtain a device identity cert and then intercepting requests to the appropriate port range on localhost ought to be enough to do that? But I've literally never written any Javascript so I don't know. Okta supports falling back from the loopback protocol to calling a custom URI scheme, but once you allow that you're also losing a bunch of the phishing protection, so I'd prefer not to take that approach.

Like I said, none of this prevents exfiltration of bearer tokens once they've been issued, and there's still a lot of ecosystem work to do there. But ensuring that tokens can't be issued to unmanaged machines in the first place is still a step forwards, and with luck we'll be able to make use of this on Linux systems without relying on proprietary client-side tooling.

(Time taken to code this implementation: about two days, and under 1000 lines of new code. Time taken to figure out what the fuck to write: rather a lot longer)

[1] There's also support for having Okta issue certificates, but then you're kind of back to the "How do I know this is my device" situation

comment count unavailable comments

Asking ChatGPT to write my security-sensitive code for me

Posted by Matthew Garrett on January 07, 2023 09:09 AM
I have a patchset that makes it safe to use hibernation under Linux even in a secure boot world but it relies on preventing userland from using one of the TPM PCRs[1] that would otherwise be available to it. TPM 2 devices support emulating PCRs in NVRAM, which would obviously be beneficial in not taking away a scare resource, but the docs suggest there's no way to tie a key generation to an NVRAM value which makes this more complicated.

So, why not ask a computer to do this for me? I asked ChatGPT to write me some code that would bind a TPM key to an NVRAM value. After a couple of false starts it told me:To bind an encryption key to the value of a TPM NV index, you can use the TPM2_NV_DefineSpace command to define an NV index with the TPMA_NV_BIND attribute. Woo! That sounds perfect. Only one problem - I can't actually find the TPMA_NV_BIND attribute defined anywhere in the spec. Well, ask the AI again.

Here is the definition of TPMA_NV_BIND from the TPM 2.0 specification:
TPMA_NV_BIND (BIT 17) If SET, the contents of the NV Index are bound to the platform hierarchy.

Ok, that sounds convincing. Where is it in the spec?

The definition of TPMA_NV_BIND is on page 126 of the TPM 2.0 specification, in section 9.7.1 "TPMA_NV".

Huh. Nope. Page 126 says nothing about this. Section 9.7.1 says nothing about this. The TPMA_NV bits are actually described on page 168 of volume 2 of the spec, in section 13.4. Bit 17 is TPMA_NV_OWNERREAD. TPMA_NV_BIND does not appear in the spec anywhere. I told the AI that.

I apologize for the error in my previous response. You are correct that the TPMA_NV_BIND attribute does not appear in the TPM 2.0 specification.

To bind an encryption key to the value of a TPM NV index, you can use the TPM2_NV_DefineSpace command with the TPMA_NV_BIND attribute.

I'm not afraid of being unemployed in the near future.

[1] "Platform Configuration Registers" - a modern TPM has 24 of these. When something security sensitive happens, you generate a hash of that event and pass it to the TPM. The TPM appends that to an existing PCR value and then hashes that concatenated value and sets the PCR to that. This means the PCR value depends not only on the values provided, but also the order they're provided in. Various TPM operations can be made conditional on the PCR values meeting specific criteria.

comment count unavailable comments

Changing firmware config that doesn't want to be changed

Posted by Matthew Garrett on January 05, 2023 04:28 AM
Update: There's actually a more detailed writeup of this here that I somehow missed. Original entry follows:

Today I had to deal with a system that had an irritating restriction - a firmware configuration option I really wanted to be able to change appeared as a greyed out entry in the configuration menu. Some emails revealed that this was a deliberate choice on the part of the system vendor, so that seemed to be that. Thankfully in this case there was a way around that.

One of the things UEFI introduced was a mechanism to generically describe firmware configuration options, called Visual Forms Representation (or VFR). At the most straightforward level, this lets you define a set of forms containing questions, with each question associated with a value in a variable. Questions can be made dependent upon the answers to other questions, so you can have options that appear or disappear based on how other questions were answered. An example in this language might be something like:
CheckBox Prompt: "Console Redirection", Help: "Console Redirection Enable or Disable.", QuestionFlags: 0x10, QuestionId: 53, VarStoreId: 1, VarStoreOffset: 0x39, Flags: 0x0
In which question 53 asks whether console redirection should be enabled or disabled. Other questions can then rely on the answer to question 53 to influence whether or not they're relevant (eg, if console redirection is disabled, there's no point in asking which port it should be redirected to). As a checkbox, if it's set then the value will be set to 1, and 0 otherwise. But where's that stored? Earlier we have another declaration:
VarStore GUID: EC87D643-EBA4-4BB5-A1E5-3F3E36B20DA9, VarStoreId: 1, Size: 0xF4, Name: "Setup"
A UEFI variable called "Setup" and with GUID EC87D643-EBA4-4BB5-A1E5-3F3E36B20DA9 is declared as VarStoreId 1 (matching the declaration in the question) and is 0xf4 bytes long. The question indicates that the offset for that variable is 0x39. Rewriting Setup-EC87D643-EBA4-4BB5-A1E5-3F3E36B20DA9 with a modified value in offset 0x39 will allow direct manipulation of the config option.

But how do we get this data in the first place? VFR isn't built into the firmware directly - instead it's turned into something called Intermediate Forms Representation, or IFR. UEFI firmware images are typically in a standardised format, and you can use UEFITool to extract individual components from that firmware. If you use UEFITool to search for "Setup" there's a good chance you'll be able to find the component that implements the setup UI. Running IFRExtractor-RS against it will then pull out any IFR data it finds, and decompile that into something resembling the original VFR. And now you have the list of variables and offsets and the configuration associated with them, even if your firmware has chosen to hide those options from you.

Given that a bunch of these config values may be security relevant, this seems a little concerning - what stops an attacker who has access to the OS from simply modifying these variables directly? UEFI avoids this by having two separate stages of boot, one where the full firmware ("Boot Services") is available, and one where only a subset ("Runtime Services") is available. The transition is triggered by the OS calling ExitBootServices, indicating the handoff from the firmware owning the hardware to the OS owning the hardware. This is also considered a security boundary - before ExitBootServices everything running has been subject to any secure boot restrictions, and afterwards applications can do whatever they want. UEFI variables can be flagged as being visible in both Boot and Runtime Services, or can be flagged as Boot Services only. As long as all the security critical variables are Boot Services only, an attacker should never be able to run untrusted code that could alter them.

In my case, the firmware option I wanted to alter had been enclosed in "GrayOutIf True" blocks. But the questions were still defined and the code that acted on those options was still present, so simply modifying the variables while still inside Boot Services gave me what I wanted. Note that this isn't a given! The presence of configuration options in the IFR data doesn't mean that anything will later actually read and make use of that variable - a vendor may have flagged options as unavailable and then removed the code, but never actually removed the config data. And also please do note that the reason stuff was removed may have been that it doesn't actually work, and altering any of these variables risks bricking your hardware in a way that's extremely difficult to recover. And there's also no requirement that vendors use IFR to describe their configuration, so you may not get any help here anyway.

In summary: if you do this you may break your computer. If you don't break your computer, it might not work anyway. I'm not going to help you try to break your computer. And I didn't come up with any of this, I just didn't find it all written down in one place while I was researching it.

comment count unavailable comments

Crosswords: Puzzle update

Posted by Jonathan Blandford on January 03, 2023 08:38 AM

Happy New Year, Crossword-lovers!

This is a minor update in preparation for a bigger release in the next few weeks. We’ve done a lot of exciting work on Crosswords over the past few months that’s almost ready for release. However, there are a few things available already that I wanted to highlight early.


There are two updates to the puzzle sets. First, George Ho kindly agreed to add his puzzles to the contrib puzzle-set. He’s written ten fun puzzles with the promise of more in the future! I really enjoyed solving them. There are also a few additional not-for-distribution ones available at his site.

<figure aria-describedby="caption-attachment-7175" class="wp-caption aligncenter" id="attachment_7175" style="width: 248px"><figcaption class="wp-caption-text" id="caption-attachment-7175">loplop puzzle set</figcaption></figure>

George has put together a truly impressive dataset of cryptic crossword clues. He’s gathered and categorized half-a-million clues over twelve years of puzzle data. I’m really impressed with this work and I’m looking to integrate some of that data into the Crossword Editor this coming year.

Second, I updated the xword-dl plugin to work with latest version of that app. Those folks have been really busy the past few months, and over ten more puzzle sets were added to it, including our first German source (Hint: a German translation would be appreciated). Our launch screen has gotten a lot fuller with both downloaders and shipped puzzles: we now have over 50 possible puzzle sets for playing across three languages.

Despite the ritual pain and suffering  with pip to get it building in a flatpak (NB: still relevant), I managed to get it updated. I’m really impressed at how that project grew in 2022. Great work!

These are available at flathub.

Developer Documentation

Federico gave me an early Christmas gift this year! He set up a developer guide for Crosswords to be built automatically from CI. It’s gorgeous, and hopefully will entice folks to join the rest of us working on the app.

A major goal I’ve had while writing this is to keep it well documented. A few weeks ago, Federico wrote beautifully about the power of documentation. As the game has grown past 30KLOC I’ve found it super valuable to reread what I wrote when revisiting older code sections. Writing the design docs has had a clarifying effect too: I had to write the WordList docs before I could finish that code, and I refresh my memory from it every time I have to touch it.

However, one thing I didn’t have was a good way to tie it all together. Now that we have this site, it’s possible to explain how things fit together. I spent some time this vacation putting together some introductory docs to make it easier to get started with the code base. If you have any interest in contributing, check them out.

What’s next?

We are gearing up for a big release in a few weeks with a lot of changes. They’re mostly internal cleanups, but these changes are already making some new features much easier to write. Here’s a preview of one them that’s in progress:

<figure aria-describedby="caption-attachment-7178" class="wp-caption aligncenter" id="attachment_7178" style="width: 300px"><figcaption class="wp-caption-text" id="caption-attachment-7178">Can you guess what this will be?</figcaption></figure>

The other thing I’d like to do is to hold a virtual hackfest this weekend. The exact time and date is still TBD pending some availability, but I’ve been meaning to get together with a few folks to work on some of the code together. I don’t know how well this will work virtually, but its worth a shot to open it up to a wider audience.

If you’re interested at all in word games and free software, feel free to drop by #crosswords. While there, let us know if you’re interested in the hackfest. All people are welcome. Come add a feature, add a translation, or try your hand at writing a puzzle. Or just hang out and tell us your favorite clues.

Until next time!

Download on FLATHUB

Off Twitter for a bit

Posted by Matthew Garrett on December 18, 2022 03:25 AM
Turns out that linking to several days old public data in order to demonstrate that Elon's jet was broadcasting its tail number in the clear is apparently "posting private information" so for anyone looking for me there I'm actually here

comment count unavailable comments

Trying to remove the need to trust cloud providers

Posted by Matthew Garrett on December 13, 2022 09:19 PM
First up: what I'm covering here is probably not relevant for most people. That's ok! Different situations have different threat models, and if what I'm talking about here doesn't feel like you have to worry about it, that's great! Your life is easier as a result. But I have worked in situations where we had to care about some of the scenarios I'm going to describe here, and the technologies I'm going to talk about here solve a bunch of these problems.

So. You run a typical VM in the cloud. Who has access to that VM? Well, firstly, anyone who has the ability to log into the host machine with administrative capabilities. With enough effort, perhaps also anyone who has physical access to the host machine. But the hypervisor also has the ability to inspect what's running inside a VM, so anyone with the ability to install a backdoor into the hypervisor could theoretically target you. And who's to say the cloud platform launched the correct image in the first place? The control plane could have introduced a backdoor into your image and run that instead. Or the javascript running in the web UI that you used to configure the instance could have selected a different image without telling you. Anyone with the ability to get a (cleverly obfuscated) backdoor introduced into quite a lot of code could achieve that. Obviously you'd hope that everyone working for a cloud provider is honest, and you'd also hope that their security policies are good and that all code is well reviewed before being committed. But when you have several thousand people working on various components of a cloud platform, there's always the potential for something to slip up.

Let's imagine a large enterprise with a whole bunch of laptops used by developers. If someone has the ability to push a new package to one of those laptops, they're in a good position to obtain credentials belonging to the user of that laptop. That means anyone with that ability effectively has the ability to obtain arbitrary other privileges - they just need to target someone with the privilege they want. You can largely mitigate this by ensuring that the group of people able to do this is as small as possible, and put technical barriers in place to prevent them from pushing new packages unilaterally.

Now imagine this in the cloud scenario. Anyone able to interfere with the control plane (either directly or by getting code accepted that alters its behaviour) is in a position to obtain credentials belonging to anyone running in that cloud. That's probably a much larger set of people than have the ability to push stuff to laptops, but they have much the same level of power. You'll obviously have a whole bunch of processes and policies and oversights to make it difficult for a compromised user to do such a thing, but if you're a high enough profile target it's a plausible scenario.

How can we avoid this? The easiest way is to take the people able to interfere with the control plane out of the loop. The hypervisor knows what it booted, and if there's a mechanism for the VM to pass that information to a user in a trusted way, you'll be able to detect the control plane handing over the wrong image. This can be achieved using trusted boot. The hypervisor-provided firmware performs a "measurement" (basically a cryptographic hash of some data) of what it's booting, storing that information in a virtualised TPM. This TPM can later provide a signed copy of the measurements on demand. A remote system can look at these measurements and determine whether the system is trustworthy - if a modified image had been provided, the measurements would be different. As long as the hypervisor is trustworthy, it doesn't matter whether or not the control plane is - you can detect whether you were given the correct OS image, and you can build your trust on top of that.

(Of course, this depends on you being able to verify the key used to sign those measurements. On real hardware the TPM has a certificate that chains back to the manufacturer and uniquely identifies the TPM. On cloud platforms you typically have to retrieve the public key via the metadata channel, which means you're trusting the control plane to give you information about the hypervisor in order to verify what the control plane gave to the hypervisor. This is suboptimal, even though realistically the number of moving parts in that part of the control plane is much smaller than the number involved in provisioning the instance in the first place, so an attacker managing to compromise both is less realistic. Still, AWS doesn't even give you that, which does make it all rather more complicated)

Ok, so we can (largely) decouple our trust in the VM from having to trust the control plane. But we're still relying on the hypervisor to provide those attestations. What if the hypervisor isn't trustworthy? This sounds somewhat ridiculous (if you can't run a trusted app on top of an untrusted OS, how can you run a trusted OS on top of an untrusted hypervisor?), but AMD actually have a solution for that. SEV ("Secure Encrypted Virtualisation") is a technology where (handwavily) an encryption key is generated when a new VM is created, and the memory belonging to that VM is encrypted with that key. The hypervisor has no access to that encryption key, and any access to memory initiated by the hypervisor will only see the encrypted content. This means that nobody with the ability to tamper with the hypervisor can see what's going on inside the OS (and also means that nobody with physical access can either, so that's another threat dealt with).

But how do we know that the hypervisor set this up, and how do we know that the correct image was booted? SEV has support for a "Launch attestation", a CPU generated signed statement that it booted the current VM with SEV enabled. But it goes further than that! The attestation includes a measurement of what was booted, which means we don't need to trust the hypervisor at all - the CPU itself will tell us what image we were given. Perfect.

Except, well. There's a few problems. AWS just doesn't have any VMs that implement SEV yet (there are bare metal instances that do, but obviously you're building your own infrastructure to make that work). Google only seem to provide the launch measurement via the logging service - and they only include the parsed out data, not the original measurement. So, we still have to trust (a subset of) the control plane. Azure provides it via a separate attestation service, but again it doesn't seem to provide the raw attestation and so you're still trusting the attestation service. For the newest generation of SEV, SEV-SNP, this is less of a big deal because the guest can provide its own attestation. But Google doesn't offer SEV-SNP hardware yet, and the driver you need for this only shipped in Linux 5.19 and Azure's SEV Ubuntu images only offer up to 5.15 at the moment, so making use of that means you're putting your own image together at the moment.

And there's one other kind of major problem. A normal VM image provides a bootloader and a kernel and a filesystem. That bootloader needs to run on something. That "something" is typically hypervisor-provided "firmware" - for instance, OVMF. This probably has some level of cloud vendor patching, and they probably don't ship the source for it. You're just having to trust that the firmware is trustworthy, and we're talking about trying to avoid placing trust in the cloud provider. Azure has a private beta allowing users to upload images that include their own firmware, meaning that all the code you trust (outside the CPU itself) can be provided by the user, and once that's GA it ought to be possible to boot Azure VMs without having to trust any Microsoft-provided code.

Well, mostly. As AMD admit, SEV isn't guaranteed to be resistant to certain microarchitectural attacks. This is still much more restrictive than the status quo where the hypervisor could just read arbitrary content out of the VM whenever it wanted to, but it's still not ideal. Which, to be fair, is where we are with CPUs in general.

(Thanks to Leonard Cohnen who gave me a bunch of excellent pointers on this stuff while I was digging through it yesterday)

comment count unavailable comments

libei - opening the portal doors

Posted by Peter Hutterer on December 13, 2022 03:24 AM

Time for another status update on libei, the transport layer for bouncing emulated input events between applications and Wayland compositors [1]. And this time it's all about portals and how we're about to use them for libei communication. I've hinted at this in the last post, but of course you're forgiven if you forgot about this in the... uhm.. "interesting" year that was 2022. So, let's recap first:

Our basic premise is that we want to emulate and/or capture input events in the glorious new world that is Wayland (read: where applications can't do whatever they want, whenever they want). libei is a C library [0] that aims to provide this functionality. libei supports "sender" and "receiver" contexts and that just specifies which way the events will flow. A sender context (e.g. xdotool) will send emulated input events to the compositor, a "receiver" context will - you'll never guess! - receive events from the compositor. If you have the InputLeap [2] use-case, the server-side will be a receiver context, the client side a sender context. But libei is really just the transport layer and hasn't had that many changes since the last post - most of the effort was spent on trying to figure out how to exchange the socket between different applications. And for that, we have portals!


In particular, we have a PR for the RemoteDesktop portal to add that socket exchange. In particular, once a RemoteDesktop session starts your application can request an EIS socket and send input events over that. This socket supersedes the current NotifyButton and similar DBus calls and removes the need for the portal to stay in the middle - the application and compositor now talk directly to each other. The compositor/portal can still close the session at any time though, so all the benefits of a portal stay there. The big advantage of integrating this into RemoteDesktop is that the infrastructucture for that is already mostly in place - once your compositor adds the bits for the new ConnectToEIS method you get all the other pieces for free. In GNOME this includes a visual indication that your screen is currently being remote-controlled, same as from a real RemoteDesktop session.

Now, talking to the RemoteDesktop portal is nontrivial simply because using DBus is nontrivial, doubly so for the way how sessions and requests work in the portals. To make this easier, libei 0.4.1 now includes a new library "liboeffis" that enables your application to catch the DBus. This library has a very small API and can easily be integrated with your mainloop (it's very similar to libei). We have patches for Xwayland to use that and it's really trivial to use. And of course, with the other Xwayland work we already had this means we can really run xdotool through Xwayland to connect through the XDG Desktop Portal as a RemoteDesktop session and move the pointer around. Because, kids, remember, uhm, Unix is all about lots of separate pieces.


On to the second mode of libei - the receiver context. For this, we also use a portal but a brand new one: the InputCapture portal. The InputCapture portal is the one to use to decide when input events should be captured. The actual events are then sent over the EIS socket.

Right now, the InputCapture portal supports PointerBarriers - virtual lines on the screen edges that, once crossed, trigger input capture for a capability (e.g. pointer + keyboard). And an application's basic approach is to request a (logical) representation of the available desktop areas ("Zones") and then set up pointer barriers at the edge(s) of those Zones. Get the EIS connection, Enable() the session and voila - the compositor will (hopefully) send input events when the pointer crosses one of those barriers. Once that happens you'll get a DBus signal in InputCapture and the events will start flowing on the EIS socket. The portal itself doesn't need to sit in the middle, events go straight to the application. The portal can still close the session anytime though. And the compositor can decide to stop capturing events at any time.

There is actually zero Wayland-y code in all this, it's display-system acgnostic. So anyone with too much motivation could add this to the X server too. Because that's what the world needs...

The (currently) bad news is that this needs to be pulled into a lot of different repositories. And everything needs to get ready before it can be pulled into anything to make sure we don't add broken API to any of those components. But thanks to a lot of work by Olivier Fourdan, we have this mostly working in InputLeap (tbh the remaining pieces are largely XKB related, not libei-related). Together with the client implementation (through RemoteDesktop) we can move pointers around like in the InputLeap of old (read: X11).

Our current goal is for this to be ready for GNOME 45/Fedora 39.

[0] eventually a protocol but we're not there yet
[1] It doesn't actually have to be a compositor but that's the prime use-case, so...
[2] or barrier or synergy. I'll stick with InputLeap for this post

Quick update on Pluton and Linux

Posted by Matthew Garrett on December 12, 2022 12:12 PM
I've been ridiculously burned out for a while now but I'm taking the month off to recover and that's giving me an opportunity to catch up on a lot of stuff. This has included me actually writing some code to work with the Pluton in my Thinkpad Z13. I've learned some more stuff in the process, but based on everything I know I'd still say that in its current form Pluton isn't a threat to free software.

So, first up: by default on the Z13, Pluton is disabled. It's not obviously exposed to the OS at all, which also means there's no obvious mechanism for Microsoft to push out a firmware update to it via Windows Update. The Windows drivers that bind to Pluton don't load in this configuration. It's theoretically possible that there's some hidden mechanism to re-enable it at runtime, but that code doesn't seem to be in Windows at the moment. I'm reasonably confident that "Disabled" is pretty genuinely disabled.

Second, when enabled, Pluton exposes two separate devices. The first of these has an MSFT0101 identifier in ACPI, which is the ID used for a TPM 2 device. The Pluton TPM implementation doesn't work out of the box with existing TPM 2 drivers, though, because it uses a custom start method. TPM 2 devices commonly use something called a "Command Response Buffer" architecture, where a command is written into a buffer, the TPM is told to do a thing, and the response to the command ends up in another buffer. The mechanism to tell the TPM to do a thing varies, and an ACPI table exposed to the OS defines which of those various things should be used for a given TPM. Pluton systems have a mechanism that isn't defined in the existing version of the spec (1.3 rev 8 at the time of writing), so I had to spend a while staring hard at the Windows drivers to figure out how to implement it. The good news is that I now have a patch that successfully gets the existing Linux TPM driver code work correctly with the Pluton implementation.

The second device has an MSFT0200 identifier, and is entirely not a TPM. The Windows driver appears to be a relatively thin layer that simply takes commands from userland and passes them on to the chip - I haven't found any userland applications that make use of this, so it's tough to figure out what functionality is actually available. But what does seem pretty clear from the code I've looked at is that it's a component that only responds when it's asked - if the OS never sends it any commands, it's not able to do anything.

One key point from this recently published Microsoft doc is that the whole "Microsoft can update Pluton firmware" thing does just seem to be the ability for the OS to push new code to the chip at runtime. That means Microsoft can't arbitrarily push new firmware to the chip - the OS needs to be involved. This is unsurprising, but it's nice to see some stronger confirmation of that.

Anyway. tl;dr - Pluton can (now) be used as a regular TPM. Pluton also exposes some additional functionality which is not yet clear, but there's no obvious mechanism for it to compromise user privacy or restrict what users can run on a Free operating system. The Pluton firmware update mechanism appears to be OS mediated, so users who control their OS can simply choose not to opt in to that.

comment count unavailable comments

On-device WebAuthn and what makes it hard to do well

Posted by Matthew Garrett on December 10, 2022 10:41 AM
WebAuthn improves login security a lot by making it significantly harder for a user's credentials to be misused - a WebAuthn token will only respond to a challenge if it's issued by the site a secret was issued to, and in general will only do so if the user provides proof of physical presence[1]. But giving people tokens is tedious and also I have a new laptop which only has USB-C but does have a working fingerprint reader and I hate the aesthetics of the Yubikey 5C Nano, so I've been thinking about what WebAuthn looks like done without extra hardware.

Let's talk about the broad set of problems first. For this to work you want to be able to generate a key in hardware (so it can't just be copied elsewhere if the machine is compromised), prove to a remote site that it's generated in hardware (so the remote site isn't confused about what security assertions you're making), and tie use of that key to the user being physically present (which may range from "I touched this object" to "I presented biometric evidence of identity"). What's important here is that a compromised OS shouldn't be able to just fake a response. For that to be possible, the chain between proof of physical presence to the secret needs to be outside the control of the OS.

For a physical security token like a Yubikey, this is pretty easy. The communication protocol involves the OS passing a challenge and the source of the challenge to the token. The token then waits for a physical touch, verifies that the source of the challenge corresponds to the secret it's being asked to respond to the challenge with, and provides a response. At the point where keys are being enrolled, the token can generate a signed attestation that it generated the key, and a remote site can then conclude that this key is legitimately sequestered away from the OS. This all takes place outside the control of the OS, meeting all the goals described above.

How about Macs? The easiest approach here is to make use of the secure enclave and TouchID. The secure enclave is a separate piece of hardware built into either a support chip (for x86-based Macs) or directly on the SoC (for ARM-based Macs). It's capable of generating keys and also capable of producing attestations that said key was generated on an Apple secure enclave ("Apple Anonymous Attestation", which has the interesting property of attesting that it was generated on Apple hardware, but not which Apple hardware, avoiding a lot of privacy concerns). These keys can have an associated policy that says they're only usable if the user provides a legitimate touch on the fingerprint sensor, which means it can not only assert physical presence of a user, it can assert physical presence of an authorised user. Communication between the fingerprint sensor and the secure enclave is a private channel that the OS can't meaningfully interfere with, which means even a compromised OS can't fake physical presence responses (eg, the OS can't record a legitimate fingerprint press and then send that to the secure enclave again in order to mimic the user being present - the secure enclave requires that each response from the fingerprint sensor be unique). This achieves our goals.

The PC space is more complicated. In the Mac case, communication between the biometric sensors (be that TouchID or FaceID) occurs in a controlled communication channel where all the hardware involved knows how to talk to the other hardware. In the PC case, the typical location where we'd store secrets is in the TPM, but TPMs conform to a standardised spec that has no understanding of this sort of communication, and biometric components on PCs have no way to communicate with the TPM other than via the OS. We can generate keys in the TPM, and the TPM can attest to those keys being TPM-generated, which means an attacker can't exfiltrate those secrets and mimic the user's token on another machine. But in the absence of any explicit binding between the TPM and the physical presence indicator, the association needs to be up to code running on the CPU. If that's in the OS, an attacker who compromises the OS can simply ask the TPM to respond to an challenge it wants, skipping the biometric validation entirely.

Windows solves this problem in an interesting way. The Windows Hello Enhanced Signin doesn't add new hardware, but relies on the use of virtualisation. The agent that handles WebAuthn responses isn't running in the OS, it's running in another VM that's entirely isolated from the OS. Hardware that supports this model has a mechanism for proving its identity to the local code (eg, fingerprint readers that support this can sign their responses with a key that has a certificate that chains back to Microsoft). Additionally, the secrets that are associated with the TPM can be held in this VM rather than in the OS, meaning that the OS can't use them directly. This means we have a flow where a browser asks for a WebAuthn response, that's passed to the VM, the VM asks the biometric device for proof of user presence (including some sort of random value to prevent the OS just replaying that), receives it, and then asks the TPM to generate a response to the challenge. Compromising the OS doesn't give you the ability to forge the responses between the biometric device and the VM, and doesn't give you access to the secrets in the TPM, so again we meet all our goals.

On Linux (and other free OSes), things are less good. Projects like tpm-fido generate keys on the TPM, but there's no secure channel between that code and whatever's providing proof of physical presence. An attacker who compromises the OS may not be able to copy the keys to their own system, but while they're on the compromised system they can respond to as many challenges as they like. That's not the same security assertion we have in the other cases.

Overall, Apple's approach is the simplest - having binding between the various hardware components involved means you can just ignore the OS entirely. Windows doesn't have the luxury of having as much control over what the hardware landscape looks like, so has to rely on virtualisation to provide a security barrier against a compromised OS. And in Linux land, we're fucked. Who do I have to pay to write a lightweight hypervisor that runs on commodity hardware and provides an environment where we can run this sort of code?

[1] As I discussed recently there are scenarios where these assertions are less strong, but even so

comment count unavailable comments

End-to-end encrypted messages need more than libsignal

Posted by Matthew Garrett on December 09, 2022 06:17 AM
(Disclaimer: I'm not a cryptographer, and I do not claim to be an expert in Signal. I've had this read over by a couple of people who are so with luck there's no egregious errors, but any mistakes here are mine)

There are indications that Twitter is working on end-to-end encrypted DMs, likely building on work that was done back in 2018. This made use of libsignal, the reference implementation of the protocol used by the Signal encrypted messaging app. There seems to be a fairly widespread perception that, since libsignal is widely deployed (it's also the basis for WhatsApp's e2e encryption) and open source and has been worked on by a whole bunch of cryptography experts, choosing to use libsignal means that 90% of the work has already been done. And in some ways this is true - the security of the protocol is probably just fine. But there's rather more to producing a secure and usable client than just sprinkling on some libsignal.

(Aside: To be clear, I have no reason to believe that the people who were working on this feature in 2018 were unaware of this. This thread kind of implies that the practical problems are why it didn't ship at the time. Given the reduction in Twitter's engineering headcount, and given the new leadership's espousal of political and social perspectives that don't line up terribly well with the bulk of the cryptography community, I have doubts that any implementation deployed in the near future will get all of these details right)

I was musing about this last night and someone pointed out some prior art. Bridgefy is a messaging app that uses Bluetooth as its transport layer, allowing messaging even in the absence of data services. The initial implementation involved a bunch of custom cryptography, enabling a range of attacks ranging from denial of service to extracting plaintext from encrypted messages. In response to criticism Bridgefy replaced their custom cryptographic protocol with libsignal, but that didn't fix everything. One issue is the potential for MITMing - keys are shared on first communication, but the client provided no mechanism to verify those keys, so a hostile actor could pretend to be a user, receive messages intended for that user, and then reencrypt them with the user's actual key. This isn't a weakness in libsignal, in the same way that the ability to add a custom certificate authority to a browser's trust store isn't a weakness in TLS. In Signal the app key distribution is all handled via Signal's servers, so if you're just using libsignal you need to implement the equivalent yourself.

The other issue was more subtle. libsignal has no awareness at all of the Bluetooth transport layer. Deciding where to send a message is up to the client, and these routing messages were spoofable. Any phone in the mesh could say "Send messages for Bob here", and other phones would do so. This should have been a denial of service at worst, since the messages for Bob would still be encrypted with Bob's key, so the attacker would be able to prevent Bob from receiving the messages but wouldn't be able to decrypt them. However, the code to decide where to send the message and the code to decide which key to encrypt the message with were separate, and the routing decision was made before the encryption key decision. An attacker could send a message saying "Route messages for Bob to me", and then another saying "Actually lol no I'm Mallory". If a message was sent between those two messages, the message intended for Bob would be delivered to Mallory's phone and encrypted with Mallory's key.

Again, this isn't a libsignal issue. libsignal encrypted the message using the key bundle it was told to encrypt it with, but the client code gave it a key bundle corresponding to the wrong user. A race condition in the client logic allowed messages intended for one person to be delivered to and readable by another.

This isn't the only case where client code has used libsignal poorly. The Bond Touch is a Bluetooth-connected bracelet that you wear. Tapping it or drawing gestures sends a signal to your phone, which culminates in a message being sent to someone else's phone which sends a signal to their bracelet, which then vibrates and glows in order to indicate a specific sentiment. The idea is that you can send brief indications of your feelings to someone you care about by simply tapping on your wrist, and they can know what you're thinking without having to interrupt whatever they're doing at the time. It's kind of sweet in a way that I'm not, but it also advertised "Private Spaces", a supposedly secure way to send chat messages and pictures, and that seemed more interesting. I grabbed the app and disassembled it, and found it was using libsignal. So I bought one and played with it, including dumping the traffic from the app. One important thing to realise is that libsignal is just the protocol library - it doesn't implement a server, and so you still need some way to get information between clients. And one of the bits of information you have to get between clients is the public key material.

Back when I played with this earlier this year, key distribution was implemented by uploading the public key to a database. The other end would download the public key, and everything works out fine. And this doesn't sound like a problem, given that the entire point of a public key is to be, well, public. Except that there was no access control on this database, and the filenames were simply phone numbers, so you could overwrite anyone's public key with one of your choosing. This didn't let you cause messages intended for them to be delivered to you, so exploiting this for anything other than a DoS would require another vulnerability somewhere, but there are contrived situations where this would potentially allow the privacy expectations to be broken.

Another issue with this app was its handling of one-time prekeys. When you send someone new a message via Signal, it's encrypted with a key derived from not only the recipient's identity key, but also from what's referred to as a "one-time prekey". Users generate a bunch of keypairs and upload the public half to the server. When you want to send a message to someone, you ask the server for one of their one-time prekeys and use that. Decrypting this message requires using the private half of the one-time prekey, and the recipient deletes it afterwards. This means that an attacker who intercepts a bunch of encrypted messages over the network and then later somehow obtains the long-term keys still won't be able to decrypt the messages, since they depended on keys that no longer exist. Since these one-time prekeys are only supposed to be used once (it's in the name!) there's a risk that they can all be consumed before they're replenished. The spec regarding pre-keys says that servers should consider rate-limiting this, but the protocol also supports falling back to just not using one-time prekeys if they're exhausted (you lose the forward secrecy benefits, but it's still end-to-end encrypted). This implementation not only implemented no rate-limiting, making it easy to exhaust the one-time prekeys, it then also failed to fall back to running without them. Another easy way to force DoS.

(And, remember, a successful DoS on an encrypted communications channel potentially results in the users falling back to an unencrypted communications channel instead. DoS may not break the encrypted protocol, but it may be sufficient to obtain plaintext anyway)

And finally, there's ClearSignal. I looked at this earlier this year - it's avoided many of these pitfalls by literally just being a modified version of the official Signal client and using the existing Signal servers (it's even interoperable with Actual Signal), but it's then got a bunch of other weirdness. The Signal database (I /think/ including the keys, but I haven't completely verified that) gets backed up to an AWS S3 bucket, identified using something derived from a key using KERI, and I've seen no external review of that whatsoever. So, who knows. It also has crash reporting enabled, and it's unclear how much internal state it sends on crashes, and it's also based on an extremely old version of Signal with the "You need to upgrade Signal" functionality disabled.

Three clients all using libsignal in one form or another, and three clients that do things wrong in ways that potentially have a privacy impact. Again, none of these issues are down to issues with libsignal, they're all in the code that surrounds it. And remember that Twitter probably has to worry about other issues as well! If I lose my phone I'm probably not going to worry too much about whether the messages sent through my weird bracelet app being gone forever, but losing all my Twitter DMs would be a significant change in behaviour from the status quo. But that's not an easy thing to do when you're not supposed to have access to any keys! Group chats? That's another significant problem to deal with. And making the messages readable through the web UI as well as on mobile means dealing with another set of key distribution issues. Get any of this wrong in one way and the user experience doesn't line up with expectations, get it wrong in another way and the worst case involves some of your users in countries with poor human rights records being executed.

Simply building something on top of libsignal doesn't mean it's secure. If you want meaningful functionality you need to build a lot of infrastructure around libsignal, and doing that well involves not just competent development and UX design, but also a strong understanding of security and cryptography. Given Twitter's lost most of their engineering and is led by someone who's alienated all the cryptographers I know, I wouldn't be optimistic.

comment count unavailable comments

Making an Orbic Speed RC400L autoboot when USB power is attached

Posted by Matthew Garrett on December 07, 2022 06:33 AM
As I mentioned a couple of weeks ago, I've been trying to hack an Orbic Speed RC400L mobile hotspot so it'll automatically boot when power is attached. When plugged in it would flash a "Welcome" screen and then switch to a display showing the battery charging - it wouldn't show up on USB, and didn't turn on any networking. So, my initial assumption was that the bootloader was making a policy decision not to boot Linux. After getting root (as described in the previous post), I was able to cat /proc/mtd and see that partition 7 was titled "aboot". Aboot is a commonly used Android bootloader, based on Little Kernel - LK provides the hardware interface, aboot is simply an app that runs on top of it. I was able to find the source code for Quectel's aboot, which is intended to run on the same SoC that's in this hotspot, so it was relatively easy to line up a bunch of the Ghidra decompilation with actual source (top tip: find interesting strings in your decompilation and paste them into github search, and see whether you get a repo back).

Unfortunately looking through this showed various cases where bootloader policy decisions were made, but all of them seemed to result in Linux booting. Patching them and flashing the patched loader back to the hotspot didn't change the behaviour. So now I was confused: it seemed like Linux was loading, but there wasn't an obvious point in the boot scripts where it then decided not to do stuff. No boot logs were retained between boots, which made things even more annoying. But then I realised that, well, I have root - I can just do my own logging. I hacked in an additional init script to dump dmesg to /var, powered it down, and then plugged in a USB cable. It booted to the charging screen. I hit the power button and it booted fully, appearing on USB. I adb shelled in, checked the logs, and saw that it had booted twice. So, we were definitely entering Linux before showing the charging screen. But what was the difference?

Diffing the dmesg showed that there was a major distinction on the kernel command line. The kernel command line is data populated by the bootloader and then passed to the kernel - it's how you provide arguments that alter kernel behaviour without having to recompile it, but it's also exposed to userland by the running kernel so it also serves as a way for the bootloader to pass information to the running userland. The boot that resulted in the charging screen had a androidboot.poweronreason=USB argument, the one that booted fully had androidboot.poweronreason=PWRKEY. Searching the filesystem for androidboot.poweronreason showed that the script that configures USB did not enable USB if poweronreason was USB, and the same string also showed up in a bunch of other applications. The bootloader was always booting Linux, but it was telling Linux why it had booted, and if that reason was "USB" then Linux was choosing not to enable USB and not starting the networking stack.

One approach would be to modify every application that parsed this data and make it work even if the power on reason was "USB". That would have been tedious. It seemed easier to modify the bootloader. Looking for that string in Ghidra showed that it was reading a register from the power management controller and then interpreting that to determine the reason it had booted. In effect, it was doing something like:
boot_reason = read_pmic_boot_reason();
switch(boot_reason) {
case 0x10:
  bootparam = strdup("androidboot.poweronreason=PWRKEY");
case 0x20:
  bootparam = strdup("androidboot.poweronreason=USB");
  bootparam = strdup("androidboot.poweronreason=Hard_Reset");
Changing the 0x20 to 0xff meant that the USB case would never be detected, and it would fall through to the default. All the userland code was happy to accept "Hard_Reset" as a legitimate reason to boot, and now plugging in USB results in the modem booting to a functional state. Woo.

If you want to do this yourself, dump the aboot partition from your device, and search for the hex sequence "03 02 00 0a 20"". Change the final 0x20 to 0xff, copy it back to the device, and write it to mtdblock7. If it doesn't work, feel free to curse me, but I'm almost certainly going to be no use to you whatsoever. Also, please, do not just attempt to mechanically apply this to other hotspots. It's not going to end well.

comment count unavailable comments