Fedora desktop Planet

Digital forgeries are hard

Posted by Matthew Garrett on March 14, 2024 09:11 AM
Closing arguments in the trial between various people and Craig Wright over whether he's Satoshi Nakamoto are wrapping up today, amongst a bewildering array of presented evidence. But one utterly astonishing aspect of this lawsuit is that expert witnesses for both sides agreed that much of the digital evidence provided by Craig Wright was unreliable in one way or another, generally including indications that it wasn't produced at the point in time it claimed to be. And it's fascinating reading through the subtle (and, in some cases, not so subtle) ways that that's revealed.

One of the pieces of evidence entered is screenshots of data from Mind Your Own Business, a business management product that's been around for some time. Craig Wright relied on screenshots of various entries from this product to support his claims around having controlled meaningful number of bitcoin before he was publicly linked to being Satoshi. If these were authentic then they'd be strong evidence linking him to the mining of coins before Bitcoin's public availability. Unfortunately the screenshots themselves weren't contemporary - the metadata shows them being created in 2020. This wouldn't fundamentally be a problem (it's entirely reasonable to create new screenshots of old material), as long as it's possible to establish that the material shown in the screenshots was created at that point. Sadly, well.

One part of the disclosed information was an email that contained a zip file that contained a raw database in the format used by MYOB. Importing that into the tool allowed an audit record to be extracted - this record showed that the relevant entries had been added to the database in 2020, shortly before the screenshots were created. This was, obviously, not strong evidence that Craig had held Bitcoin in 2009. This evidence was reported, and was responded to with a couple of additional databases that had an audit trail that was consistent with the dates in the records in question. Well, partially. The audit record included session data, showing an administrator logging into the data base in 2011 and then, uh, logging out in 2023, which is rather more consistent with someone changing their system clock to 2011 to create an entry, and switching it back to present day before logging out. In addition, the audit log included fields that didn't exist in versions of the product released before 2016, strongly suggesting that the entries dated 2009-2011 were created in software released after 2016. And even worse, the order of insertions into the database didn't line up with calendar time - an entry dated before another entry may appear in the database afterwards, indicating that it was created later. But even more obvious? The database schema used for these old entries corresponded to a version of the software released in 2023.

This is all consistent with the idea that these records were created after the fact and backdated to 2009-2011, and that after this evidence was made available further evidence was created and backdated to obfuscate that. In an unusual turn of events, during the trial Craig Wright introduced further evidence in the form of a chain of emails to his former lawyers that indicated he had provided them with login details to his MYOB instance in 2019 - before the metadata associated with the screenshots. The implication isn't entirely clear, but it suggests that either they had an opportunity to examine this data before the metadata suggests it was created, or that they faked the data? So, well, the obvious thing happened, and his former lawyers were asked whether they received these emails. The chain consisted of three emails, two of which they confirmed they'd received. And they received a third email in the chain, but it was different to the one entered in evidence. And, uh, weirdly, they'd received a copy of the email that was submitted - but they'd received it a few days earlier. In 2024.

And again, the forensic evidence is helpful here! It turns out that the email client used associates a timestamp with any attachments, which in this case included an image in the email footer - and the mysterious time travelling email had a timestamp in 2024, not 2019. This was created by the client, so was consistent with the email having been sent in 2024, not being sent in 2019 and somehow getting stuck somewhere before delivery. The date header indicates 2019, as do encoded timestamps in the MIME headers - consistent with the mail being sent by a computer with the clock set to 2019.

But there's a very weird difference between the copy of the email that was submitted in evidence and the copy that was located afterwards! The first included a header inserted by gmail that included a 2019 timestamp, while the latter had a 2024 timestamp. Is there a way to determine which of these could be the truth? It turns out there is! The format of that header changed in 2022, and the version in the email is the new version. The version with the 2019 timestamp is anachronistic - the format simply doesn't match the header that gmail would have introduced in 2019, suggesting that an email sent in 2022 or later was modified to include a timestamp of 2019.

This is by no means the only indication that Craig Wright's evidence may be misleading (there's the whole argument that the Bitcoin white paper was written in LaTeX when general consensus is that it's written in OpenOffice, given that's what the metadata claims), but it's a lovely example of a more general issue.

Our technology chains are complicated. So many moving parts end up influencing the content of the data we generate, and those parts develop over time. It's fantastically difficult to generate an artifact now that precisely corresponds to how it would look in the past, even if we go to the effort of installing an old OS on an old PC and setting the clock appropriately (are you sure you're going to be able to mimic an entirely period appropriate patch level?). Even the version of the font you use in a document may indicate it's anachronistic. I'm pretty good at computers and I no longer have any belief I could fake an old document.

(References: this Dropbox, under "Expert reports", "Patrick Madden". Initial MYOB data is in "Appendix PM7", further analysis is in "Appendix PM42", email analysis is "Sixth Expert Report of Mr Patrick Madden")

comment count unavailable comments

Enforcing a touchscreen mapping in GNOME

Posted by Peter Hutterer on March 12, 2024 04:33 AM

Touchscreens are quite prevalent by now but one of the not-so-hidden secrets is that they're actually two devices: the monitor and the actual touch input device. Surprisingly, users want the touch input device to work on the underlying monitor which means your desktop environment needs to somehow figure out which of the monitors belongs to which touch input device. Often these two devices come from two different vendors, so mutter needs to use ... */me holds torch under face* .... HEURISTICS! :scary face:

Those heuristics are actually quite simple: same vendor/product ID? same dimensions? is one of the monitors a built-in one? [1] But unfortunately in some cases those heuristics don't produce the correct result. In particular external touchscreens seem to be getting more common again and plugging those into a (non-touch) laptop means you usually get that external screen mapped to the internal display.

Luckily mutter does have a configuration to it though it is not exposed in the GNOME Settings (yet). But you, my $age $jedirank, can access this via a commandline interface to at least work around the immediate issue. But first: we need to know the monitor details and you need to know about gsettings relocatable schemas.

Finding the right monitor information is relatively trivial: look at $HOME/.config/monitors.xml and get your monitor's vendor, product and serial from there. e.g. in my case this is:

  <monitors version="2">
   <configuration>
    <logicalmonitor>
      <x>0</x>
      <y>0</y>
      <scale>1</scale>
      <monitor>
        <monitorspec>
          <connector>DP-2</connector>
          <vendor>DEL</vendor>              <--- this one
          <product>DELL S2722QC</product>   <--- this one
          <serial>59PKLD3</serial>          <--- and this one
        </monitorspec>
        <mode>
          <width>3840</width>
          <height>2160</height>
          <rate>59.997</rate>
        </mode>
      </monitor>
    </logicalmonitor>
    <logicalmonitor>
      <x>928</x>
      <y>2160</y>
      <scale>1</scale>
      <primary>yes</primary>
      <monitor>
        <monitorspec>
          <connector>eDP-1</connector>
          <vendor>IVO</vendor>
          <product>0x057d</product>
          <serial>0x00000000</serial>
        </monitorspec>
        <mode>
          <width>1920</width>
          <height>1080</height>
          <rate>60.010</rate>
        </mode>
      </monitor>
    </logicalmonitor>
  </configuration>
</monitors>
  
Well, so we know the monitor details we want. Note there are two monitors listed here, in this case I want to map the touchscreen to the external Dell monitor. Let's move on to gsettings.

gsettings is of course the configuration storage wrapper GNOME uses (and the CLI tool with the same name). GSettings follow a specific schema, i.e. a description of a schema name and possible keys and values for each key. You can list all those, set them, look up the available values, etc.:


    $ gsettings list-recursively
    ... lots of output ...
    $ gsettings set org.gnome.desktop.peripherals.touchpad click-method 'areas'
    $ gsettings range org.gnome.desktop.peripherals.touchpad click-method
    enum
    'default'
    'none'
    'areas'
    'fingers'
  
Now, schemas work fine as-is as long as there is only one instance. Where the same schema is used for different devices (like touchscreens) we use a so-called "relocatable schema" and that requires also specifying a path - and this is where it gets tricky. I'm not aware of any functionality to get the specific path for a relocatable schema so often it's down to reading the source. In the case of touchscreens, the path includes the USB vendor and product ID (in lowercase), e.g. in my case the path is:
  /org/gnome/desktop/peripherals/touchscreens/04f3:2d4a/
In your case you can get the touchscreen details from lsusb, libinput record, /proc/bus/input/devices, etc. Once you have it, gsettings takes a schema:path argument like this:
  $ gsettings list-recursively org.gnome.desktop.peripherals.touchscreen:/org/gnome/desktop/peripherals/touchscreens/04f3:2d4a/
  org.gnome.desktop.peripherals.touchscreen output ['', '', '']
Looks like the touchscreen is bound to no monitor. Let's bind it with the data from above:
 
   $ gsettings set org.gnome.desktop.peripherals.touchscreen:/org/gnome/desktop/peripherals/touchscreens/04f3:2d4a/ output "['DEL', 'DELL S2722QC', '59PKLD3']"
Note the quotes so your shell doesn't misinterpret things.

And that's it. Now I have my internal touchscreen mapped to my external monitor which makes no sense at all but shows that you can map a touchscreen to any screen if you want to.

[1] Probably the one that most commonly takes effect since it's the vast vast majority of devices

Debugging an odd inability to stream video

Posted by Matthew Garrett on February 19, 2024 10:30 PM
We have a cabin out in the forest, and when I say "out in the forest" I mean "in a national forest subject to regulation by the US Forest Service" which means there's an extremely thick book describing the things we're allowed to do and (somewhat longer) not allowed to do. It's also down in the bottom of a valley surrounded by tall trees (the whole "forest" bit). There used to be AT&T copper but all that infrastructure burned down in a big fire back in 2021 and AT&T no longer supply new copper links, and Starlink isn't viable because of the whole "bottom of a valley surrounded by tall trees" thing along with regulations that prohibit us from putting up a big pole with a dish on top. Thankfully there's LTE towers nearby, so I'm simply using cellular data. Unfortunately my provider rate limits connections to video streaming services in order to push them down to roughly SD resolution. The easy workaround is just to VPN back to somewhere else, which in my case is just a Wireguard link back to San Francisco.

This worked perfectly for most things, but some streaming services simply wouldn't work at all. Attempting to load the video would just spin forever. Running tcpdump at the local end of the VPN endpoint showed a connection being established, some packets being exchanged, and then… nothing. The remote service appeared to just stop sending packets. Tcpdumping the remote end of the VPN showed the same thing. It wasn't until I looked at the traffic on the VPN endpoint's external interface that things began to become clear.

This probably needs some background. Most network infrastructure has a maximum allowable packet size, which is referred to as the Maximum Transmission Unit or MTU. For ethernet this defaults to 1500 bytes, and these days most links are able to handle packets of at least this size, so it's pretty typical to just assume that you'll be able to send a 1500 byte packet. But what's important to remember is that that doesn't mean you have 1500 bytes of packet payload - that 1500 bytes includes whatever protocol level headers are on there. For TCP/IP you're typically looking at spending around 40 bytes on the headers, leaving somewhere around 1460 bytes of usable payload. And if you're using a VPN, things get annoying. In this case the original packet becomes the payload of a new packet, which means it needs another set of TCP (or UDP) and IP headers, and probably also some VPN header. This still all needs to fit inside the MTU of the link the VPN packet is being sent over, so if the MTU of that is 1500, the effective MTU of the VPN interface has to be lower. For Wireguard, this works out to an effective MTU of 1420 bytes. That means simply sending a 1500 byte packet over a Wireguard (or any other VPN) link won't work - adding the additional headers gives you a total packet size of over 1500 bytes, and that won't fit into the underlying link's MTU of 1500.

And yet, things work. But how? Faced with a packet that's too big to fit into a link, there are two choices - break the packet up into multiple smaller packets ("fragmentation") or tell whoever's sending the packet to send smaller packets. Fragmentation seems like the obvious answer, so I'd encourage you to read Valerie Aurora's article on how fragmentation is more complicated than you think. tl;dr - if you can avoid fragmentation then you're going to have a better life. You can explicitly indicate that you don't want your packets to be fragmented by setting the Don't Fragment bit in your IP header, and then when your packet hits a link where your packet exceeds the link MTU it'll send back a packet telling the remote that it's too big, what the actual MTU is, and the remote will resend a smaller packet. This avoids all the hassle of handling fragments in exchange for the cost of a retransmit the first time the MTU is exceeded. It also typically works these days, which wasn't always the case - people had a nasty habit of dropping the ICMP packets telling the remote that the packet was too big, which broke everything.

What I saw when I tcpdumped on the remote VPN endpoint's external interface was that the connection was getting established, and then a 1500 byte packet would arrive (this is kind of the behaviour you'd expect for video - the connection handshaking involves a bunch of relatively small packets, and then once you start sending the video stream itself you start sending packets that are as large as possible in order to minimise overhead). This 1500 byte packet wouldn't fit down the Wireguard link, so the endpoint sent back an ICMP packet to the remote telling it to send smaller packets. The remote should then have sent a new, smaller packet - instead, about a second after sending the first 1500 byte packet, it sent that same 1500 byte packet. This is consistent with it ignoring the ICMP notification and just behaving as if the packet had been dropped.

All the services that were failing were failing in identical ways, and all were using Fastly as their CDN. I complained about this on social media and then somehow ended up in contact with the engineering team responsible for this sort of thing - I sent them a packet dump of the failure, they were able to reproduce it, and it got fixed. Hurray!

(Between me identifying the problem and it getting fixed I was able to work around it. The TCP header includes a Maximum Segment Size (MSS) field, which indicates the maximum size of the payload for this connection. iptables allows you to rewrite this, so on the VPN endpoint I simply rewrote the MSS to be small enough that the packets would fit inside the Wireguard MTU. This isn't a complete fix since it's done at the TCP level rather than the IP level - so any large UDP packets would still end up breaking)

I've no idea what the underlying issue was, and at the client end the failure was entirely opaque: the remote simply stopped sending me packets. The only reason I was able to debug this at all was because I controlled the other end of the VPN as well, and even then I wouldn't have been able to do anything about it other than being in the fortuitous situation of someone able to do something about it seeing my post. How many people go through their lives dealing with things just being broken and having no idea why, and how do we fix that?

(Edit: thanks to this comment, it sounds like the underlying issue was a kernel bug that Fastly developed a fix for - under certain configurations, the kernel fails to associate the MTU update with the egress interface and so it continues sending overly large packets)

comment count unavailable comments

coverity 2022.6.0 and LibreOffice

Posted by Caolán McNamara on February 11, 2024 06:02 PM


After a long slog since November when the previous version of coverity was EOLed and we had to start using 2022.6.0 with its new suggestions for std::move etc, LibreOffice is now finally back to a 0 warnings coverity state

New and old apps on Flathub

Posted by Bastien Nocera on February 09, 2024 02:44 PM

3D Printing Slicers

 I recently replaced my Flashforge Adventurer 3 printer that I had been using for a few years as my first printer with a BambuLab X1 Carbon, wanting a printer that was not a “project” so I could focus on modelling and printing. It's an investment, but my partner convinced me that I was using the printer often enough to warrant it, and told me to look out for Black Friday sales, which I did.

The hardware-specific slicer, Bambu Studio, was available for Linux, but only as an AppImage, with many people reporting crashes on startup, non-working video live view, and other problems that the hardware maker tried to work-around by shipping separate AppImage variants for Ubuntu and Fedora.

After close to 150 patches to the upstream software (which, in hindsight, I could probably have avoided by compiling the C++ code with LLVM), I manage to “flatpak” the application and make it available on Flathub. It's reached 3k installs in about a month, which is quite a bit for a niche piece of software.

Note that if you click the “Donate” button on the Flathub page, it will take you a page where you can feed my transformed fossil fuel addiction buy filament for repairs and printing perfectly fitting everyday items, rather than bulk importing them from the other side of the planet.

Screenshot
 

Preparing a Game Gear consoliser shell

I will continue to maintain the FlashPrint slicer for FlashForge printers, installed by nearly 15k users, although I enabled automated updates now, and will not be updating the release notes, which required manual intervention.

FlashForge have unfortunately never answered my queries about making this distribution of their software official (and fixing the crash when using a VPN...).

 Rhythmbox

As I was updating the Rhythmbox Flatpak on Flathub, I realised that it just reached 250k installs, which puts the number of installations of those 3D printing slicers above into perspective.

rhythmbox-main-window.png 

The updated screenshot used on Flathub

Congratulations, and many thanks, to all the developers that keep on contributing to this very mature project, especially Jonathan Matthew who's been maintaining the app since 2008.

fwupd: Auto-Quitting On Idle, Harder

Posted by Richard Hughes on February 05, 2024 01:25 PM

In fwupd 1.9.12 and earlier we had the following auto-quit behavior: Auto-quit on idle after 2 hours, unless:

  • Any thunderbolt controller, thunderbolt retimer or synaptics-mst devices exist.

These devices are both super slow to query and also use battery power to query as you have to power on various hungry things and then power them down to query for the current firmware version.

In 19.13, due to be released in a few days time, we now: Auto-quit after 5 minutes, unless:

  • Any thunderbolt controller, thunderbolt retimer or synaptics-mst devices exist.
  • Any D-Bus client (that used or is using fwupd) is still alive, which includes gnome-software if it’s running in the background of the GNOME session
  • The daemon took more than 500ms to start – on the logic it’s okay to wait 0.5 seconds on the CLI to get results to a query, but we don’t want to be waiting tens of seconds to check for updates on a deeply nested USB hub devices.

The tl;dr: is that most laptop and desktop machines have Thunderbolt or MST devices, and so they already had fwupd running all the time before, and continue to have it running all the time now. Trading 3.3MB of memory and an extra process for instant queries on a machine with GBs of memory is probably worthwhile. For embedded machines like IoT devices, and for containers (that are using fwupd to update things like the dbx) fwupd was probably starting and then quitting after 2h before, and now fwupd is only going to be alive for 5 minutes before quitting.

If any of the thresholds (500 ms) or timeouts (5 mins) are offensive to you then it’s all configurable, see man fwupd.conf for details. Comments welcome.

Re: New responsibilities

Posted by Bastien Nocera on January 31, 2024 11:33 AM

 A few months have passed since New Responsibilities was posted, so I thought I would provide an update.

Projects Maintenance

Of all the freedesktop projects I created and maintained, only one doesn't have a new maintainer, low-memory-monitor.

This daemon is what the GMemoryMonitor GLib API is based on, so it can't be replaced trivially. Efforts seem to be under way to replace it with systemd APIs.

As for the other daemons:

(As an aside, there's posturing towards replacing power-profiles-daemon with tuned in Fedora. I would advise stakeholders to figure out whether having a large Python script in the boot hot path is a good idea, taking a look at bootcharts, and then thinking about whether hardware manufacturers would be able to help with supporting a tool with so many moving parts. Useful for tinkering, not for shipping in a product)

Updated responsibilities

Since mid-August, I've joined the Platform Enablement Team. Right now, I'm helping out with maintenance of the Bluetooth kernel stack in RHEL (and thus CentOS).

The goal is to eventually pivot to hardware enablement, which is likely to involve backporting and testing, more so than upstream enablement. This is currently dependent on attending some formal kernel development (and debugging) training sessions which should make it easier to see where my hodge-podge kernel knowledge stands.

Blog backlog

Before being moved to a different project, and apart from the usual and very time-consuming bug triage, user support and project maintenance, I also worked on a few new features. I have a few posts planned that will lay that out.

New gitlab.freedesktop.org 🚯 emoji-based spamfighting abilities

Posted by Peter Hutterer on January 29, 2024 07:58 AM

This is a follow-up from our Spam-label approach, but this time with MOAR EMOJIS because that's what the world is turning into.

Since March 2023 projects could apply the "Spam" label on any new issue and have a magic bot come in and purge the user account plus all issues they've filed, see the earlier post for details. This works quite well and gives every project member the ability to quickly purge spam. Alas, pesky spammers are using other approaches to trick google into indexing their pork [1] (because at this point I think all this crap is just SEO spam anyway). Such as commenting on issues and merge requests. We can't apply labels to comments, so we found a way to work around that: emojis!

In GitLab you can add "reactions" to issue/merge request/snippet comments and in recent GitLab versions you can register for a webhook to be notified when that happens. So what we've added to the gitlab.freedesktop.org instance is support for the :do_not_litter: (🚯) emoji [2] - if you set that on an comment the author of said comment will be blocked and the comment content will be removed. After some safety checks of course, so you can't just go around blocking everyone by shotgunning emojis into gitlab. Unlike the "Spam" label this does not currently work recursively so it's best to report the user so admins can purge them properly - ideally before setting the emoji so the abuse report contains the actual spam comment instead of the redacted one. Also note that there is a 30 second grace period to quickly undo the emoji if you happen to set it accidentally.

Note that for purging issues, the "Spam" label is still required, the emojis only work for comments.

Happy cleanup!

[1] or pork-ish
[2] Benjamin wanted to use :poop: but there's a chance that may get used for expressing disagreement with the comment in question

A re-introduction to mkosi -- A Tool for Generating OS Images

Posted by Lennart Poettering on January 09, 2024 11:00 PM

This is a guest post written by Daan De Meyer, systemd and mkosi maintainer

Almost 7 years ago, Lennart first wrote about mkosi on this blog. Some years ago, I took over development and there's been a huge amount of changes and improvements since then. So I figure this is a good time to re-introduce mkosi.

mkosi stands for Make Operating System Image. It generates OS images that can be used for a variety of purposes.

If you prefer watching a video over reading a blog post, you can also watch my presentation on mkosi at All Systems Go 2023.

What is mkosi?

mkosi was originally written as a tool to simplify hacking on systemd and for experimenting with images using many of the new concepts being introduced in systemd at the time. In the meantime, it has evolved into a general purpose image builder that can be used in a multitude of scenarios.

Instructions to install mkosi can be found in its readme. We recommend running the latest version to take advantage of all the latest features and bug fixes. You'll also need bubblewrap and the package manager of your favorite distribution to get started.

At its core, the workflow of mkosi can be divided into 3 steps:

  1. Generate an OS tree for some distribution by installing a set of packages.
  2. Package up that OS tree in a variety of output formats.
  3. (Optionally) Boot the resulting image in qemu or systemd-nspawn.

Images can be built for any of the following distributions:

  • Fedora Linux
  • Ubuntu
  • OpenSUSE
  • Debian
  • Arch Linux
  • CentOS Stream
  • RHEL
  • Rocky Linux
  • Alma Linux

And the following output formats are supported:

  • GPT disk images built with systemd-repart
  • Tar archives
  • CPIO archives (for building initramfs images)
  • USIs (Unified System Images which are full OS images packed in a UKI)
  • Sysext, confext and portable images
  • Directory trees

For example, to build an Arch Linux GPT disk image and boot it in qemu, you can run the following command:

$ mkosi -d arch -p systemd -p udev -p linux -t disk qemu

To instead boot the image in systemd-nspawn, replace qemu with boot:

$ mkosi -d arch -p systemd -p udev -p linux -t disk boot

The actual image can be found in the current working directory named image.raw. However, using a separate output directory is recommended which is as simple as running mkdir mkosi.output.

To rebuild the image after it's already been built once, add -f to the command line before the verb to rebuild the image. Any arguments passed after the verb are forwarded to either systemd-nspawn or qemu itself. To build the image without booting it, pass build instead of boot or qemu or don't pass a verb at all.

By default, the disk image will have an appropriately sized root partition and an ESP partition, but the partition layout and contents can be fully customized using systemd-repart by creating partition definition files in mkosi.repart/. This allows you to customize the partition as you see fit:

  • The root partition can be encrypted.
  • Partition sizes can be customized.
  • Partitions can be protected with signed dm-verity.
  • You can opt out of having a root partition and only have a /usr partition instead.
  • You can add various other partitions, e.g. an XBOOTLDR partition or a swap partition.
  • ...

As part of building the image, we'll run various tools such as systemd-sysusers, systemd-firstboot, depmod, systemd-hwdb and more to make sure the image is set up correctly.

Configuring mkosi image builds

Naturally with extended use you don't want to specify all settings on the command line every time, so mkosi supports configuration files where the same settings that can be specified on the command line can be written down.

For example, the command we used above can be written down in a configuration file mkosi.conf:

[Distribution]
Distribution=arch

[Output]
Format=disk

[Content]
Packages=
        systemd
        udev
        linux

Like systemd, mkosi uses INI configuration files. We also support dropins which can be placed in mkosi.conf.d. Configuration files can also be conditionalized using the [Match] section. For example, to only install a specific package on Arch Linux, you can write the following to mkosi.conf.d/10-arch.conf:

[Match]
Distribution=arch

[Content]
Packages=pacman

Because not everything you need will be supported in mkosi, we support running scripts at various points during the image build process where all extra image customization can be done. For example, if it is found, mkosi.postinst is called after packages have been installed. Scripts are executed on the host system by default (in a sandbox), but can be executed inside the image by suffixing the script with .chroot, so if mkosi.postinst.chroot is found it will be executed inside the image.

To add extra files to the image, you can place them in mkosi.extra in the source directory and they will be automatically copied into the image after packages have been installed.

Bootable images

If the necessary packages are installed, mkosi will automatically generate a UEFI/BIOS bootable image. As mkosi is a systemd project, it will always build UKIs (Unified Kernel Images), except if the image is BIOS-only (since UKIs cannot be used on BIOS). The initramfs is built like a regular image by installing distribution packages and packaging them up in a CPIO archive instead of a disk image. Specifically, we do not use dracut, mkinitcpio or initramfs-tools to generate the initramfs from the host system. ukify is used to assemble all the individual components into a UKI.

If you don't want mkosi to generate a bootable image, you can set Bootable=no to explicitly disable this logic.

Using mkosi for development

The main requirements to use mkosi for development is that we can build our source code against the image we're building and install it into the image we're building. mkosi supports this via build scripts. If a script named mkosi.build (or mkosi.build.chroot) is found, we'll execute it as part of the build. Any files put by the build script into $DESTDIR will be installed into the image. Required build dependencies can be installed using the BuildPackages= setting. These packages are installed into an overlay which is put on top of the image when running the build script so the build packages are available when running the build script but don't end up in the final image.

An example mkosi.build.chroot script for a project using meson could look as follows:

#!/bin/sh
meson setup "$BUILDDIR" "$SRCDIR"
ninja -C "$BUILDDIR"
if ((WITH_TESTS)); then
    meson test -C "$BUILDDIR"
fi
meson install -C "$BUILDDIR"

Now, every time the image is built, the build script will be executed and the results will be installed into the image.

The $BUILDDIR environment variable points to a directory that can be used as the build directory for build artifacts to allow for incremental builds if the build system supports it.

Of course, downloading all packages from scratch every time and re-installing them again every time the image is built is rather slow, so mkosi supports two modes of caching to speed things up.

The first caching mode caches all downloaded packages so they don't have to be downloaded again on subsequent builds. Enabling this is as simple as running mkdir mkosi.cache.

The second mode of caching caches the image after all packages have been installed but before running the build script. On subsequent builds, mkosi will copy the cache instead of reinstalling all packages from scratch. This mode can be enabled using the Incremental= setting. While there is some rudimentary cache invalidation, the cache can also forcibly be rebuilt by specifying -ff on the command line instead of -f.

Note that when running on a btrfs filesystem, mkosi will automatically use subvolumes for the cached images which can be snapshotted on subsequent builds for even faster rebuilds. We'll also use reflinks to do copy-on-write copies where possible.

With this setup, by running mkosi -f qemu in the systemd repository, it takes about 40 seconds to go from a source code change to a root shell in a virtual machine running the latest systemd with your change applied. This makes it very easy to test changes to systemd in a safe environment without risk of breaking your host system.

Of course, while 40 seconds is not a very long time, it's still more than we'd like, especially if all we're doing is modifying the kernel command line. That's why we have the KernelCommandLineExtra= option to configure kernel command line options that are passed to the container or virtual machine at runtime instead of being embedded into the image. These extra kernel command line options are picked up when the image is booted with qemu's direct kernel boot (using -append), but also when booting a disk image in UEFI mode (using SMBIOS). The same applies to systemd credentials (using the Credentials= setting). These settings allow configuring the image without having to rebuild it, which means that you only have to run mkosi qemu or mkosi boot again afterwards to apply the new settings.

Building images without root privileges and loop devices

By using newuidmap/newgidmap and systemd-repart, mkosi is able to build images without needing root privileges. As long as proper subuid and subgid mappings are set up for your user in /etc/subuid and /etc/subgid, you can run mkosi as your regular user without having to switch to root.

Note that as of the writing of this blog post this only applies to the build and qemu verbs. Booting the image in a systemd-nspawn container with mkosi boot still needs root privileges. We're hoping to fix this in an future systemd release.

Regardless of whether you're running mkosi with root or without root, almost every tool we execute is invoked in a sandbox to isolate as much of the build process from the host as possible. For example, /etc and /var from the host are not available in this sandbox, to avoid host configuration inadvertently affecting the build.

Because systemd-repart can build disk images without loop devices, mkosi can run from almost any environment, including containers. All that's needed is a UID range with 65536 UIDs available, either via running as the root user or via /etc/subuid and newuidmap. In a future systemd release, we're hoping to provide an alternative to newuidmap and /etc/subuid to allow running mkosi from all containers, even those with only a single UID available.

Supporting older distributions

mkosi depends on very recent versions of various systemd tools (v254 or newer). To support older distributions, we implemented so called tools trees. In short, mkosi can first build a tools image for you that contains all required tools to build the actual image. This can be enabled by adding ToolsTree=default to your mkosi configuration. Building a tools image does not require a recent version of systemd.

In the systemd mkosi configuration, we automatically use a tools tree if we detect your distribution does not have the minimum required systemd version installed.

Configuring variants of the same image using profiles

Profiles can be defined in the mkosi.profiles/ directory. The profile to use can be selected using the Profile= setting (or --profile=) on the command line. A profile allows you to bundle various settings behind a single recognizable name. Profiles can also be matched on if you want to apply some settings only to a few profiles.

For example, you could have a bootable profile that sets Bootable=yes, adds the linux and systemd-boot packages and configures Format=disk to end up with a bootable disk image when passing --profile bootable on the kernel command line.

Building system extension images

System extension images may – dynamically at runtime — extend the base system with an overlay containing additional files.

To build system extensions with mkosi, we need a base image on top of which we can build our extension.

To keep things manageable, we'll make use of mkosi's support for building multiple images so that we can build our base image and system extension in one go.

We start by creating a temporary directory with a base configuration file mkosi.conf with some shared settings:

[Output]
OutputDirectory=mkosi.output
CacheDirectory=mkosi.cache

Now let's continue with the base image definition by writing the following to mkosi.images/base/mkosi.conf:

[Output]
Format=directory

[Content]
CleanPackageMetadata=no
Packages=systemd
         udev

We use the directory output format here instead of the disk output so that we can build our extension without needing root privileges.

Now that we have our base image, we can define a sysext that builds on top of it by writing the following to mkosi.images/btrfs/mkosi.conf:

[Config]
Dependencies=base

[Output]
Format=sysext
Overlay=yes

[Content]
BaseTrees=%O/base
Packages=btrfs-progs

BaseTrees= point to our base image and Overlay=yes instructs mkosi to only package the files added on top of the base tree.

We can't sign the extension image without a key. We can generate one by running mkosi genkey which will generate files that are automatically picked up when building the image.

Finally, you can build the base image and the extensions by running mkosi -f. You'll find btrfs.raw in mkosi.output which is the extension image.

Various other interesting features

  • To sign any generated UKIs for secure boot, put your secure boot key and certificate in mkosi.key and mkosi.crt and enable the SecureBoot= setting. You can also run mkosi genkey to have mkosi generate a key and certificate itself.
  • The Ephemeral= setting can be enabled to boot the image in an ephemeral copy that is thrown away when the container or virtual machine exits.
  • ShimBootloader= and BiosBootloader= settings are available to configure shim and grub installation if needed.
  • mkosi can boot directory trees in a virtual using virtiofsd. This is very useful for quickly rebuilding an image and booting it as the image does not have to be packed up as a disk image.
  • ...

There's many more features that we won't go over in detail here in this blog post. Learn more about those by reading the documentation.

Conclusion

I'll finish with a bunch of links to more information about mkosi and related tooling:

Looking for LogoFAIL on your local system

Posted by Richard Hughes on January 09, 2024 01:45 PM

A couple of months ago, Binarly announced LogoFAIL which is a pretty serious firmware security problem. There is lots of complexity Alex explains much better than I might, but essentially a huge amount of system firmware running right now is vulnerable: The horribly-insecure parsing in the firmware allows the user to use a corrupted OEM logo (the one normally shown as the system boots) to run whatever code they want, providing a really useful primitive to do basically anything the attacker wants when running in a super-privileged boot state.

Vendors have to release new firmware versions to address this, and OEMs using the LVFS have pumped out millions of updates over the last few weeks.

So, what can we do to check that your system firmware has been patched [correctly] by the OEM? The only real way we can detect this is by dumping the BIOS in userspace, decompressing the various sections and looking at the EFI binary responsible for loading the image. In an ideal world we’d be able to look at the embedded SBoM entry for the specific DXE, but that’s not a universe we live in yet — although it is something I’m pushing the IBVs really hard to do. What we can do right now is token matching (or control flow analysis) to detect the broken and fixed image loader versions.

The four decompressing the various sections words hide how complicated taking an Intel Flash Descriptor image and breaking it into EFI binaries actually is. There are many levels of Matryoshka doll stacking involving hideous custom LZ77 and Huffman decompressors, and of course vendor-specific section types. It’s been several programmer-months spread over the last few years figuring it all out. Programs like UEFITool do a very good job, but we need to do something super-lightweight (and paranoid) at every system boot as part of the HSI tests. We only really want to stream a few kBs of SPI contents, not MBs as it’s actually quite slow and we only need a few hundred bytes to analyze.

In Fedora 40 all the kernel parts are in place to actually get the image from userspace in a sane way. It’s a 100% read-only interface, so don’t panic about bricking your system. This is currently Intel-only — AMD wasn’t super-keen on allowing userspace read access to the SPI, even as root — even though it’s the same data you can get with a $2 SPI programmer and 30 seconds with a Pomona clip.

Intel laptop and servers should both have an Intel PCI SPI controller — but some OEMs manually hide it for dubious reasons — and if that’s the case there’s nothing we can do I’m afraid.

You can help the fwupd project by contributing test firmware we can use to verify we parse it correctly, and to prevent regressions in the future. Please follow these steps only if:

  1. You have an Intel CPU laptop, desktop or server machine
  2. You’re running Fedora 39, (no idea on other distros, but you’ll need at least CONFIG_MTD_SPI_NOR, CONFIG_SPI_INTEL_PCI and CONFIG_SPI_MEM to be enabled in the kernel)
  3. You’re comfortable installing and removing a kernel on the command line
  4. There’s not already a test image for the same model provided by someone else
  5. You are okay with uploading your SPI contents to the internet
  6. You’re running the OEM-provided firmware, and not something like coreboot
  7. You’re aware that the firmware image we generate may have an encrypted version of your BIOS supervisor password (if set) and also all of the EFI attribute keys you’ve manually set, or that have been set by the various crash reporting programs.
  8. The machine is not a secure production system or a machine you don’t actually own.

Okay, lets get started:

sudo dnf update kernel --releasever 40

Then reboot into the new kernel, manually selecting the fc40 entry on the grub menu if required. We can check that the Intel SPI controller is visible.

$ cat /sys/class/mtd/mtd0/name 
BIOS

Assuming it’s indeed BIOS and not some other random system MTD device, lets continue.

$ sudo cat /dev/mtd0 > lenovo-p1-gen4.bin

The filename should be lowercase, have no spaces, and identify the machine you’re using — using the SKU if that’s easier.

Then we want to compress it (as it will have a lot of 0xFF padding bytes) and encrypt it (otherwise github will get most upset that you’re attaching something containing “binary code”):

zip lenovo-p1-gen4.zip lenovo-p1-gen4.bin -e
Enter password: fwupd
Verify password: fwupd

It’s easier if you use the password of “fwupd” (lowercase, no quotes) but if you’d rather send the image with a custom password just get the password to me somehow. Email, mastodon DM, carrier pigeon, whatever.

If you’re happy sharing the image, can you please create an issue and then attach the zip file and wait for me to download the file and close the issue. I also promise that I’m only using the provided images for testing fwupd IFD parsing, rather than anything more scary.

NOTE: If you’re getting a permission error (even running with sudo) you’re probably hitting a kernel MTD issue we’re trying to debug and fix. I wrote a python script that can be run as root to try to get each partition in turn.
If this script works, can you please also paste the output of that script into the submitted github issue.

Thanks!

Crosswords 0.3.12: Two-toned Editor

Posted by Jonathan Blandford on January 05, 2024 06:43 PM

Happy 2024! It’s time for the first GNOME Crosswords update of the year. A ton of work happened since the last release — primarily in the Editor. I’ll try to keep this update short and sweet. Packages will be available in flathub momentarily.

Move to libadwaita-1.4

The biggest visual change is that we moved away from libpanel in the Editor window. I found I was fighting with that library constantly due to the mode changes in the Editor, and I wasn’t able to take advantage of many of libpanel’s cooler features. Ultimately, libpanel wasn’t designed for workspaces and it looked like it would be a prohibitive amount of work to add them. Instead, I’m using an AdwOverlaySplitView, custom paned widget, and a lot of GtkStacks. This results in a cleaner appearance at the cost of a lot of flexibility. As an additional plus, it gives the Editor the snazzy two-toned look that’s currently in vogue:

The plan for the bottom of the content section is to provide some useful information for whatever operation was being shown. I put in some rudimentary statistics about the grid, as well as some basic support for authoring clues. I added three simple sections just to get something up and running: Anagrams, Odds, and Evens. The anagram calculations was based on work done by Pratham for his GSoC project. I was able to finish it and incorporate it into the UI.

Also, note that the histogram bars are rendered via GtkLabels with repeated block characters (U+2588). Thanks to Christian for that great suggestion — it was a quick way to get something visible.

Stateless Apps

For this release, we fully embraced a stateless control flow for the Editor. Federico and I have been talking about this for a while and it took some experimenting to find a model that works well. But given the complexity of the app (data and flow), it ended up simplifying the codebase substantially.

A neat example of why this is useful: there are three distinct grids in the main content section for each stage (GRID, CLUES, and STYLE). Instead of using the traditional GtkStack to swap between them, I’m able to simply swap out the state to render the correct one.

What’s next?

Word data

It’s become clear that the most important feature needed for the editor to be competitive is to improve the word list and the clue authoring section. Both the data and the filtering code need to be revisited. The word list code was originally written just for the autofill dialog and had to be fast for that. But as it’s progressed we’ve needed a lot more flexibility with the filtering. I don’t want to implement full regexp support as it won’t be fast enough, but something like nutrimatic might be more warranted (though it can be slow as well).

The data, too, needs attention. The Broda word-list is serviceable, but provides options that aren’t appropriate for all puzzles. A co-worker pointed me to the free wordnik wordlist, which I was able to add as a compile-time alternative. Libreoffice also has its own wordlists, though they aren’t designed for puzzle writing. But beyond the words, I’m going to need:

  • Definitions of words
  • Synonyms of words
  • Multi-word combos (along with punctuation)
  • Word frequency counts, to give prioritization to words/phrases
  • Offline support. As much as possible, not requiring network access is preferred

In addition, to the extent there’s interest, I’d love to support this in other languages beyond English.

I know there are a number of options for these already for free desktops. If you work in this space (or are interested in seeing GNOME’s options improved here) drop me a line.

Acrostics

I also want to highlight that Tanmay has continued to work hard on Acrostic support long after his GSoC internship ended. Not content with just adding acrostic support to the game, he has been adding acrostics to the Editor as well. It’s turned out to be quite a hard problem, with a lot of algorithmic complexity involved with auto-generating the grid in a reasonable amount of time. It has also required him to fix issues all over the codebase (including in glib!) Great work, Tanmay!

Thanks

  • Christian for libpanel advice and that cute histogram trick
  • Federico for helping me reason about complexity, CI fixes, and translations
  • Tanmay for tests and charset work (and future Acrostic work)
  • Pratham for anagram support (GSoC)
  • Rosanna for wording and UX suggestions
  • Davide for doc fixes and translations

Dealing with weird ELF libraries

Posted by Matthew Garrett on January 02, 2024 07:04 PM
Libraries are collections of code that are intended to be usable by multiple consumers (if you're interested in the etymology, watch this video). In the old days we had what we now refer to as "static" libraries, collections of code that existed on disk but which would be copied into newly compiled binaries. We've moved beyond that, thankfully, and now make use of what we call "dynamic" or "shared" libraries - instead of the code being copied into the binary, a reference to the library function is incorporated, and at runtime the code is mapped from the on-disk copy of the shared object[1]. This allows libraries to be upgraded without needing to modify the binaries using them, and if multiple applications are using the same library at once it only requires that one copy of the code be kept in RAM.

But for this to work, two things are necessary: when we build a binary, there has to be a way to reference the relevant library functions in the binary; and when we run a binary, the library code needs to be mapped into the process.

(I'm going to somewhat simplify the explanations from here on - things like symbol versioning make this a bit more complicated but aren't strictly relevant to what I was working on here)

For the first of these, the goal is to replace a call to a function (eg, printf()) with a reference to the actual implementation. This is the job of the linker rather than the compiler (eg, if you use the -c argument to tell gcc to simply compile to an object rather than linking an executable, it's not going to care about whether or not every function called in your code actually exists or not - that'll be figured out when you link all the objects together), and the linker needs to know which symbols (which aren't just functions - libraries can export variables or structures and so on) are available in which libraries. You give the linker a list of libraries, it extracts the symbols available, and resolves the references in your code with references to the library.

But how is that information extracted? Each ELF object has a fixed-size header that contains references to various things, including a reference to a list of "section headers". Each section has a name and a type, but the ones we're interested in are .dynstr and .dynsym. .dynstr contains a list of strings, representing the name of each exported symbol. .dynsym is where things get more interesting - it's a list of structs that contain information about each symbol. This includes a bunch of fairly complicated stuff that you need to care about if you're actually writing a linker, but the relevant entries for this discussion are an index into .dynstr (which means the .dynsym entry isn't sufficient to know the name of a symbol, you need to extract that from .dynstr), along with the location of that symbol within the library. The linker can parse this information and obtain a list of symbol names and addresses, and can now replace the call to printf() with a reference to libc instead.

(Note that it's not possible to simply encode this as "Call this address in this library" - if the library is rebuilt or is a different version, the function could move to a different location)

Experimentally, .dynstr and .dynsym appear to be sufficient for linking a dynamic library at build time - there are other sections related to dynamic linking, but you can link against a library that's missing them. Runtime is where things get more complicated.

When you run a binary that makes use of dynamic libraries, the code from those libraries needs to be mapped into the resulting process. This is the job of the runtime dynamic linker, or RTLD[2]. The RTLD needs to open every library the process requires, map the relevant code into the process's address space, and then rewrite the references in the binary into calls to the library code. This requires more information than is present in .dynstr and .dynsym - at the very least, it needs to know the list of required libraries.

There's a separate section called .dynamic that contains another list of structures, and it's the data here that's used for this purpose. For example, .dynamic contains a bunch of entries of type DT_NEEDED - this is the list of libraries that an executable requires. There's also a bunch of other stuff that's required to actually make all of this work, but the only thing I'm going to touch on is DT_HASH. Doing all this re-linking at runtime involves resolving the locations of a large number of symbols, and if the only way you can do that is by reading a list from .dynsym and then looking up every name in .dynstr that's going to take some time. The DT_HASH entry points to a hash table - the RTLD hashes the symbol name it's trying to resolve, looks it up in that hash table, and gets the symbol entry directly (it still needs to resolve that against .dynstr to make sure it hasn't hit a hash collision - if it has it needs to look up the next hash entry, but this is still generally faster than walking the entire .dynsym list to find the relevant symbol). There's also DT_GNU_HASH which fulfills the same purpose as DT_HASH but uses a more complicated algorithm that performs even better. .dynamic also contains entries pointing at .dynstr and .dynsym, which seems redundant but will become relevant shortly.

So, .dynsym and .dynstr are required at build time, and both are required along with .dynamic at runtime. This seems simple enough, but obviously there's a twist and I'm sorry it's taken so long to get to this point.

I bought a Synology NAS for home backup purposes (my previous solution was a single external USB drive plugged into a small server, which had uncomfortable single point of failure properties). Obviously I decided to poke around at it, and I found something odd - all the libraries Synology ships were entirely lacking any ELF section headers. This meant no .dynstr, .dynsym or .dynamic sections, so how was any of this working? nm asserted that the libraries exported no symbols, and readelf agreed. If I wrote a small app that called a function in one of the libraries and built it, gcc complained that the function was undefined. But executables on the device were clearly resolving the symbols at runtime, and if I loaded them into ghidra the exported functions were visible. If I dlopen()ed them, dlsym() couldn't resolve the symbols - but if I hardcoded the offset into my code, I could call them directly.

Things finally made sense when I discovered that if I passed the --use-dynamic argument to readelf, I did get a list of exported symbols. It turns out that ELF is weirder than I realised. As well as the aforementioned section headers, ELF objects also include a set of program headers. One of the program header types is PT_DYNAMIC. This typically points to the same data that's present in the .dynamic section. Remember when I mentioned that .dynamic contained references to .dynsym and .dynstr? This means that simply pointing at .dynamic is sufficient, there's no need to have separate entries for them.

The same information can be reached from two different locations. The information in the section headers is used at build time, and the information in the program headers at run time[3]. I do not have an explanation for this. But if the information is present in two places, it seems obvious that it should be able to reconstruct the missing section headers in my weird libraries? So that's what this does. It extracts information from the DYNAMIC entry in the program headers and creates equivalent section headers.

There's one thing that makes this more difficult than it might seem. The section header for .dynsym has to contain the number of symbols present in the section. And that information doesn't directly exist in DYNAMIC - to figure out how many symbols exist, you're expected to walk the hash tables and keep track of the largest number you've seen. Since every symbol has to be referenced in the hash table, once you've hit every entry the largest number is the number of exported symbols. This seemed annoying to implement, so instead I cheated, added code to simply pass in the number of symbols on the command line, and then just parsed the output of readelf against the original binaries to extract that information and pass it to my tool.

Somehow, this worked. I now have a bunch of library files that I can link into my own binaries to make it easier to figure out how various things on the Synology work. Now, could someone explain (a) why this information is present in two locations, and (b) why the build-time linker and run-time linker disagree on the canonical source of truth?

[1] "Shared object" is the source of the .so filename extension used in various Unix-style operating systems
[2] You'll note that "RTLD" is not an acryonym for "runtime dynamic linker", because reasons
[3] For environments using the GNU RTLD, at least - I have no idea whether this is the case in all ELF environments

comment count unavailable comments

Wayland proxy load balancer

Posted by Martin Stransky on December 22, 2023 01:56 PM
<figure class="aligncenter size-large is-resized"></figure>

Updated Dec 23

Wayland clients (applications) may face various difficulties not primary caused by them. There are three main Wayland compositors (Mutter/Gnome, KWin/KDE and WLRoots/Sway) and every compositor behaves differently in some corner cases not exactly defined by Wayland standards (or bents the specification somehow).

In X11 world an underlying X.Org implementation is the same for every desktop environment and there are differences introduced by window managers. Wayland merged window manager (like Metacity) and renderer (X11) to one block.

One of Firefox Wayland top crash bug is a Lost connection to Wayland compositor one.

Mutter (and maybe other ones) terminates Wayland client if it’s recognized as stalled. It usually means Wayland client doesn’t read messages from Wayland display socket fast enough and compositor message output buffer is full. It may be a bug in application itself (an event loop is not processed) or it’s caused by input devices like 1000 Hz mouse which generates too many events.

Unfortunately Wayland protocol doesn’t implement any kind of display connection management. Once the connection is lost / disconnected, there isn’t any way how it can be restored and Wayland client is terminated. The most visible example is Wayland compositor crash which takes down all applications, there isn’t any recovery point available.

There are various discussions going on [1],[2] but they’re stalled too 😀 as well as discussion about PIP – Picture-in-Picture Wayland protocol extension implementation [3].

But Firefox needs to solve the crashes caused by message jam right now as it’s shipped to wide audience. An initial idea popped out in discussion to create a proxy between Firefox and Wayland compositor to cache messages and prevent compositor message queue overflow. It’s very simplified case of WayPipe which routes Wayland communication over network and adds network transparency to Wayland protocol.

There’s initial successful proof-of-concept written in Rust and I implement it in C++ as wayland-proxy module which can be shipped with Firefox (mainly because my Rust knowledge is non-existent).

Wayland-proxy can be used as stand alone application or library included in Wayland application and it’s shipped with Fedora Firefox 121.0 right now and let’s see how it works.

UPDATE: Thanks to the valuable feedback at comments, it’s definitely worth to looks at it! There are more issues which needs different approach and Firefox problem also depends on Gtk3 and it’s way of handling Wayland connection. I was also pointed to Qt 6.6 robustness project which implement Wayland reconnection on Qt toolkit level.

Mozilla ships Firefox 121.0 with Wayland enabled

Posted by Martin Stransky on December 20, 2023 12:32 PM
<figure class="aligncenter size-large">Wayland pub at Brno city<figcaption class="wp-element-caption">Even my hometown Brno has its own Wayland 😀</figcaption></figure>

Firefox on Linux hit another milestone as Mozilla defaults to Wayland backend instead of XWayland X11 emulation in Firefox 121. It’s a logic step as XWayland emulation introduces bugs from both Wayland and X11 worlds together so better run Wayland directly.

As Fedora has provided Firefox on Wayland backend for years, this change affects mainly Ubuntu and its Firefox/Snap users (if Canonical decides to follow Mozilla here), Firefox shipped as Flatpak and next Firefox ESR and Thunderbird releases.

And what to expect? Beside all the goodies I need to mention Wayland differences and regressions from X11.

  • Wayland doesn’t allow applications to position itself on desktop and it can’t place itself on a particular workspace.

    It means Firefox session restore feature works differently on Wayland and all Firefox windows are restored on first workspace.
  • Wayland client can’t place itself on top, above of other clients, which is another Wayland security feature. It affects Firefox Picture-in-Picture (PIP) windows, they can’t be placed on top by Firefox itself.

    You need to guide Wayland compositor to keep it there. Fortunately a GNOME extension may be installed for it and KDE allows to create a window rule.

    Or you can click by right mouse button on PIP window and select ‘Always on Top’ option there.

    There’s ongoing discussion about universal PIP Window interface as Wayland protocol extension (similar to PIP on Android) but we’re far from general agreement here.

And if you feel Wayland is too restrictive for you while it’s missing any benefit it can be disabled in Firefox by MOZ_ENABLE_WAYLAND env variable. Just add

export MOZ_ENABLE_WAYLAND=0

line at .bashrc file (or similar one for different shell) or run Firefox from terminal as

MOZ_ENABLE_WAYLAND=0 firefox

or add it to /usr/bin/firefox launch script.

Making SSH host certificates more usable

Posted by Matthew Garrett on December 19, 2023 07:48 PM
Earlier this year, after Github accidentally committed their private RSA SSH host key to a public repository, I wrote about how better support for SSH host certificates would allow this sort of situation to be handled in a user-transparent way without any negative impact on security. I was hoping that someone would read this and be inspired to fix the problem but sadly that didn't happen so I've actually written some code myself.

The core part of this is straightforward - if a server presents you with a certificate associated with a host key, then make the trust in that host be whoever signed the certificate rather than just trusting the host key. This means that if someone needs to replace the host key for any reason (such as, for example, them having published the private half), you can replace the host key with a new key and a new certificate, and as long as the new certificate is signed by the same key that the previous certificate was, you'll trust the new key and key rotation can be carried out without any user errors. Hurrah!

So obviously I wrote that bit and then thought about the failure modes and it turns out there's an obvious one - if an attacker obtained both the private key and the certificate, what stops them from continuing to use it? The certificate isn't a secret, so we basically have to assume that anyone who possesses the private key has access to it. We may have silently transitioned to a new host key on the legitimate servers, but a hostile actor able to MITM a user can keep on presenting the old key and the old certificate until it expires.

There's two ways to deal with this - either have short-lived certificates (ie, issue a new certificate every 24 hours or so even if you haven't changed the key, and specify that the certificate is invalid after those 24 hours), or have a mechanism to revoke the certificates. The former is viable if you have a very well-engineered certificate issuing operation, but still leaves a window for an attacker to make use of the certificate before it expires. The latter is something SSH has support for, but the spec doesn't define any mechanism for distributing revocation data.

So, I've implemented a new SSH protocol extension that allows a host to send a key revocation list to a client. The idea is that the client authenticates to the server, receives a key revocation list, and will no longer trust any certificates that are contained within that list. This seems simple enough, but a naive implementation opens the client to various DoS attacks. For instance, if you simply revoke any key contained within the received KRL, a hostile server could revoke any certificates that were otherwise trusted by the client. The easy way around this is for the client to ensure that any revoked keys are associated with the same CA that signed the host certificate - that way a compromised host can only revoke certificates associated with that CA, and can't interfere with anyone else.

Unfortunately that still means that a single compromised host can still trigger revocation of certificates inside that trust domain (ie, a compromised host a.test.com could push a KRL that invalidated the certificate for b.test.com), because there's no way in the KRL format to indicate that a given revocation is associated with a specific hostname. This means we need a mechanism to verify that the KRL update is legitimate, and the easiest way to handle that is to sign it. The KRL format specifies an in-band signature but this was deprecated earlier this year - instead KRLs are supposed to be signed with the sshsig format. But we control both the server and the client, which means it's easy enough to send a detached signature as part of the extension data.

Putting this all together: you ssh to a server you've never contacted before, and it presents you with a host certificate. Instead of the host key being added to known_hosts, the CA key associated with the certificate is added. From now on, if you ssh to that host and it presents a certificate signed by that CA, it'll be trusted. Optionally, the host can also send you a KRL and a signature. If the signature is generated by the CA key that you already trust, any certificates in that KRL associated with that CA key will be incorporated into local storage. The expected flow if a key is compromised is that the owner of the host generates a new keypair, obtains a new certificate for the new key, and adds the old certificate to a KRL that is signed with the CA key. The next time the user connects to that host, they receive the new key and new certificate, trust it because it's signed by the same CA key, and also receive a KRL signed with the same CA that revokes trust in the old certificate.

Obviously this breaks down if a user is MITMed with a compromised key and certificate immediately after the host is compromised - they'll see a legitimate certificate and won't receive any revocation list, so will trust the host. But this is the same failure mode that would occur in the absence of keys, where the attacker simply presents the compromised key to the client before trust in the new key has been created. This seems no worse than the status quo, but means that most users will seamlessly transition to a new key and revoke trust in the old key with no effort on their part.

The work in progress tree for this is here - at the point of writing I've merely implemented this and made sure it builds, not verified that it actually works or anything. Cleanup should happen over the next few days, and I'll propose this to upstream if it doesn't look like there's any showstopper design issues.

comment count unavailable comments

Xorg being removed. What does this mean?

Posted by Peter Hutterer on December 14, 2023 04:13 AM

You may have seen the news that Red Hat Enterprise Linux 10 plans to remove Xorg. But Xwayland will stay around, and given the name overloading and them sharing a git repository there's some confusion over what is Xorg. So here's a very simple "picture". This is the xserver git repository:

$ tree -d -L 2 xserver
xserver
├── composite
├── config
├── damageext
├── dbe
├── dix
├── doc
│   └── dtrace
├── dri3
├── exa
├── fb
├── glamor
├── glx
├── hw
│   ├── kdrive
│   ├── vfb
│   ├── xfree86              <- this one is Xorg
│   ├── xnest
│   ├── xquartz
│   ├── xwayland
│   └── xwin
├── include
├── m4
├── man
├── mi
├── miext
│   ├── damage
│   ├── rootless
│   ├── shadow
│   └── sync
├── os
├── present
├── pseudoramiX
├── randr
├── record
├── render
├── test
│   ├── bigreq
│   ├── bugs
│   ├── damage
│   ├── scripts
│   ├── sync
│   ├── xi1
│   └── xi2
├── Xext
├── xfixes
├── Xi
└── xkb
The git repo produces several X servers, including the one designed to run on bare metal: Xorg (in hw/xfree86 for historical reasons). The other hw directories are the other X servers including Xwayland. All the other directories are core X server functionality that's shared between all X servers [1]. Removing Xorg from a distro but keeping Xwayland means building with --disable-xfree86 -enable-xwayland [1]. That's simply it (plus the resulting distro packaging work of course).

Removing Xorg means you need something else that runs on bare metal and that is your favourite Wayland compositor. Xwayland then talks to that while presenting an X11-compatible socket to existing X11 applications.

Of course all this means that the X server repo will continue to see patches and many of those will also affect Xorg. For those who are running git master anyway. Don't get your hopes up for more Xorg releases beyond the security update background noise [2].

Xwayland on the other hand is actively maintained and will continue to see releases. But those releases are a sequence [1] of

$ git new-branch xwayland-23.x.y
$ git rm hw/{kdrive/vfb/xfree86/xnest,xquartz,xwin}
$ git tag xwayland-23.x.y
In other words, an Xwayland release is the xserver git master branch with all X servers but Xwayland removed. That's how Xwayland can see new updates and releases without Xorg ever seeing those (except on git master of course). And that's how your installed Xwayland has code from 2023 while your installed Xorg is still stuck on the branch created and barely updated after 2021.

I hope this helps a bit with the confusion of the seemingly mixed messages sent when you see headlines like "Xorg is unmaintained", "X server patches to fix blah", "Xorg is abandoned", "new Xwayland release.

[1] not 100% accurate but close enough
[2] historically an Xorg release included all other X servers (Xquartz, Xwin, Xvfb, ...) too so this applies to those servers too unless they adopt the Xwayland release model

100 Million Firmware Updates Supplied By The LVFS

Posted by Richard Hughes on December 06, 2023 09:23 PM

The LVFS has now supplied over 100 million updates to Linux machines all around the globe. The true number is unknown, as we allow users to re-distribute updates without any kind of tracking, and also allow large companies or agencies to mirror the entire LVFS so the archive can be used offline. The true number of updates deployed will probably be a lot higher. Just 8 years ago Red Hat asked me to “make firmware updates work on Linux” and now we have a thriving set of projects that respect both your freedom and your privacy, and a growing ecosystem of hardware vendors who consider Linux users first class citizens. Every month we have two or three new vendors join; the logistical, security and most importantly commercial implications of not being “on the LVFS” are now too critical for IHVs, ODMs and OEMs to ignore.

Red Hat can certainly take a lot of credit for the undeniable success of LVFS and fwupd, as they have been paying my salary and pushing me forward over the last decade and more. Customer use of fwupd and LVFS is growing and growing – and planning for new fwupd/LVFS device support now happens months in advance to ensure fwupd is ready-to-go in long term support distributions like Red Hat Enterprise Linux. With infrastructure supplied and support paid for by the Linux Foundation, the LVFS really has a stable base that will be used for years to come.

As the number of devices supported by the LVFS goes up and up every week, and I’m glad that the community around fwupd is growing at the same pace as the popularity. Google and Collabora have also been amazing partners in encouraging and helping vendors to ship updates on the LVFS and supporting fwupd in ChromeOS — and their trust and support has been invaluable. I’m also glad the “side-projects” like “GNOME Firmware“, “Host Security ID“, “fwupd friendly firmware” and “uSWID as a SBoM format” also seem to be flourishing into independent projects in their own right.

Everybody is incredibly excited about the long term future of both fwupd and the LVFS and I’m looking forward to the next 100 million updates. A huge thank you to all that helped.

Why does Gnome fingerprint unlock not unlock the keyring?

Posted by Matthew Garrett on December 05, 2023 06:32 AM
There's a decent number of laptops with fingerprint readers that are supported by Linux, and Gnome has some nice integration to make use of that for authentication purposes. But if you log in with a fingerprint, the moment you start any app that wants to access stored passwords you'll get a prompt asking you to type in your password, which feels like it somewhat defeats the point. Mac users don't have this problem - authenticate with TouchID and all your passwords are available after login. Why the difference?

Fingerprint detection can be done in two primary ways. The first is that a fingerprint reader is effectively just a scanner - it passes a graphical representation of the fingerprint back to the OS and the OS decides whether or not it matches an enrolled finger. The second is for the fingerprint reader to make that determination itself, either storing a set of trusted fingerprints in its own storage or supporting being passed a set of encrypted images to compare against. Fprint supports both of these, but note that in both cases all that we get at the end of the day is a statement of "The fingerprint matched" or "The fingerprint didn't match" - we can't associate anything else with that.

Apple's solution involves wiring the fingerprint reader to a secure enclave, an independently running security chip that can store encrypted secrets or keys and only release them under pre-defined circumstances. Rather than the fingerprint reader providing information directly to the OS, it provides it to the secure enclave. If the fingerprint matches, the secure enclave can then provide some otherwise secret material to the OS. Critically, if the fingerprint doesn't match, the enclave will never release this material.

And that's the difference. When you perform TouchID authentication, the secure enclave can decide to release a secret that can be used to decrypt your keyring. We can't easily do this under Linux because we don't have an interface to store those secrets. The secret material can't just be stored on disk - that would allow anyone who had access to the disk to use that material to decrypt the keyring and get access to the passwords, defeating the object. We can't use the TPM because there's no secure communications channel between the fingerprint reader and the TPM, so we can't configure the TPM to release secrets only if an associated fingerprint is provided.

So the simple answer is that fingerprint unlock doesn't unlock the keyring because there's currently no secure way to do that. It's not intransigence on the part of the developers or a conspiracy to make life more annoying. It'd be great to fix it, but I don't see an easy way to do so at the moment.

comment count unavailable comments

PipeWire Camera Support in Firefox #2

Posted by Jan Grulich on November 24, 2023 02:45 PM

I wrote the first blog post about PipeWire cameras in Firefox in May and a lot has happened since then. The first PipeWire support arrived shortly after the blog post was published and was released as part of Firefox 116 (August). We didn’t enable it by default, of course since it’s still a “work in progress”, but many of you tried it (thank you for that) and we were able to fix some issues that I, as the only tester at the time, hadn’t found. However, aside from all the crashes and minor issues we were able to fix relatively quickly, there was one major problem (or drawback) with the PipeWire camera that made it unusable with most popular video conferencing sites, such as Google Meet. Kind of a deal breaker, right? This has kept me busy ever since, but we are finally close to fixing it in upstream. I’m going to explain why this was a problem and how we fixed it, and forgive me in advance if I write anything wrong, I’m still learning and discovering things as they unfold.

There are Javascript APIs that are implemented by all the major browsers. The API documentation is here. It defines APIs sites can use to query information about media devices. I will now describe a simplified workflow used with V4L2 on the aforementioned Google Meet once you start a meeting:

  • GMeet makes enumerateDevices() call to get information about available devices
  • Firefox can respond with the list of available cameras (+ audio devices obviously) on the system because the information about cameras is available and no permission is needed
  • GMeet makes getUserMedia() call to get access to the camera since it knows there is a camera available
  • Firefox will prompt the user to get access to the selected devices (including camera) and start streaming

Now the same situation, but with PipeWire:

  • GMeet makes enumerateDevices() call to get information about available devices
  • Firefox cannot respond with the list of available cameras because this enumeration request cannot ask for user permission and we cannot access PipeWire without it. Firefox will return an empty list of camera devices and there will be only audio devices announced
  • GMeet makes getUserMedia() call, but only to get access to the devices that were previously announced, so only audio devices
  • Firefox will prompt the user to get access only to the selected audio devices and no camera

How did we solve this?

The documentation here also covers this situation. The enumerateDevices() request is allowed to return a placeholder device for each type. This means that we can return a placeholder camera device, which tells Google Meet there is actually a camera device to ask for. With this device placeholder, the subsequent getUserMedia() request will also request access to camera devices. How do we know that a camera device is present without having access to PipeWire, you ask? The camera portal from xdg-desktop-portal has a IsCameraPresent property for exactly the same purpose and we use it to know whether to insert the camera device placeholder or not.

While such a solution sounds simple on paper, it required a significant amount of changes to the entire media handling stack. There is not a small amount of PipeWire specific code, so this fix also involves some restructuring so that all the backend specific logic is in one place. And while I’m getting more and more familiar with the Firefox code, which is helping me to progress faster, there’s still a lot to learn.

Anyway, the reason I’m writing this blog post now is that all the related changes have been approved and will hopefully be landing soon, making Firefox fully usable with PipeWire . Although not yet merged, Fedora users can use a COPR repository I created. The repository has Fedora Firefox builds with all the necessary changes backported and PipeWire camera enabled by default. Just note that while I’ve been testing and using it for the past few months and it’s worked perfectly for me, you use it at your own risk. You better to use it just to test PipeWire camera support as the official Fedora Firefox package is the one we keep fully updated and my repo may lag behind. There will be a new PipeWire 1.0 release soon, which will be a big milestone for PipeWire and I hope that PipeWire camera support in Firefox and browsers in general will be part of the PipeWire success story.

PSA: For Xorg GNOME sessions, use the xf86-input-wacom driver for your tablets

Posted by Peter Hutterer on November 10, 2023 03:22 AM

TLDR: see the title of this blog post, it's really that trivial.

Now that GodotWayland has been coming for ages and all new development focuses on a pile of software that steams significantly less, we're seeing cracks appear in the old Xorg support. Not intentionally, but there's only so much time that can be spent on testing and things that are more niche fall through. One of these was a bug I just had the pleasure of debugging and was triggered by GNOME on Xorg user using the xf86-input-libinput driver for tablet devices.

On the surface of it, this should be fine because libinput (and thus xf86-input-libinput) handles tablets just fine. But libinput is the new kid on the block. The old kid on said block is the xf86-input-wacom driver, older than libinput by slightly over a decade. And oh man, history has baked things into the driver that are worse than raisins in apple strudel [1].

The xf86-input-libinput driver was written as a wrapper around libinput and makes use of fancy things that (from libinput's POV) have always been around: things like input device hotplugging. Fancy, I know. For tablet devices the driver creates an X device for each new tool as it comes into proximity first. Future events from that tool will go through that device. A second tool, be it a new pen or the eraser on the original pen, will create a second X device and events from that tool will go through that X device. Configuration on any device will thus only affect that particular pen. Almost like the whole thing makes sense.

The wacom driver of course doesn't do this. It pre-creates X devices for some possible types of tools (pen, eraser, and cursor [2] but not airbrush or artpen). When a tool goes into proximity the events are sent through the respective device, i.e. all pens go through the pen tool, all erasers through the eraser tool. To actually track pens there is the "Wacom Serial IDs" property that contains the current tool's serial number. If you want to track multiple tools you need to query the property on proximity in [4]. At the time this was within a reasonable error margin of a good idea.

Of course and because MOAR CONFIGURATION! will save us all from the great filter you can specify the "ToolSerials" xorg.conf option as e.g. "airbrush;12345;artpen" and get some extra X devices pre-created, in this case a airbrush and artpen X device and an X device just for the tool with the serial number 12345. All other tools multiplex through the default devices. Again, at the time this was a great improvement. [5]

Anyway, where was I? Oh, right. The above should serve as a good approximation of a reason why the xf86-input-libinput driver does not try to be fullly compatible to the xf86-input-wacom driver. In everyday use these things barely matter [6] but for the desktop environment which needs to configure these devices all these differences mean multiple code paths. Those paths need to be tested but they aren't, so things fall through the cracks.

So quite a while ago, we made the decision that until Xorg goes dodo, the xf86-input-wacom driver is the tablet driver to use in GNOME. So if you're using a GNOME on Xorg session [7], do make sure the xf86-input-wacom driver is installed. It will make both of us happier and that's a good aim to strive for.

[1] It's just a joke. Put the pitchforks down already.
[2] The cursor is the mouse-like thing Wacom sells. Which is called cursor [3] because the English language has a limited vocabulary and we need to re-use words as much as possible lest we run out of them.
[3] It's also called puck. Because [2].
[4] And by "query" I mean "wait for the XI2 event notifying you of a property change". Because of lolz the driver cannot update the property on proximity in but needs to schedule that as idle func so the property update for the serial always arrives at some unspecified time after the proximity in but hopefully before more motion events happen. Or not, and that's how hope dies.
[5] Think about this next time someone says they long for some unspecified good old days.
[6] Except the strip axis which on the wacom driver is actually a bit happily moving left/right as your finger moves up/down on the touch strip and any X client needs to know this. libinput normalizes this to...well, a normal value but now the X client needs to know which driver is running so, oh deary deary.
[7] e.g because your'e stockholmed into it by your graphics hardware

Why ACPI?

Posted by Matthew Garrett on November 01, 2023 06:30 AM
"Why does ACPI exist" - - the greatest thread in the history of forums, locked by a moderator after 12,239 pages of heated debate, wait no let me start again.

Why does ACPI exist? In the beforetimes power management on x86 was done by jumping to an opaque BIOS entry point and hoping it would do the right thing. It frequently didn't. We called this Advanced Power Management (Advanced because before this power management involved custom drivers for every machine and everyone agreed that this was a bad idea), and it involved the firmware having to save and restore the state of every piece of hardware in the system. This meant that assumptions about hardware configuration were baked into the firmware - failed to program your graphics card exactly the way the BIOS expected? Hurrah! It's only saved and restored a subset of the state that you configured and now potential data corruption for you. The developers of ACPI made the reasonable decision that, well, maybe since the OS was the one setting state in the first place, the OS should restore it.

So far so good. But some state is fundamentally device specific, at a level that the OS generally ignores. How should this state be managed? One way to do that would be to have the OS know about the device specific details. Unfortunately that means you can't ship the computer without having OS support for it, which means having OS support for every device (exactly what we'd got away from with APM). This, uh, was not an option the PC industry seriously considered. The alternative is that you ship something that abstracts the details of the specific hardware and makes that abstraction available to the OS. This is what ACPI does, and it's also what things like Device Tree do. Both provide static information about how the platform is configured, which can then be consumed by the OS and avoid needing device-specific drivers or configuration to be built-in.

The main distinction between Device Tree and ACPI is that Device Tree is purely a description of the hardware that exists, and so still requires the OS to know what's possible - if you add a new type of power controller, for instance, you need to add a driver for that to the OS before you can express that via Device Tree. ACPI decided to include an interpreted language to allow vendors to expose functionality to the OS without the OS needing to know about the underlying hardware. So, for instance, ACPI allows you to associate a device with a function to power down that device. That function may, when executed, trigger a bunch of register accesses to a piece of hardware otherwise not exposed to the OS, and that hardware may then cut the power rail to the device to power it down entirely. And that can be done without the OS having to know anything about the control hardware.

How is this better than just calling into the firmware to do it? Because the fact that ACPI declares that it's going to access these registers means the OS can figure out that it shouldn't, because it might otherwise collide with what the firmware is doing. With APM we had no visibility into that - if the OS tried to touch the hardware at the same time APM did, boom, almost impossible to debug failures (This is why various hardware monitoring drivers refuse to load by default on Linux - the firmware declares that it's going to touch those registers itself, so Linux decides not to in order to avoid race conditions and potential hardware damage. In many cases the firmware offers a collaborative interface to obtain the same data, and a driver can be written to get that. this bug comment discusses this for a specific board)

Unfortunately ACPI doesn't entirely remove opaque firmware from the equation - ACPI methods can still trigger System Management Mode, which is basically a fancy way to say "Your computer stops running your OS, does something else for a while, and you have no idea what". This has all the same issues that APM did, in that if the hardware isn't in exactly the state the firmware expects, bad things can happen. While historically there were a bunch of ACPI-related issues because the spec didn't define every single possible scenario and also there was no conformance suite (eg, should the interpreter be multi-threaded? Not defined by spec, but influences whether a specific implementation will work or not!), these days overall compatibility is pretty solid and the vast majority of systems work just fine - but we do still have some issues that are largely associated with System Management Mode.

One example is a recent Lenovo one, where the firmware appears to try to poke the NVME drive on resume. There's some indication that this is intended to deal with transparently unlocking self-encrypting drives on resume, but it seems to do so without taking IOMMU configuration into account and so things explode. It's kind of understandable why a vendor would implement something like this, but it's also kind of understandable that doing so without OS cooperation may end badly.

This isn't something that ACPI enabled - in the absence of ACPI firmware vendors would just be doing this unilaterally with even less OS involvement and we'd probably have even more of these issues. Ideally we'd "simply" have hardware that didn't support transitioning back to opaque code, but we don't (ARM has basically the same issue with TrustZone). In the absence of the ideal world, by and large ACPI has been a net improvement in Linux compatibility on x86 systems. It certainly didn't remove the "Everything is Windows" mentality that many vendors have, but it meant we largely only needed to ensure that Linux behaved the same way as Windows in a finite number of ways (ie, the behaviour of the ACPI interpreter) rather than in every single hardware driver, and so the chances that a new machine will work out of the box are much greater than they were in the pre-ACPI period.

There's an alternative universe where we decided to teach the kernel about every piece of hardware it should run on. Fortunately (or, well, unfortunately) we've seen that in the ARM world. Most device-specific simply never reaches mainline, and most users are stuck running ancient kernels as a result. Imagine every x86 device vendor shipping their own kernel optimised for their hardware, and now imagine how well that works out given the quality of their firmware. Does that really seem better to you?

It's understandable why ACPI has a poor reputation. But it's also hard to figure out what would work better in the real world. We could have built something similar on top of Open Firmware instead but the distinction wouldn't be terribly meaningful - we'd just have Forth instead of the ACPI bytecode language. Longing for a non-ACPI world without presenting something that's better and actually stands a reasonable chance of adoption doesn't make the world a better place.

comment count unavailable comments

Q3 Firefox Linux update

Posted by Martin Stransky on October 13, 2023 10:09 AM
<figure class="aligncenter size-large is-resized"></figure>

Let’s highlight some updates of Firefox development from Linux perspective for last three months.

Wayland backend is gaining momentum at Mozilla upstream. It’s enabled by default in Fedora/Arch Linux but Mozilla is a bit hesitant and runs Wayland for Nightly/Beta only. Mozilla official binaries, Ubuntu/Snap and Mozilla/flatpak switches to XWayland mainly due to missing test coverage of Wayland builds.

But Mozilla migrates its testsuite to Ubuntu 22.04 which also involves Wayland testing (Bug 1813588) so let’s hope we see Wayland in release soon (Bug 1752398). Indeed there are still some Wayland bugs to fix and we keep an eye on it.

Dbus-glib is no longer needed by Firefox to build. This old dependency from Gtk2 times was removed in Firefox 120.0 where we switch to Gio DBus interface and async implementation based on MozPromise. Dbus-glib can be removed from build roots for good and that helps mainly to flatpak packaging.

Kiosk mode has a new feature – with ‘–kiosk-monitor’ parameter you can place Firefox kiosk window to a specified monitor. That supports Wayland/kiosk environments and mutihead setup where one box covers more displays. We also switch to fullscreen immediately after Firefox start to make sure we reliably cover whole screen in kiosk mode.

Idle monitor/service was implemented by org.gnome.Mutter.IdleMonitor DBus interface. Former X11 version crashes and doesn’t work on Wayland. I may look at KDE/Sway interfaces too.

I also investigated broken Gnome Shell search service in Fedora and found out that Firefox DBus search interface needs to be ‘Activatable’. That involves to implement org.freedesktop.Application DBus interface, ships corresponding service file at /usr/share/dbus-1/services/ and has application desktop file in correct format (org.mozilla.firefox.desktop instead of recent Fedora plain firefox.desktop).

So I renamed Fedora Firefox desktop file to org.mozilla.firefox.desktop and bundled DBus service search file and everything looked okay except … Gnome Shell default launcher has hardcoded ‘firefox.desktop‘ so Firefox was removed from default applications :D. Well, desktop file change is allowed for new releases only. Lesson learned.

But there’s a way. You can use plain firefox.desktop name but you need a binary service launcher at /usr/share/dbus-1/services/ file so DBus can run the service on demand. As we don’t want to run Firefox for every Gnome search I put simple ‘/usr/bin/false’ there and oala! Search works and Firefox is still in Gnome taskbar.

But, well, desktop is not just a Gnome. KDE/Plasma has also a vote and just can’t stand such level hack and simply crashed. I don’t blame it, non-canonical desktop file name with DBusActivatable flag may be too much for poor launcher. Nobody is perfect.

So right now we have broken Gnome search provider but I know where the problem is and look forward to fix it for Fedora 40, with desktop file rename and implementation of org.freedesktop.Application DBus interface.

And that’s all for Q3, let’s see next quarter.

Defending abuse does not defend free software

Posted by Matthew Garrett on October 12, 2023 04:32 PM
The Free Software Foundation Europe and the Software Freedom Conservancy recently released a statement that they would no longer work with Eben Moglen, chairman of the Software Freedom Law Center. Eben was the general counsel for the Free Software Foundation for over 20 years, and was centrally involved in the development of version 3 of the GNU General Public License. He's devoted a great deal of his life to furthering free software.

But, as described in the joint statement, he's also acted abusively towards other members of the free software community. He's acted poorly towards his own staff. In a professional context, he's used graphically violent rhetoric to describe people he dislikes. He's screamed abuse at people attempting to do their job.

And, sadly, none of this comes as a surprise to me. As I wrote in 2017, after it became clear that Eben's opinions diverged sufficiently from the FSF's that he could no longer act as general counsel, he responded by threatening an FSF board member at an FSF-run event (various members of the board were willing to tolerate this, which is what led to me quitting the board). There's over a decade's evidence of Eben engaging in abusive behaviour towards members of the free software community, be they staff, colleagues, or just volunteers trying to make the world a better place.

When we build communities that tolerate abuse, we exclude anyone unwilling to tolerate being abused[1]. Nobody in the free software community should be expected to deal with being screamed at or threatened. Nobody should be afraid that they're about to have their sexuality outed by a former boss.

But of course there are some that will defend Eben based on his past contributions. There were people who were willing to defend Hans Reiser on that basis. We need to be clear that what these people are defending is not free software - it's the right for abusers to abuse. And in the long term, that's bad for free software.

[1] "Why don't people just get better at tolerating abuse?" is a terrible response to this. Why don't abusers stop abusing? There's fewer of them, and it should be easier.

comment count unavailable comments

Reconstructing an invalid TPM event log

Posted by Matthew Garrett on September 13, 2023 09:02 PM
TPMs contain a set of registers ("Platform Configuration Registers", or PCRs) that are used to track what a system boots. Each time a new event is measured, a cryptographic hash representing that event is passed to the TPM. The TPM appends that hash to the existing value in the PCR, hashes that, and stores the final result in the PCR. This means that while the PCR's value depends on the precise sequence and value of the hashes presented to it, the PCR value alone doesn't tell you what those individual events were. Different PCRs are used to store different event types, but there are still more events than there are PCRs so we can't avoid this problem by simply storing each event separately.

This is solved using the event log. The event log is simply a record of each event, stored in RAM. The algorithm the TPM uses to calculate the PCR values is known, so we can reproduce that by simply taking the events from the event log and replaying the series of events that were passed to the TPM. If the final calculated value is the same as the value in the PCR, we know that the event log is accurate, which means we now know the value of each individual event and can make an appropriate judgement regarding its security.

If any value in the event log is invalid, we'll calculate a different PCR value and it won't match. This isn't terribly helpful - we know that at least one entry in the event log doesn't match what was passed to the TPM, but we don't know which entry. That means we can't trust any of the events associated with that PCR. If you're trying to make a security determination based on this, that's going to be a problem.

PCR 7 is used to track information about the secure boot policy on the system. It contains measurements of whether or not secure boot is enabled, and which keys are trusted and untrusted on the system in question. This is extremely helpful if you want to verify that a system booted with secure boot enabled before allowing it to do something security or safety critical. Unfortunately, if the device gives you an event log that doesn't replay correctly for PCR 7, you now have no idea what the security state of the system is.

We ran into that this week. Examination of the event log revealed an additional event other than the expected ones - a measurement accompanied by the string "Boot Guard Measured S-CRTM". Boot Guard is an Intel feature where the CPU verifies the firmware is signed with a trusted key before executing it, and measures information about the firmware in the process. Previously I'd only encountered this as a measurement into PCR 0, which is the PCR used to track information about the firmware itself. But it turns out that at least some versions of Boot Guard also measure information about the Boot Guard policy into PCR 7. The argument for this is that this is effectively part of the secure boot policy - having a measurement of the Boot Guard state tells you whether Boot Guard was enabled, which tells you whether or not the CPU verified a signature on your firmware before running it (as I wrote before, I think Boot Guard has user-hostile default behaviour, and that enforcing this on consumer devices is a bad idea).

But there's a problem here. The event log is created by the firmware, and the Boot Guard measurements occur before the firmware is executed. So how do we get a log that represents them? That one's fairly simple - the firmware simply re-calculates the same measurements that Boot Guard did and creates a log entry after the fact[1]. All good.

Except. What if the firmware screws up the calculation and comes up with a different answer? The entry in the event log will now not match what was sent to the TPM, and replaying will fail. And without knowing what the actual value should be, there's no way to fix this, which means there's no way to verify the contents of PCR 7 and determine whether or not secure boot was enabled.

But there's still a fundamental source of truth - the measurement that was sent to the TPM in the first place. Inspired by Henri Nurmi's work on sniffing Bitlocker encryption keys, I asked a coworker if we could sniff the TPM traffic during boot. The TPM on the board in question uses SPI, a simple bus that can have multiple devices connected to it. In this case the system flash and the TPM are on the same SPI bus, which made things easier. The board had a flash header for external reprogramming of the firmware in the event of failure, and all SPI traffic was visible through that header. Attaching a logic analyser to this header made it simple to generate a record of that. The only problem was that the chip select line on the header was attached to the firmware flash chip, not the TPM. This was worked around by simply telling the analysis software that it should invert the sense of the chip select line, ignoring all traffic that was bound for the flash and paying attention to all other traffic. This worked in this case since the only other device on the bus was the TPM, but would cause problems in the event of multiple devices on the bus all communicating.

With the aid of this analyser plugin, I was able to dump all the TPM traffic and could then search for writes that included the "0182" sequence that corresponds to the command code for a measurement event. This gave me a couple of accesses to the locality 3 registers, which was a strong indication that they were coming from the CPU rather than from the firmware. One was for PCR 0, and one was for PCR 7. This corresponded to the two Boot Guard events that we expected from the event log. The hash in the PCR 0 measurement was the same as the hash in the event log, but the hash in the PCR 7 measurement differed from the hash in the event log. Replacing the event log value with the value actually sent to the TPM resulted in the event log now replaying correctly, supporting the hypothesis that the firmware was failing to correctly reconstruct the event.

What now? The simple thing to do is for us to simply hard code this fixup, but longer term we'd like to figure out how to reconstruct the event so we can calculate the expected value ourselves. Unfortunately there doesn't seem to be any public documentation on this. Sigh.

[1] What stops firmware on a system with no Boot Guard faking those measurements? TPMs have a concept of "localities", effectively different privilege levels. When Boot Guard performs its initial measurement into PCR 0, it does so at locality 3, a locality that's only available to the CPU. This causes PCR 0 to be initialised to a different initial value, affecting the final PCR value. The firmware can't access locality 3, so can't perform an equivalent measurement, so can't fake the value.

comment count unavailable comments

Crosswords 0.3.11: Acrostic Panels

Posted by Jonathan Blandford on September 03, 2023 11:25 PM

Long time, no release.

When I last blogged about GNOME Crosswords, I had a design plan to improve the editing API. It’s been a busy summer since then. The crosswords team rewrote large chunks of code to implement and use this new API:

crosswords: 146 files changed, 10545 insertions(+), 4915 deletions(-)
libipuz:  53 files changed, 8224 insertions(+), 961 deletions(-)

There’s now over 38KLOC between the two codebases — this is starting to look like a real application!

Editor and libpanel

The biggest change this cycle is the implementation of the new editing interface. I started changing the code five months ago, but it took a while to land. We now use libpanel from GNOME Builder to manage the information panels. Libpanel has a lot of the functionality I want, though unfortunately I’m fighting its geometry handling.

<figure aria-describedby="caption-attachment-7334" class="wp-caption aligncenter" id="attachment_7334" style="width: 840px">New grid editor panel<figcaption class="wp-caption-text" id="caption-attachment-7334">New grid editor panel</figcaption></figure> <figure aria-describedby="caption-attachment-7337" class="wp-caption aligncenter" id="attachment_7337" style="width: 840px">New clues editor panel<figcaption class="wp-caption-text" id="caption-attachment-7337">New clues editor panel</figcaption></figure>

I really struggled getting the UI design for this to work, and I had a number of regrettable paths along the way. Fortunately, Niko agreed to help out with this, and showed up with some fabulous design work! I’m so much happier with the current approach, and am getting ready to implement more of his designs for the next cycle.

Behind the scenes, implementing this was a challenge. I blogged about those challenges previously, but in a nutshell, mutating a puzzle has a lot of side-effects which can leave you in an invalid state. For example, something simple like adding a block could completely change the numbering of the grid.

Federico and I fixed this by adding a number of functions to enforce heuristics and conventions. I also added ipuz_puzzle_fix_all() which will  get a fully well-formed puzzle regardless of the state. It’s turned out to be a really nice design pattern.

I have now released the Editor as a separate application in flathub. You can download it here.

Acrostics

Another major feature this release is Acrostic Puzzle support. Tanmay worked on that for his GSOC project and did fabulously (details in his blog post.) The end result is gorgeous:

<figure aria-describedby="caption-attachment-7328" class="wp-caption aligncenter" id="attachment_7328" style="width: 840px">Animation of an Acrostic puzzle being solved<figcaption class="wp-caption-text" id="caption-attachment-7328">Animation of an Acrostic puzzle being solved</figcaption></figure>

I had a great time mentoring Tanmay for the summer, and we’re already making plans on how to add acrostic support to the Editor.

GUADEC and GNOME Mobile

GUADEC in Latvia was a blast. Riga was fun, the countryside was surprisingly interesting, and the overall feel of the conference was lovely. As always, it was great to meet so many people, new and old.

<figure aria-describedby="caption-attachment-7346" class="wp-caption aligncenter" id="attachment_7346" style="width: 300px">Stone statue looking like a goomba<figcaption class="wp-caption-text" id="caption-attachment-7346">A wild Goomba in Riga</figcaption></figure>

GNOME Superstar Martin came to my Crosswords BOF in Riga and got it running on an actual mobile device. The end result was.. actually usable! It seems like all the work we did on adaptive sizing paid dividends. There are some rough corners (and Martin filed a number of bugs) but it worked surprisingly well out of the box.

I also got to meet with my other GSOC student Pratham. His work on anagrams will land next release, so we’ll have more to talk about then.

libipuz.org

The final thing mentioning is that I bought the libipuz.org domain for the library. Unfortunately the main ipuz.org site has been down all summer, I put a mirror of the ipuz spec up there so people can read it.

I also managed to reach Roy Leban by phone. He’s the original author of the ipuz spec, and was able to clarify some questions I had when interpreting it. This led me down a deep rabbit hole around the part of the spec regarding clue directions, as it required a substantial rewrite to avoid hardcoding Across/Down directions everywhere.

Finally, Pranjal showed up completely out of nowhere with an MR to add ipuz_puzzle_equal(). This was a tricky function to write, and one I’ve wanted for a really long time. This has been making all our tests so much better. Hero! Pranjal is interested in adding a sudoku loader/saver to libipuz — maybe there will be more in this space in the future.

Thanks

  • Niko, for massive help with the designs for the Editor
  • Tanmay, for Acrostic support
  • Martin, for testing Crosswords on mobile
  • Pranjal, for ipuz_puzzle_equal()
  • Federico, for testing fixes and overall support
  • Rosanna, for continued advice and crossword support
  • Pratham, for initial anagram support
  • Bart, for whatever magic he did to make libipuz.org work
  • The translators, for keeping us multi-lingual

Until next time!

Unix sockets, Cygwin, SSH agents, and sadness

Posted by Matthew Garrett on August 29, 2023 06:57 AM
Work involves supporting Windows (there's a lot of specialised hardware design software that's only supported under Windows, so this isn't really avoidable), but also involves git, so I've been working on extending our support for hardware-backed SSH certificates to Windows and trying to glue that into git. In theory this doesn't sound like a hard problem, but in practice oh good heavens.

Git for Windows is built on top of msys2, which in turn is built on top of Cygwin. This is an astonishing artifact that allows you to build roughly unmodified POSIXish code on top of Windows, despite the terrible impedance mismatches inherent in this. One is that until 2017, Windows had no native support for Unix sockets. That's kind of a big deal for compatibility purposes, so Cygwin worked around it. It's, uh, kind of awful. If you're not a Cygwin/msys app but you want to implement a socket they can communicate with, you need to implement this undocumented protocol yourself. This isn't impossible, but ugh.

But going to all this trouble helps you avoid another problem! The Microsoft version of OpenSSH ships an SSH agent that doesn't use Unix sockets, but uses a named pipe instead. So if you want to communicate between Cygwinish OpenSSH (as is shipped with git for Windows) and the SSH agent shipped with Windows, you need something that bridges between those. The state of the art seems to be to use npiperelay with socat, but if you're already writing something that implements the Cygwin socket protocol you can just use npipe to talk to the shipped ssh-agent and then export your own socket interface.

And, amazingly, this all works? I've managed to hack together an SSH agent (using Go's SSH agent implementation) that can satisfy hardware backed queries itself, but forward things on to the Windows agent for compatibility with other tooling. Now I just need to figure out how to plumb it through to WSL. Sigh.

comment count unavailable comments

Small Caps in Impress

Posted by Caolán McNamara on August 23, 2023 04:01 PM

Writer supports Small Caps, but Impress and drawing shapes in general never fully supported Small Caps. The option was available, and the character dialog provided a preview, but Small Caps was rendered the same as All Caps, as seen here.

 This has lingered for years and it's not easy as a user to try and manually workaround with varying font sizes because underline/overline/strike-through decorations won't link up, as seen in this example:


 but finally for Collabora Hack Week I was able to carve out some time to fix this. So in today's LibreOffice trunk we finally have this implemented as:

In addition a buggy case seen in the preview of double overlines for mixed upper/lower case was also fixed, from:

to:


Also noticed during all of this was the wrong width scaling used for the red wave line underneath incorrect spelling in impress when the text is superscript or subscript so the line didn't stretch over the whole text, as seen here: 

Now corrected as:

and finally added was a missing implementation in the RTF export of shape text to allow the small caps format to be copy and pasted into other applications from impress.

Qt theming in Fedora Workstation

Posted by Jan Grulich on August 22, 2023 03:47 PM

We have been working on and using custom Qt theming in Fedora Workstation for many years now. By custom Qt theming, I’m talking about the QGnomePlatform and Adwaita-qt projects. If you haven’t heard of them, you can read my recent blog post explaining what they are. While these projects are in some ways better than what Qt upstream has to offer, there were also drawbacks/issues and that’s why I decided to make a final decision and discontinue both projects. The issues are explained in the aforementioned blog post, but one of the main drawbacks is that we are in this development alone and not working directly in the upstream makes it less attractive for other contributors. It’s also not used by default anywhere other than Fedora, so it’s not properly tested by other developers working on Qt applications using different distributions. These reasons led me to submit a Fedora 39 feature to remove our custom Qt theming in Fedora Workstation in favor of Qt’s defaults. The only problem is that if we just go with Qt’s default, we would go backwards a bit. This is because upstream Qt does not provide any decent client-side window decorations (problem #1), and the QGtkTheme in Qt5 (QGnomePlatform equivalent) is a bit behind its Qt6 version with many improvements and integration goodies (problem #2) recently made by Axel Spoerl of the Qt Group, whom I met during this year’s KDE Akademy.

Solution to problem #1

QGnomePlatform used to be our solution to this problem, as QGnomePlatform implemented it’s own version of the QWaylandAbstractDecoration plugin. This was a GTK 3-like decoration plugin that used Adwaita-qt for button rendering and QGnomePlatform bits (e.g. GSettings configuration) to get the titlebar layout. Since we are going to remove QGnomePlatform, we needed an alternative. So I started working on the QAdwaitaDecorations project. This is supposed to be a an intermediate step as I would like to have a proper GNOME/Gtk decorations directly in Qt upstream, but since I was in a hurry to get everything done in time for Fedora 39, we have this for now. QAdwaitaDecorations plugin is based on the decorations we have in QGnomePlatform, but there is no dependency on GTK or Glib (e.g. GSettings) or on Adwaita-qt. We use xdg-desktop-portal to get the titlebar layout and do our own drawing instead. This decoration plugin should also have now a GTK 4-like style so the buttons and colors of the decorations are different.

Below is a screenshot of Wireshark (Qt6) using QAdwaitaDecorations plugin + QGtkTheme + Fusion:

<figure class="wp-block-image size-large"></figure>

Solution to problem #2

Since Qt5 is no longer actively developed, the only possible solution is to backport all QGtkTheme improvements from Qt6, so I did that + modified some of those changes to avoid breaking binary compatibility. This results in about ~15 related backports to Qt5 so far, and it seems to work pretty well. I also made sure that Fedora 38 and older will still use QGnomePlatform by default, so we don’t change the behavior for existing users. Also, a small change to our QtWayland package was needed to make it use the new decoration plugin by default.

Future plans

As mentioned, I would really like to have everything directly in Qt upstream (talking about QAdwaitaDecorations). That way we get other contributions and thus fixes/improvements for free and a lot more users. Another thing is that QGnomePlatform supports things that are not yet supported/implemented in QGtkTheme, like support for xdg-desktop-portal instead of just relying on GSettings. Not to mention that GTK 4 has been around for a while, and both QGnomePlatform and QGtkTheme are still GTK 3 based. I will definitely try to make some of these things happen for Fedora 40, but knowing myself, it’s better not to make any promises, as things usually don’t go according to plan.

GNOME 45 Core Apps Update

Posted by Michael Catanzaro on August 17, 2023 03:57 PM

It’s been a few months since I last reviewed the state of GNOME core apps. For GNOME 45, we have implemented the changes proposed in the “Imminent Core App Changes” section of that blog post:

  • Loupe enters core as GNOME’s new image viewer app, developed by Christopher Davis and Sophie Herold. Loupe will be branded as Image Viewer and replaces Eye of GNOME, which will no longer use the Image Viewer branding. Eye of GNOME will continue to be maintained by Felix Riemann, and contributions are still welcome there.
  • Snapshot enters core as GNOME’s new camera app, developed by Maximiliano Sandoval and Jamie Murphy. Snapshot will be branded as Camera and replaces Cheese. Cheese will continue to be maintained by David King, and contributions are still welcome there.
  • GNOME Photos has been removed from core without replacement. This application could have been retained if more developers were interested in it, but we have made the decision to remove it due to lack of volunteers interested in maintaining it. Photos will likely be archived eventually, unless a new maintainer volunteers to save it.

GNOME 45 beta will be released imminently with the above changes. Testing the release and reporting bugs is much appreciated.

We are also looking for volunteers interested in helping implement future core app changes. Specifically, improvements are required for Music to remain in core, and improvements are required for Geary to enter core. We’re also not quite sure what to do with Contacts. If you’re interested in any of these projects, consider getting involved.

New responsibilities

Posted by Bastien Nocera on August 14, 2023 09:31 AM

As part of the same process outlined in Matthias Clasen's "LibreOffice packages" email, my management chain has made the decision to stop all upstream and downstream work on desktop Bluetooth, multimedia applications (namely totem, rhythmbox and sound-juicer) and libfprint/fprintd. The rest of my upstream and downstream work will be reassigned depending on Red Hat's own priorities (see below), as I am transferred to another team that deals with one of a list of Red Hat’s priority projects.

I'm very disappointed, because those particular projects were already starved for resources: I spent less than 10% of my work time on them in the past year, with other projects and responsibilities taking most of my time.

This means that, in the medium-term at least, all those GNOME projects will go without a maintainer, reviewer, or triager:
- gnome-bluetooth (including Settings panel and gnome-shell integration)
- totem, totem-pl-parser, gom
- libgnome-volume-control
- libgudev
- geocode-glib
- gvfs AFC backend

Those freedesktop projects will be archived until further notice:
- power-profiles-daemon
- switcheroo-control
- iio-sensor-proxy
- low-memory-monitor

I will not be available for reviewing libfprint/fprintd, upower, grilo/grilo-plugins, gnome-desktop thumbnailer sandboxing patches, or any work related to XDG specifications.

Kernel work, reviews and maintenance, including recent work on SteelSeries headset and Logitech devices kernel drivers, USB revoke for Flatpak Portal support, or core USB is suspended until further notice.

All my Fedora packages were orphaned about a month and a half ago, it's likely that there are still some that are orphaned, if there are takers. RHEL packages were unassigned about 3 weeks ago, they've been reassigned since then, so I cannot point to the new maintainer(s).

If you are a partner, or a customer, I would recommend that you get in touch with your Red Hat contacts to figure out what the plan is going forward for the projects you might be involved with.

If you are a colleague that will take on all or part of the 90% of the work that's not being stopped, or a community member that was relying on my work to further advance your own projects, get in touch, I'll do my best to accommodate your queries, time permitting.

I'll try to make sure to update this post, or create a new one if and when any of the above changes.

Updating Fedora the unsupported way

Posted by Matthew Garrett on August 08, 2023 05:54 AM
I dug out a computer running Fedora 28, which was released 2018-04-01 - over 5 years ago. Backing up the data and re-installing seemed tedious, but the current version of Fedora is 38, and while Fedora supports updates from N to N+2 that was still going to be 5 separate upgrades. That seemed tedious, so I figured I'd just try to do an update from 28 directly to 38. This is, obviously, extremely unsupported, but what could possibly go wrong?

Running sudo dnf system-upgrade download --releasever=38 didn't successfully resolve dependencies, but sudo dnf system-upgrade download --releasever=38 --allowerasing passed and dnf started downloading 6GB of packages. And then promptly failed, since I didn't have any of the relevant signing keys. So I downloaded the fedora-gpg-keys package from F38 by hand and tried to install it, and got a signature hdr data: BAD, no. of bytes(88084) out of range error. It turns out that rpm doesn't handle cases where the signature header is larger than a few K, and RPMs from modern versions of Fedora. The obvious fix would be to install a newer version of rpm, but that wouldn't be easy without upgrading the rest of the system as well - or, alternatively, downloading a bunch of build depends and building it. Given that I'm already doing all of this in the worst way possible, let's do something different.

The relevant code in the hdrblobRead function of rpm's lib/header.c is:

int32_t il_max = HEADER_TAGS_MAX;
int32_t dl_max = HEADER_DATA_MAX;

if (regionTag == RPMTAG_HEADERSIGNATURES) {
il_max = 32;
dl_max = 8192;
}

which indicates that if the header in question is RPMTAG_HEADERSIGNATURES, it sets more restrictive limits on the size (no, I don't know why). So I installed rpm-libs-debuginfo, ran gdb against librpm.so.8, loaded the symbol file, and then did disassemble hdrblobRead. The relevant chunk ends up being:

0x000000000001bc81 <+81>: cmp $0x3e,%ebx
0x000000000001bc84 <+84>: mov $0xfffffff,%ecx
0x000000000001bc89 <+89>: mov $0x2000,%eax
0x000000000001bc8e <+94>: mov %r12,%rdi
0x000000000001bc91 <+97>: cmovne %ecx,%eax

which is basically "If ebx is not 0x3e, set eax to 0xffffffff - otherwise, set it to 0x2000". RPMTAG_HEADERSIGNATURES is 62, which is 0x3e, so I just opened librpm.so.8 in hexedit, went to byte 0x1bc81, and replaced 0x3e with 0xfe (an arbitrary invalid value). This has the effect of skipping the if (regionTag == RPMTAG_HEADERSIGNATURES) code and so using the default limits even if the header section in question is the signatures. And with that one byte modification, rpm from F28 would suddenly install the fedora-gpg-keys package from F38. Success!

But short-lived. dnf now believed packages had valid signatures, but sadly there were still issues. A bunch of packages in F38 had files that conflicted with packages in F28. These were largely Python 3 packages that conflicted with Python 2 packages from F28 - jumping this many releases meant that a bunch of explicit replaces and the like no longer existed. The easiest way to solve this was simply to uninstall python 2 before upgrading, and avoiding the entire transition. Another issue was that some data files had moved from libxcrypt-common to libxcrypt, and removing libxcrypt-common would remove libxcrypt and a bunch of important things that depended on it (like, for instance, systemd). So I built a fake empty package that provided libxcrypt-common and removed the actual package. Surely everything would work now?

Ha no. The final obstacle was that several packages depended on rpmlib(CaretInVersions), and building another fake package that provided that didn't work. I shouted into the void and Bill Nottingham answered - rpmlib dependencies are synthesised by rpm itself, indicating that it has the ability to handle extensions that specific packages are making use of. This made things harder, since the list is hard-coded in the binary. But since I'm already committing crimes against humanity with a hex editor, why not go further? Back to editing librpm.so.8 and finding the list of rpmlib() dependencies it provides. There were a bunch, but I couldn't really extend the list. What I could do is overwrite existing entries. I tried this a few times but (unsurprisingly) broke other things since packages depended on the feature I'd overwritten. Finally, I rewrote rpmlib(ExplicitPackageProvide) to rpmlib(CaretInVersions) (adding an extra '\0' at the end of it to deal with it being shorter than the original string) and apparently nothing I wanted to install depended on rpmlib(ExplicitPackageProvide) because dnf finished its transaction checks and prompted me to reboot to perform the update. So, I did.

And about an hour later, it rebooted and gave me a whole bunch of errors due to the fact that dbus never got started. A bit of digging revealed that I had no /etc/systemd/system/dbus.service, a symlink that was presumably introduced at some point between F28 and F38 but which didn't get automatically added in my case because well who knows. That was literally the only thing I needed to fix up after the upgrade, and on the next reboot I was presented with a gdm prompt and had a fully functional F38 machine.

You should not do this. I should not do this. This was a terrible idea. Any situation where you're binary patching your package manager to get it to let you do something is obviously a bad situation. And with hindsight performing 5 independent upgrades might have been faster. But that would have just involved me typing the same thing 5 times, while this way I learned something. And what I learned is "Terrible ideas sometimes work and so you should definitely act upon them rather than doing the sensible thing", so like I said, you should not do this in case you learn the same lesson.

comment count unavailable comments

Introducing Passim

Posted by Richard Hughes on July 28, 2023 08:57 AM

tl;dr: Passim is a local caching server that uses mDNS to advertise files by their SHA-256 hash. Named after the Latin word for “here, there and everywhere” it might save a lot of people a lot of money.

Introduction

Much of the software running on your computer that connects to other systems over the Internet needs to periodically download metadata or other information needed to perform other requests.

As part of running the passim/LVFS projects I’ve seen how download this “small” file once per 24h turns into tens of millions of requests per day — which is about ~10TB of bandwidth! Everybody downloads the same file from a CDN, and although a CDN is not super-expensive, it’s certainly not free. Everybody on your local network (perhaps dozens of users in an office) has to download the same 1MB blob of metadata from a CDN over a perhaps-non-free shared internet link.

What if we could download the file from the Internet CDN on one machine, and the next machine on the local network that needs it instead downloads it from the first machine? We could put a limit on the number of times it can be shared, and the maximum age so that we don’t store yesterdays metadata forever, and so that we don’t turn a ThinkPad X220 into a machine distributing 1Gb/s to every other machine in the office. We could cut the CDN traffic by at least one order of magnitude, but possibly much more. This is better for the person paying the cloud bill, the person paying for the internet connection, and the planet as a whole.

This is what Passim might be. You add automatically or manually add files to the daemon which stores them in /var/lib/passim/data with xattrs set on each file for the max-age and share-limit. When the file has been shared more than the share limit number of times, or is older than the max age it is deleted and not advertised to other clients.

The daemon then advertises the availability of the file as a mDNS service subtype and provides a tiny single-threaded HTTP v1.1 server that supplies the file over HTTPS using a self-signed certificate.

The file is sent when requested from a URL like https://192.168.1.1:27500/filename.xml.gz?sha256=the_hash_value – any file requested without the checksum will not be supplied. Although this is a chicken-and-egg problem where you don’t know the payload checksum until you’ve checked the remote server, this is solved using a tiny <100 byte request to the CDN for the payload checksum (or a .jcat file) and then the multi-megabyte (or multi-gigabyte!) payload can be found using mDNS. Using a Jcat file also means you know the PKCS#7/GPG signature of the thing you’re trying to request. Using a Metalink request would work as well I think.

Sharing Considerations

Here we’ve assuming your local network (aka LAN) is a nice and friendly place, without evil people trying to overwhelm your system or feed you fake files. Although we request files by their hash (and thus can detect tampering) and we hopefully also use a signature, it still uses resources to send a file over the network.

We’ll assume that any network with working mDNS (as implemented in Avahi) is good enough to get metadata from other peers. If Avahi is not running, or mDNS is turned off on the firewall then no files will be shared.

The cached index is available to localhost without any kind of authentication as a webpage on https://localhost:27500/.

Only processes running as UID 0 (a.k.a. root) can publish content to Passim. Before sharing everything, the effects of sharing can be subtle; if you download a security update for a Lenovo P1 Gen 3 laptop and share it with other laptops on your LAN — it also tells any attacker [with a list of all possible firmware updates] on your local network your laptop model and also that you’re running a system firmware that isn’t currently patched against the latest firmware bug.

My recommendation here is only to advertise files that are common to all machines. For instance:

  • AdBlocker metadata
  • Firmware update metadata
  • Remote metadata for update frameworks, e.g. apt-get/dnf etc.

Implementation Considerations

Any client MUST calculate the checksum of the supplied file and verify that it matches. There is no authentication or signing verification done so this step is non-optional. A malicious server could advertise the hash of firmware.xml.gz but actually supply evil-payload.exe — and you do not want that.

Comparisons

The obvious comparison to make is IPFS. I’ll try to make this as fair as possible, although I’m obviously somewhat biased.

IPFS

  • Existing project that’s existed for many years tested by many people
  • Allows sharing with other users not on your local network
  • Not packaged in any distributions and not trivial to install correctly
  • Requires a significant time to find resources
  • Does not prioritize local clients over remote clients
  • Requires a internet-to-IPFS “gateway” which cost me a lot of $$$ for a large number of files

Passim

  • New project that’s not even finished
  • Only allowed sharing with computers on your local network
  • Returns results within 2s

One concern we had specifically with IPFS for firmware were ITAR/EAR legal considerations. e.g. we couldn’t share firmware containing strong encryption with users in some countries — which is actually most of the firmware the LVFS distributes. From an ITAR/EAR point of view Passim would be compliant (as it only shares locally, presumably in the same country) and IPFS certainly is not.

There’s a longer README in the git repo. There’s also a test patch that wires up fwupd with libpassim although it’s not ready for merging. For instance, I think it’s perfectly safe to share metadata but not firmware or distro package payloads – but for some people downloading payloads on a cellular link might be exactly what they want – so it’ll be configurable. For reference Windows Update also shares content (not just metadata) so maybe I’m worrying about nothing, and doing a distro upgrade from the computer next to them is exactly what people need. Small steps perhaps.

Comments welcome.

EDIT 2023-08-22: Made changes to reflect that we went from HTTP 1.0 to HTTP 1.1 with TLS.

Urgent VCL Mission

Posted by Caolán McNamara on July 25, 2023 07:37 PM

To mark the occasion of Noel's merge of Convert internal vcl bitmap formats transparency->alpha to align LibreOffice's internal concept of color opacity with everything else's.

Urgent VCL Mission

This work, "Urgent VCL Mission", is adapted from "Urgent Mission" by Randall Munroe, used under CC BY-NC 2.5, and licensed under same by me.

Amazingly there exists, not one, but two fonts based on Randall's handwriting.

Onward and upward to the end of split-Alpha.

Roots of Trust are difficult

Posted by Matthew Garrett on July 11, 2023 07:58 AM
The phrase "Root of Trust" turns up at various points in discussions about verified boot and measured boot, and to a first approximation nobody is able to give you a coherent explanation of what it means[1]. The Trusted Computing Group has a fairly wordy definition, but (a) it's a lot of words and (b) I don't like it, so instead I'm going to start by defining a root of trust as "A thing that has to be trustworthy for anything else on your computer to be trustworthy".

(An aside: when I say "trustworthy", it is very easy to interpret this in a cynical manner and assume that "trust" means "trusted by someone I do not necessarily trust to act in my best interest". I want to be absolutely clear that when I say "trustworthy" I mean "trusted by the owner of the computer", and that as far as I'm concerned selling devices that do not allow the owner to define what's trusted is an extremely bad thing in the general case)

Let's take an example. In verified boot, a cryptographic signature of a component is verified before it's allowed to boot. A straightforward implementation of a verified boot implementation has the firmware verify the signature on the bootloader or kernel before executing it. In this scenario, the firmware is the root of trust - it's the first thing that makes a determination about whether something should be allowed to run or not[2]. As long as the firmware behaves correctly, and as long as there aren't any vulnerabilities in our boot chain, we know that we booted an OS that was signed with a key we trust.

But what guarantees that the firmware behaves correctly? What if someone replaces our firmware with firmware that trusts different keys, or hot-patches the OS as it's booting it? We can't just ask the firmware whether it's trustworthy - trustworthy firmware will say yes, but the thing about malicious firmware is that it can just lie to us (either directly, or by modifying the OS components it boots to lie instead). This is probably not sufficiently trustworthy!

Ok, so let's have the firmware be verified before it's executed. On Intel this is "Boot Guard", on AMD this is "Platform Secure Boot", everywhere else it's just "Secure Boot". Code on the CPU (either in ROM or signed with a key controlled by the CPU vendor) verifies the firmware[3] before executing it. Now the CPU itself is the root of trust, and, well, that seems reasonable - we have to place trust in the CPU, otherwise we can't actually do computing. We can now say with a reasonable degree of confidence (again, in the absence of vulnerabilities) that we booted an OS that we trusted. Hurrah!

Except. How do we know that the CPU actually did that verification? CPUs are generally manufactured without verification being enabled - different system vendors use different signing keys, so those keys can't be installed in the CPU at CPU manufacture time, and vendors need to do code development without signing everything so you can't require that keys be installed before a CPU will work. So, out of the box, a new CPU will boot anything without doing verification[4], and development units will frequently have no verification.

As a device owner, how do you tell whether or not your CPU has this verification enabled? Well, you could ask the CPU, but if you're doing that on a device that booted a compromised OS then maybe it's just hotpatching your OS so when you do that you just get RET_TRUST_ME_BRO even if the CPU is desperately waving its arms around trying to warn you it's a trap. This is, unfortunately, a problem that's basically impossible to solve using verified boot alone - if any component in the chain fails to enforce verification, the trust you're placing in the chain is misplaced and you are going to have a bad day.

So how do we solve it? The answer is that we can't simply ask the OS, we need a mechanism to query the root of trust itself. There's a few ways to do that, but fundamentally they depend on the ability of the root of trust to provide proof of what happened. This requires that the root of trust be able to sign (or cause to be signed) an "attestation" of the system state, a cryptographically verifiable representation of the security-critical configuration and code. The most common form of this is called "measured boot" or "trusted boot", and involves generating a "measurement" of each boot component or configuration (generally a cryptographic hash of it), and storing that measurement somewhere. The important thing is that it must not be possible for the running OS (or any pre-OS component) to arbitrarily modify these measurements, since otherwise a compromised environment could simply go back and rewrite history. One frequently used solution to this is to segregate the storage of the measurements (and the attestation of them) into a separate hardware component that can't be directly manipulated by the OS, such as a Trusted Platform Module. Each part of the boot chain measures relevant security configuration and the next component before executing it and sends that measurement to the TPM, and later the TPM can provide a signed attestation of the measurements it was given. So, an SoC that implements verified boot should create a measurement telling us whether verification is enabled - and, critically, should also create a measurement if it isn't. This is important because failing to measure the disabled state leaves us with the same problem as before; someone can replace the mutable firmware code with code that creates a fake measurement asserting that verified boot was enabled, and if we trust that we're going to have a bad time.

(Of course, simply measuring the fact that verified boot was enabled isn't enough - what if someone replaces the CPU with one that has verified boot enabled, but trusts keys under their control? We also need to measure the keys that were used in order to ensure that the device trusted only the keys we expected, otherwise again we're going to have a bad time)

So, an effective root of trust needs to:

1) Create a measurement of its verified boot policy before running any mutable code
2) Include the trusted signing key in that measurement
3) Actually perform that verification before executing any mutable code

and from then on we're in the hands of the verified code actually being trustworthy, and it's probably written in C so that's almost certainly false, but let's not try to solve every problem today.

Does anything do this today? As far as I can tell, Intel's Boot Guard implementation does. Based on publicly available documentation I can't find any evidence that AMD's Platform Secure Boot does (it does the verification, but it doesn't measure the policy beforehand, so it seems spoofable), but I could be wrong there. I haven't found any general purpose non-x86 parts that do, but this is in the realm of things that SoC vendors seem to believe is some sort of value-add that can only be documented under NDAs, so please do prove me wrong. And then there are add-on solutions like Titan, where we delegate the initial measurement and validation to a separate piece of hardware that measures the firmware as the CPU reads it, rather than requiring that the CPU do it.

But, overall, the situation isn't great. On many platforms there's simply no way to prove that you booted the code you expected to boot. People have designed elaborate security implementations that can be bypassed in a number of ways.

[1] In this respect it is extremely similar to "Zero Trust"
[2] This is a bit of an oversimplification - once we get into dynamic roots of trust like Intel's TXT this story gets more complicated, but let's stick to the simple case today
[3] I'm kind of using "firmware" in an x86ish manner here, so for embedded devices just think of "firmware" as "the first code executed out of flash and signed by someone other than the SoC vendor"
[4] In the Intel case this isn't strictly true, since the keys are stored in the motherboard chipset rather than the CPU, and so taking a board with Boot Guard enabled and swapping out the CPU won't disable Boot Guard because the CPU reads the configuration from the chipset. But many mobile Intel parts have the chipset in the same package as the CPU, so in theory swapping out that entire package would disable Boot Guard. I am not good enough at soldering to demonstrate that.

comment count unavailable comments

Firefox on Linux update

Posted by Martin Stransky on July 07, 2023 02:36 PM
<figure class="aligncenter size-large is-resized"></figure>

It may look like Firefox development is stalled and frozen but nothing could be further from the truth. 22 300 patches landed in Firefox Mercurial repository since new year and we keep hacking 😀

Let’s look what’s new in Firefox on Linux and what could be interesting for Fedora users.

  • Large WebRTC update was merged with DMABuf screen sharing and xdg camera portal support, both developed by Jan Grulich. It should speed up screen sharing via. portals (used by Wayland, Snap and Flatpak) and enable new webcams on Linux.
  • VA-API was enabled for Intel GPU on Firefox 115.0. AMD is delayed due to various driver bugs but you can force enable it by setting media.hardware-video-decoding.force-enabled at about:config. VA-API state is printed at about:support page now.
  • David Turner implemented video hardware decoding support for H.264 on a Raspberry Pi 4 via V4L2-M2M and there’s ongoing work to enable VP8/9 too. If you have an arm with HW video decoding, try it in next Firefox 116.0 (or recent Beta).
  • Emilio Cobos Álvarez – fixed many Linux/Gtk style bugs like wrong colors, menu round corner rendering, titlebar rendering and also lots of Wayland bugs. He unified Firefox main window rendering on all desktops (we use client decorations – CSD – now) and have fixed regressions on desktops like XFCE/TWM.
    The style bugs are especially tricky ones as we removed X11/Wayland display connection from web content processes (due to security) and we can’t use native styles directly then.
  • Firefox 116 allows you to create Wayland or X11 exclusive builds. It means you can build Firefox without X11 (or Wayland) dependency which saves space and resources. Also you don’t need anymore Wayland build to run DMABuf and VA-API backends.
  • Mozilla started to implement Firefox tests on Wayland (along with testsuite migration to Ubuntu 22.04). It allows to ship Wayland backend by default with Mozilla binaries and Ubuntu/Snap and reduces regressions during development.
  • Fedora ships Firefox 115.0 with PGO/LTO optimization again (built by GCC) after long period of breakage. Mozilla/clang builds are still a bit faster than GCC ones (Mozilla binaries give me ~ 200 points in Speedometer2.1 while Fedora ones ~ 190 points) but GCC binaries are slightly smaller (130 MB vs. 150 MB).

    But it’s still much better than non-PGO ones, Fedora Firefox 114.0 did only ~140 points in the benchmark.

    Just out of curiosity I tested Chrome browsers too. Fedora Chromium gives me ~140 points and official Google Chrome reached ~ 200 points. Looks like Fedora Chromium need some tweaks too :D.

No one fights alone. A guide to your first Firefox patch on Linux.

Posted by Martin Stransky on July 04, 2023 12:13 PM
<figure class="wp-block-image size-large"></figure>

Have you ever hit an annoying Firefox bug and want to fix it? You’re right here. Firefox is a great open source project with many volunteer contributors, large community and any patches and help is very welcome.

This post aims to help you with your first patch to Firefox and become an active Firefox contributor.

Get and build Firefox sources

First of all you need to get Firefox sources and build themon Linux. Ubuntu is a natural choice of many but I use Fedora as it’s Wayland leading distro and I’m also Red Hat employee. It doesn’t matter much because Firefox installs its own build environment.

Firefox development happens on nightly which is the latest up-to-date sources and all changes goes there. For start you need mercurial package installed.

Download latest Firefox sources to src directory by mercurial:

hg clone http://hg.mozilla.org/mozilla-central/ src

A finished Firefox build may need ~ 40GB of free space so don’t be surprised. It’s quite a beast 😀

Create .mozconfig file in src directory to set build conditions. Correct values are selected automatically so you don’t need to care much about it now. You can use mine generic one for optimized builds:

. $topsrcdir/browser/config/mozconfig

mk_add_options BUILD_OFFICIAL=1
mk_add_options MOZILLA_OFFICIAL=1
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/objdir
mk_add_options AUTOCLOBBER=1

ac_add_options --disable-debug
ac_add_options --enable-optimize

And one for debug / non-optimized builds. That’s suitable for debugging:

. $topsrcdir/browser/config/mozconfig

mk_add_options BUILD_OFFICIAL=1
mk_add_options MOZILLA_OFFICIAL=1
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/objdir
mk_add_options AUTOCLOBBER=1

ac_add_options --enable-debug
ac_add_options --disable-optimize

Then you go to create a build environment and download build tools for it. Run bootstrap and select Firefox for Desktop there:

./mach bootstrap

That downloads all requested tools from Mozilla and put them to ~/.mozbuild directory. It also configures Mercurial to work with Firefox sources. If you decide to quit or you want free disk space, just delete .mozbuild and src dirs.

You should be ready to build Firefox from sources now:

./mach build

When build is finished a new firefox build is in src/objdir/dist/bin directory and you can run it there.

Firefox hacking

Mozilla provides powerful tool Searchfox where Firefox sources may be browsed and investigated. From Linux/Gtk perspective some interesting code parts are:

  • widget/gtk – there’s Firefox/Linux/Gtk integration point here named toolkit and almost all Linux related code is located here. Corresponding Bugzilla component is Core :: Widget/Gtk where most of Linux bugs are filed.
  • widget/gtk/nsWindow.cpp – every Firefox window (main browser window, popups, tooltips etc.) have its nsWindow object. It creates GtkWindow for browser, handles Gtk signals and so on. Look at nsWindow::Create() for instance.
  • widget/gtk/DMABufSurface.cpp – implements dmabuf based surfaces. It’s used for various Linux/HW operations like video playback and WebGL rendering.
  • toolkit/xre – Firefox startup code. Set’s up GdkDisplay, wayland/x11 connections and remote startup code.
  • dom/media/platforms/ffmpeg – ffmpeg video playback implementation, VA-API related code.
  • gfx/thebes/gfxPlatformGtk.cpp – Set’s up gfx environment on LinuxGtk. Initilizes and configures WebRender, VA-API, DMABuf and related features and enable/disable them according to actual hardware.
  • gfx/gl – EGL/GLX related code, contains OpenGL setup for Linux (and other systems).

You can use there hints as starting points for further investigation.

Mercurial mini-howto

Mercurial is a VCS used by Mozilla. I’ve never been good at it and both Git and Mercurial are still a mystery to me. But don’t worry, you need only a small subset of Mercurial commands to hack Firefox and submit your patches.

Show all changes in your sources:

hg diff

Revert all your changes:

hg revert --all

Or revert specified files:

hg revert file1 file2 ...

Commit your changes:

hg commit -m "commit message"

This is the most important Mercurial command you use. It saves your changes locally and you can submit them to Mozilla for review later.

Mercurial commits all changes by default but you can ask to commit selected files only:

hg commit -m "commit message" file1 file2 ...

After commit the change gets a revision number. You can print that by Mozilla hg wip helper:

hg wip

And you show complete patch (message and changes) by:

hg log -v --patch -r revision_number

Now let’s look at commit message. It’s important part of the commit and has following format for Firefox project:

Bug XXXXX - What changed r?reviewer_name

Look at Bug 1818517 and D172914 patch. They’re a good example how the commit message looks like:

Bug 1818517 - Ignore crossing-due-to-grab-start r?stransky

Bug field tells Mozilla tools where to attach your patch while r? field selects who is going to check and ack the change. You always need to specify these two entries.

Without bugzilla number and reviewer name the patch can’t be uploaded or will hang unnoticed there.

The patch reviewer is a key person for your contribution. He/she may direct you how to improve the patch, ack it and eventually check it into project. Sorrect reviewer selection may be tricky and could discourage new contributors.

If you create a new patch for Linux you can always select me (r?stransky). Reviewers for other parts of Firefox are listed here.

Submit your patch for review

So we have a patch committed to your local repository and want to upload it to Mozilla for review. First you need to create account in Bugzilla (bug tracking system) and Phabricator (patch tracking) and install moz-phab tool to submit your patches.

Let’s say we have Bugzilla/Phabricator installed and moz-phab is working. It’s time to submit your first patch. Make sure you enabled Mozilla mercurial extensions in mach bootstrap command above. If not please run it again and enable it.

hg wip command lists patches/revisions in nice format and you can select what to submit:

<figure class="wp-block-image size-large"></figure>

To submit our patch for Bug 1818517 use moz-phab tool with patch revision numbers:

moz-phab submit d67996f50ead d67996f50ead

and you have your first patch uploaded.

More useful Mercurial commands

Update your sources to the latest ones:

hg pull
hg up central

Switch sources to any revision/branch from history. Use hg wip to get revision number and use hg up to switch:

<figure class="wp-block-image size-large"></figure>
hg up d67996f50ead

Rebase tool allows you to re-arrange patches across branches, make patch stream linear or rebase your patches to latest trunk. It rebases single patch or more changesets.

In case of collision a merge tool is called. I personally use Meld tool as vimdiff is difficult to use for me. Use hg rebase –help to get complete options list. I usually use:

hg rebase -s source_revision -d dest_revision -k

histedit is a great tool for commit management:

hg histedit

It allows you to re-arrange, merge and drop individual patches and change commit messages or show individual patches.

Where to start?

Wanna help with the project and fix bugs reported by other people? Cool! Bugzilla is your friend there are trackers related to Linux/Gtk here:

And that’s all! Don’t hesitate to join Mozilla Matrix chat, have fun and become member of Firefox developer community. If you’re blocked on any step listed here, drop me a note or ask on Mozilla chat.

ASG! 2023 CfP Closes Soon

Posted by Lennart Poettering on July 03, 2023 10:00 PM

<large>The All Systems Go! 2023 Call for Participation Closes in Three Days!</large>

The Call for Participation (CFP) for All Systems Go! 2023 will close in three days, on 7th of July! We’d like to invite you to submit your proposals for consideration to the CFP submission site quickly!

ASG image

All topics relevant to foundational open-source Linux technologies are welcome. In particular, however, we are looking for proposals including, but not limited to, the following topics:

The CFP will close on July 7th, 2023. A response will be sent to all submitters on or before July 14th, 2023. The conference takes place in 🗺️ Berlin, Germany 🇩🇪 on Sept. 13-14th.

All Systems Go! 2023 is all about foundational open-source Linux technologies. We are primarily looking for deeply technical talks by and for developers, engineers and other technical roles.

We focus on the userspace side of things, so while kernel topics are welcome they must have clear, direct relevance to userspace. The following is a non-comprehensive list of topics encouraged for 2023 submissions:

  • Image-Based Linux 🖼️
  • Secure and Measured Boot 📏
  • TPM-Based Local/Remote Attestation, Encryption, Authentication 🔑
  • Low-level container executors and infrastructure ⚙️.
  • IoT, embedded and server Linux infrastructure
  • Reproducible builds 🔧
  • Package management, OS, container 📦, image delivery and updating
  • Building Linux devices and applications 🏗️
  • Low-level desktop 💻 technologies
  • Networking 🌐
  • System and service management 🚀
  • Tracing and performance measuring 🔍
  • IPC and RPC systems 🦜
  • Security 🔐 and Sandboxing 🏖️

For more information please visit our conference website!

gitlab.freedesktop.org now has a bugbot for automatic issue/merge request processing

Posted by Peter Hutterer on July 03, 2023 06:34 AM

As of today, gitlab.freedesktop.org provides easy hooks to invoke the gitlab-triage tool for your project. gitlab-triage allows for the automation of recurring tasks, for example something like

If the label FOO is set, close the issue and add a comment containing ".... blah ..."
Many project have recurring tasks like this, e.g. the wayland project gets a lot of issues that are compositor (not protocol) issues. Being able to just set a label and have things happen is much more convenient than having to type out the same explanations over and over again.

The goal for us was to provide automated handling for these with as little friction as possible. And of course each project must be able to decide what actions should be taken. Usually gitlab-triage is run as part of project-specific scheduled pipelines but since we already have webhook-based spam-fighting tools we figured we could make this even easier.

So, bugbot was born. Any project registered with bugbot can use labels prefixed with "bugbot::" to have gitlab-triage invoked against the project's policies file. These labels thus serve as mini-commands for bugbot, though each project decides what happens for any particular label. bugbot effectively works like this:

sleep 30
for label in {issue|merge_request}.current_labels:
  if label.startswith("bugbot::"):
     wget https://gitlab.freedesktop.org/foo/bar/-/raw/{main|master}/.triage-policies.yml
     run-gitlab-triage --as-user @bugbot --use-file .triage-policies.yml
     break
And this is triggered on every issue/merge request update for any registered project which means that all you need to do is set the label and you're done. The things of note here:
  • bugbot delays by 30 seconds, giving you time to unset an accidentally applied label before it takes effect
  • bugbot doesn't care about the label beyond the (hard-coded) "bugbot::" prefix
  • bugbot always runs your project's triage policies, from your main or master branch (whichever succeeds first)
  • The actions are performed as the bugbot user, not your user
The full documentation of what you can do in a policies file is available at the gitlab-triage documentation but let's look at a simple example that shouldn't even need explanation:
resource_rules:
  issues:
    rules:
      - name: convert bugbot label to other label
        conditions:
          labels:
            - "bugbot::foo"
        actions:
          labels:
            - "foo"
          remove_labels:
            - "bugbot::foo"
          comment: |
            Nice label you have there. Would be a shame 
            if someone removed it
          status: "close"
  merge_requests:
    rules:
      []
And the effect of this file can be seen in this issue here.

Registering a project

Bugbot is part of the damspam project and registering a project can be done with a single command. Note: this can only be done by someone with the Maintainer role or above.

Create a personal access token with API access and save the token value as $XDG_CONFIG_HOME/bugbot/user.token Then run the following commands with your project's full path (e.g. mesa/mesa, pipewire/wireplumber, xorg/lib/libX11):

$ pip install git+https://gitlab.freedesktop.org/freedesktop/damspam
$ bugbot request-webhook foo/bar
After this you may remove the token file and the package
$ pip uninstall damspam
$ rm $XDG_CONFIG_HOME/bugbot/user.token
The bugbot command will file an issue in the freedesktop/fdo-bots repository. This issue will be automatically processed and should be done by the time you finish the above commands, see this issue for an example. Note: the issue processing requires a git push to an internal repo - if you script this for multiple repos please put a sleep(30) in to avoid conflicts.

Adding triage policies

Once registered, the .triage-policies.yml file must be added to the root directory of your project. What bugbot commands you want to respond to (and the actions to take) is up to you, though there are two things of note: you should always remove the bugbot label you are reacting to to avoid duplicate processing and gitlab-triage does not create new labels. So any label in your actions must be manually created in the project first. Beyond that - the sky's your limit.

Remember you can test your policies file with

 $ gitlab-triage --dry-run --token $GITLAB_TOKEN \
   --source-id foo/bar  --resource-reference 1234

As usual, many thanks to Benjamin Tissoires for reviews and the magic of integrating this into infrastructure.

snegg - Python bindings for libei

Posted by Peter Hutterer on June 06, 2023 09:36 AM

After what was basically a flurry of typing, the snegg Python bindings for libei are now available. This is a Python package that provides bindings to the libei/libeis/liboeffis C libraries with a little bit of API improvement to make it not completely terrible. The main goal of these bindings (at least for now) is to provide some quick and easy way to experiment with what could possibly be done using libei - both server-side and client-side. [1] The examples directory has a minimal EI client (with portal support via liboeffis) and a minimal EIS implementation. The bindings are still quite rough and the API is nowhere near stable.

A proper way to support EI in Python would be to implement the protocol directly - there's no need for the C API quirkiness this way and you can make full use of things like async and whatnot. If you're interested in that, get in touch! Meanwhile, writing something roughly resemling xdotool is probably only a few hundred lines of python code. [2]

[1] writing these also exposed a few bugs in libei itself so I'm happy 1.0 wasn't out just yet
[2] at least the input emulation parts of xdotool

PipeWire camera support in Firefox

Posted by Jan Grulich on May 29, 2023 12:18 PM

New year, new challenges.

We finally reached a major milestone with Chromium 110, which was a release where we finally got screen sharing enabled by default on Wayland, and since then you no longer have to go into the preferences and enable the flag you need. That doesn’t mean my work there is over, but I’ve shifted my focus to something related but slightly different and that is PipeWire camera support.

Work on PipeWire camera support started in 2021 and was done by Michael Olbrich (Pengutronix). He submitted a huge change to Chromium to add this support and had trouble finding a reviewer because there was actually no one who knew anything about PipeWire in the Chromium project. I actually saw his change request by accident, but we got in touch and decided to move this to WebRTC instead, because having it lower in the stack means we would get it automatically in other browsers, like Firefox. Michael attended a meeting we used to have regularly for screen sharing support in WebRTC and we discussed how to implement PipeWire camera support in WebRTC instead and how to reuse some of the code we already had for screen sharing to avoid code duplication. After a few submitted and reverted reviews (usually when things break Chromium parts that are not covered by CI, happened to me many times), we ended up with PipeWire camera support in WebRTC (talking about the beginning of this year).

Journey to PipeWire camera support in Firefox

Up to recently, my work has mostly been 95% WebRTC and 5% Chromium, but I have not been familiar with Firefox at all (not counting WebRTC backports). I actually started fixing screen sharing support in there first before moving to camera, because I noticed a few issues after Firefox (finally) did a WebRTC rebase to some of the newer versions. They’ve actually started doing monthly WebRTC rebases, which is really a good thing and I’m glad to see that happening. Anyway, even though Firefox has more recent WebRTC these days, when I started in February, there was still no PipeWire camera support at all because WebRTC was still a few months behind, so I had to backport all the patches and make them work with Firefox. Only then I could finally start working on the actual PipeWire camera support from WebRTC. Working on the backports, I was still working in the WebRTC space, so everything was somewhat familiar. Implementing the actual PipeWire support was a different story and took me some extra time to understand how everything works. This includes camera API on the WebRTC side, camera support on the Firefox side, and I also had to learn all the APIs specifically used in Firefox, but admittedly, learning about new things is fun too. After some tries and errors it started to work and I was able to share my camera using the PipeWire camera backend from WebRTC. You have to trust me that the picture below is not using the V4L2 backend.

<figure class="wp-block-image size-large is-resized is-style-default"></figure>

I went ahead and submitted my WebRTC backports and the PipeWire camera backend implementation for review to Firefox. Unfortunately, I was told that the code where I placed my implementation could also be used by the WebRTC Javascript API, which is used by bots to check for camera presence on the client side, which I didn’t know as someone who just recently started working on camera support. This was a problem because we get PipeWire access through xdg-desktop-portal and this involves showing a dialog to the user asking for camera access. Showing a camera request dialog randomly to the user would not be a good experience. Going back to the drawing board, I talked to Andreas Pehrson (Mozilla/WebRTC). Andreas was a great helper and we came up with a solution on how to implement it properly in Firefox and avoid things like I’ve mentioned before. This time it involved some re-org changes in WebRTC, where I split the xdg-desktop-portal and PipeWire implementations for PipeWire video capture, so we can request camera access in Firefox only when appropriate and only do the PipeWire stuff in the backend assuming the access was granted. So I did implement it again, this time according to what we agreed on with Andreas and it worked.

<figure class="wp-block-image size-large is-resized"></figure>

This is now submitted for review again and hopefully this time it will only need some minor fixes and not a complete rewrite like before, and you will be able to try/use it sooner rather than later. The main change is submitted here, but it is accompanied by other changes with WebRTC backports or changes that make the backports buildable with Firefox. With the first version of the change, I had a Fedora COPR repository, but I had to discontinue it, because it was too hard to maintain it in a buildable state on top of a stable Firefox. But you can be sure that Fedora will be the very first consumer of these changes once they are merged.

Why do we need this?

For many reasons. I would recommend you to read a blog post from Christian Schaller, where everything is explained into the details and gives you more information about the camera stack. Main reasons are:

  • Security
    • Access to the camera must be granted by the user, so you can be sure that no one is using your camera behind your back.
  • Flexibility
    • Your camera can be accessed by multiple clients simultaneously.
  • Libcamera support
    • Needed for ARM devices or devices using ChromeOS

Chromium support

While support in WebRTC has been done already a few months ago, Chromium originally didn’t use WebRTC video capture API for camera support and for that reason it had to be added. Michael implemented it and it is still currently pending on review so currently both Chromium and Firefox are both implemented, but waiting for approval.

Future plans

Most importantly, I want to get everything merged and working seamlessly, but I’m already aware of some issues and missing functionality in the PipeWire backend in WebRTC. And we also have the same problem we used to have with screen sharing, which is that it’s not enabled by default, unit tested, and feature complete and these things take time fix.

GNOME Logout Inhbit

Posted by Caolán McNamara on May 11, 2023 11:14 AM


Recently added support in LibreOffice towards 7.6 for GNOME logout inhibit if LibreOffice has open documents with unsaved changes.

GNOME Core Apps Update

Posted by Michael Catanzaro on May 10, 2023 09:32 PM

It’s been a while since my big core app reorganization for GNOME 3.22. Here is a history of core app changes since then:

  • GNOME 3.26 (September 2017) added Music, To Do (which has since been renamed to Endeavor), and Document Scanner (simple-scan). (I blogged about this at the time, then became lazy and stopped blogging about core app updates, until now.)
  • To Do was removed in GNOME 3.28 (March 2018) due to lack of consensus over whether it should really be a core app.  As a result of this, we improved communication between GNOME release team and design team to ensure both teams agree on future core app changes. Mea culpa.
  • Documents was removed in GNOME 3.32 (March 2019).
  • A new Developer Tools subcategory of core was created in GNOME 3.38 (September 2020), adding Builder, dconf Editor, Devhelp, and Sysprof. These apps are only interesting for software developers and are not intended to be installed by default in general-purpose operating systems like the rest of GNOME core.
  • GNOME 41 (September 2021) featured the first larger set of changes to GNOME core since GNOME 3.22. This release removed Archive Manager (file-roller), since Files (nautilus) is now able to handle archives, and also removed gedit (formerly Text Editor). It added Connections and a replacement Text Editor app (gnome-text-editor). It also added a new Mobile subcategory of core, for apps intended for mobile-focused operating systems, featuring the dialer app Calls. (To date, the Mobile subcategory has not been very successful: so far Calls is the only app included there.)
  • GNOME 42 (March 2022) featured a second larger set of changes. Screenshot was removed because GNOME Shell gained a built-in screenshot tool. Terminal was removed in favor of Console (kgx). We also moved Boxes to the Developer Tools subcategory, to recommend that it no longer be installed by default in general purpose operating systems.
  • GNOME 43 (September 2022) added D-Spy to Developer Tools.

OK, now we’re caught up on historical changes. So, what to expect next?

New Process for Core Apps Changes

Although most of the core app changes have gone smoothly, we ran into some trouble replacing Terminal with Console. Console provides a fresher and simpler user interface on top of vte, the same terminal backend used by Terminal, so Console and Terminal share much of the same underlying functionality. This means work of the Terminal maintainers is actually key to the success of Console. Using a new terminal app rather than evolving Terminal allowed for bigger changes to the default user experience without upsetting users who prefer the experience provided by Terminal. I think Console is generally nicer than Terminal, but it is missing a few features that Fedora Workstation developers thought were important to have before replacing Terminal with Console. Long story short: this core app change was effectively rejected by one of our most important downstreams. Since then, Console has not seen very much development, and accordingly it is unlikely to be accepted into Fedora Workstation anytime soon. We messed up by adding the app to core before downstreams were comfortable with it, and at this point it has become unclear whether Console should remain in core or whether we should give up and bring back Terminal. Console remains for now, but I’m not sure where we go from here. Help welcome.

To prevent this situation from happening again, Chris and Sophie developed a detailed and organized process for adding or removing core apps, including a new Incubator category designed to provide notice to downstreams that we are considering adding new apps to GNOME core. The new Incubator is much more structured than my previous short-lived Incubator attempt in GNOME 3.22. When apps are added to Incubator, I’ve been proactively asking other Fedora Workstation developers to provide feedback to make sure the app is considered ready there, to avoid a repeat of the situation with Console. Other downstreams are also welcome to watch the  Incubator/Submission project and provide feedback on newly-submitted apps, which should allow plenty of heads-up so downstreams can let us know sooner rather than later if there are problems with Incubator apps. Hopefully this should ensure apps are actually adopted by downstreams when they enter GNOME core.

Imminent Core App Changes

Currently there are two apps in Incubator. Loupe is a new image viewer app developed by Chris and Sophie to replace Image Viewer (eog). Snapshot is a new camera app developed by Maximiliano and Jamie to replace Cheese. These apps are maturing rapidly and have received primarily positive feedback thus far, so they are likely to graduate from Incubator and enter GNOME core sooner rather than later. The time to provide feedback is now. Don’t be surprised if Loupe is included in core for GNOME 45.

In addition to Image Viewer and Cheese, we are also considering removing Photos. Photos is one of our “content apps” designed to allow browsing an entire collection of files independently of their filesystem locations. Historically, the other two content apps were Documents and Music. The content app strategy did not work very well for Documents, since a document browser doesn’t really offer many advantages over a file browser, but Photos and Music are both pretty decent at displaying your collection of pictures or songs, assuming you have such a collection. We have been discussing what to do with Photos and the other content apps for a very long time, at least since 2015. It took a very long time to reach some rough consensus, but we have finally agreed that the design of Photos still makes sense for GNOME: having a local app for viewing both local and cloud photos is still useful. However, Photos is no longer actively maintained. Several basic functionality bugs imperiled timely release of Fedora 37 last fall, and the app is less useful than previously because it no longer integrates with cloud services like Google Photos. (The Google integration depends on libgdata, which was removed from GNOME 44 because it did not survive the transition to libsoup 3.) Photos has failed the new core app review process due to lack of active maintenance, and will be soon be removed from GNOME core unless a new maintainer steps up to take care of it. Volunteers welcome.

Future Core App Changes

Lastly, I want to talk about some changes that are not yet planned, but might occur in the future. Think of this entire section as brainstorming rather than any concrete plans.

Like Photos, we have also been discussing the status of Music. The popularity of DRM-encumbered cloud music services has increased, and local music storage does not seem to be as common as it used to be. If you do have local music, Music is pretty decent at handling it, but there are prominent bugs and missing features (like the ability to select which folders to index) detracting from the user experience. We do not have consensus on whether having a core app to play local music files still makes sense, since most users probably do not have a local music collection anymore. But perhaps all that is a moot point, because Videos (totem) 3.38 removed support for opening audio files, leaving us with no core apps capable of playing audio for the past 2.5 years. Previously, our default music player was Videos, which was really weird, and now we have none; Music can only play audio files that you’ve navigated to using Music itself, so it’s impossible for Music to be our default music player. My suggestion to rename Videos to Media Player and handle audio files again has not been well-received, so the most likely solution to this conundrum is to teach Music how to open audio files, likely securing its future in core. A merge request exists, but it does not look close to landing. Fedora Workstation is still shipping Rhythmbox rather than Music specifically due to this problem. My opinion is this needs to be resolved for Music to remain in core.

It would be nice to have an email client in GNOME core, since everybody uses email and local clients are much nicer than webmail. The only plausible candidate here is Geary. (If you like Evolution, consider that you might not like the major UI changes and many, many feature removals that would be necessary for Evolution to enter GNOME core.) Geary has only one active maintainer, and adding a big application that depends on just one person seems too risky. If more developers were interested in maintaining Geary, it would feel like a safer addition to GNOME core.

Contacts feels a little out of place currently. It’s mostly useful for storing email addresses, but you cannot actually do anything with them because we have no email application in core. Like Photos, Contacts has had several recent basic functionality bugs that imperiled timely Fedora releases, but these seem to have been largely resolved, so it’s not causing urgent problems. Still, for Contacts to remain in the long term, we’re probably going to need another maintainer here too. And perhaps it only makes sense to keep if we add Geary.

Finally, should Maps move to the Mobile category? It seems clearly useful to have a maps app installed by default on a phone, but I wonder how many desktop users really prefer to use Maps rather than a maps website.

GNOME 44 Core Apps

I’ll end this blog post with an updated list of core apps as of GNOME 44. Here they are:

  • Main category (26 apps):
    • Calculator
    • Calendar
    • Characters
    • Cheese
    • Clocks
    • Connections
    • Console (kgx)
    • Contacts
    • Disks (gnome-disk-utility)
    • Disk Usage Analyzer (baobab)
    • Document Scanner (simple-scan)
    • Document Viewer (evince)
    • Files (nautilus)
    • Fonts (gnome-font-viewer)
    • Help (yelp)
    • Image Viewer (eog)
    • Logs
    • Maps
    • Music
    • Photos
    • Software
    • System Monitor
    • Text Editor
    • Videos (totem)
    • Weather
    • Web (epiphany)
  • Developer Tools (6 apps):
    • Boxes
    • Builder
    • dconf Editor
    • Devhelp
    • D-Spy
    • sysprof
  • Mobile (1 app):
    • Calls

MSI and Insecure KMs

Posted by Richard Hughes on May 09, 2023 01:40 PM

As some as you may know, MSI suffered a data breach which leaked a huge amount of source code, documentation and low-level firmware PRIVATE KEYS. This is super bad as it now allows anyone to sign a random firmware image and install it as an official MSI firmware. It’s even more super bad than that, as the certificates leaked seem to be the KeyManifest keys, which actually control the layer below SecureBoot, this little-documented and even less well understood thing called BootGuard. I’ll not overplay the impact here, but there is basically no firmware security on most modern MSI hardware now. We already detect the leaked test keys from Lenovo and notify the user via the HSI test failure and I think we should do the same thing for MSI devices too. I’ve not downloaded the leak for obvious reasons, and I don’t think the KM hashes would be easy to find either.

So what can you do to help? Do you have an MSI laptop or motherboard affected by the leak? The full list is here (source: Binarly) and if you have one of those machines I’d ask if you could follow the instructions below, run MEInfo and attach it to the discussion please.

As for how to get MEInfo, Intel doesn’t want to make it easy for us. The Intel CSME System Tools are all different binaries, and are seemingly all compiled one-by-one for each specific MEI generation — and available only from a semi-legitimate place unless you’re an OEM or ODM. Once you have the archive of tools you either have to work out what CSME revision you have (e.g. Ice Point is 13.0) or do what I do and extract all the versions and just keep running them until one works. e.g. choosing the wrong one will get you:

sudo ./CSME\ System\ Tools\ v13.50\ r3/MEInfo/LINUX64/MEInfo 
Intel (R) MEInfo Version: 13.50.15.1475
Copyright (C) 2005 - 2021, Intel Corporation. All rights reserved.
Error 621: Unsupported hardware platform. HW: Cometlake Platform. Supported HW: Jasplerlake Platform.

And choosing the right one will get you:

Intel (R) MEInfo Version: 14.1.60.1790
Copyright (C) 2005 - 2021, Intel Corporation. All rights reserved.

General FW Information
…
OEM Public Key Hash FPF                          2B4D5D79BD7EE3C192412A4501D88FB2066C853FF7B1060765395D671B15D30C

Now, how to access these hashes is what Intel keeps a secret, for no reason at all. I literally need to know what integer index to use when querying the HECI device. I’ve asked Intel, but I’ve been waiting since October 2022. For instance:

sudo strace -xx -s 4096  -e openat,read,write,close ./CSME\ System\ Tools\ v14.0.20+\ r20/MEInfo/LINUX64/MEInfo
…
write(3, "\x0a\x0a\x00\x00\x00\x23\x00\x40\x00\x00\x00\x00\x20\x00\x00\x00\x00", 17) = 17
read(3, "\x0a\x8a\x00\x00\x20\x00\x00\x00\x2b\x4d\x5d\x79\xbd\x7e\xe3\xc1\x92\x41\x2a\x45\x01\xd8\x8f\xb2\x06\x6c\x85\x3f\xf7\xb1\x06\x07\x65\x39\x5d\x67\x1b\x15\xd3\x0c", 4096) = 40
…

That contains all the information I need – the Comet Lake READ_FILE_EX ID is 0x40002300 and there’s a SHA256 hash that matches what the OEM Public Key Hash FPF console output said above. There are actually three accesses to get the same hash in three different places, so until I know why I’d like the entire output from MEInfo.

The information I need uploading to the bug is then just these two files:

sudo ./THE_CORRECT_PATH/MEInfo/LINUX64/MEInfo &> YOUR_GITHUB_USERNAME-meinfo.txt
sudo strace -xx -s 4096  -e openat,read,write,close ./THE_CORRECT_PATH/MEInfo/LINUX64/MEInfo &> YOUR_GITHUB_USERNAME-meinfo-strace.txt

If I need more info I’ll ask on the ticket. Thanks!

libei and a fancy protocol

Posted by Peter Hutterer on May 09, 2023 12:51 AM

libei is the library for Emulated Input - see this post for an introduction. Like many projects, libei was started when it was still unclear if it could be the right solution to the problem. In the years (!) since, we've upgraded the answer to that question from "hopefully" to "yeah, I reckon" - doubly so since we added support for receiver contexts and got InputLeap working through the various portal changes.

Emulating or capturing input needs two processes to communicate for obvious reasons so the communication protocol is a core part of it. But initially, libei was a quickly written prototype and the protocol was hacked up on an as-needed let's-get-this-working basis. The rest of the C API got stable enough but the protocol was the missing bit. Long-term the protocol must be stable - without a stable protocol updating your compositor may break all flatpaks still shipping an older libei. Or updating a flatpak may not work with an older compositor. So in the last weeks/months, a lot of work as gone into making the protocol stable. This consisted of two parts: drop protobuf and make the variuos features interface-dependent, unashamedly quite like the Wayland protocol which is also split into a number of interfaces that can be independently versioned. Initially, I attempted to make the protocol binary compatible with Wayland but dropped that goal eventually - the benefits were minimal and the effort and limitations (due to different requirements) were quite significant.

The protocol is defined in a single XML file and can be used directly from language bindings (if any). The protocol documentation is quite extensive but it's relatively trivial in principal: the first 8 bytes of each message are the object ID, then we have 4 bytes for the message length in bytes, then 4 for the object-specific opcode. That opcode is one of the requests or events in the object's interface - which is defined at object creation time. Unlike Wayland, the majority of objects in libei are created in server-side (the EIS implementation decides which seats are available and which devices in those seats). The remainder of the message are the arguments. Note that unlike other protocols the message does not carry a signature - prior knowledge of the message is required to parse the arguments. This is a direct effect of initially making it wayland-compatible and I didn't really find it worth the effort to add this.

Anyway, long story short: swapping the protocol out didn't initially have any effect on the C library but with the changes came some minor updates to remove some of the warts in the API. Perhaps the biggest change is that the previous capabilities of a device are now split across several interfaces. Your average mouse-like emulated device will have the "pointer", "button" and "scroll" interfaces, or maybe the "pointer_absolute", "button" and "scroll" interface. The touch and keyboard interfaces were left as-is. Future interfaces will likely include gestures and tablet tools, I have done some rough prototyping locally and it will fit in nicely enough with the current protocol.

At the time of writing, the protocol is not officialy stable but I have no intention of changing it short of some bug we may discover. Expect libei 1.0 very soon.

Twitter's e2ee DMs are better than nothing

Posted by Matthew Garrett on May 04, 2023 09:49 PM
(Edit 2023-05-10: This has now launched for a subset of Twitter users. The code that existed to notify users that device identities had changed does not appear to have been enabled - as a result, in its current form, Twitter can absolutely MITM conversations and read your messages)

Elon Musk appeared on an interview with Tucker Carlson last month, with one of the topics being the fact that Twitter could be legally compelled to hand over users' direct messages to government agencies since they're held on Twitter's servers and aren't encrypted. Elon talked about how they were in the process of implementing proper encryption for DMs that would prevent this - "You could put a gun to my head and I couldn't tell you. That's how it should be."

tl;dr - in the current implementation, while Twitter could subvert the end-to-end nature of the encryption, it could not do so without users being notified. If any user involved in a conversation were to ignore that notification, all messages in that conversation (including ones sent in the past) could then be decrypted. This isn't ideal, but it still seems like an improvement over having no encryption at all. More technical discussion follows.

For context: all information about Twitter's implementation here has been derived from reverse engineering version 9.86.0 of the Android client and 9.56.1 of the iOS client (the current versions at time of writing), and the feature hasn't yet launched. While it's certainly possible that there could be major changes in the protocol between now launch, Elon has asserted that they plan to launch the feature this week so it's plausible that this reflects what'll ship.

For it to be impossible for Twitter to read DMs, they need to not only be encrypted, they need to be encrypted with a key that's not available to Twitter. This is what's referred to as "end-to-end encryption", or e2ee - it means that the only components in the communication chain that have access to the unencrypted data are the endpoints. Even if the message passes through other systems (and even if it's stored on other systems), those systems do not have access to the keys that would be needed to decrypt the data.

End-to-end encrypted messengers were initially popularised by Signal, but the Signal protocol has since been incorporated into WhatsApp and is probably much more widely used there. Millions of people per day are sending messages to each other that pass through servers controlled by third parties, but those third parties are completely unable to read the contents of those messages. This is the scenario that Elon described, where there's no degree of compulsion that could cause the people relaying messages to and from people to decrypt those messages afterwards.

But for this to be possible, both ends of the communication need to be able to encrypt messages in a way the other end can decrypt. This is usually performed using AES, a well-studied encryption algorithm with no known significant weaknesses. AES is a form of what's referred to as a symmetric encryption, one where encryption and decryption are performed with the same key. This means that both ends need access to that key, which presents us with a bootstrapping problem. Until a shared secret is obtained, there's no way to communicate securely, so how do we generate that shared secret? A common mechanism for this is something called Diffie Hellman key exchange, which makes use of asymmetric encryption. In asymmetric encryption, an encryption key can be split into two components - a public key and a private key. Both devices involved in the communication combine their private key and the other party's public key to generate a secret that can only be decoded with access to the private key. As long as you know the other party's public key, you can now securely generate a shared secret with them. Even a third party with access to all the public keys won't be able to identify this secret. Signal makes use of a variation of Diffie-Hellman called Extended Triple Diffie-Hellman that has some desirable properties, but it's not strictly necessary for the implementation of something that's end-to-end encrypted.

Although it was rumoured that Twitter would make use of the Signal protocol, and in fact there are vestiges of code in the Twitter client that still reference Signal, recent versions of the app have shipped with an entirely different approach that appears to have been written from scratch. It seems simple enough. Each device generates an asymmetric keypair using the NIST P-256 elliptic curve, along with a device identifier. The device identifier and the public half of the key are uploaded to Twitter using a new API endpoint called /1.1/keyregistry/register. When you want to send an encrypted DM to someone, the app calls /1.1/keyregistry/extract_public_keys with the IDs of the users you want to communicate with, and gets back a list of their public keys. It then looks up the conversation ID (a numeric identifier that corresponds to a given DM exchange - for a 1:1 conversation between two people it doesn't appear that this ever changes, so if you DMed an account 5 years ago and then DM them again now from the same account, the conversation ID will be the same) in a local database to retrieve a conversation key. If that key doesn't exist yet, the sender generates a random one. The message is then encrypted with the conversation key using AES in GCM mode, and the conversation key is then put through Diffie-Hellman with each of the recipients' public device keys. The encrypted message is then sent to Twitter along with the list of encrypted conversation keys. When each of the recipients' devices receives the message it checks whether it already has a copy of the conversation key, and if not performs its half of the Diffie-Hellman negotiation to decrypt the encrypted conversation key. One it has the conversation key it decrypts it and shows it to the user.

What would happen if Twitter changed the registered public key associated with a device to one where they held the private key, or added an entirely new device to a user's account? If the app were to just happily send a message with the conversation key encrypted with that new key, Twitter would be able to decrypt that and obtain the conversation key. Since the conversation key is tied to the conversation, not any given pair of devices, obtaining the conversation key means you can then decrypt every message in that conversation, including ones sent before the key was obtained.

(An aside: Signal and WhatsApp make use of a protocol called Sesame which involves additional secret material that's shared between every device a user owns, hence why you have to do that QR code dance whenever you add a new device to your account. I'm grossly over-simplifying how clever the Signal approach is here, largely because I don't understand the details of it myself. The Signal protocol uses something called the Double Ratchet Algorithm to implement the actual message encryption keys in such a way that even if someone were able to successfully impersonate a device they'd only be able to decrypt messages sent after that point even if they had encrypted copies of every previous message in the conversation)

How's this avoided? Based on the UI that exists in the iOS version of the app, in a fairly straightforward way - each user can only have a single device that supports encrypted messages. If the user (or, in our hypothetical, a malicious Twitter) replaces the device key, the client will generate a notification. If the user pays attention to that notification and verifies with the recipient through some out of band mechanism that the device has actually been replaced, then everything is fine. But, if any participant in the conversation ignores this warning, the holder of the subverted key can obtain the conversation key and decrypt the entire history of the conversation. That's strictly worse than anything based on Signal, where such impersonation would simply not work, but even in the Twitter case it's not possible for someone to silently subvert the security.

So when Elon says Twitter wouldn't be able to decrypt these messages even if someone held a gun to his head, there's a condition applied to that - it's true as long as nobody fucks up. This is clearly better than the messages just not being encrypted at all in the first place, but overall it's a weaker solution than Signal. If you're currently using Twitter DMs, should you turn on encryption? As long as the limitations aren't too limiting, definitely! Should you use this in preference to Signal or WhatsApp? Almost certainly not. This seems like a genuine incremental improvement, but it'd be easy to interpret what Elon says as providing stronger guarantees than actually exist.

comment count unavailable comments

PSA: upgrade your LUKS key derivation function

Posted by Matthew Garrett on April 18, 2023 12:26 AM
Here's an article from a French anarchist describing how his (encrypted) laptop was seized after he was arrested, and material from the encrypted partition has since been entered as evidence against him. His encryption password was supposedly greater than 20 characters and included a mixture of cases, numbers, and punctuation, so in the absence of any sort of opsec failures this implies that even relatively complex passwords can now be brute forced, and we should be transitioning to even more secure passphrases.

Or does it? Let's go into what LUKS is doing in the first place. The actual data is typically encrypted with AES, an extremely popular and well-tested encryption algorithm. AES has no known major weaknesses and is not considered to be practically brute-forceable - at least, assuming you have a random key. Unfortunately it's not really practical to ask a user to type in 128 bits of binary every time they want to unlock their drive, so another approach has to be taken.

This is handled using something called a "key derivation function", or KDF. A KDF is a function that takes some input (in this case the user's password) and generates a key. As an extremely simple example, think of MD5 - it takes an input and generates a 128-bit output, so we could simply MD5 the user's password and use the output as an AES key. While this could technically be considered a KDF, it would be an extremely bad one! MD5s can be calculated extremely quickly, so someone attempting to brute-force a disk encryption key could simply generate the MD5 of every plausible password (probably on a lot of machines in parallel, likely using GPUs) and test each of them to see whether it decrypts the drive.

(things are actually slightly more complicated than this - your password is used to generate a key that is then used to encrypt and decrypt the actual encryption key. This is necessary in order to allow you to change your password without having to re-encrypt the entire drive - instead you simply re-encrypt the encryption key with the new password-derived key. This also allows you to have multiple passwords or unlock mechanisms per drive)

Good KDFs reduce this risk by being what's technically referred to as "expensive". Rather than performing one simple calculation to turn a password into a key, they perform a lot of calculations. The number of calculations performed is generally configurable, in order to let you trade off between the amount of security (the number of calculations you'll force an attacker to perform when attempting to generate a key from a potential password) and performance (the amount of time you're willing to wait for your laptop to generate the key after you type in your password so it can actually boot). But, obviously, this tradeoff changes over time - defaults that made sense 10 years ago are not necessarily good defaults now. If you set up your encrypted partition some time ago, the number of calculations required may no longer be considered up to scratch.

And, well, some of these assumptions are kind of bad in the first place! Just making things computationally expensive doesn't help a lot if your adversary has the ability to test a large number of passwords in parallel. GPUs are extremely good at performing the sort of calculations that KDFs generally use, so an attacker can "just" get a whole pile of GPUs and throw them at the problem. KDFs that are computationally expensive don't do a great deal to protect against this. However, there's another axis of expense that can be considered - memory. If the KDF algorithm requires a significant amount of RAM, the degree to which it can be performed in parallel on a GPU is massively reduced. A Geforce 4090 may have 16,384 execution units, but if each password attempt requires 1GB of RAM and the card only has 24GB on board, the attacker is restricted to running 24 attempts in parallel.

So, in these days of attackers with access to a pile of GPUs, a purely computationally expensive KDF is just not a good choice. And, unfortunately, the subject of this story was almost certainly using one of those. Ubuntu 18.04 used the LUKS1 header format, and the only KDF supported in this format is PBKDF2. This is not a memory expensive KDF, and so is vulnerable to GPU-based attacks. But even so, systems using the LUKS2 header format used to default to argon2i, again not a memory expensive KDFwhich is memory strong, but not designed to be resistant to GPU attack (thanks to the comments pointing out my misunderstanding here). New versions default to argon2id, which is. You want to be using argon2id.

What makes this worse is that distributions generally don't update this in any way. If you installed your system and it gave you pbkdf2 as your KDF, you're probably still using pbkdf2 even if you've upgraded to a system that would use argon2id on a fresh install. Thankfully, this can all be fixed-up in place. But note that if anything goes wrong here you could lose access to all your encrypted data, so before doing anything make sure it's all backed up (and figure out how to keep said backup secure so you don't just have your data seized that way).

First, make sure you're running as up-to-date a version of your distribution as possible. Having tools that support the LUKS2 format doesn't mean that your distribution has all of that integrated, and old distribution versions may allow you to update your LUKS setup without actually supporting booting from it. Also, if you're using an encrypted /boot, stop now - very recent versions of grub2 support LUKS2, but they don't support argon2id, and this will render your system unbootable.

Next, figure out which device under /dev corresponds to your encrypted partition. Run

lsblk

and look for entries that have a type of "crypt". The device above that in the tree is the actual encrypted device. Record that name, and run

sudo cryptsetup luksHeaderBackup /dev/whatever --header-backup-file /tmp/luksheader

and copy that to a USB stick or something. If something goes wrong here you'll be able to boot a live image and run

sudo cryptsetup luksHeaderRestore /dev/whatever --header-backup-file luksheader

to restore it.

(Edit to add: Once everything is working, delete this backup! It contains the old weak key, and someone with it can potentially use that to brute force your disk encryption key using the old KDF even if you've updated the on-disk KDF.)

Next, run

sudo cryptsetup luksDump /dev/whatever

and look for the Version: line. If it's version 1, you need to update the header to LUKS2. Run

sudo cryptsetup convert /dev/whatever --type luks2

and follow the prompts. Make sure your system still boots, and if not go back and restore the backup of your header. Assuming everything is ok at this point, run

sudo cryptsetup luksDump /dev/whatever

again and look for the PBKDF: line in each keyslot (pay attention only to the keyslots, ignore any references to pbkdf2 that come after the Digests: line). If the PBKDF is either "pbkdf2" or "argon2i" you should convert to argon2id. Run the following:

sudo cryptsetup luksConvertKey /dev/whatever --pbkdf argon2id

and follow the prompts. If you have multiple passwords associated with your drive you'll have multiple keyslots, and you'll need to repeat this for each password.

Distributions! You should really be handling this sort of thing on upgrade. People who installed their systems with your encryption defaults several years ago are now much less secure than people who perform a fresh install today. Please please please do something about this.

comment count unavailable comments

Booting modern Intel CPUs

Posted by Matthew Garrett on April 17, 2023 12:04 AM
CPUs can't do anything without being told what to do, which leaves the obvious problem of how do you tell a CPU to do something in the first place. On many CPUs this is handled in the form of a reset vector - an address the CPU is hardcoded to start reading instructions from when power is applied. The address the reset vector points to will typically be some form of ROM or flash that can be read by the CPU even if no other hardware has been configured yet. This allows the system vendor to ship code that will be executed immediately after poweron, configuring the rest of the hardware and eventually getting the system into a state where it can run user-supplied code.

The specific nature of the reset vector on x86 systems has varied over time, but it's effectively always been 16 bytes below the top of the address space - so, 0xffff0 on the 20-bit 8086, 0xfffff0 on the 24-bit 80286, and 0xfffffff0 on the 32-bit 80386. Convention on x86 systems is to have RAM starting at address 0, so the top of address space could be used to house the reset vector with as low a probability of conflicting with RAM as possible.

The most notable thing about x86 here, though, is that when it starts running code from the reset vector, it's still in real mode. x86 real mode is a holdover from a much earlier era of computing. Rather than addresses being absolute (ie, if you refer to a 32-bit address, you store the entire address in a 32-bit or larger register), they are 16-bit offsets that are added to the value stored in a "segment register". Different segment registers existed for code, data, and stack, so a 16-bit address could refer to different actual addresses depending on how it was being interpreted - jumping to a 16 bit address would result in that address being added to the code segment register, while reading from a 16 bit address would result in that address being added to the data segment register, and so on. This is all in order to retain compatibility with older chips, to the extent that even 64-bit x86 starts in real mode with segments and everything (and, also, still starts executing at 0xfffffff0 rather than 0xfffffffffffffff0 - 64-bit mode doesn't support real mode, so there's no way to express a 64-bit physical address using the segment registers, so we still start just below 4GB even though we have massively more address space available).

Anyway. Everyone knows all this. For modern UEFI systems, the firmware that's launched from the reset vector then reprograms the CPU into a sensible mode (ie, one without all this segmentation bullshit), does things like configure the memory controller so you can actually access RAM (a process which involves using CPU cache as RAM, because programming a memory controller is sufficiently hard that you need to store more state than you can fit in registers alone, which means you need RAM, but you don't have RAM until the memory controller is working, but thankfully the CPU comes with several megabytes of RAM on its own in the form of cache, so phew). It's kind of ugly, but that's a consequence of a bunch of well-understood legacy decisions.

Except. This is not how modern Intel x86 boots. It's far stranger than that. Oh, yes, this is what it looks like is happening, but there's a bunch of stuff going on behind the scenes. Let's talk about boot security. The idea of any form of verified boot (such as UEFI Secure Boot) is that a signature on the next component of the boot chain is validated before that component is executed. But what verifies the first component in the boot chain? You can't simply ask the BIOS to verify itself - if an attacker can replace the BIOS, they can replace it with one that simply lies about having done so. Intel's solution to this is called Boot Guard.

But before we get to Boot Guard, we need to ensure the CPU is running in as bug-free a state as possible. So, when the CPU starts up, it examines the system flash and looks for a header that points at CPU microcode updates. Intel CPUs ship with built-in microcode, but it's frequently old and buggy and it's up to the system firmware to include a copy that's new enough that it's actually expected to work reliably. The microcode image is pulled out of flash, a signature is verified, and the new microcode starts running. This is true in both the Boot Guard and the non-Boot Guard scenarios. But for Boot Guard, before jumping to the reset vector, the microcode on the CPU reads an Authenticated Code Module (ACM) out of flash and verifies its signature against a hardcoded Intel key. If that checks out, it starts executing the ACM. Now, bear in mind that the CPU can't just verify the ACM and then execute it directly from flash - if it did, the flash could detect this, hand over a legitimate ACM for the verification, and then feed the CPU different instructions when it reads them again to execute them (a Time of Check vs Time of Use, or TOCTOU, vulnerability). So the ACM has to be copied onto the CPU before it's verified and executed, which means we need RAM, which means the CPU already needs to know how to configure its cache to be used as RAM.

Anyway. We now have an ACM loaded and verified, and it can safely be executed. The ACM does various things, but the most important from the Boot Guard perspective is that it reads a set of write-once fuses in the motherboard chipset that represent the SHA256 of a public key. It then reads the initial block of the firmware (the Initial Boot Block, or IBB) into RAM (or, well, cache, as previously described) and parses it. There's a block that contains a public key - it hashes that key and verifies that it matches the SHA256 from the fuses. It then uses that key to validate a signature on the IBB. If it all checks out, it executes the IBB and everything starts looking like the nice simple model we had before.

Except, well, doesn't this seem like an awfully complicated bunch of code to implement in real mode? And yes, doing all of this modern crypto with only 16-bit registers does sound like a pain. So, it doesn't. All of this is happening in a perfectly sensible 32 bit mode, and the CPU actually switches back to the awful segmented configuration afterwards so it's still compatible with an 80386 from 1986. The "good" news is that at least firmware can detect that the CPU has already configured the cache as RAM and can skip doing that itself.

I'm skipping over some steps here - the ACM actually does other stuff around measuring the firmware into the TPM and doing various bits of TXT setup for people who want DRTM in their lives, but the short version is that the CPU bootstraps itself into a state where it works like a modern CPU and then deliberately turns a bunch of the sensible functionality off again before it starts executing firmware. I'm also missing out the fact that this entire process only kicks off after the Management Engine says it can, which means we're waiting for an entirely independent x86 to boot an entire OS before our CPU even starts pretending to execute the system firmware.

Of course, as mentioned before, on modern systems the firmware will then reprogram the CPU into something actually sensible so OS developers no longer need to care about this[1][2], which means we've bounced between multiple states for no reason other than the possibility that someone wants to run legacy BIOS and then boot DOS on a CPU with like 5 orders of magnitude more transistors than the 8086.

tl;dr why can't my x86 wake up with the gin protected mode already inside it

[1] Ha uh except that on ACPI resume we're going to skip most of the firmware setup code so we still need to handle the CPU being in fucking 16-bit mode because suspend/resume is basically an extremely long reboot cycle

[2] Oh yeah also you probably have multiple cores on your CPU and well bad news about the state most of the cores are in when the OS boots because the firmware never started them up so they're going to come up in 16-bit real mode even if your boot CPU is already in 64-bit protected mode, unless you were using TXT in which case you have a different sort of nightmare that if we're going to try to map it onto real world nightmare concepts is one that involves a lot of teeth. Or, well, that used to be the case, but ACPI 6.4 (released in 2021) provides a mechanism for the OS to ask the firmware to wake the CPU up for it so this is invisible to the OS, but you're still relying on the firmware to actually do the heavy lifting here

comment count unavailable comments

Crosswords 0.3.8: Change Management

Posted by Jonathan Blandford on April 02, 2023 04:00 PM

It’s time for another Crosswords release. This is a somewhat quieter release on the surface as it doesn’t have as many user-visible changes. But like the last release, a lot happened under the hood in preparation for the next phase.

This release marks a change in focus. I’ve shifted my work to the editor instead of the game. I hadn’t given the editor much attention over the past year and it’s overdue for updates. I have a lot of features planned for it; it’s time to make progress on them.

Crosswords Editor

The first change I made was to give a revamp to the workflow for creating a new puzzle. The old editor would let you change the puzzle type while editing it — something that was technically neat but not actually useful to setters. I have different editing sections planned based on the puzzle type, which means that restricting a window to one type. For example, editing an acrostic grid is totally different from editing a crossword grid.

The new greeter also cleans up a  weird flow where the first tab would lock once you’d picked your initial values.

<figure aria-describedby="caption-attachment-7263" class="wp-caption aligncenter" id="attachment_7263" style="width: 840px">New greeter dialog for the Crossword Editor<figcaption class="wp-caption-text" id="caption-attachment-7263">New puzzle greeter</figcaption></figure>

To do this, I added a greeter that lets you select the type of puzzle right from the outset. We also took advantage of the fact that we added a separate GType for each puzzle type. It’s very loosely inspired by the new project greeter from GNOME Builder.

The second thing I spent time on wasn’t actually a code change, but a design plan. An implementation challenge I’ve had is balancing letting people use all the crazy features that the ipuz spec allows, and adding guardrails to let people write standard puzzles without thinking about those extra features. The problem with those features is that you can easily end up in a legal-but-weird puzzle file. As an example, imagine a crossword where the numbering of all the clues are out-of-order. That’s a legal .ipuz file and possibly valid in fringe circumstances, but rarely what any puzzle designer actually wants.

There is a new design doc for how to handle intermediate states. The details are complicated, but the overall approach involves adding lint() and fixup() functions to the puzzle types. This will let us make the changes we want, but then let the editor get the puzzle back to a reasonable state.

Many thanks to Federico, who very kindly let me call him on the weekends to talk through the issues and iterate to a proposal. I’ve started updating libipuz to implement this new design.

Crosswords: Adaptive Layout

This is the third release that I will have blogged about the adaptive layout, and it’s the first time I feel good about the results. It has been incredibly challenging to get Crosswords to work well at a variety of sizes. This cycle, I introduced the concept of a both a natural size and a minimum size to the game. This results in a mixture of user control and screen-driven sizing. Here is an example of a crossword scaling to the very small and to the very large.

<video class="wp-video-shortcode" controls="controls" height="463" id="video-7260-1" preload="metadata" width="840"><source src="https://blogs.gnome.org/jrb/files/2023/04/Screencast-from-2023-04-01-19-04-31.webm?_=1" type="video/webm">https://blogs.gnome.org/jrb/files/2023/04/Screencast-from-2023-04-01-19-04-31.webm</video>
<video class="wp-video-shortcode" controls="controls" height="463" id="video-7260-2" preload="metadata" width="840"><source src="https://blogs.gnome.org/jrb/files/2023/04/Screencast-from-2023-04-01-19-09-22.webm?_=2" type="video/webm">https://blogs.gnome.org/jrb/files/2023/04/Screencast-from-2023-04-01-19-09-22.webm</video>

I hope I can put this feature down for now!

Misc fixes and thanks

There were a number of other fixes this release. Most excitingly, we have a number of new contributors too! Crosswords is a potential GNOME GSOC program and some perspective students spent time trying to learn the code base. Here are the improvements and credits.

Puzzle Set tagging dialog

  • First, we now render the puzzle set description labels to look more like tags. Thanks to Pratham for working on this.
  • Thanks to Tanmay for fixing a bug where horizontal clue enumerations weren’t rendering correctly. He also contributed fixes to keyboard layout dialog, and started a promising thumbnailer (targeting next release)
  • Thanks to Philip for layout fixes, a new icon, and for having a preternatural ability to find bugs in my code as soon as I declare them “done.”
  • I fixed an issue where we didn’t lock down the puzzle correctly after winning the game. I’ve been trying to track this down for six months.
  • Thanks as always to the  translators for translations.

Until next time!

New gitlab.freedesktop.org spamfighting abilities

Posted by Peter Hutterer on March 29, 2023 07:31 AM

As of today, gitlab.freedesktop.org allows anyone with a GitLab Developer role or above to remove spam issues. If you are reading this article a while after it's published, it's best to refer to the damspam README for up-to-date details. I'm going to start with the TLDR first.

For Maintainers

Create a personal access token with API access and save the token value as $XDG_CONFIG_HOME/damspam/user.token Then run the following commands with your project's full path (e.g. mesa/mesa, pipewire/wireplumber, xorg/lib/libX11):

$ pip install git+https://gitlab.freedesktop.org/freedesktop/damspam
$ damspam request-webhook foo/bar
# clean up, no longer needed.
$ pip uninstall damspam
$ rm $XDG_CONFIG_HOME/damspam/user.token
The damspam command will file an issue in the freedesktop/fdo-bots repository. This issue will be automatically processed by a bot and should be done by the time you finish the above commands, see this issue for an example. Note: the issue processing requires a git push to an internal repo - if you script this for multiple repos please put a sleep(30) in to avoid conflicts.

Once the request has been processed (and again, this should be instant), any issue in your project that gets assigned the label Spam will be processed automatically by damspam. See the next section for details.

For Developers

Once the maintainer for your project has requested the webhook, simply assign the Spam label to any issue that is spam. The issue creator will be blocked (i.e. cannot login), this issue and any other issue filed by the same user will be closed and made confidential (i.e. they are no longer visible to the public). In the future, one of the GitLab admins can remove that user completely but meanwhile, they and their spam are gone from the public eye and they're blocked from producing more. This should happen within seconds of assigning the Spam label.

For GitLab Admins

Create a personal access token with API access for the @spambot user and save the token value as $XDG_CONFIG_HOME/damspam/spambot.token. This is so you can operate as spambot instead of your own user. Then run the following command to remove all tagged spammers:

$ pip install git+https://gitlab.freedesktop.org/freedesktop/damspam
$ damspam purge-spammers
The last command will list any users that are spammers (together with an issue that should make it simple to check whether it is indeed spam) and after interactive confirmation purge them as requested. At the time of writing, the output looks like this:
$ damspam purge-spammers
0: naughtyuser              : https://gitlab.freedesktop.org/somenamespace/project/-/issues/1234: [STREAMING@TV]!* LOOK AT ME
1: abcuseless               : https://gitlab.freedesktop.org/somenamespace/project/-/issues/4567: ((@))THIS STREAM IS IMPORTANT
2: anothergit               : https://gitlab.freedesktop.org/somenamespace/project/-/issues/8778: Buy something, really
3: whatawasteofalife        : https://gitlab.freedesktop.org/somenamespace/project/-/issues/9889: What a waste of oxygen I am
Purging a user means a full delete including all issues, MRs, etc. This is nonrecoverable!
Please select the users to purge:
[q]uit, purge [a]ll, or the index: 
     
Purging the spammers will hard-delete them and remove anything they ever did on gitlab. This is irreversible.

How it works

There are two components at play here: hookiedookie, a generic webhook dispatcher, and damspam which handles the actual spam issues. Hookiedookie provides an HTTP server and "does things" with JSON data on request. What it does is relatively generic (see the Settings.yaml example file) but it's set up to be triggered by a GitLab webhook and thus receives this payload. For damspam the rules we have for hookiedookie come down to something like this: if the URL is "webhooks/namespace/project" and damspam is set up for this project and the payload is an issue event and it has the "Spam" label in the issue labels, call out to damspam and pass the payload on. Other rules we currently use are automatic reload on push events or the rule to trigger the webhook request processing bot as above.

This is also the reason a maintainer has to request the webhook. When the request is processed, the spambot installs a webhook with a secret token (a uuid) in the project. That token will be sent as header (a standard GitLab feature). The project/token pair is also added to hookiedookie and any webhook data must contain the project name and matching token, otherwise it is discarded. Since the token is write-only, no-one (not even the maintainers of the project) can see it.

damspam gets the payload forwarded but is otherwise unaware of how it is invoked. It checks the issue, fetches the data needed, does some safety check and if it determines that yes, this is spam, then it closes the issue, makes it confidential, blocks the user and then recurses into every issue this user ever filed. Not necessarily in that order. There are some safety checks, so you don't have to worry about it suddenly blocking every project member.

Why?

For a while now, we've suffered from a deluge of spam (and worse) that makes it through the spam filters. GitLab has a Report Abuse feature for this but it's... woefully incomplete. The UI guides users to do the right thing - as reporter you can tick "the user is sending spam" and it automatically adds a link to the reported issue. But: none of this useful data is visible to admins. Seriously, look at the official screenshots. There is no link to the issue, all you get is a username, the user that reported it and the content of a textbox that almost never has any useful information. The link to the issue? Not there. The selection that the user is a spammer? Not there.

For an admin, this is frustrating at best. To verify that the user is indeed sending spam, you have to find the issue first. Which, at best, requires several clicks and digging through the profile activities. At worst you know that the user is a spammer because you trust the reporter but you just can't find the issue for whatever reason.

But even worse: reporting spam does nothing immediately. The spam stays up until an admin wakes up, reviews the abuse reports and removes that user. Meanwhile, the spammer can happily keep filing issues against the project. Overall, it is not a particularly great situation.

With hookiedookie and damspam, we're now better equipped to stand against the tide of spam. Anyone who can assign labels can help fight spam and the effect is immediate. And it's - for our use-cases - safe enough: if you trust someone to be a developer on your project, we can trust them to not willy-nilly remove issues pretending they're spam. In fact, they probably could've deleted issues beforehand already anyway if they wanted to make them disappear.

Other instances

While we're definitely aiming at gitlab.freedesktop.org, there's nothing in particular that requires this instance. If you're the admin for a public gitlab instance feel free to talk to Benjamin Tissoires or me to check whether this could be useful for you too, and what changes would be necessary.

We need better support for SSH host certificates

Posted by Matthew Garrett on March 24, 2023 05:12 PM
Github accidentally committed their SSH RSA private key to a repository, and now a bunch of people's infrastructure is broken because it needs to be updated to trust the new key. This is obviously bad, but what's frustrating is that there's no inherent need for it to be - almost all the technological components needed to both reduce the initial risk and to make the transition seamless already exist.

But first, let's talk about what actually happened here. You're probably used to the idea of TLS certificates from using browsers. Every website that supports TLS has an asymmetric pair of keys divided into a public key and a private key. When you contact the website, it gives you a certificate that contains the public key, and your browser then performs a series of cryptographic operations against it to (a) verify that the remote site possesses the private key (which prevents someone just copying the certificate to another system and pretending to be the legitimate site), and (b) generate an ephemeral encryption key that's used to actually encrypt the traffic between your browser and the site. But what stops an attacker from simply giving you a fake certificate that contains their public key? The certificate is itself signed by a certificate authority (CA), and your browser is configured to trust a preconfigured set of CAs. CAs will not give someone a signed certificate unless they prove they have legitimate ownership of the site in question, so (in theory) an attacker will never be able to obtain a fake certificate for a legitimate site.

This infrastructure is used for pretty much every protocol that can use TLS, including things like SMTP and IMAP. But SSH doesn't use TLS, and doesn't participate in any of this infrastructure. Instead, SSH tends to take a "Trust on First Use" (TOFU) model - the first time you ssh into a server, you receive a prompt asking you whether you trust its public key, and then you probably hit the "Yes" button and get on with your life. This works fine up until the point where the key changes, and SSH suddenly starts complaining that there's a mismatch and something awful could be happening (like someone intercepting your traffic and directing it to their own server with their own keys). Users are then supposed to verify whether this change is legitimate, and if so remove the old keys and add the new ones. This is tedious and risks users just saying "Yes" again, and if it happens too often an attacker can simply redirect target users to their own server and through sheer fatigue at dealing with this crap the user will probably trust the malicious server.

Why not certificates? OpenSSH actually does support certificates, but not in the way you might expect. There's a custom format that's significantly less complicated than the X509 certificate format used in TLS. Basically, an SSH certificate just contains a public key, a list of hostnames it's good for, and a signature from a CA. There's no pre-existing set of trusted CAs, so anyone could generate a certificate that claims it's valid for, say, github.com. This isn't really a problem, though, because right now nothing pays attention to SSH host certificates unless there's some manual configuration.

(It's actually possible to glue the general PKI infrastructure into SSH certificates. Please do not do this)

So let's look at what happened in the Github case. The first question is "How could the private key have been somewhere that could be committed to a repository in the first place?". I have no unique insight into what happened at Github, so this is conjecture, but I'm reasonably confident in it. Github deals with a large number of transactions per second. Github.com is not a single computer - it's a large number of machines. All of those need to have access to the same private key, because otherwise git would complain that the private key had changed whenever it connected to a machine with a different private key (the alternative would be to use a different IP address for every frontend server, but that would instead force users to repeatedly accept additional keys every time they connect to a new IP address). Something needs to be responsible for deploying that private key to new systems as they're brought up, which means there's ample opportunity for it to accidentally end up in the wrong place.

Now, best practices suggest that this should be avoided by simply placing the private key in a hardware module that performs the cryptographic operations, ensuring that nobody can ever get at the private key. The problem faced here is that HSMs typically aren't going to be fast enough to handle the number of requests per second that Github deals with. This can be avoided by using something like a Nitro Enclave, but you're still going to need a bunch of these in different geographic locales because otherwise your front ends are still going to be limited by the need to talk to an enclave on the other side of the planet, and now you're still having to deal with distributing the private key to a bunch of systems.

What if we could have the best of both worlds - the performance of private keys that just happily live on the servers, and the security of private keys that live in HSMs? Unsurprisingly, we can! The SSH private key could be deployed to every front end server, but every minute it could call out to an HSM-backed service and request a new SSH host certificate signed by a private key in the HSM. If clients are configured to trust the key that's signing the certificates, then it doesn't matter what the private key on the servers is - the client will see that there's a valid certificate and will trust the key, even if it changes. Restricting the validity of the certificate to a small window of time means that if a key is compromised an attacker can't do much with it - the moment you become aware of that you stop signing new certificates, and once all the existing ones expire the old private key becomes useless. You roll out a new private key with new certificates signed by the same CA and clients just carry on trusting it without any manual involvement.

Why don't we have this already? The main problem is that client tooling just doesn't handle this well. OpenSSH has no way to do TOFU for CAs, just the keys themselves. This means there's no way to do a git clone ssh://git@github.com/whatever and get a prompt asking you to trust Github's CA. Instead, you need to add a @cert-authority github.com (key) line to your known_hosts file by hand, and since approximately nobody's going to do that there's only marginal benefit in going to the effort to implement this infrastructure. The most important thing we can do to improve the security of the SSH ecosystem is to make it easier to use certificates, and that means improving the behaviour of the clients.

It should be noted that certificates aren't the only approach to handling key migration. OpenSSH supports a protocol for key rotation, basically by allowing the server to provide a set of multiple trusted keys that the client can cache, and then invalidating old ones. Unfortunately this still requires that the "new" private keys be deployed in the same way as the old ones, so any screwup that results in one private key being leaked may well also result in the additional keys being leaked. I prefer the certificate approach.

Finally, I've seen a couple of people imply that the blame here should be attached to whoever or whatever caused the private key to be committed to a repository in the first place. This is a terrible take. Humans will make mistakes, and your systems should be resilient against that. There's no individual at fault here - there's a series of design decisions that made it possible for a bad outcome to occur, and in a better universe they wouldn't have been necessary. Let's work on building that better universe.

comment count unavailable comments