Fedora desktop Planet

Free Software Charities

Posted by Michael Catanzaro on March 25, 2021 01:32 PM

I believe we have reached a point where it is time to discontinue donations to the Free Software Foundation, in light of the outrageously poor judgment shown by its board of directors in reinstating Richard Stallman to the board. I haven’t seen other free software community members calling for cutting off donations yet. Even the open letter doesn’t call for this. Edit: I’ve been pointed to the line “We urge those in a position to do so to stop supporting the Free Software Foundation,” which surely implies a call to stop donating, so I was wrong about that.

I have no doubt there will be follow-up blog posts explaining why cutting off donations is harmful to the community and the FSF’s staff, and will hurt the FSF in the long-term… but seriously, enough is enough. If we don’t draw the line here, there will never be any line anywhere. Continued support for the FSF is continued complicity, and is harming rather than helping advance the ideals of the free software community. So, where should you donate instead?

The Software Freedom Conservancy is a great choice. It hosts many member projects you’re probably familiar with, including Outreachy and git, among many others. It does some GPL compliance work and takes a strict view on software freedom. For US donors, it is a 501(c)(3), so your donations may be tax-deductible. This is where I send my free software donations not directed to GNOME. Read its statement on recent events at the FSF.

Another good option, especially for people in the EU, is the Free Software Foundation Europe, an independently-run sister organization of the FSF. If you’re not familiar with FSFE, think of it as a more moderate version of the FSF, with a special focus on Europe. It shares the same goals as the FSF, but with more reasonable leadership and much less popcorn. Read its statement on recent events at the FSF.

Most people reading this blog are likely GNOME users. Contributing to GNOME is a great way to support the desktop you use to get things done. Monthly sustaining donations are especially appreciated. Despite the historical association between GNOME and GNU, GNOME has had little to do with GNU for a long time now. The GNOME Foundation is run independently and has formally signed the open letter calling for the resignation of the FSF’s board of directors. For US donors, it is a 501(c)(3). (If you use KDE, donate to KDE here.)

It should be obvious, but this is a personal blog. None of the above organizations have endorsed cutting off donations to the FSF, to my knowledge. They would probably find it to be in poor taste to abuse a crisis to solicit funds. But I doubt they’d object if you send some money their way.

Exploring my doorbell

Posted by Matthew Garrett on March 15, 2021 06:58 PM
I've talked about my doorbell before, but started looking at it again this week because sometimes it simply doesn't send notifications to my Home Assistant setup - the push notifications appear on my phone, but the doorbell simply doesn't trigger the HTTP callback it's meant to[1]. This is obviously suboptimal, but it's also tricky to debug a device when you have no access to it.

Normally I'd just head straight in with a screwdriver, but the doorbell is shared with the other units in this building and it seemed a little anti-social to interfere with a shared resource. So I bought some broken units from ebay and pulled one of them apart. There's several boards inside, but one of them had a conveniently empty connector at the top with "TX", "RX" and "GND" labelled. Sticking a USB-serial converter on this gave me output from U-Boot, and then kernel output. Confirmation that my doorbell runs Linux, but unfortunately it didn't give me a shell prompt. My next approach would often me to just dump the flash and look for vulnerabilities that way, but this device uses TSOP-48 packaged NAND flash rather than the more convenient SPI NOR flash that I already have adapters to access. Dumping this sort of NAND isn't terribly hard, but the easiest way to do it involves desoldering it from the board and plugging it into something like a Flashcat USB adapter, and my soldering's not good enough to put it back on the board afterwards. So I wanted another approach.

U-Boot gave a short countdown to hit a key before continuing with boot, and for once hitting a key actually did something. Unfortunately it then prompted for a password, and giving the wrong one resulted in boot continuing[2]. In the past I've had good luck forcing U-Boot to drop to a prompt by simply connecting one of the data lines on SPI flash to ground while it's trying to read the kernel - the failed read causes U-Boot to error out. It turns out the same works fine on raw NAND, so I just edited the kernel boot arguments to append "init=/bin/sh" and soon I had a shell.

From here on, things were made easier by virtue of the device using the YAFFS filesystem. Unlike many flash filesystems, it's read/write, so I could make changes that would persist through to the running system. There was a convenient copy of telnetd included, but it segfaulted on startup, which reduced its usefulness. Fortunately there was also a copy of Netcat[3]. If you make a fifo somewhere on the filesystem, you can cat the fifo to a shell, pipe the shell to a netcat listener, and then pipe netcat's output back to the fifo. The shell's output all gets passed to whatever connects to netcat, and whatever's sent to netcat gets passed through the fifo back to the shell. This is, obviously, horribly insecure, but it was enough to get a root shell over the network on the running device.

The doorbell runs various bits of software, one of which is Lighttpd to provide a local API and access to the device. Another component ("nxp-client") connects to the vendor's cloud infrastructure and passes cloud commands back to the local webserver. This is where I found something strange. Lighttpd was refusing to start because its modules wanted library symbols that simply weren't present on the device. My best guess is that a firmware update went wrong and left the device in a partially upgraded state - and without a working local webserver, there was no way to perform any further updates. This may explain why this doorbell was sitting on ebay.

Anyway. Now that I had shell, I could simply dump the flash by copying it directly off the /dev/mtdblock devices - since I had netcat, I could just pipe stuff through that back to my actual computer. Now I had access to the filesystem I could extract that locally and start digging into it more deeply. One incredibly useful tool for this is qemu-user. qemu is a general purpose hardware emulation platform, usually used to emulate entire systems. But in qemu-user mode, it instead only emulates the CPU. When a piece of code tries to make a system call to access the kernel, qemu-user translates that to the appropriate calling convention for the host kernel and makes that call instead. Combined with binfmt_misc, you can configure a Linux system to be able to run Linux binaries from other architectures. One of the best things about this is that, because they're still using the host convention for making syscalls, you can run the host strace on them and see what they're doing.

What I found was that nxp-client was calling back to the cloud platform, setting up an encrypted communication channel (using ChaCha20 and a bunch of key setup stuff I couldn't be bothered picking apart) and then waiting for commands from the cloud. It would then proxy those through to the local webserver. Since I couldn't run the local lighttpd, I just wrote a trivial Python app using http.server and waited to see what requests I got. The first was a GET to a CGI script called editcgi.cgi, along with a path name. I mocked up the GET request to respond with what was on the actual filesystem. The cloud then proceeded to POST to editcgi.cgi, with the same pathname and with new file contents. editcgi.cgi is apparently able to read and write to files on the filesystem.

But this is on the interface that's exposed to the cloud client, so this didn't appear immediately useful - and, indeed, trying to hit the same CGI binary over the local network gave me a 401 unauthorized error. There's a local API spec for these doorbells, but they all refer to scripts in the bha-api namespace, and this script was in the plain cgi-bin namespace. But then I noticed that the bha-api namespace didn't actually exist in the filesystem - instead, lighttpd's mod_alias was configured to rewrite requests to bha-api through to files in cgi-bin. And by using the documented API to get a session token, I could call editcgi.cgi to read and write arbitrary files on the doorbell. Which means I can drop an extra script in /etc/rc.d/rc3.d and get a shell on my doorbell.

This all requires the ability to have local authentication credentials, so it's not a big security deal other than it allowing you to retain access to a monitoring device even after you've moved out and had your credentials revoked. I'm sure it's all fine.

[1] I can ping the doorbell from the Home Assistant machine, so it's not that the network is flaky
[2] The password appears to be hy9$gnhw0z6@ if anyone else ends up in this situation
[3] https://twitter.com/mjg59/status/654578208545751040

comment count unavailable comments

Unauthenticated MQTT endpoints on Linksys Velop routers enable local DoS

Posted by Matthew Garrett on March 09, 2021 07:55 PM
(Edit: this is CVE-2021-1000002)

Linksys produces a series of wifi mesh routers under the Velop line. These routers use MQTT to send messages to each other for coordination purposes. In the version I tested against, there was zero authentication on this - anyone on the local network is able to connect to the MQTT interface on a router and send commands. As an example:
mosquitto_pub -h 192.168.1.1 -t "network/master/cmd/nodes_temporary_blacklist" -m '{"data": {"client": "f8:16:54:43:e2:0c", "duration": "3600", "action": "start"}}'
will ask the router to block the client with MAC address f8:16:54:43:e2:0c from the network for an hour. Various other MQTT topics pass parameters to shell scripts without quoting them or escaping metacharacters, so more serious outcomes may be possible.

The vendor has released two firmware updates since report - I have not verified whether either fixes this, but the changelog does not indicate any security issues were addressed.

Timeline:

2020-07-30: Submitted through the vendor's security vulnerability report form, indicating that I plan to disclose in either 90 days or after a fix is released. The form turns out to file a Bugcrowd submission.
2020-07-30: I claim the Bugcrowd submission.
2020-08-19: Vendor acknowledges the issue, is able to reproduce and assigns it a P3 priority.
2020-12-15: I ask if there's an update.
2021-02-02: I ask if there's an update.
2021-02-03: Bugcrowd raise a blocker on the issue, asking the vendor to respond.
2021-02-17: I ask for permission to disclose.
2021-03-09: In the absence of any response from the vendor since 2020-08-19, I violate Bugcrowd disclosure policies and unilaterally disclose.

comment count unavailable comments

Making hibernation work under Linux Lockdown

Posted by Matthew Garrett on February 21, 2021 08:37 AM
Linux draws a distinction between code running in kernel (kernel space) and applications running in userland (user space). This is enforced at the hardware level - in x86-speak[1], kernel space code runs in ring 0 and user space code runs in ring 3[2]. If you're running in ring 3 and you attempt to touch memory that's only accessible in ring 0, the hardware will raise a fault. No matter how privileged your ring 3 code, you don't get to touch ring 0.

Kind of. In theory. Traditionally this wasn't well enforced. At the most basic level, since root can load kernel modules, you could just build a kernel module that performed any kernel modifications you wanted and then have root load it. Technically user space code wasn't modifying kernel space code, but the difference was pretty semantic rather than useful. But it got worse - root could also map memory ranges belonging to PCI devices[3], and if the device could perform DMA you could just ask the device to overwrite bits of the kernel[4]. Or root could modify special CPU registers ("Model Specific Registers", or MSRs) that alter CPU behaviour via the /dev/msr interface, and compromise the kernel boundary that way.

It turns out that there were a number of ways root was effectively equivalent to ring 0, and the boundary was more about reliability (ie, a process running as root that ends up misbehaving should still only be able to crash itself rather than taking down the kernel with it) than security. After all, if you were root you could just replace the on-disk kernel with a backdoored one and reboot. Going deeper, you could replace the bootloader with one that automatically injected backdoors into a legitimate kernel image. We didn't have any way to prevent this sort of thing, so attempting to harden the root/kernel boundary wasn't especially interesting.

In 2012 Microsoft started requiring vendors ship systems with UEFI Secure Boot, a firmware feature that allowed[5] systems to refuse to boot anything without an appropriate signature. This not only enabled the creation of a system that drew a strong boundary between root and kernel, it arguably required one - what's the point of restricting what the firmware will stick in ring 0 if root can just throw more code in there afterwards? What ended up as the Lockdown Linux Security Module provides the tooling for this, blocking userspace interfaces that can be used to modify the kernel and enforcing that any modules have a trusted signature.

But that comes at something of a cost. Most of the features that Lockdown blocks are fairly niche, so the direct impact of having it enabled is small. Except that it also blocks hibernation[6], and it turns out some people were using that. The obvious question is "what does hibernation have to do with keeping root out of kernel space", and the answer is a little convoluted and is tied into how Linux implements hibernation. Basically, Linux saves system state into the swap partition and modifies the header to indicate that there's a hibernation image there instead of swap. On the next boot, the kernel sees the header indicating that it's a hibernation image, copies the contents of the swap partition back into RAM, and then jumps back into the old kernel code. What ensures that the hibernation image was actually written out by the kernel? Absolutely nothing, which means a motivated attacker with root access could turn off swap, write a hibernation image to the swap partition themselves, and then reboot. The kernel would happily resume into the attacker's image, giving the attacker control over what gets copied back into kernel space.

This is annoying, because normally when we think about attacks on swap we mitigate it by requiring an encrypted swap partition. But in this case, our attacker is root, and so already has access to the plaintext version of the swap partition. Disk encryption doesn't save us here. We need some way to verify that the hibernation image was written out by the kernel, not by root. And thankfully we have some tools for that.

Trusted Platform Modules (TPMs) are cryptographic coprocessors[7] capable of doing things like generating encryption keys and then encrypting things with them. You can ask a TPM to encrypt something with a key that's tied to that specific TPM - the OS has no access to the decryption key, and nor does any other TPM. So we can have the kernel generate an encryption key, encrypt part of the hibernation image with it, and then have the TPM encrypt it. We store the encrypted copy of the key in the hibernation image as well. On resume, the kernel reads the encrypted copy of the key, passes it to the TPM, gets the decrypted copy back and is able to verify the hibernation image.

That's great! Except root can do exactly the same thing. This tells us the hibernation image was generated on this machine, but doesn't tell us that it was done by the kernel. We need some way to be able to differentiate between keys that were generated in kernel and ones that were generated in userland. TPMs have the concept of "localities" (effectively privilege levels) that would be perfect for this. Userland is only able to access locality 0, so the kernel could simply use locality 1 to encrypt the key. Unfortunately, despite trying pretty hard, I've been unable to get localities to work. The motherboard chipset on my test machines simply doesn't forward any accesses to the TPM unless they're for locality 0. I needed another approach.

TPMs have a set of Platform Configuration Registers (PCRs), intended for keeping a record of system state. The OS isn't able to modify the PCRs directly. Instead, the OS provides a cryptographic hash of some material to the TPM. The TPM takes the existing PCR value, appends the new hash to that, and then stores the hash of the combination in the PCR - a process called "extension". This means that the new value of the TPM depends not only on the value of the new data, it depends on the previous value of the PCR - and, in turn, that previous value depended on its previous value, and so on. The only way to get to a specific PCR value is to either (a) break the hash algorithm, or (b) perform exactly the same sequence of writes. On system reset the PCRs go back to a known value, and the entire process starts again.

Some PCRs are different. PCR 23, for example, can be reset back to its original value without resetting the system. We can make use of that. The first thing we need to do is to prevent userland from being able to reset or extend PCR 23 itself. All TPM accesses go through the kernel, so this is a simple matter of parsing the write before it's sent to the TPM and returning an error if it's a sensitive command that would touch PCR 23. We now know that any change in PCR 23's state will be restricted to the kernel.

When we encrypt material with the TPM, we can ask it to record the PCR state. This is given back to us as metadata accompanying the encrypted secret. Along with the metadata is an additional signature created by the TPM, which can be used to prove that the metadata is both legitimate and associated with this specific encrypted data. In our case, that means we know what the value of PCR 23 was when we encrypted the key. That means that if we simply extend PCR 23 with a known value in-kernel before encrypting our key, we can look at the value of PCR 23 in the metadata. If it matches, the key was encrypted by the kernel - userland can create its own key, but it has no way to extend PCR 23 to the appropriate value first. We now know that the key was generated by the kernel.

But what if the attacker is able to gain access to the encrypted key? Let's say a kernel bug is hit that prevents hibernation from resuming, and you boot back up without wiping the hibernation image. Root can then read the key from the partition, ask the TPM to decrypt it, and then use that to create a new hibernation image. We probably want to prevent that as well. Fortunately, when you ask the TPM to encrypt something, you can ask that the TPM only decrypt it if the PCRs have specific values. "Sealing" material to the TPM in this way allows you to block decryption if the system isn't in the desired state. So, we define a policy that says that PCR 23 must have the same value at resume as it did on hibernation. On resume, the kernel resets PCR 23, extends it to the same value it did during hibernation, and then attempts to decrypt the key. Afterwards, it resets PCR 23 back to the initial value. Even if an attacker gains access to the encrypted copy of the key, the TPM will refuse to decrypt it.

And that's what this patchset implements. There's one fairly significant flaw at the moment, which is simply that an attacker can just reboot into an older kernel that doesn't implement the PCR 23 blocking and set up state by hand. Fortunately, this can be avoided using another aspect of the boot process. When you boot something via UEFI Secure Boot, the signing key used to verify the booted code is measured into PCR 7 by the system firmware. In the Linux world, the Shim bootloader then measures any additional keys that are used. By either using a new key to tag kernels that have support for the PCR 23 restrictions, or by embedding some additional metadata in the kernel that indicates the presence of this feature and measuring that, we can have a PCR 7 value that verifies that the PCR 23 restrictions are present. We then seal the key to PCR 7 as well as PCR 23, and if an attacker boots into a kernel that doesn't have this feature the PCR 7 value will be different and the TPM will refuse to decrypt the secret.

While there's a whole bunch of complexity here, the process should be entirely transparent to the user. The current implementation requires a TPM 2, and I'm not certain whether TPM 1.2 provides all the features necessary to do this properly - if so, extending it shouldn't be hard, but also all systems shipped in the past few years should have a TPM 2, so that's going to depend on whether there's sufficient interest to justify the work. And we're also at the early days of review, so there's always the risk that I've missed something obvious and there are terrible holes in this. And, well, given that it took almost 8 years to get the Lockdown patchset into mainline, let's not assume that I'm good at landing security code.

[1] Other architectures use different terminology here, such as "supervisor" and "user" mode, but it's broadly equivalent
[2] In theory rings 1 and 2 would allow you to run drivers with privileges somewhere between full kernel access and userland applications, but in reality we just don't talk about them in polite company
[3] This is how graphics worked in Linux before kernel modesetting turned up. XFree86 would just map your GPU's registers into userland and poke them directly. This was not a huge win for stability
[4] IOMMUs can help you here, by restricting the memory PCI devices can DMA to or from. The kernel then gets to allocate ranges for device buffers and configure the IOMMU such that the device can't DMA to anything else. Except that region of memory may still contain sensitive material such as function pointers, and attacks like this can still cause you problems as a result.
[5] This describes why I'm using "allowed" rather than "required" here
[6] Saving the system state to disk and powering down the platform entirely - significantly slower than suspending the system while keeping state in RAM, but also resilient against the system losing power.
[7] With some handwaving around "coprocessor". TPMs can't be part of the OS or the system firmware, but they don't technically need to be an independent component. Intel have a TPM implementation that runs on the Management Engine, a separate processor built into the motherboard chipset. AMD have one that runs on the Platform Security Processor, a small ARM core built into their CPU. Various ARM implementations run a TPM in Trustzone, a special CPU mode that (in theory) is able to access resources that are entirely blocked off from anything running in the OS, kernel or otherwise.

comment count unavailable comments

A pre-supplied "custom" keyboard layout for X11

Posted by Peter Hutterer on February 18, 2021 01:57 AM

Last year I wrote about how to create a user-specific XKB layout, followed by a post explaining that this won't work in X. But there's a pandemic going on, which is presumably the only reason people haven't all switched to Wayland yet. So it was time to figure out a workaround for those still running X.

This Merge Request (scheduled for xkeyboard-config 2.33) adds a "custom" layout to the evdev.xml and base.xml files. These XML files are parsed by the various GUI tools to display the selection of available layouts. An entry in there will thus show up in the GUI tool.

Our rulesets, i.e. the files that convert a layout/variant configuration into the components to actually load already have wildcard matching [1]. So the custom layout will resolve to the symbols/custom file in your XKB data dir - usually /usr/share/X11/xkb/symbols/custom.

This file is not provided by xkeyboard-config. It can be created by the user though and whatever configuration is in there will be the "custom" keyboard layout. Because xkeyboard-config does not supply this file, it will not get overwritten on update.

From XKB's POV it is just another layout and it thus uses the same syntax. For example, to override the +/* key on the German keyboard layout with a key that produces a/b/c/d on the various Shift/Alt combinations, use this:


default
xkb_symbols "basic" {
include "de(basic)"
key <AD12> { [ a, b, c, d ] };
};
This example includes the "basic" section from the symbols/de file (i.e. the default German layout), then overrides the 12th alphanumeric key from left in the 4th row from bottom (D) with the given symbols. I'll leave it up to the reader to come up with a less useful example.

There are a few drawbacks:

  • If the file is missing and the user selects the custom layout, the results are... undefined. For run-time configuration like GNOME it doesn't really matter - the layout compilation fails and you end up with the one the device already had (i.e. the default one built into X, usually the US layout).
  • If the file is missing and the custom layout is selected in the xorg.conf, the results are... undefined. I tested it and ended up with the US layout but that seems more by accident than design. My recommendation is to not do that.
  • No variants are available in the XML files, so the only accessible section is the one marked default.
  • If a commandline tool uses a variant of custom, the GUI will not reflect this. If the GUI goes boom, that's a bug in the GUI.

So overall, it's a hack[2]. But it's a hack that fixes real user issues and given we're talking about X, I doubt anyone notices another hack anyway.

[1] If you don't care about GUIs, setxkbmap -layout custom -variant foobar has been possible for years.
[2] Sticking with the UNIX principle, it's a hack that fixes the issue at hand, is badly integrated, and weird to configure.

fwupd 1.5.6

Posted by Richard Hughes on February 16, 2021 12:16 PM

Today I released fwupd 1.5.6 which the usual smattering of new features and bugfixes. These are some of the more interesting ones:

With the help of a lot of people we added support for quite a bit of new hardware. The slightly odd GD32VF103 as found in the Longan Nano is now supported, and more of the DFU ST devices with huge amounts of flash. The former should enable us to support the Pinecil device soon and the latter will be a nice vendor announcement in the future. We’ve also added support for RMI PS2 devices as found in some newer Lenovo ThinkPads, the Starlabs LabTop L4 and the new System76 Keyboard. We’ve refactored the udev and usb backends into self contained modules, allowing someone else to contribute new bluetooth peripheral functionality in the future. There are more than a dozen teams of people all working on fwupd features at the moment. Exciting times!

One problem that has been reported was that downloads from the datacenter in the US were really slow from China, specifically because the firewall was deliberately dropping packets. I assume compressed firmware looks quite a lot like a large encrypted message from a firewalls’ point of view, and thus it was only letting through ~20% of the traffic. All non-export controlled public firmware is now also mirrored onto the IPFS, and we experimentally fall back to peer-to-peer downloads where the HTTP download failed. You can prefer IPFS downloads using fwupdmgr --ipfs update although you need to have a running ipfs daemon on your local computer. If this works well for you, let me know and we might add support for downloading metadata in the future too.

We’ve fully integrated the fwupd CI with oss-fuzz, a popular fuzzing service from Google. Generating horribly corrupt firmware files has found a few little memory leaks, files that cause fwupd to spin in a loop and even the odd crash. It was a lot of work to build each fuzzer into a small static binary using a 16.04-based container but it was well worth all the hard work. All new PRs will run the same fuzzers checking for regressions which also means new plugins now also have to implement building new firmware (so the test payload can be a few tens of bytes, not 32kB), rather than just parsing it.

On some Lenovo hardware there’s a “useful” feature called Boot Order Lock that means whatever the OS adds as a BootXXXX entry the old bootlist gets restored on next boot. This breaks firmware updates using fwupdx64.efi and until we can detect BOL from a kernel interface we also check if our EFI entry has been deleted by the firmware on next boot and give the user a more helpful message than just “it failed”. Also, on some Lenovo hardware we’re limiting the number of UEFI updates to be deployed on one reboot as they appear to have slightly quirky capsule coalesce behavior. In the same vein we’re also checking the system clock is set approximately correct (as in, not set to before 2020…) so we can tell the user to check the clock on the machine rather than just failing with a obscure certificate error.

Now there are systems that can be switched to coreboot (and back to EDK2 again) we’ve polished up the “switch-branch” feature. We’re also checking both BIOSWE and BLE before identifying systems that can be supported. We’re also including the lockdown status in uploaded UEFI reports and added SBAT metadata to the fwupd EFI binary, which will be required for future versions of shim and grub – so for distro fwupd binaries the packager will need to set meson build options like -Defi_sbat_distro_id=. There are examples in the fwupd source tree.

Auto-updating XKB for new kernel keycodes

Posted by Peter Hutterer on January 22, 2021 12:58 AM

Your XKB keymap contains two important parts. One is the mapping from the hardware scancode to some internal representation, for example:

  <AB10> = 61;  

Which basically means Alphanumeric key in row B (from bottom), 10th key from the left. In other words: the /? key on a US keyboard.

The second part is mapping that internal representation to a keysym, for example:

  key <AB10> {        [     slash,    question        ]       }; 

This is the actual layout mapping - once in place this key really produces a slash or question mark (on level2, i.e. when Shift is down).

This two-part approach exists so either part can be swapped without affecting the other. Swap the second part to an exclamation mark and paragraph symbol and you have the French version of this key, swap it to dash/underscore and you have the German version of the key - all without having to change the keycode.

Back in the golden days of everyone-does-what-they-feel-like, keyboard manufacturers (presumably happily so) changed the key codes and we needed model-specific keycodes in XKB. The XkbModel configuration is a leftover from these trying times.

The Linux kernel's evdev API has largely done away with this. It provides a standardised set of keycodes, defined in linux/input-event-codes.h, and ensures, with the help of udev [0], that all keyboards actually conform to that. An evdev XKB keycode is a simple "kernel keycode + 8" [1] and that applies to all keyboards. On top of that, the kernel uses semantic definitions for the keys as they'd be in the US layout. KEY_Q is the key that would, behold!, produce a Q. Or an A in the French layout because they just have to be different, don't they? Either way, with evdev the Xkb Model configuration largely points to nothing and only wastes a few cycles with string parsing.

The second part, the keysym mapping, uses two approaches. One is to use a named #define like the "slash", "question" outlined above (see X11/keysymdef.h for the defines). The other is to use unicode directly like this example from  the Devangari layout:

  key <AB10> { [ U092f, U095f, slash, question ] };

As you can see, mix and match is available too. Using Unicode code points of course makes the layouts less immediately readable but on the other hand we don't need to #define the whole of Unicode. So from a maintenance perspective it's a win.

However, there's a third type of key that we care about: functional keys. Those are the multimedia (historically: "internet") keys that most devices have these days. Volume up, touchpad on/off, cycle display connectors, etc. Those keys are special in that they don't have a Unicode representation and they are always mapped to the same fixed functionality. Even Dvorak users want their volume keys to do what it says on the key.

Because they have no Unicode code points, those keys are defined, historically, in XF86keysyms.h:

  #define XF86XK_MonBrightnessUp    0x1008FF02  /* Monitor/panel brightness */

And mapping a key like this looks like this [2]:

  key <I21>   {       [ XF86Calculator        ] };

The only drawback: every key needs to be added manually. This has been done for some, but not for others. And some keys were added with different names than what the kernel uses [3].

So we're in this weird situation where we have a flexible keymap system  but the kernel already tells us what a key does anyway and we don't want to change that. Virtually all keys added in the last decade or so falls into that group of keys, but to actually make use of them requires a #define in xorgproto and an update to the keycodes and symbols in xkeyboard-config. That again introduces discrepancies and we end up in the situation where we're at right now: some keys don't work until someone files a bug, and then the users still need to wait for several components to be released and those releases trickle into the distributions.

10 years ago would've been a good time to make this more efficient. The situation wasn't that urgent then, most of the kernel keycodes added are >255 which means they cannot be used in X anyway. [4] The second best time to do it is now. What we need is basically a pass-through from kernel code to symbol and that's currently sitting in various MRs:

- xkeyboard-config can generate the keycodes/evdev file based on the list of kernel keycodes, so all kernel keycodes are mapped to internal representations by default

- xorgproto has reserved a range within the XF86 keysym reserved range for pass-through mappings, i.e. any KEY_FOO define from the kernel is mapped to XF86XK_Foo with a specific value [5]. The #define format is fixed so it can be parsed.

- xkeyboard-config parses theses XF86 keysyms and sets up a keysym mapping in the default keymap.

This is semi-automatic, i.e. there are helper scripts that detect changes and notify us, hooked into the CI, but the actual work must be done manually. These keysyms immediately become set-in-stone API so we don't want some unsupervised script to go wild on them.

There's a huge backlog of keys to be added (dating to kernels pre-v3.18) and I'll go through them one-by-one over the next weeks to make sure they're correct. But eventually they'll be done and we have a full keymap for all kernel keys to be immediately available in the XKB layout.

The last part of all of this is a calendar reminder for me to do this after every new kernel release. Let's hope this crucial part isn't the first to fail.

[0] 60-keyboard.hwdb has a mere ~1800 lines!
[1] Historical reasons, you don't want to know. *jedi wave*
[2] the XK_ part of the key name is dropped, implementation detail.
[3] This can also happen when a kernel define is renamed/aliased but we cannot easily do so for this header.
[4] X has an 8 bit keycode limit and that won't change until someone develops XKB2 with support for 32-bit keycodes, i.e. never.

[5] The actual value is an implementation detail and no client must care


Parsing HID Unit Items

Posted by Peter Hutterer on January 13, 2021 11:28 AM

This post explains how to parse the HID Unit Global Item as explained by the HID Specification, page 37. The table there is quite confusing and it took me a while to fully understand it (Benjamin Tissoires was really the one who cracked it). I couldn't find any better explanation online which means either I'm incredibly dense and everyone's figured it out or no-one has posted a better explanation. On the off-chance it's the latter [1], here are the instructions on how to parse this item.

We know a HID Report Descriptor consists of a number of items that describe the content of each HID Report (read: an event from a device). These Items include things like Logical Minimum/Maximum for axis ranges, etc. A HID Unit item specifies the physical unit to apply. For example, a Report Descriptor may specify that X and Y axes are in mm which can be quite useful for all the obvious reasons.

Like most HID items, a HID Unit Item consists of a one-byte item tag and 1, 2 or 4 byte payload. The Unit item in the Report Descriptor itself has the binary value 0110 01nn where the nn is either 1, 2, or 3 indicating 1, 2 or 4 bytes of payload, respectively. That's standard HID.

The payload is divided into nibbles (4-bit units) and goes from LSB to MSB. The lowest-order 4 bits (first byte & 0xf) define the unit System to apply: one of SI Linear, SI Rotation, English Linear or English Rotation (well, or None/Reserved). The rest of the nibbles are in this order: "length", "mass", "time", "temperature", "current", "luminous intensity". In something resembling code this means:


system = value & 0xf
length_exponent = (value & 0xf0) >> 4
mass_exponent = (value & 0xf00) >> 8
time_exponent = (value & 0xf000) >> 12
...
The System defines which unit is used for length (e.g. SILinear means length is in cm). The actual value of each nibble is the exponent for the unit in use [2]. In something resembling code:

switch (system)
case SILinear:
print("length is in cm^{length_exponent}");
break;
case SIRotation:
print("length is in rad^{length_exponent}");
break;
case EnglishLinear:
print("length is in in^{length_exponent}");
break;
case EnglishRotation:
print("length is in deg^{length_exponent}");
break;
case None:
case Reserved"
print("boo!");
break;

For example, the value 0x321 means "SI Linear" (0x1) so the remaining nibbles represent, in ascending nibble order: Centimeters, Grams, Seconds, Kelvin, Ampere, Candela. The length nibble has a value of 0x2 so it's square cm, the mass nibble has a value of 0x3 so it is cubic grams (well, it's just an example, so...). This means that any report containing this item comes in cm²g³. As a more realistic example: 0xF011 would be cm/s.

If we changed the lowest nibble to English Rotation (0x4), i.e. our value is now 0x324, the units represent: Degrees, Slug, Seconds, F, Ampere, Candela [3]. The length nibble 0x2 means square degrees, the mass nibble is cubic slugs. As a more realistic example, 0xF014 would be degrees/s.

Any nibble with value 0 means the unit isn't in use, so the example from the spec with value 0x00F0D121 is SI linear, units cm² g s⁻³ A⁻¹, which is... Voltage! Of course you knew that and totally didn't have to double-check with wikipedia.

Because bits are expensive and the base units are of course either too big or too small or otherwise not quite right, HID also provides a Unit Exponent item. The Unit Exponent item (a separate item to Unit in the Report Descriptor) then describes the exponent to be applied to the actual value in the report. For example, a Unit Eponent of -3 means 10⁻³ to be applied to the value. If the report descriptor specifies an item of Unit 0x00F0D121 (i.e. V) and Unit Exponent -3, the value of this item is mV (milliVolt), Unit Exponent of 3 would be kV (kiloVolt).

Now, in hindsight all this is pretty obvious and maybe even sensible. It'd have been nice if the spec would've explained it a bit clearer but then I would have nothing to write about, so I guess overall I call it a draw.

[1] This whole adventure was started because there's a touchpad out there that measures touch pressure in radians, so at least one other person out there struggled with the docs...
[2] The nibble value is twos complement (i.e. it's a signed 4-bit integer). Values 0x1-0x7 are exponents 1 to 7, values 0x8-0xf are exponents -8 to -1.
[3] English Linear should've trolled everyone and use Centimetres instead of Centimeters in SI Linear.

Unlocking LUKS2 volumes with TPM2, FIDO2, PKCS#11 Security Hardware on systemd 248

Posted by Lennart Poettering on January 12, 2021 11:00 PM

TL;DR: It's now easy to unlock your LUKS2 volume with a FIDO2 security token (e.g. YubiKey or Nitrokey FIDO2). And TPM2 unlocking is easy now too.

Blogging is a lot of work, and a lot less fun than hacking. I mostly focus on the latter because of that, but from time to time I guess stuff is just too interesting to not be blogged about. Hence here, finally, another blog story about exciting new features in systemd.

With the upcoming systemd v248 the systemd-cryptsetup component of systemd (which is responsible for assembling encrypted volumes during boot) gained direct support for unlocking encrypted storage with three types of security hardware:

  1. Unlocking with FIDO2 security tokens (well, at least with those which implement the hmac-secret extension, most do). i.e. your YubiKeys (series 5 and above), or Nitrokey FIDO2 and such.

  2. Unlocking with TPM2 security chips (pretty ubiquitous on non-budget PCs/laptops/…)

  3. Unlocking with PKCS#11 security tokens, i.e. your smartcards and older YubiKeys (the ones that implement PIV). (Strictly speaking this was supported on older systemd already, but was a lot more "manual".)

For completeness' sake, let's keep in mind that the component also allows unlocking with these more traditional mechanisms:

  1. Unlocking interactively with a user-entered passphrase (i.e. the way most people probably already deploy it, supported since about forever)

  2. Unlocking via key file on disk (optionally on removable media plugged in at boot), supported since forever.

  3. Unlocking via a key acquired through trivial AF_UNIX/SOCK_STREAM socket IPC. (Also new in v248)

  4. Unlocking via recovery keys. These are pretty much the same thing as a regular passphrase (and in fact can be entered wherever a passphrase is requested) — the main difference being that they are always generated by the computer, and thus have guaranteed high entropy, typically higher than user-chosen passphrases. They are generated in a way they are easy to type, in many cases even if the local key map is misconfigured. (Also new in v248)

In this blog story, let's focus on the first three items, i.e. those that talk to specific types of hardware for implementing unlocking.

To make working with security tokens and TPM2 easy, a new, small tool was added to the systemd tool set: systemd-cryptenroll. It's only purpose is to make it easy to enroll your security token/chip of choice into an encrypted volume. It works with any LUKS2 volume, and embeds a tiny bit of meta-information into the LUKS2 header with parameters necessary for the unlock operation.

Unlocking with FIDO2

So, let's see how this fits together in the FIDO2 case. Most likely this is what you want to use if you have one of these fancy FIDO2 tokens (which need to implement the hmac-secret extension, as mentioned). Let's say you already have your LUKS2 volume set up, and previously unlocked it with a simple passphrase. Plug in your token, and run:

# systemd-cryptenroll --fido2-device=auto /dev/sda5

(Replace /dev/sda5 with the underlying block device of your volume).

This will enroll the key as an additional way to unlock the volume, and embeds all necessary information for it in the LUKS2 volume header. Before we can unlock the volume with this at boot, we need to allow FIDO2 unlocking via /etc/crypttab. For that, find the right entry for your volume in that file, and edit it like so:

myvolume /dev/sda5 - fido2-device=auto

Replace myvolume and /dev/sda5 with the right volume name, and underlying device of course. Key here is the fido2-device=auto option you need to add to the fourth column in the file. It tells systemd-cryptsetup to use the FIDO2 metadata now embedded in the LUKS2 header, wait for the FIDO2 token to be plugged in at boot (utilizing systemd-udevd, …) and unlock the volume with it.

And that's it already. Easy-peasy, no?

Note that all of this doesn't modify the FIDO2 token itself in any way. Moreover you can enroll the same token in as many volumes as you like. Since all enrollment information is stored in the LUKS2 header (and not on the token) there are no bounds on any of this. (OK, well, admittedly, there's a cap on LUKS2 key slots per volume, i.e. you can't enroll more than a bunch of keys per volume.)

Unlocking with PKCS#11

Let's now have a closer look how the same works with a PKCS#11 compatible security token or smartcard. For this to work, you need a device that can store an RSA key pair. I figure most security tokens/smartcards that implement PIV qualify. How you actually get the keys onto the device might differ though. Here's how you do this for any YubiKey that implements the PIV feature:

# ykman piv reset
# ykman piv generate-key -a RSA2048 9d pubkey.pem
# ykman piv generate-certificate --subject "Knobelei" 9d pubkey.pem
# rm pubkey.pem

(This chain of commands erases what was stored in PIV feature of your token before, be careful!)

For tokens/smartcards from other vendors a different series of commands might work. Once you have a key pair on it, you can enroll it with a LUKS2 volume like so:

# systemd-cryptenroll --pkcs11-token-uri=auto /dev/sda5

Just like the same command's invocation in the FIDO2 case this enrolls the security token as an additional way to unlock the volume, any passphrases you already have enrolled remain enrolled.

For the PKCS#11 case you need to edit your /etc/crypttab entry like this:

myvolume /dev/sda5 - pkcs11-uri=auto

If you have a security token that implements both PKCS#11 PIV and FIDO2 I'd probably enroll it as FIDO2 device, given it's the more contemporary, future-proof standard. Moreover, it requires no special preparation in order to get an RSA key onto the device: FIDO2 keys typically just work.

Unlocking with TPM2

Most modern (non-budget) PC hardware (and other kind of hardware too) nowadays comes with a TPM2 security chip. In many ways a TPM2 chip is a smartcard that is soldered onto the mainboard of your system. Unlike your usual USB-connected security tokens you thus cannot remove them from your PC, which means they address quite a different security scenario: they aren't immediately comparable to a physical key you can take with you that unlocks some door, but they are a key you leave at the door, but that refuses to be turned by anyone but you.

Even though this sounds a lot weaker than the FIDO2/PKCS#11 model TPM2 still bring benefits for securing your systems: because the cryptographic key material stored in TPM2 devices cannot be extracted (at least that's the theory), if you bind your hard disk encryption to it, it means attackers cannot just copy your disk and analyze it offline — they always need access to the TPM2 chip too to have a chance to acquire the necessary cryptographic keys. Thus, they can still steal your whole PC and analyze it, but they cannot just copy the disk without you noticing and analyze the copy.

Moreover, you can bind the ability to unlock the harddisk to specific software versions: for example you could say that only your trusted Fedora Linux can unlock the device, but not any arbitrary OS some hacker might boot from a USB stick they plugged in. Thus, if you trust your OS vendor, you can entrust storage unlocking to the vendor's OS together with your TPM2 device, and thus can be reasonably sure intruders cannot decrypt your data unless they both hack your OS vendor and steal/break your TPM2 chip.

Here's how you enroll your LUKS2 volume with your TPM2 chip:

# systemd-cryptenroll --tpm2-device=auto --tpm2-pcrs=7 /dev/sda5

This looks almost as straightforward as the two earlier sytemd-cryptenroll command lines — if it wasn't for the --tpm2-pcrs= part. With that option you can specify to which TPM2 PCRs you want to bind the enrollment. TPM2 PCRs are a set of (typically 24) hash values that every TPM2 equipped system at boot calculates from all the software that is invoked during the boot sequence, in a secure, unfakable way (this is called "measurement"). If you bind unlocking to a specific value of a specific PCR you thus require the system has to follow the same sequence of software at boot to re-acquire the disk encryption key. Sounds complex? Well, that's because it is.

For now, let's see how we have to modify your /etc/crypttab to unlock via TPM2:

myvolume /dev/sda5 - tpm2-device=auto

This part is easy again: the tpm2-device= option is what tells systemd-cryptsetup to use the TPM2 metadata from the LUKS2 header and to wait for the TPM2 device to show up.

Bonus: Recovery Key Enrollment

FIDO2, PKCS#11 and TPM2 security tokens and chips pair well with recovery keys: since you don't need to type in your password everyday anymore it makes sense to get rid of it, and instead enroll a high-entropy recovery key you then print out or scan off screen and store a safe, physical location. i.e. forget about good ol' passphrase-based unlocking, go for FIDO2 plus recovery key instead! Here's how you do it:

# systemd-cryptenroll --recovery-key /dev/sda5

This will generate a key, enroll it in the LUKS2 volume, show it to you on screen and generate a QR code you may scan off screen if you like. The key has highest entropy, and can be entered wherever you can enter a passphrase. Because of that you don't have to modify /etc/crypttab to make the recovery key work.

Future

There's still plenty room for further improvement in all of this. In particular for the TPM2 case: what the text above doesn't really mention is that binding your encrypted volume unlocking to specific software versions (i.e. kernel + initrd + OS versions) actually sucks hard: if you naively update your system to newer versions you might lose access to your TPM2 enrolled keys (which isn't terrible, after all you did enroll a recovery key — right? — which you then can use to regain access). To solve this some more integration with distributions would be necessary: whenever they upgrade the system they'd have to make sure to enroll the TPM2 again — with the PCR hashes matching the new version. And whenever they remove an old version of the system they need to remove the old TPM2 enrollment. Alternatively TPM2 also knows a concept of signed PCR hash values. In this mode the distro could just ship a set of PCR signatures which would unlock the TPM2 keys. (But quite frankly I don't really see the point: whether you drop in a signature file on each system update, or enroll a new set of PCR hashes in the LUKS2 header doesn't make much of a difference). Either way, to make TPM2 enrollment smooth some more integration work with your distribution's system update mechanisms need to happen. And yes, because of this OS updating complexity the example above — where I referenced your trusty Fedora Linux — doesn't actually work IRL (yet? hopefully…). Nothing updates the enrollment automatically after you initially enrolled it, hence after the first kernel/initrd update you have to manually re-enroll things again, and again, and again … after every update.

The TPM2 could also be used for other kinds of key policies, we might look into adding later too. For example, Windows uses TPM2 stuff to allow short (4 digits or so) "PINs" for unlocking the harddisk, i.e. kind of a low-entropy password you type in. The reason this is reasonably safe is that in this case the PIN is passed to the TPM2 which enforces that not more than some limited amount of unlock attempts may be made within some time frame, and that after too many attempts the PIN is invalidated altogether. Thus making dictionary attacks harder (which would normally be easier given the short length of the PINs).

Postscript

(BTW: Yubico sent me two YubiKeys for testing and Nitrokey a Nitrokey FIDO2, thank you! — That's why you see all those references to YubiKey/Nitrokey devices in the text above: it's the hardware I had to test this with. That said, I also tested the FIDO2 stuff with a SoloKey I bought, where it also worked fine. And yes, you!, other vendors!, who might be reading this, please send me your security tokens for free, too, and I might test things with them as well. No promises though. And I am not going to give them back, if you do, sorry. ;-))

fwupd 1.5.5

Posted by Richard Hughes on January 11, 2021 10:39 AM

I’ve just released fwupd 1.5.5 with the following new features:

  • Add a plugin to update PixArt RF devices; the hardware this enables we’ll announce in a few weeks hopefully
  • Add new hardware to use the elantp (for TouchPads) and rts54hid (for USB Hubs) plugins
  • Allow specifying more than one VendorID for a device, which allows ATA devices to use the OUI-assigned vendor if set
  • Detect the AMD TSME encryption state for HSI-4 — use fwupdmgr security --force to help test
  • Detect the AMI PK test key is not installed for HSI-1 — a failure here is very serious
  • As usual, this release fixes quite a few bugs too:

  • Fix flashing a fingerprint reader that is in use; in theory the window to hit this is vanishingly small, but on some hardware we ask the user to authorise the request using the very device that we’re trying to update…
  • Fix several critical warnings when parsing invalid firmware, found using hongfuzz, warming my office on these cold winter days
  • Fix updating DFU devices that use DNLOAD_BUSY which fixes fwupd on some other future hardware support
  • Ignore the legacy UEFI OVMF dummy GUID so that we can test the dbx updates using qemu on older releases like RHEL
  • Make libfwupd more thread safe to fix a crash in gnome-software — many thanks to Philip Withnall for explaining a lot of the GMainContext threading complexities to me
  • We now never show unprintable chars from invalid firmware in the logs — as a result of fuzzing insane things the logs would often be full of gobbledygook, but no longer
  • I’m now building 1.5.5 into Fedora 33 and Fedora 32, packages should appear soon.

    Firefox – we’re finally getting HW acceleration on Linux

    Posted by Martin Stransky on January 10, 2021 08:40 PM
    <figure class="wp-block-image size-large"><figcaption>A first image from original WebRender article. Published three years ago.</figcaption></figure>

    Firefox 84.0 is a big milestone for Firefox Linux development as it comes with HW acceleration by default for some Linux users. Stock Mozilla Firefox 84.0 enables WebRender (HW accelerated backend) for Gnome/X.org and Gnome/Wayland will be supported in Firefox 85.0. Fedora is bit ahead and enables WebRender for Gnome/Wayland in Firefox 84.0 too.

    WebRender by default is restricted to AMD/Intel graphics cards as NVIDIA is known for various issues – both proprietary and Noveau drivers.

    And why it’s enabled in Gnome only for now? For instance KDE is also a popular desktop environment. I think it’s because Gnome utilizes HW acceleration so when Gnome works on your box there’s assumption that Firefox will work too. KDE provides choices how to disable/restrict HW acceleration setup (for instance it supports disabled screen compositing) and it’s more difficult to cover various scenarios.

    Another excluded group are XWayland users. It means you have Wayland as a desktop compositor but for some reasons you use X11 emulation layer and run Firefox as X11 application. It’s a valid scenario, Firefox with Wayland backend still suffers from some annoying bug, mostly related to popup windows.

    But don’t worry, Mozilla folks are going to bring WebRender to the most Linux users on various desktops and graphics. Jan created a brief Linux WebRender state overview. And you can help with it! Please check if you have WebRender enabled and eventually try to enable it. Test various web pages, video playback, WebGL and report your experience. You can use comments below or drop me a mail at stransky@redhat.com.

    <figure class="wp-block-image size-large"></figure>

    The first fully tested Fedora Firefox package

    Posted by Martin Stransky on January 08, 2021 11:11 AM
    <figure class="wp-block-image size-large"><figcaption>Mozilla testsuite running natively on Wayland. There is still a room for improvement 🙂</figcaption></figure>

    We hit a big milestone in Firefox deployment on Fedora with firefox-84.0.2 package. It’s the first fully tested Firefox package released to Fedora users. Let’s see what’s so exciting on it.

    Mozilla has a large testsuite as a part of development and release process. When any new patch hits Firefox repository, it’s built and tested for functional and speed regressions. The testsuite is also a developers nightmare as it contains some old and outdated test environments and it may be difficult to pass patches through it.

    So if you download a Mozilla Firefox binary you can be sure it’s generally working at least on X.org and Ubuntu 18.04 which is main test setup there. But what about distro builds?

    There are may differences how Firefox is built by various distributions. The very first difference is used compiler. Mozilla compiles by custom patched Clang which is not available for distributions although they actually don’t want it.

    GCC is the favorite compiler among Linux hackers and that brings extra maintenance burden to Firefox distro maintainers. Compiler crashes and code miscompilation are companions of every new GCC version (no to mention an extra fun with LTO).

    Distros also tend to modify Firefox sources with various integration patches or patches from Nightly, run Firefox on Wayland backend or use system NSS so maintaining such package is like dancing in a minefield.

    And there comes the Mozilla testsuite. If we manage to run it in our environment with our package modifications it greatly helps to reduce unwelcome surprises like Bug 1893474. And it needs to run in automated manner and test every Firefox build we may release to user on all arches.

    And that’s done now. Almost. The latest Firefox Fedora builds (F32, F33) run the testsuite as a part of the build process and test results are generated as a new package (firefox-testresults).

    If I say we’re almost there, I mean the most difficult work is done as the tests are run in Koji. There are still some failing tests which needs to be inspected and disabled or fixed. The testsuite needs to be run on Wayland – it checks X11 backend now. And it should be integrated to Fedora test infrastructure to show test results in some user-friendly way.

    WebRTC/Chromium updates in 2020

    Posted by Jan Grulich on December 18, 2020 12:49 PM

    In 2019, I started with my first contribution to WebRTC. This was all about screen sharing support on Linux Wayland sessions, using xdg-desktop-portal and PipeWire. Back then, it was quite simple, we only had PipeWire 0.2 and all portal backends supported only screen sharing (no window sharing). While this was relatively easy, it was not ideal as each screen sharing request involved two portal dialogs to get the screen content on the web page itself. For me it was a big success, because I made quite a significant contribution to such a big project, which is used by many people, and a project which is used by all modern web browsers.

    At the beginning of 2020, the year everyone would like to erase from their memories, we got PipeWire 0.3 (with slightly different API) and later with xdg-desktop-portal-gtk and xdg-desktop-portal-kde (later this year) people were finally able to share application windows. Support for all of this was lacking in WebRTC, because back then those were not available. I wanted to tackle all issues at once, bring support for window sharing and get rid of the “dialog hell” with portals, which was even worse with the new window sharing capabilities in portal backends.

    <figure class="wp-block-image size-large"></figure>

    This is what the situation looks like. With each request to share a screen, you got the preview dialog from Chromium. This dialog consists from three pages. One is for screen sharing making one portal request, second one is for window sharing, which is another portal request, and the last one is just to allow you to share a web page you have opened. You had to confirm both portal dialogs, then confirm the Chromium dialog and finally you got one more portal dialog (ouch) to get the screen content on the web page itself.

    I had a solution. I made all portal calls identified with an ID and shared this ID (portal call) in Chromium between both pages in the Chromium preview dialog and with the request made for the web page itself. With this solution we only had ONE portal dialog. This was a perfect solution (at least seemed to be). I started working on this at the beginning of this year, we exchanged many emails with people from Chromium UX team, because I wanted to do also some minor UI changes in the preview dialog. Unfortunately, those were rejected for consistency with all platforms. It was not a big deal and I submitted my changes for review, keeping UI as it was, just adding all necessary bits into Chromium and WebRTC to make it all work.

    I wish to say things went smoothly since then, but the opposite is true. It took a while to get everything reviewed, but this is probably no surprise with this year being weird and many people working from home with less than ideal conditions. Anyway, few months passed away, I ended up rewriting my changes many times, not even counting hours I spent on it. This all resulted into me being obsessed with this change, it mattered to me so much to get it merged. I was constantly thinking about how to make it better, I was many times fixing issues in the evening (as reviewers were mostly US based), instead spending time with my family. It would be even better to waste my time with my beloved Playstation. This had really negative impact on my mental health and I realized this has to stop and I simply gave up, because I couldn’t continue this way and needed a break. I abandoned both changes (WebRTC and Chromium) and decided to just pick changes I will be able to successfuly upstream. I probably made my change too ambitious and complicated or maybe it’s just Chromium not being ready for this kind of change, because some tweaks were specific for my use-case. It’s also hard to say I wish upstream devs had helped me more, because there is so much to understand around Wayland, portals and PipeWire and way how it all works together.

    Anyway, with a new start, without pressure after gaving up on the change, I picked the most important changes and submitted them separately. I was surprised now how smoothly this went and how fast those changes were upstreamed. Simply those changes were simple, understandable and easy to review. I didn’t gave up on fixing the “dialog hell” completely, I have some other ideas, but next time I will try to submit them step by step and will keep some distance and my free time.

    And what are the changes you can expect in upcoming Chromium release in 2021?

    Support for PipeWire 0.3

    You can now build Chromium/WebRTC with both PipeWire 0.2 and Pipewire 0.3. There is a new “rtc_pipewire_version” option you can pass to your build configs.

    Window sharing support

    There is probably no description needed. You will be able to share application windows in case you don’t want to share whole screen.

    Suppport for DmaBuf and MemFd buffer types

    This should allow faster transfer of your screen content from your Wayland compositor, through PipeWire to your browser.

    Less portal dialogs involved

    If you look back into the screenshot I posted above, you can see there are two portal dialogs opened just for the Chromium preview dialog. I at least tried to reduce this to just one portal dialog. This was done by removing the page for window sharing, because the screen share request will already handle both screen and windows.

    I think you can expect above mentioned changes in Chromium 89 and I hope you will at least appreciate some of these improvements even though I didn’t deliver everything I wanted to. Also, thanks to Martin Stránský from our Firefox team, you can expect all these changes to be also part of Firefox.

    Happy holidays and see you in a better year.

    Understanding systemd-resolved, Split DNS, and VPN Configuration

    Posted by Michael Catanzaro on December 18, 2020 01:21 AM

    So, systemd-resolved is enabled by default in Fedora 33. Most users won’t notice the difference, but if you use VPNs — or depend on DNSSEC, more on that at the bottom of this post — then systemd-resolved might be big deal for you. When testing Fedora 33, we found one bug report where a user discovered that systemd-resolved broke his VPN configuration. After this bug was fixed, and nobody reported any further issues, I was pretty confident that migration to systemd-resolved would go smoothly. Then Fedora 33 was released, and I noticed a significant number of users on Ask Fedora and Reddit asking for help with broken VPNs, problems that Fedora 33 beta testers had failed to detect. This was especially surprising to me because Ubuntu has enabled systemd-resolved by default since Ubuntu 16.10, so we were four full years behind Ubuntu here, which should have been plenty of time for any problems to be ironed out. So what went wrong?

    First, let’s talk about how things worked before systemd-resolved, so we can see what was wrong and why we needed change. We’ll see how split DNS with systemd-resolved is different than traditional DNS. Finally, we’ll learn how custom VPN software must configure systemd-resolved to avoid problems that result in broken DNS.

    I want to note that, although I wrote the Fedora change proposal and have done some evangelism on behalf of systemd-resolved, I’m not a systemd developer and haven’t contributed any code to systemd-resolved.

    Traditional DNS with nss-dns

    Let’s first see how things worked before systemd-resolved. There are two important configuration files to discuss. The first is /etc/nsswitch.conf, which controls which NSS modules are invoked by glibc when performing name resolution. Note these are glibc Name Service Switch modules, which are totally unrelated to Firefox’s NSS, Network Security Services, which unfortunately uses the same acronym. Also note that, in Fedora (and also Red Hat Enterprise Linux), /etc/nsswitch.conf is managed by authselect and must not be edited directly. If you want to change it, you need to edit /etc/authselect/user-nsswitch.conf instead, then run sudo authselect apply-changes.

    Anyway, in Fedora 32, the hosts line in /etc/nsswitch.conf looked like this:

    hosts: files mdns4_minimal [NOTFOUND=return] dns myhostname

    That means: first invoke nss-files, which looks at /etc/hosts to see if the hostname is hardcoded there. If it’s not, then invoke nss-mdns4_minimal, which uses avahi to implement mDNS resolution. [NOTFOUND=return] means it’s OK for avahi to not be installed; in that case, it just gets ignored. (Edit: this was wrong. Mantas mentioned in the comment below that this is to allow returning early for queries to .local domains, which should never go to the remaining nss modules.) Then most DNS resolution is performed by nss-dns. And finally, we have nss-myhostname, which is just there to guarantee that your own local hostname is always resolvable. Anyway, nss-dns is the key part here. nss-dns is what reads /etc/resolv.conf.

    Next, let’s look at /etc/resolv.conf. This file contains a list of up to three DNS servers to use. The servers are attempted in order. If the first server in the list is broken, then the second server will be used. If the second server is broken, the third server will be used. If the third server is also broken, then everything fails, because no matter how many servers you list here, all except the first three are ignored. In Fedora 32, /etc/resolv.conf was, by default, a plain file managed by NetworkManager. It might look like this:

    # Generated by NetworkManager
    nameserver 192.168.122.1

    That’s a pretty common example. It means that all DNS requests should be sent to my router. My router must have configured this via DHCP, causing NetworkManager to dutifully add it to /etc/resolv.conf.

    Traditional DNS Problems

    Traditional DNS is all well and good for a simple case like we had above, but turns out it’s really broken once you start adding VPNs to the mix. Let’s consider two types of VPNs: a privacy VPN that is always enabled and which is the default route for all web traffic, and a corporate VPN that only receives traffic for internal company resources. (To switch between these two different types of VPN configuration, use the checkbox “Use this connection only for resources on its network” at the bottom of the IPv4 and IPv6 tabs of your VPN’s configuration in System Settings.)

    Now, what happens if we connect to both VPNs? The VPN that you connect to first gets listed first in /etc/resolv.conf, followed by the VPN that you connect to second, followed by your local DNS server. Assuming the DNS servers are all working properly, that means:

    • If you connect to your privacy VPN first and your corporate VPN second, all DNS requests will be sent to your privacy VPN, and you won’t be able to visit internal corporate websites. (This scenario is exactly why I become interested in systemd-resolved. After joining Red Hat, I discovered that I couldn’t access various redhat.com websites if I connected to my VPNs in the wrong order.)
    • If you connect to your corporate VPN first and your privacy VPN second, then all your DNS goes to your corporate VPN, and none to your privacy VPN. As that defeats the point of using the privacy VPN, we can be confident it’s not what users expect to happen.
    • If you ever connect the VPNs in the opposite order — say, if your connection to one temporarily drops, and you need to reconnect — then you’ll get the opposite behavior. If you don’t notice this pattern behind the failures, it can make problems difficult to reproduce.

    You don’t need two VPNs for this to be a problem, of course. Let’s say you have no privacy VPN, only a corporate VPN.  Well, your employer may fire you if it notices DNS requests it doesn’t like. If you’re making 30 requests per hour to facebook.com, youtube.com, or more salacious websites, that sure looks like you’re not doing very much work. It’s really never in the employee’s best interests to send more DNS than necessary to an employer.

    If you use only a privacy VPN, the failure case is arguably even more severe. Let’s say your privacy VPN’s DNS server temporarily goes offline. Then, because /etc/resolv.conf is a list, glibc will fall back to using your normal DNS, probably either your ISP’s DNS server, or your router that forwards everything to your ISP. And now your DNS query has gone to your ISP. If you’re making the wrong sort of DNS requests in the wrong sort of countries — say, if you’re visiting websites opposed to your government — this could get you imprisoned or executed.

    Finally, either type of VPN will break resolution of local domains, e.g. fritz.box, because only your router can resolve that properly, but you’re sending your DNS query to your VPN’s DNS server. So local resources will be broken for as long as you’re connected to a VPN.

    All things considered, the status quo prior to systemd-resolved was pretty terrible. The need for something better should be clear. Now let’s look at how systemd-resolved fixes this.

    Modern DNS with nss-resolve

    First, let’s look at /etc/nsswitch.conf, which looks a bit different in Fedora 33:

    hosts: files mdns4_minimal [NOTFOUND=return] resolve [!UNAVAIL=return] myhostname dns

    nss-myhostname and nss-dns have switched places, but that’s just a minor change that ensures your local hostname is always local even if your DNS server thinks otherwise. (March 2021 Update: nss-myhostname has been moved before nss-mdns4_minimal for Fedora 34, so our new configuration is files myhostname mdns4_minimal [NOTFOUND=return] resolve [!UNAVAIL=return] dns.)

    The important change here is the addition of resolve [!UNAVAIL=return]. nss-resolve uses systemd-resolved to resolve hostnames, via either its varlink API (with systemd 247) or its D-Bus API (with older versions of systemd). If systemd-resolved is running, glibc will stop there, and refuse to continue on to nss-myhostname or nss-dns even if nss-resolve doesn’t return a result, since both nss-myhostname and nss-dns are obsoleted by nss-resolve. But if systemd-resolved is not running, then it continues on (and, if resolving something other than the local hostname, will up using nss-dns and reading /etc/resolv.conf, as before).

    Importantly, when nss-resolve is used, glibc does not read /etc/resolv.conf when performing name resolution, so any configuration that you put there is totally ignored. That means any script or program that writes to /etc/resolv.conf is probably broken. /etc/resolv.conf still exists, though: it’s managed by systemd-resolved to maintain compatibility with programs that manually read /etc/resolv.confand do their own name resolution, bypassing glibc. Although systemd-resolved supports several different modes for managing /etc/resolv.conf, the default mode, and the mode used in both Fedora and Ubuntu, is for /etc/resolv.conf to be a symlink to /run/systemd/resolve/stub-resolv.conf, which now looks like this:

    # This file is managed by man:systemd-resolved(8). Do not edit.
    #
    # This is a dynamic resolv.conf file for connecting local clients to the
    # internal DNS stub resolver of systemd-resolved. This file lists all
    # configured search domains.
    #
    # Run "resolvectl status" to see details about the uplink DNS servers
    # currently in use.
    #
    # Third party programs should typically not access this file directly, but only
    # through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
    # different way, replace this symlink by a static file or a different symlink.
    #
    # See man:systemd-resolved.service(8) for details about the supported modes of
    # operation for /etc/resolv.conf.
    
    nameserver 127.0.0.53
    options edns0 trust-ad
    search redhat.com lan

    The redhat.com search domain is coming from my corporate VPN, but the rest of this /etc/resolv.conf should look like yours. Notably, 127.0.0.53 is systemd-resolved’s local stub responder. This allows programs that manually read /etc/resolv.conf to continue to work without changes: they will just wind up talking to systemd-resolved on 127.0.0.53 rather than directly connecting to your real DNS server, as before.

    A Word about Ubuntu

    Although Ubuntu has used systemd-resolved for four years now, it has not switched from nss-dns to nss-resolve, contrary to upstream recommendations. This means that on Ubuntu, glibc still reads /etc/resolv.conf, finds 127.0.0.53 listed there, and then makes an IP connection to systemd-resolved rather than talking to it via varlink or D-Bus, as occurs on Fedora. The practical effect is that, on Ubuntu, you can still manually edit /etc/resolv.conf and applications will respond to those changes, unlike Fedora. Of course, that would be a disaster, since it would cause all of your DNS configuration in systemd-resolved to be completely ignored. But it’s still possible on Ubuntu. On Fedora, that won’t work at all.

    If you’re using custom VPN software that doesn’t work with systemd-resolved, chances are it probably tries to write to /etc/resolv.conf.

    Split DNS with systemd-resolved

    OK, so now we’ve looked at how /etc/nsswitch.conf and /etc/resolve.conf have changed, but we haven’t actually explained how split DNS is configured. Instead of sending all your DNS requests to the first server listed in /etc/resolv.conf, systemd-resolved is able to split your DNS on the basis of DNS routing domains.

    IP Routing Domains, DNS Routing Domains, and DNS Search Domains: Oh My!

    systemd-resolved works with DNS routing domains and DNS search domains. A DNS routing domain determines only which DNS server your DNS query goes to.  It doesn’t determine where IP traffic goes to: that would be an IP routing domain. Normally, when people talk about “routing domains,” they probably mean IP routing domains, not DNS routing domains, so be careful not to confuse these two concepts. For the rest of this article, I will use “routing domain” or “DNS domain” to mean DNS routing domain.

    A DNS search domain is also different. When you query a name that is only a single label — a domain without any dots — a search domain gets appended to your query. For example, because I’m currently connected to my Red Hat VPN, I have a search domain configured for redhat.com. This means that if I make a query to a domain that is only a single label, redhat.com will be appended to the query. For example, I can query bugzilla and this will be treated as a query for bugzilla.redhat.com. This probably won’t work in your web browser, because web browsers like to convert single-label domains into web searches, but it does work at the DNS level.

    In systemd-resolved, each DNS routing domain may or may not be used as a search domain. By default, systemd-resolved will add search domains for every configured routing domain that is not prefixed by a tilde. For example, ~example.com is a routing domain only, while example.com is both a routing domain and a search domain. There is also a global routing domain,  ~.

    Example Split DNS Configurations

    Let’s look at a complex example with three network interfaces:

    $ resolvectl
    Global
    Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
    resolv.conf mode: stub
    
    Link 2 (enp4s0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6 
    Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
    Current DNS Server: 192.168.1.1 
    DNS Servers: 192.168.1.1 
    DNS Domain: lan
    
    Link 5 (tun0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6 
    Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
    Current DNS Server: 10.8.0.1 
    DNS Servers: 10.8.0.1 
    DNS Domain: ~.
    
    Link 9 (tun1)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6 
    Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
    Current DNS Server: 10.9.0.1 
    DNS Servers: 10.9.0.1 10.9.0.2
    DNS Domain: example.com

    To simplify this example, I’ve removed several uninteresting network interfaces from the output above: my unused second Ethernet interface, my unused Wi-Fi interface wlp5s0, and two virtual network interfaces that I presume are used by libvirt. This means we only have three interfaces to consider: normal Ethernet enp4s0, the privacy VPN tun0, and the corporate VPN tun1. I’m currently running NetworkManager 1.26.4, so I have also fudged the output a bit to make it look like it would if I were using NetworkManager 1.26.6 — I’ll discuss the difference below — so that this example will be good for the future. Let’s look at a few points of note:

    • enp4s0 is configured with +DefaultRoute and no routing domains.
    • tun0 is configured with +DefaultRoute and a global routing domain, ~.
    • tun1 is configured with -DefaultRoute and a routing domain for example.com. (It also has a search domain for example.com, because it doesn’t start with a tilde.)

    systemd-resolved first decides which network interface is most appropriate for your DNS query based on the domain name you are querying, then sends your query to the DNS server associated with that interface. In this case, queries for example.com, foo.example.com, etc. will be sent to 10.9.0.1, since that is the DNS server configured for tun1, which is associated with the domain example.com. All other requests go to 10.8.0.1, since tun0 has the global domain ~. Nothing ever goes to 192.168.1.1, because a privacy VPN is enabled, and that would be a privacy disaster. Very simple, right?

    If you do not use a privacy VPN, you will not have any ~. domain configured. In this case, your query will go to all interfaces that have +DefaultRoute. For example, if tun0 were removed from the above configuration, then queries not for example.com would be sent to 192.168.1.1, my router, which is good because tun1 is my corporate VPN and should only receive DNS queries corresponding to its own DNS domains.

    Enter NetworkManager

    How does systemd-resolved come up with the above configuration? It doesn’t. Everything I wrote in the previous section assumes that you are using NetworkManager, because systemd-resolved doesn’t actually make any decisions about where to send your DNS. That is all the responsibility of higher-level network management software, typically NetworkManager. If you use custom VPN software — anything that’s not a NetworkManager VPN plugin — then that software is also responsible for configuring systemd-resolved and playing nice with NetworkManager.

    NetworkManager normally does a very good job of configuring systemd-resolved to work as you would expect, so most users should not need to make any changes. But if your DNS isn’t working as you expect, and you run resolvectl and find that systemd-resolved’s configuration is not what you want, do not report a bug against systemd-resolved! Report a bug against NetworkManager instead (if you’re confident there is a real bug).

    If you don’t use NetworkManager, you can still make systemd-resolved do what you want, but you’re on your own. It will not configure itself for you.

    NetworkManager 1.26.6

    If you’re reading this in December 2020, you’re probably using NetworkManager 1.26.4 or earlier. Things are slightly different here, because NetworkManager recently landed a major behavior change. Previously, NetworkManager would always configure a ~. domain for exactly one network interface. This means that the value of systemd-resolved’s DefaultRoute settings was always ignored, since ~. takes precedence. Accordingly, NetworkManager did not bother to configure DefaultRoute at all. I told you that I fudged the output of the example above a little. In actuality, NetworkManager 1.26.4 has configured +DefaultRoute on my tun1 corporate VPN. That doesn’t make sense, because it should only receive DNS for example.com, but it previously did not matter, because there was previously always a ~. domain on some interface. If you’re not using any VPNs, then your Ethernet or Wi-Fi interface would receive the ~. domain. But since 1.26.6, NetworkManager now only ever configures a ~. domain when you are using a privacy VPN, so the DefaultRoute setting now matters.

    Prior to NetworkManager 1.26.6, you could rely on resolvectl domain alone to see where your DNS goes, because there was always a ~. domain. Since NetworkManager 1.26.6 no longer always creates a ~. domain, that no longer works. You’ll need to use look at the full output of resolvectl instead, since that will show you the DefaultRoute settings, which are now important.

    My Corporate VPN is Missing a Routing Domain, What Should I Do?

    Say your corporate VPN is example.com. You want all requests for example.com to be resolved by the VPN, and they are, because NetworkManager creates an appropriate routing domain for it. But you also want requests for some other domain, say example.org, to be resolved by the VPN as well. What do you do?

    Most VPN protocols allow the VPN to tell NetworkManager which domains should be resolved by the VPN. Others allow specifying this in the connection profile that you import into NetworkManager. Sadly, not all VPNs actually do this properly, since it doesn’t matter for traditional non-split DNS. Worse, there is no graphical configuration in GNOME System Settings to fix this. There really should be. But for now, you’ll have to use nmcli to set the ipv4.dns-search and ipv6.dns-search properties of your VPN connection profile. Confusingly, even though that setting says “search,” it also creates a routing domain. Hopefully you never have to mess with this. If you do this, consider contacting your IT department to ask them to fix your VPN configuration to properly declare its DNS routing domains, so you don’t have to fix it manually. (This actually sometimes works!) You might have to do this more than once, if you discover additional domains that need to be resolved by the corporate VPN.

    Custom VPN Software

    By “custom VPN software,” I mean any VPN that is not a NetworkManager plugin. That includes proprietary VPN applications offered by VPN services, and also packaged software like openvpn or wg-quick, when invoked by something other than NetworkManager.

    If your custom VPN software is broken, you could report a bug against your VPN software to ask for support for systemd-resolved, but it’s really best to ditch your custom software and configure your VPN using NetworkManager instead, if possible. There are really only two good reasons to use custom VPN software: if NetworkManager doesn’t have a plugin appropriate for your corporate VPN, or if you need to use Wireguard and your desktop doesn’t support Wireguard yet. (NetworkManager itself supports Wireguard, but GNOME does not yet, because Wireguard is special and not treated the same as other VPNs. Help welcome.)

    If you use NetworkManager to configure your VPN, as desktop developers intend for you to do, then NetworkManager will take care of configuring systemd-resolved appropriately. Fedora ships with several NetworkManager VPN plugins installed by default, so the vast majority of VPN users should be able to configure your VPN directly in System Settings. This also allows you to control your VPN using your desktop environment’s VPN integration, rather than using the command line or a custom proprietary application.

    OpenVPN users will want to look into using the unofficial update-systemd-resolved script. However, NetworkManager has good support for OpenVPN, and this is totally unnecessary if you configure your VPN with NetworkManager. So it’s probably better to use NetworkManager instead.

    Now, what if you maintain custom VPN software and want it to work properly with systemd-resolved, or what if you can’t use NetworkManager for whatever reason? First, stop trying to write to /etc/resolv.conf, at least if it’s managed by systemd-resolved. You’ll instead want to use the systemd-resolved D-Bus API to configure an appropriate routing domain for your VPN interface. Read this documentation. You could also shell out to resolvectl, but it’s probably better to use the D-Bus API unless your VPN is managed by a shell script. Privacy VPNs (or corporate VPNs that wish to eschew split DNS and hijack all the user’s DNS) can also use the resolvconf compatibility script, but note this will only work properly with NetworkManager 1.26.6 and newer, because the best you can do with it is add a global routing domain to a network interface, but that’s not going to work as expected if another network interface already has a global routing domain. Did I mention that you might want to use the D-Bus API instead? With the D-Bus API, you can remove the global routing domain from any other network interfaces, to ensure only your VPN’s interface gets a global routing domain.

    Split DNS Without systemd-resolved

    Quick tangent: systemd-resolved is not the only software available that implements split DNS. Previously, the most popular solution for this was to use dnsmasq. This has always been available in Fedora, but you had to go out of your way to install and configure it, so almost nobody did. Other custom solutions were possible too — I know one developer who runs Unbound locally — but systemd-resolved and dnsmasq are the only options supported by NetworkManager.

    One significant difference between systemd-resolved and dnsmasq is that systemd-resolved, as a system daemon, allows for multiple sources of configuration. In contrast, NetworkManager runs dnsmasq as a subprocess, so only NetworkManager itself is allowed to configure dnsmasq. For most users, this distinction will not matter, but it’s important for custom VPN software.

    Servers and DNSSEC

    You might have noticed that the rest of this blog post focused pretty much exclusively on desktop use cases. Your server is probably not using a VPN. It’s probably not using mDNS. It’s probably not expected to be able to resolve local hostnames. Conclusion: most servers don’t need split DNS! Servers do benefit from systemd-resolved’s systemwide DNS cache, so running systemd-resolved on servers is still a good idea. But it’s not nearly as important for servers as it is for desktops.

    There are some disadvantages for servers as well. First, systemd-resolved is not intended to be used on DNS servers. If you’re running a DNS server, you’ll need to disable systemd-resolved before setting up BIND or Unbound instead. That is one extra step to get your DNS server working relative to before, so enabling systemd-resolved by default is an inconvenience here, but that’s hardly difficult to do, so not a big deal.

    However, systemd-resolved currently has several bugs in how it handles DNSSEC, and this is potentially a big deal if you depend on that. If you’re a desktop user, you’ll probably never notice, because DNSSEC on desktops is a total failure. Due to widespread and unfixable compatibility issues, it’s very unlikely that we would be able to enable DNSSEC validation by default in the next 10-15 years. If you have a desktop computer that never leaves your home and a good ISP, or a server sitting in a data center, then you can probably safely turn it on manually in /etc/systemd/resolved.conf, but this is highly inadvisable for laptops. So DNSSEC is currently useful for securing DNS between DNS servers, but not for securing DNS between you devices and your DNS server.  (For that, we plan to use DNS over TLS instead.) And we’ve already established that DNS servers should not use systemd-resolved. So what’s the problem?

    Well, it turns out DNS servers are not the only server software that expects DNSSEC to work properly. In particular, broken DNSSEC can result in broken mail servers. Other stuff might break too. If you’re running a server that needs functional DNSSEC, you’re going to need to disable systemd-resolved for now. These problems with DNSSEC resulted in some extremely vocal opposition to the Fedora 33 systemd-resolved change proposal, which unfortunately we didn’t properly appreciate until too late in the Fedora 33 development cycle. The good news is that these problems are being treated as bugs to be fixed. In particular, I am keeping an eye on this bug and this bug. Development is currently very active, so I’m hopeful that systemd-resolved’s DNSSEC support will look much better in time for Fedora 34.

    Tell Me More!

    Wow, you made it to the end of a long blog post, and you still want to know more? Next step is to read my colleague Zbigniew’s Fedora Magazine article, which describes some of the concepts I’ve already mentioned in greater detail. (However, when reading that article, be aware of the NetworkManager 1.26.6 changes I mentioned above. The article predates NetworkManager 1.26.6, so you will see in the examples that a ~. global routing domain is assigned to non-VPN interfaces. That will no longer happen.)

    Conclusion

    Split DNS is designed to just work, like the rest of the modern Linux desktop, and it should for everyone not using custom VPN software. If you do run into trouble with custom VPN software, the bottom line is to try using a NetworkManager VPN plugin instead, if possible. In the short term, you will also need to disable systemd-resolved if you depend on DNSSEC, but hopefully that won’t be necessary for much longer. Everyone else should hopefully never notice that systemd-resolved is there.

    Happy resolving!

    fwupd 1.5.3

    Posted by Richard Hughes on December 08, 2020 10:48 AM

    Today we released fwupd 1.5.3 which has the usual smattering of fixes and enhancements you’d expect. One notable fix is that we now allow setting the GMainContext when used for sync methods, as some people reported problems with the way we implemented the sync libfwupd methods in previous 1.5.x releases. We’re now defaulting to the default thread context allowing the app to override if required, which seems to fix a lot of problems.

    We’ve also merged some support code to support PS/2 devices. This included adding the device firmware ID for serio class hardware. In general I’m happy to help vendors with patches that affect the core parts of fwupd (e.g. things inside ./libfwupdplugin or ./src) but plugins themselves should now either be written by the IHV or by a consulting company employed by the IHV, ODM or OEM. There are now dozens of companies adding support for new hardware all at the same time and although I’m happy to review code, I still can’t write it all :) There are lots of consulting companies to choose from now.

    We’ve also added back some compatibility code that allows apps running with old shared versions of libfwupd talk to a new running fwupd daemon – which shouldn’t be possible but then Snap decided to do just that and everything exploded.

    Aleksander also merged a few patches to fix firmware updates over QMI on some hardware and to validate firmware update method combinations on mobile broadband hardware.

    In the last release we also switched from libsoup to curl, but accidentally broke the RHEL build as it doesn’t quite have a new enough libcurl for us to use. There is now fallback code in place for these older versions.

    It's templates all the way down - part 3

    Posted by Peter Hutterer on December 04, 2020 04:00 AM

    In Part 1 I've shown you how to create your own distribution image using the freedesktop.org CI templates. In Part 2, I've shown you how to truly build nested images. In this part, I'll talk about the ci-fairy tool that is part of the same repository of ci-templates.

    When you're building a CI pipeline, there are some tasks that most projects need in some way or another. The ci-fairy tool is a grab-bag of solutions for these. Some of those solutions are for a pipeline itself, others are for running locally. So let's go through the various commands available.

    Using ci-fairy in a pipeline

    It's as simple as including the template in your .gitlab-ci.yml file.


    include:
    - 'https://gitlab.freedesktop.org/freedesktop/ci-templates/-/raw/master/templates/ci-fairy.yml'
    Of course, if you want to track a specific sha instead of following master, just sub that sha there. freedesktop.org projects can include ci-fairy like this:

    include:
    - project: 'freedesktop/ci-templates'
    ref: master
    file: '/templates/ci-fairy.yml'
    Once that's done, you have access to a .fdo.ci-fairy job template that you can extends from. This will download an image from quay.io that is capable of git, python, bash and obviously ci-fairy. This image is a fixed one and referenced by a unique sha so even if where we keep working on ci-fairy upstream you should never see regression, updating requires you to explicitly update the sha of the included ci-fairy template. Obviously, if you're using master like above you'll always get the latest.

    Due to how the ci-templates work, it's good to set the FDO_UPSTREAM_REPO variable with the upstream project name. This means ci-fairy will be able to find the equivalent origin/master branch, where that's not available in the merge request. Note, this is not your personal fork but the upstream one, e.g. "freedesktop/ci-templates" if you are working on the ci-templates itself.

    Checking commit messages

    ci-fairy has a command to check commits for a few basic expectations in commit messages. This currently includes things like enforcing a 80 char subject line length, that there is an empty line after the subject line, that no fixup or squash commits are in the history, etc. If you have complex requirements you need to write your own but for most projects this job ensures that there are no obvious errors in the git commit log:


    check-commit:
    extends:
    - .fdo.ci-fairy
    script:
    - ci-fairy check-commits --signed-off-by
    except:
    - master@upstream/project
    Since you don't ever want this to fail on an already merged commit, exclude this job the master branch of the upstream project - the MRs should've caught this already anyway.

    Checking merge requests

    To rebase a contributors merge request, the contributor must tick the checkbox to Allow commits from members who can merge to the target branch. The default value is off which is frustrating (gitlab is working on it though) and causes unnecessary delays in processing merge requests. ci-fairy has command to check for this value on an MR and fail - contributors ideally pay attention to the pipeline and fix this accordingly.


    check-merge-request:
    extends:
    - .fdo.ci-fairy
    script:
    - ci-fairy check-merge-request --require-allow-collaboration
    allow_failure: true
    As a tip: run this job towards the end of the pipeline to give collaborators a chance to file an MR before this job fails.

    Using ci-fairy locally

    The two examples above are the most useful ones for CI pipelines, but ci-fairy also has some useful local commands. For that you'll have to install it, but that's as simple as


    $ pip3 install git+http://gitlab.freedesktop.org/freedesktop/ci-templates
    A big focus on ci-fairy for local commands is that it should, usually, be able to work without any specific configuration if you run it in the repository itself.

    Linting

    Just hacked on the CI config?


    $ ci-fairy lint
    and done, you get the same error back that the online linter for your project would return.

    Pipeline checks

    Just pushed to the repo?


    $ ci-fairy wait-for-pipeline
    Pipeline https://gitlab.freedesktop.org/username/project/-/pipelines/238586
    status: success | 7/7 | created: 0 | pending: 0 | running: 0 | failed: 0 | success: 7 ....
    The command is self-explanatory, I think.

    Summary

    There are a few other parts to ci-fairy including templating and even minio handling. I recommend looking at e.g. the libinput CI pipeline which uses much of ci-fairy's functionality. And the online documentation for ci-fairy, who knows, there may be something useful in there for you.

    The useful contribution of ci-fairy is primarily that it tries to detect the settings for each project automatically, regardless of whether it's run inside a MR pipeline or just as part of a normal pipeline. So the same commands will work without custom configuration on a per-project basis. And for many things it works without API tokens, so the setup costs are just the pip install.

    If you have recurring jobs, let us know, we're always looking to add more useful functionality to this little tool.

    Accurate Conclusions from Bogus Data: Methodological Issues in “Collaboration in the open-source arena: The WebKit case”

    Posted by Michael Catanzaro on November 29, 2020 11:46 PM

    Nearly five years ago, when I was in grad school, I stumbled across the paper Collaboration in the open-source arena: The WebKit case when trying to figure out what I would do for a course project in network theory (i.e. graph theory, not computer networking; I’ll use the words “graph” and “network” interchangeably). The paper evaluates collaboration networks, which are graphs where collaborators are represented by nodes and relationships between collaborators are represented by edges. Our professor had used collaboration networks as examples during lecture, so it seemed at least mildly relevant to our class, and I wound up writing a critique on this paper for the class project. In this paper, the authors construct collaboration networks for WebKit by examining the project’s changelog files to define relationships between developers. They perform “community detection” to visually group developers who work closely together into separate clusters in the graphs. Then, the authors use those graphs to arrive at various conclusions about WebKit (e.g. “[e]ven if Samsung and Apple are involved in expensive patent wars in the courts and stopped collaborating on hardware components, their contributions remained strong and central within the WebKit open source project,” regarding the period from 2008 to 2013).

    At the time, I contacted the authors to let them know about some serious problems I found with their work. Then I left the paper sitting in a short-term to-do pile on my desk, where it has been sitting since Obama was president, waiting for me to finally write this blog post. Unfortunately, nearly five years later, the authors’ email addresses no longer work, which is not very surprising after so long — since I’m no longer a student, the email I originally used to contact them doesn’t work anymore either — so I was unable to contact them again to let them know that I was finally going to publish this blog post. Anyway, suffice to say that the conclusions of the paper were all correct; however, the networks used to arrive at those conclusions suffered from three different mistakes, each of which was, on its own, serious enough to invalidate the entire work.

    So if the analysis of the networks was bogus, how did the authors arrive at correct conclusions anyway? The answer is confirmation bias. The study was performed by visually looking at networks and then coming to non-rigorous conclusions about the networks, and by researching the WebKit community to learn what is going on with the major companies involved in the project. The authors arrived at correct conclusions because they did a good job at the later, then saw what they wanted to see in the graphs.

    I don’t want to be too harsh on the authors of this paper, though, because they decided to publish their raw data and methodology on the internet. They even published the python scripts they used to convert WebKit changelogs into collaboration graphs. Had they not done so, there is no way I would have noticed the third (and most important) mistake that I’ll discuss below, and I wouldn’t have been able to confirm my suspicions about the second mistake. You would not be reading this right now, and likely nobody would ever have realized the problems with the paper. The authors of most scientific papers are not nearly so transparent: many researchers today consider their source code and raw data to be either proprietary secrets to be guarded, or simply not important enough to merit publication. The authors of this paper deserve to be commended, not penalized, for their openness. Mistakes are normal in research papers, and open data is by far the best way for us to be able to detect mistakes when they happen.

    <figure aria-describedby="caption-attachment-9269" class="wp-caption aligncenter" id="attachment_9269" style="width: 684px">Collaboration network<figcaption class="wp-caption-text" id="caption-attachment-9269">A collaboration network from the paper. The paper reports that this network represents collaboration between September 2008 (when Google began contributing to WebKit) and February 2011 (the departure of Nokia from the project). Because the authors posted their data online, I noticed that this was a mistake in the paper: the graph actually represents the period between February 2011 and July 2012. The paper’s analysis of this graph is therefore questionable, but note this was only a minor mistake compared to the three major mistakes that impact this network. Note the suspiciously-high number of unaffiliated (“Other”) contributors in a corporate-dominated project.</figcaption></figure>

    The rest of this blog post is a simplified version of my original school paper from 2016. I’ve removed maybe half the original content, including some flowery academic language and unnecessary references to class material (e.g. “community detection was performed using fast modularity maximization to generate an alternate visualization of the network,” good for high scores on class papers, not so good for blog posts). But rewriting everything to be informal takes a long time, and I want to finish this today so it’s not still on my desk five more years from now, so the rest of this blog post is still going to be much more formal than normal. Oh well. Tone shift now!

    We (“we” means “I”) examine various methodological issues discovered by analyzing the paper. The first section discusses the effects on the collaboration network of choosing a poor definition of collaboration. The second section discusses a major source of error in detecting the company affiliation of many contributors. The third section describes a serious mistake in the data collection process. Each of these issues is quite severe, and any one alone calls into question the validity of the entire study. It must be noted that such issues are not necessarily unique to this paper, and must be kept in mind for all future studies that utilize collaboration networks.

    Mistake #1: Poorly-defined Collaboration

    The precise definition used to model collaboration has tremendous impact on the usefulness of the resultant collaboration network. Many collaboration networks are built using definitions of collaboration that are self-evidently useful, where there is little doubt that edges in the network represent real-world collaboration. The paper adopts an approach to building collaboration networks where developers are represented by nodes, and an edge exists between two nodes if the corresponding developers modified the same source code file during the time period under consideration for the construction of the network. However, it is not clear that this definition of collaboration is actually useful. Consider that it is a regular occurrence for developers who do not know each other and may have never communicated to modify the same files. Consider also that modifying a common file does not necessarily reflect any shared interest in a particular portion of the software project. For instance, a file might be modified when making an interface change in another file, or when fixing a build error occurring on a particular platform. Such occurrences are, in fact, extremely common in the WebKit project. Additionally, consider that there exist particular source code files that are unusually central to the project, and must be modified more frequently than other files. It is highly likely that almost all developers will at one point or another make some change in such a file, and therefore be connected via a collaboration edge to all other developers who have ever modified that file. (My original critique shows a screenshot of the revision history of WebPageProxy.cpp, to demonstrate that the developers modifying this file were working on unrelated projects.)

    It is true, as assumed by the paper, that particular developers work on different portions of the WebKit source code, and collaborate more with particular other developers. For instance, developers who work for the same company typically, though not always, collaborate most with other developers from that same company. However, the paper’s naive definition of collaboration should ensure that most developers will be considered to have collaborated equally with most other developers, regardless of the actual degree of collaboration. For instance, consider developers A and B who regularly collaborate on a particular source file. Now, developer C, who works on a platform that does not use this file and would not ordinarily need to modify it, makes a change to some cross-platform interface in another file that requires updating this file. Developer C is now considered to have collaborated with developers A and B on this file! Clearly, this is not a desirable result, as developers A and B have collaborated far more on the development of the file. Moreover, consider that an edge exists between two developers in the collaboration network if they have ever both modified any file anywhere in WebKit during the time period under review; then we can expect to form a network that is almost complete (a “full” graph where edges exists between most nodes). It is evident that some method of weighting collaboration between different contributors would be desirable, as the unweighted collaboration network does not seem useful.

    One might argue that the networks presented in the paper clearly show developers exist in subcommunities on the peripheries of the network, that the network is clearly not complete, and that therefore this definition of collaboration sufficed, at least to some extent. However, this is only due to another methodological error in the study. Mistake #3, discussed later, explains how the study managed to produce collaboration networks with noticeable subcommunities despite these issues.

    We note that the authors chose this same definition of collaboration in their more recent work on OpenStack, so there exist multiple studies using this same flawed definition of collaboration. We speculate that this definition of collaboration is unlikely to be more suitable for OpenStack or for other software projects than it is for WebKit. The software engineering research community must explore alternative models of collaboration when undertaking future studies of software development collaboration networks in order to more accurately reflect collaboration.

    Mistake #2: Misdetected Contributor Affiliation

    One difficulty when building collaboration networks is the need to correctly match each contributor with the correct company affiliation. Although many free software projects are dominated by unaffiliated contributors, others, like WebKit, are primarily developed by paid contributors. Looking at the number of times a particular email domain appears in WebKit changelog entries made during 2015, most contributors commit using corporate emails, but many developers commit to WebKit using personal email accounts, such as Gmail accounts; additionally, many developers use generic webkit.org email aliases, which were previously available to active WebKit contributors. These developers may or may not be affiliated with companies that contribute to the project. Use of personal email addresses is a source of inaccuracy when constructing collaboration networks, as it results in an undercount of corporate contributions.  We can expect this issue has led to serious inaccuracies in the reported collaboration networks.

    This substantial source of error is neither mentioned nor accounted for; all contributors using such email accounts were therefore miscategorized as unaffiliated. However, the authors clearly recognized this issue, as it has been accounted for in their more recent work covering OpenStack by cross-referencing email addresses from git revision history with a database containing corporate affiliations maintained by the OpenStack Foundation. Unfortunately, no such effort was made for the WebKit data set.

    The WebKit project was previously dominated by contributors with chromium.org email domains. This domain is equivalent to webkit.org in that it can be used by contributors to the Chromium project regardless of corporate affiliation; however, most contributors with Chromium emails are actually Google employees. The high use of Chromium emails by Google employees appears to have led to a dramatic — by roughly an entire order of magnitude — undercount of Google’s contributors to the WebKit project, as only contributors with google.com emails were considered to be Google employees. The vast majority of Google employees used chromium.org emails, and so were counted as unaffiliated developers. This explains the extraordinarily high number of unaffiliated developers in the networks presented by the paper, despite the fact that WebKit development is, in reality, dominated by corporate contributors.

    Mistake #3: Missing Most Changelog Data

    The paper incorrectly claims to have gathered its data from both WebKit’s Subversion revision history and from its changelog files. We must draw a distinction between changelog entries and Subversion revision history. Changelog entries are inserted into changelog files that are committed into the Subversion repository; they are completely separate from the Subversion history. Each subproject within the WebKit project has its own set of changelog files used to record changes under the corresponding directory.

    In fact, the paper processed only the changelog files. This was actually a good choice, as WebKit’s changelog files are much more accurate than the Subversion history, for two reasons. Firstly, it is easy for a contributor to change the email address entered into a changelog file, e.g. after a change in company affiliation. However, it is difficult to change the email address used to commit to Subversion, as this requires requesting a new Subversion account from the Subversion administrator; accordingly, contributors are more likely to use older email addresses, lacking accurate company affiliation, in Subversion revisions than in changelog files.  Secondly, many Subversion revisions are not directly committed by contributors, but rather are actually committed by the commit queue bot, which runs various tests before committing the revision. Subversion revisions are also, more rarely, committed by a completely different contributor than the patch author. In both cases, the proper contributor’s name will appear in only the changelog file, and not the Subversion data. Some developers are dramatically more likely to use the commit queue than others. Various other reviews of WebKit contribution history that examine data from Subversion history rather than from changelog files are flawed for this reason. Fortunately, by relying on changelog files rather than Subversion metadata, the authors avoid this problem.

    Unfortunately, a serious error was made in processing the changelog data. WebKit has many different sets of changelog files, stored in various project subdirectories (JavaScriptCore, WebCore, WebKit, etc.), as well as toplevel changelogs stored in the root directory of the project. Regrettably, the authors were unaware of the changelogs in subdirectories, and based their analysis only on the toplevel changelogs, which contain only changes that occurred in subdirectories that lack their own changelog files. In practice, this inadvertently restricted the scope of the analysis to a very small minority of changes, primarily to build system files, manual tests, and the WebKit website. That is, the reported collaboration networks do not reflect collaboration on any actual source code files. All source code files are contained in subdirectories with their own changelog files, and therefore no source code files were actually considered in the analysis of collaboration on source code changes.

    We speculate that the analysis’s focus on build system files likely exaggerates the effects of clustering in the network, as different companies used different build systems and thus were less likely to edit the build systems used by other companies, and that an analysis based on the correct data would display less of a clustering effect. Certainly, there would be dramatically more edges in the already-dense networks, because an edge exists between two developers if there exists any one file in WebKit that both developers have modified. Omitting all of the source code files from the analysis therefore dramatically reduces the likelihood of edges existing between nodes in the network.

    Conclusion

    We found that the original study was impacted by an unsuitable definition of collaboration used to build the collaboration networks, severe errors in counting contributor affiliation (including the classification of most Google employees as unaffiliated developers), and the omission of almost all the required data from the analysis, including all data on modifications to source code files. The authors constructed and studied essentially meaningless networks. Nevertheless, the authors were able to derive many accurate conclusions about the WebKit project from their inaccurate collaboration networks. Such conclusions illustrate the dangers of seeking to find particular meanings or explanations through visual inspection of collaboration networks. Researchers must work forwards from the collaboration networks to arrive at their conclusions, rather than backwards by attempting to match the networks to conclusions gained from prior knowledge.

    Original Report

    Wow, OK, you actually read this far? Since this blog post criticizes an academic paper, and since this blog post does not include various tables and examples that support my arguments, I’ve attached my original analysis in full. It is a boring, student-quality grad school project written with the objective of scoring the highest-possible grade in a class rather than for clarity, and you probably don’t want to look at it unless you are investigating the paper in detail. (If you download that, note that I no longer work for Igalia, and the paper was not authorized by Igalia either; I used my company email to disclose my affiliation and maybe impress my professor a little.) Phew, now I can finally remove this from my desk!

    fwupd 1.5.2

    Posted by Richard Hughes on November 23, 2020 04:36 PM

    The last few posts I did about fwupd releases were very popular, so I’ll do the same thing again: I’ve just tagged fwupd 1.5.2 – This release changes a few things:

  • Add a build time flag to indicate if packages are supported – this would be set for “traditional” package builds done by the distro, and unset by things like the Fedora COPR build, the Flatpak or Snap bundles. There are too many people expecting that the daily snap or flatpak packages represent the “official fwupd” and we wanted to make it clear to people using these snapshots that we’ve done basically no QA on the snapshots.
  • A plugin for the Pinebook Pro laptop has been added, although it needs further work from PINE64 before it will work correctly. At the moment there’s no way of getting the touchpad version, or finding out which keyboard layout is installed so we can tag the correct firmware file. It’s nearly there and is still very useful for playing with the hardware on the PB Pro.
  • Components can now set the icon from the metadata from the LVFS, if supported by the fwupd plugin. This allows us to tag “generic” ESRT devices as things like EC devices, or, ahem, batteries.
  • I’ve been asked by a few teams, including the Red Hat Edge team, the CoreOS team and also by Google to switch from libsoup to libcurl for downloading data – as this reduces the image size by over 5MB. Even NetworkManager depends on libcurl now, and this seemed like a sensible thing to do given fwupd is now being used in so many different places.
  • Fall back to FAT32 internal partitions for detecting ESP, as some users were complaining that fwupd did not properly detect their ESP that didn’t have the correct partition GUID set. Although I think fixing the GUID is the right thing to do, the system firmware also falls back, and pragmatically so should we.
  • Fix detection of ColorHug version on older firmware versions, which was slightly embarrassing as ColorHug is one of the devices in the device regression tests, but we were not testing an old enough firmware version to detect this bug.
  • Fix reading BCM57XX vendor and device ids from firmware – firmware for the Talos II machine is already on the LVFS and can replace the non-free firmware there in almost all situations now.
  • For this release we had to improve synaptics-mst reliability when writing data, which was found occasionally when installing firmware onto a common dock model. A 200ms delay is the difference between success and failure, which although not strictly required seemed pragmatic to add.
  • Fix replugging the MSP430 device which was the last device that was failing a specific ODM QA. This allows us to release a ton of dock firmware on the LVFS.
  • Fix a deadlock seen when calling libfwupd from QT programs. This was because we were calling a sync method from threads without a context, which we’ve now added.
  • In 1.5.0 we switched to the async libfwupd by default, and accidentally dropped the logic to only download the remote metadata as required. Most users only need to download the tiny .jcat file every day, and the much larger .xml.gz is only downloaded if the signature has changed in the last 24h. Of course, it’s all hitting the CDN, but it’s not nice to waste bandwidth for no reason.
  • As Snap is bundling libfwupd with gnome-software now, we had to restore recognizing GPG and PKCS7 signature types. This allows a new libfwupd to talk to an old fwupd daemon which is something we’d not expected before.
  • We’re also now setting the SMBIOS chassis type to portable if a DeviceTree battery exists, although I’d much rather see a ChassisType in the DT specification one day. This allows us to support HSI on platforms like the PineBook Pro, although the number of tests is still minimal without more buy-in from ARM.
  • We removed the HSI update and attestation suffixes; we decided they complicated the HSI specification and didn’t really fit in. Most users won’t even care and the spec is explicitly WIP so expect further changes like this in the future.
  • If you’re running 1.5.0 or 1.5.1 you probably want to update to this release now as it fixes a hard-to-debug hang we introduced in 1.5.0. If you’re running 1.4.x you might want to let the libcurl changes settle, although we’ve been using it without issue for more than a week on a ton of hardware here. Expect 1.5.3 in a few weeks time, assuming we’re all still alive by then. :)

    Refreshed UI for Fedora Media Writer

    Posted by Jan Grulich on November 13, 2020 10:06 AM

    For those who don’t know, Fedora Media Writer is a tool to create bootable live USB drive with your favorite flavor of Fedora. It is written in C++ with UI written in QML and it is supported on Linux, Windows and Mac OS X. It was developed by Martin Bříza, my former collegue from Red Hat, who did an amazing job in the past. Fedora Media Writer (FMW) primarily targets Fedora Workstation and therefore the UI looks like a GNOME app using Adwaita theme. Unfortunately the Adwaita theme changed over time and originally FMW was written using QtQuickControls 1 (deprecated these days) so it needed an UI overhaul.

    I started working on FMW during the summer, slowly migrating it to QtQuickControls 2. The original UI had lots of custom QML widgets, basically standard widgets with Adwaita skin on it. I still wanted FMW to use Adwaita theme, because Qt doesn’t have any native QML components for Windows, Mac OSX or GNOME and writing those would require lots of work. Therefore I decided to write a new QQC2 based Adwaita theme which can be used on all platforms. To avoid duplicating half of the code we already have in Adwaita-qt (a QStyle to make QWidgets look like Adwaita), like information about widget sizes and colors, I reworked Adwaita-qt to provide a library so it can be used by projects like this and so they don’t need to update everytime Adwaita changes. It was more work than I anticipated because it needed quite a lot of changes to separate things into library and also to make it build and work on all platforms where I want to use FMW. Good news is that the work is now done and I made a pre-release of Adwaita-qt. The library for now provides information about widget sizes, color palette and colors used by all widgets, but I plan to extend this in future with addition of Adwaita-qt rendering part allowing the library to render basic widgets for you. That’s something I would like to use for example in QGnomePlatform (GNOME platform theme) to render buttons in window decorations. With a lot of information being already said about Adwaita-qt, the work on QQC2 Adwaita theme was an interesting experience and probably the most enjoyable one, because everytime you write a new component and port the app to use it, you see the result of your work and the app slowly migrating towards a more modern UI makes you happy with the result. I don’t know what more to say about the QQC2 Adwaita theme as it’s basically QML variant of widgets we have in Adwaita-qt, with difference that it should look exactly the same on all platforms thanks to using Adwaita-qt. In past with QQC1 all the colors were derived from system QPalette making it slightly different on all platforms. If you wonder why the QQC2 theme is not part of Adwaita-qt, where it will most likely end up, then it’s because it’s not complete yet and contains only components used in FMW itself. Anyway, I have finished the port to QQC2 this week with some late fixes and after I spent a week updating all build systems (Windows, Github CI, Mac OSX) to properly build and produce builds for you to test since I made a new pre-release yesterday \o/.

    The work on this port is most likely not 100% finished as I expect some minor issues to appear here and there, but I tried to make this 1:1 copy of the previous version so don’t expect any major changes. I will be glad if you try it and let me know what you think. Thank you and especially big thanks goes to Martin Bříza for his help during the development and for the work he did on this project in the past.

    You can get it from following locations:

    Here you have some images for comparison:

    <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure> <figure class="wp-block-image size-large"></figure>

    New fwupd 1.5.1 release

    Posted by Richard Hughes on November 02, 2020 03:46 PM

    Hot on the heels of 1.5.0, I’ve just tagged and uploaded fwupd 1.5.1. Most importantly, if fixes the regression we recently included for an as-yet-unnamed OEM who wants to ship dock firmware. Any day now, I promise.

    Other interesting things we fixed:

    • Delete unused EFI variables when deploying firmware — which frees up a lot of space if you’ve ever enabled the fwupdx64.efi debugging…
    • Fix duplicate probe warning for the Logitech Unifying device — which was really cosmetic, but wasting resources is never nice.
    • Include the amount of NVRAM size in use in the LVFS failure report — which will might let us explain some of the dbx updates failing.
    • Make bcm57xx hotplug more reliable — although uncommon to hotplug PCI devices, using an eGPU enclosure (like I do for the device tests!) this needs to work!
    • Recognize authorized ThunderBolt value of 2 — which we found in the wild recently.
    • Remove the duplicate parent-child data in FwupdDevice and FuDevice — although not strictly a bugfix, duplicating this data made no sense and caused confusion.
    • Use UDisks to find out if swap files and devices are encrypted — which further adds more code depending on UDisks. I’ve added a Recommends: udisks2 in the Fedora package, but see the wiki if you’re running a minimal system.

    As before, Fedora 33 and 32 updates in the usual places.

    on abandoning the X server

    Posted by Adam Jackson on October 28, 2020 03:01 PM

    There's been some recent discussion about whether the X server is abandonware. As the person arguably most responsible for its care and feeding over the last 15 years or so, I feel like I have something to say about that.

    The thing about being the maintainer of a public-facing project for nearly the whole of your professional career is it's difficult to separate your own story from the project. So I'm not going to try to be dispassionate, here. I started working on X precisely because free software had given me options and capabilities that really matter, and I feel privileged to be able to give that back. I can't talk about that without caring about it.

    So here's the thing: X works extremely well for what it is, but what it is is deeply flawed. There's no shame in that, it's 33 years old and still relevant, I wish more software worked so well on that kind of timeframe. But using it to drive your display hardware and multiplex your input devices is choosing to make your life worse.

    It is, however, uniquely well suited to a very long life as an application compatibility layer. Though the code happens to implement an unfortunate specification, the code itself is quite well structured, easy to hack on, and not far off from being easily embeddable.

    The issue, then, is how to get there. And I don't have any real desire to get there while still pretending that the xfree86 hardware-backed server code is a real thing. Sorry, I guess, but I've worked on xfree86-derived servers for very nearly as long as XFree86-the-project existed, and I am completely burnt out on that on its own merits, let alone doing that and also being release manager and reviewer of last resort. You can only apply so much thrust to the pig before you question why you're trying to make it fly at all.

    So, is Xorg abandoned? To the extent that that means using it to actually control the display, and not just keep X apps running, I'd say yes. But xserver is more than xfree86. Xwayland, Xwin, Xephyr, Xvnc, Xvfb: these are projects with real value that we should not give up. A better way to say it is that we can finally abandon xfree86.

    And if that sounds like a world you'd like to see, please, come talk to us, let's make it happen. I'd be absolutely thrilled to see someone take this on, and I'm happy to be your guide through the server internals.

    Sandboxing inside the sandbox: No rogue thumbnailers inside Flatpak

    Posted by Bastien Nocera on October 28, 2020 11:20 AM

     A couple of years ago, we sandboxed thumbnailers using bubblewrap to avoid drive-by downloads taking advantage of thumbnailers with security issues.

     It's a great tool, and it's a tool that Flatpak relies upon to create its own sandboxes. But that also meant that we couldn't use it inside the Flatpak sandboxes themselves, and those aren't always as closed as they could be, to support legacy applications.

     We've finally implemented support for sandboxing thumbnailers within Flatpak, using the Spawn D-Bus interface (indirectly).

    This should all land in GNOME 40, though it should already be possible to integrate it into your Flatpaks. Make sure to use the latest gnome-desktop development version, and that the flatpak-spawn utility is new enough in the runtime you're targeting (it's been updated in the freedesktop.org runtimes #1, #2, #3, but it takes time to trickle down to GNOME versions). Example JSON snippets:

            {
    "name": "flatpak-xdg-utils",
    "buildsystem": "meson",
    "sources": [
    {
    "type": "git",
    "url": "https://github.com/flatpak/flatpak-xdg-utils.git",
    "tag": "1.0.4"
    }
    ]
    },
    {
    "name": "gnome-desktop",
    "buildsystem": "meson",
    "config-opts": ["-Ddebug_tools=true", "-Dudev=disabled"],
    "sources": [
    {
    "type": "git",
    "url": "https://gitlab.gnome.org/GNOME/gnome-desktop.git"
    }
    ]
    }  

    (We also sped up GStreamer-based thumbnailers by allowing them to use a cache, and added profiling information to the thumbnail test tools, which could prove useful if you want to investigate performance or bugs in that area)

    Edit: correct a link, thanks to the commenters for the notice

    New fwupd 1.5.0 release

    Posted by Richard Hughes on October 26, 2020 01:04 PM

    Today we tagged the 1.5.0 release of fwupd. Quite a bit has changed since the last release and I figured a blog post probably made sense to explain things.

    From a firmware engineer point of view, the most useful is the ability to build composite images, for instance building a firmware.dfuse file from different A.dfu and B.dfu images. At the moment there are commands in fwupdtool to convert one file format to another, but not to merge or alter them. Many firmware files are really just containers which can store multiple images, each with optional id, index and addresses. This new fwupd feature also allows us to create very small complicated container binaries for fuzzing.

    This can be used by writing a `firmware.builder.xml` file like:

     <?xml version="1.0" encoding="UTF-8"?>
     <firmware gtype="FuBcm57xxFirmware">
       <version>1.2.3</version>
       <image>
         <version>4.5.6</version>
         <id>header</id>
         <idx>456</idx>
         <addr>0x456</addr>
         <filename>header.bin</filename>
       </image>
       <image>
         <version>7.8.9</version>
         <id>payload</id>
         <data>aGVsbG8=</data>
       </image>
     </firmware>
    

    …and then using something like fwupdtool firmware-convert firmware.builder.xml firmware.dfu builder dfu on the CLI.

    Notably, each subclass of FuFirmware (for instance FuBcm57xxFirmware) can define properties it expects in the XML so it can really be quite expressive and useful.

    From the developer point of view, the most interesting additions are the async API to libfwupd and also the addition of FwupdPlugin so we can convey enumerated system errors to the end user. This means we can finally stop the workaround of building “dummy devices” with the update error set for a generic plugin failure, e.g. efivarfs not being mounted. Expect updates to GNOME Software and GNOME Firmware to support both when all this hits Fedora stable.

    From the end user point of view, we have lots of new devices supported, including:

    • Goodix fingerprint sensors
    • Elan Touchpads
    • ChromeOS Quiche and Gingerbread
    • Broadcom BCM5719 network adapter

    The latter being the most interesting, as the BCM5719 has two branches of firmware; one from Broadcom and one free software re-implementation from meklort. Once both versions of firmware has been uploaded to the LVFS, the user can simply type fwupdmgr switch-branch to switch from the proprietary firmware to the free software one, or back again. We’re hoping to use this in other places in the future, for instance EDK2 to Coreboot on platforms without BootGuard enabled.

    Which brings me nicely to the Host Security ID. We’re not officially launching the HSI specification yet, as we’re waiting to hear back from various silicon vendors about how (and if) they can support the new initiative. HSI is something that might be hugely interesting to users where platform security is important and especially for people specifying and purchasing hardware for a specific purpose. Although fwupdmgr security is now available, you‘ll need to use --force as it’s not officially an API stable “thing” yet. If you do play with HSI, be sure to upload results to the LVFS if you can and then we’ll know if the various plugins are working as designed. I’m sure we’ll be talking more about HSI in the future…

    We’ve also done some work for teams building fwupd into products we never imagined; for CoreOS the ModemManager and flashrom plugins are split off as sub-packages so that we don’t drag lots of extra deps onto the minimal image. We’ve also made PolicyKit optional at build time as it doesn’t make sense on super-embedded devices, although you’re limited to only installing signed firmware. For the server SSH-only case we’re also using pkttyagent to request user passwords if running without GUI.

    Finally, it was a ton of work testing and fixing timing bugs for composite devices found in various laptop docks, so the people waiting for those updates probably want to update to 1.5.0 too. Updates are already on the LVFS and will be available soon. You know who you are.

    As usual, tarball releases are in the normal place and are available as a Fedora 33 update and Fedora 32 update too. Please let us know if you have any problems with 1.5.0 in the issue tracker.

    Firefox on Fedora with OpenH264

    Posted by Martin Stransky on September 30, 2020 08:12 PM
    <figure class="wp-block-image size-large"></figure>

    Firefox on Fedora which sits in the updates [F32][F31] right now comes with enabled OpenH264 Cisco decoder for video playback and fdk-aac-free used for audio decoding.

    It’s implemented by GMP (Gecko Media Plugin) API so the OpenH264 is not used through ffmpeg library but Firefox sandboxed interface, the same as Firefox uses for Widevine CDM plugin.

    The OpenH264 GMP video playback is a fallback solution when system ffmpeg is missing and internal ffvpx library can’t decode the stream, so ffmpeg from RPM Fusion is always a better alternative if you can install it.

    The video streams are decoded by system wide OpenH264 2.1.1 which is shipped by Fedora as mozilla-openh264 rpm package. Even if Mozilla OpenH264 (1.8.1) plugin is installed in your profile and claimed at about:plugins page, the Fedora system one is used.

    You can also remove the old plugin from your profile (see Bug 1648024 for details) and with the updated Firefox packages ([F32][F31]) it won’t be installed again.

    OpenH264 plugin utilization, version and playback parameters can be confirmed from GMP log, just run Firefox on terminal as:

    MOZ_LOG=”GMP:5″ firefox

    and look for OpenH264 version / path and so on.

    Also if you find any bug during video playback (and you definitely will find one :-)), please report that at bugzilla.redhat.com or Cisco bug tracker along with the video link.

    Firefox 81 on Fedora with VA-API, WebRTC and X11

    Posted by Martin Stransky on September 29, 2020 11:44 AM
    <figure class="wp-block-image size-large"></figure>

    Mozilla Firefox continues to implement more and more HW acceleration features on Linux and Fedora comes with extra integration patches to enable it.

    Firefox 78 introduced VA-API video decoding on Wayland. Firefox 81 (a latest stable release) comes with VA-API for WebRTC streams decoding and enables VA-API on X11 for supported platforms.

    Let’s start with X11. Firefox implements VA-API on top of DMABuf where particular video frames are exported by vaExportSurfaceHandle from libva and imported by EGL_EXT_image_dma_buf_import extension to OpenGL/EGL. That’s the default OpenGL sequence on Wayland where the VA-API was implemented first.

    There are two options how to bring VA-API on X11 – vaPutSurface/GLX or vaExportSurfaceHandle/EGL. GLX is the default Firefox OpenGL backend on X11, it’s well supported on all devices including NVIDIA proprietary drivers and it’s generally well tested. On the contrary Firefox does not implement native X11 pixmap textures in any HW accelerated backend, shm based textures are used everywhere. There was an attempt to implement VA-API/vaPutSurface with Basic compositor but it was rejected.

    VaExportSurfaceHandle/EGL is the second option for X11. EGL is well supported on Intel/AMD drivers but almost completely missing on NVIDIA, both proprietary and free ones (a partial solution may be GLX/EGL wrapper by Adam Jackson). But with EGL we can recycle most of the work done for Wayland and use the same code path which led to final EGL/X11 implementation by Robert Mader.

    The EGL backend is not used by default yet, we’re working to enable it for Mesa drivers. Until that happens you need to run Firefox with set MOZ_X11_EGL env variable to switch from GLX to EGL, so run Firefox as:

    MOZ_X11_EGL=1 firefox

    As a consequence the dmabuf/vaapi preferences has been changed. Enable VA-API decoding by media.ffmpeg.vaapi.enabled set to true at about:config. That’s a single option for both Wayland and X11 backends as well as for WebRTC decoding (see bellow). Also make sure WebRender (Firefox HW accelerated backend) is enabled.

    Users of Mozilla stock builds or other distros than Fedora need also set media.ffvpx.enabled to false until VA-API decoding is implemented in in-tree ffvpx library.

    <figure class="alignleft size-large is-resized"></figure>

    And now the WebRTC VA-API decoding. I tested the implementation on BlueJeans video conferencing system as it’s used by Red Hat internally. BlueJeans uses VP8 video streams and Intel HW decoder produces YV12 video frames which have to be handled by DMABuf surfaces. And we also want to route WebRTC video through external decoder by default [1][2].

    If you use Fedora Firefox builds you are settled here, the patches [1][2] are included in the builds.

    Users of other distros or Mozilla Firefox stock builds must set media.ffmpeg.low-latency.enabled and media.navigator.mediadatadecoder_vpx_enabled to true at about:config.

    20 Million Downloads from the LVFS

    Posted by Richard Hughes on September 28, 2020 08:12 AM

    A few hours ago the LVFS provided its 20 millionth firmware update and although it’s just another somewhat unusual base-10 number, it’s an achievement I’m immensely proud of. As one of my friends said last week, “20 million of anything is a big deal”. Right from the start, the fwupd daemon and LVFS website data provider was a result of collaboration between many different companies and open source projects, and is now cemented as an integral part of the firmware ecosystem. People building open source projects, especially low level infrastructure like this, are not good at celebrating success and it’s no wonder so many talented maintainers burn out over long years of dedicated service. This post celebrates some of the things we’ve done.

    Little known to most people, fwupd and the LVFS grew out of the frustration of distributing the ColorHug firmware. If you bought one of those devices all those years ago, you can know you were a tiny part in starting all this. I still use ColorHug devices for all kinds of automated firmware testing, perhaps even more so than for screen calibration. My experience building OpenHardware devices really pushed me to make the LVFS free-for-all, on the logic that I wouldn’t have been able to justify even a $100/year subscription. Certainly making the service free in all respects meant that it was almost risk-free for companies to test the service.

    Now the LVFS analyses uploaded firmware for security problems, keeps millions of devices up to date, and also helps governments buy secure hardware using initiatives like the upcoming Host Security ID that I’ll talk more about in future blog posts. How many devices we’ve updated is impossible to know exactly as many large companies and departments mirror the entire LVFS; we just know it’s at least 20 million. In reality it’s probably a single digit multiple of that, although there’s no real way of knowing. We know 1.5 million people have sent the optional “it works for me” report we ask from CLI users, and given the CLI downloads account for ~1% of all downloads it could be a lot higher than 20 million.

    A huge number of devices are supported on the LVFS now. There are currently 2393 different public firmwares uploaded by 1401 users from 106 different vendors, using 39 different protocols to update hardware. We’ve run 40,378 automated tests on those files, and extracted 1,170,543 file volume objects which can be scanned by YARA. All impressive numbers I’m sure you’ll agree.

    There is one specific person I would like to thank first, my co-maintainer for both projects: Mario Limonciello who is a Senior Principal Software Development Engineer at Dell. Mario has reviewed thousands of my patches over the years, and contributed hundreds himself. I really appreciate his trust and also his confidence to tell me the half-baked and incomplete thing I’m proposing is actually insane. Together we’ve created an architecture that’s easy to maintain with a clean and modern design. Dell were also the first major laptop OEM to tell their suppliers “you need to ship firmware updates using the LVFS” and so did a lot of the initial plumbing work. We re-used most of the initial plugins when other OEMs decided they’d like to join the initiative later.

    Peter Jones is another talented member of our team at Red Hat and wrote a lot of the low level UEFI code we’ve used millions of times. Peter has to understand all the crazy broken things that firmware vendors decide to do, and is responsible for most of the EFI code in fwupd. Over the years fwupd has absorbed two of his projects, fwupdate and most recently dbxtool. Without Peter there would have been no UpdateCapsule support, and that’s about half the updates on the LVFS.

    Also notable to mention here is Logitech. They’ve shipped a ton of firmware (literally, millions) for their Unifying hardware and were also early adopters of the LVFS and fwupd. Nestor Lopez Casado, many thanks for all your help over the years and I’m glad MouseJack made all this a requirement :)

    I’d also like to thank Lenovo; not a specific person, as Lenovo is split up into ThinkPad, ThinkStation and ThinkCenter groups and in each the engineers would probably like to remain anonymous. Lenovo as a combined group is shipping a huge amount of firmware now via the LVFS, and most of the Lenovo supply chain is already wired up to supporting the LVFS. A lot of the ODMs for Lenovo have had to actually install Fedora and learn how to program with GLib C to create a fwupd plugins to support laptop models that are not even on the shelves yet. Training up dozens of people in “how to write a fwupd plugin and deal with a grumpy maintainer” took a lot of time, but now we have companies building custom silicon submitting ready-to-roll plugins, fixes and even enhancements to the GUI tools.

    The LVFS isn’t just a way for OEMs to distribute firmware, like a shared FTP site. The LVFS plumbs-itself into the ODM and ISV relationships, so we can get a pipeline right from the firmware author, all the way to the end user. As ODMs such as Wistron and Foxconn use the LVFS for one OEM, it’s very simple for them to also support other OEMs. The feedback loop from vendors to users and back to vendors again has been invaluable when debugging problems with specific firmware releases.

    More recently various groups at Google have also been pushing suppliers of the Chromebook ecosystem to use fwupd and the LVFS. I’ve been told there’s now a “fwupd group” inside Google and I know that there are more than a few different models that will only be updatable using fwupd. Google are also using some of the consulting companies familiar with LVFS and fwupd so that it’s not just me explaining how this works over and over to various different small ODMs. I think in the next year this consulting side will explode and help grow the ecosystem even further.

    I’d love to list all the OEMs, ODMs, ISVs and consultants that have helped over the last 5 years, but this blog entry would be even larger than it already is. Just know that I appreciate your support, help and guidance.

    Talk of growing the ecosystem, the Linux Foundation are taking over the actual site maintenance; I’m not a sysadmin, and Terraform and Docker still scare me. This Christmas we’ll move the little VM I have running in Amsterdam to a proper scalable architecture with 24/7 support. This will let us provide some kind of uptime guarantee and also means I don’t have to worry about applying a security update when I’m on holiday without internet access. The Linux Foundation have been paying the entire CDN cost for the last few years, and for that my wallet is truly grateful.

    Finally, I must also thank Red Hat for letting me work on this stuff. Over the last few years fwupd has gone from my “20%” hobby to almost taking over all my developer time. Red Hat doesn’t get enough credit for all the essential plumbing they tirelessly do, and without them paying my salary every month there is no way this kind of free-to-upload and free-to-download service could exist. I firmly believe that fwupd is mutually beneficial to all the Red Hat partners like Intel, Dell and Lenovo – both so that more hardware gets purchased from OEMs, and also so that customers running Fedora and RHEL have up to date firmware.

    At a conference last year, I presented a talk where the penultimate slide was “the LVFS is just a website that runs cron jobs” and I had someone I respect come up to me afterwards and tell me something in a stern voice I’ll remember forever: “You didn’t just create a website – you changed an industry!

    Lets look forward to the next 20 million updates.

    Firefox GNOME Shell search provider

    Posted by Martin Stransky on September 25, 2020 04:22 PM

    gs2Firefox has a long history of Gtk desktop integration. We use native Gtk3 theme to style widgets, we try to use system colors where it’s possible, honor global dark themes and so on and we feel that the application should play nicely with the rest of the system.

    Plasma/KDE users may admit that Firefox is a GTK application and KDE integration is missing as well as integration for various tiling windows managers. This isn’t an intention but merely a lack of manpower to code and maintain that.

    Another step in the Gnome integration effort is to provide a global search experience by Firefox. It’s already available in Fedora by default by downstream patches but the patches were integrated upstream so they’re available for all users of Mozilla stock builds since Firefox 78 (that also includes recent ESR line).

    Firefox GNOME Search provider can be enabled in three easy steps:

    • Install Firefox desktop and Firefox search provider files. You can use prepared ones from Mozilla or create your own. Copy firefox.desktop to /usr/share/applications/ and firefox-search-provider.ini to /usr/share/gnome-shell/search-providers.
    • Run Firefox, go to about:config and set
      browser.gnome-search-provider.enabled to true.
    • Restart Firefox.

    Now, when Firefox is running, you should see search results when you hit ‘Super’ key in Gnome and type something. If you don’t get anything, check:

    • Is D-Bus service running? Use D-Feet tool to inspect if Firefox provides org.mozilla.Firefox.SearchProvider interface on Session Bus.gs3
    • Are Firefox search results shown? If not, go to Gnome Settings -> Search and look for Firefox. If the Firefox is missing in search engines, your firefox-search-provider.ini isn’t properly installed. To see Firefox search results, move Firefox up to the top.gs1

    Epiphany 3.38 and WebKitGTK 2.30

    Posted by Michael Catanzaro on September 16, 2020 05:00 PM

    It’s that time of year again: a new GNOME release, and with it, a new Epiphany. The pace of Epiphany development has increased significantly over the last few years thanks to an increase in the number of active contributors. Most notably, Jan-Michael Brummer has solved dozens of bugs and landed many new enhancements, Alexander Mikhaylenko has polished numerous rough edges throughout the browser, and Andrei Lisita has landed several significant improvements to various Epiphany dialogs. That doesn’t count the work that Igalia is doing to maintain WebKitGTK, the WPE graphics stack, and libsoup, all of which is essential to delivering quality Epiphany releases, nor the work of the GNOME localization teams to translate it to your native language. Even if Epiphany itself is only the topmost layer of this technology stack, having more developers working on Epiphany itself allows us to deliver increased polish throughout the user interface layer, and I’m pretty happy with the result. Let’s take a look at what’s new.

    Intelligent Tracking Prevention

    Intelligent Tracking Prevention (ITP) is the headline feature of this release. Safari has had ITP for several years now, so if you’re familiar with how ITP works to prevent cross-site tracking on macOS or iOS, then you already know what to expect here.  If you’re more familiar with Firefox’s Enhanced Tracking Protection, or Chrome’s nothing (crickets: chirp, chirp!), then WebKit’s ITP is a little different from what you’re used to. ITP relies on heuristics that apply the same to all domains, so there are no blocklists of naughty domains that should be targeted for content blocking like you see in Firefox. Instead, a set of innovative restrictions is applied globally to all web content, and a separate set of stricter restrictions is applied to domains classified as “prevalent” based on your browsing history. Domains are classified as prevalent if ITP decides the domain is capable of tracking your browsing across the web, or non-prevalent otherwise. (The public-friendly terminology for this is “Classification as Having Cross-Site Tracking Capabilities,” but that is a mouthful, so I’ll stick with “prevalent.” It makes sense: domains that are common across many websites can track you across many websites, and domains that are not common cannot.)

    ITP is enabled by default in Epiphany 3.38, as it has been for several years now in Safari, because otherwise only a small minority of users would turn it on. ITP protections are designed to be effective without breaking too many websites, so it’s fairly safe to enable by default. (You may encounter a few broken websites that have not been updated to use the Storage Access API to store third-party cookies. If so, you can choose to turn off ITP in the preferences dialog.)

    For a detailed discussion covering ITP’s tracking mitigations, see Tracking Prevention in WebKit. I’m not an expert myself, but the short version is this: full third-party cookie blocking across all websites (to store a third-party cookie, websites must use the Storage Access API to prompt the user for permission); cookie-blocking latch mode (“once a request is blocked from using cookies, all redirects of that request are also blocked from using cookies”); downgraded third-party referrers (“all third-party referrers are downgraded to their origins by default”) to avoid exposing the path component of the URL in the referrer; blocked third-party HSTS (“HSTS […] can only be set by the first-party website […]”) to stop abuse by tracker scripts; detection of cross-site tracking via link decoration and 24-hour expiration time for all cookies created by JavaScript on the landing page when detected; a 7-day expiration time for all other cookies created by JavaScript (yes, this applies to first-party cookies); and a 7-day extendable lifetime for all other script-writable storage, extended whenever the user interacts with the website (necessary because tracking companies began using first-party scripts to evade the above restrictions). Additionally, for prevalent domains only, domains engaging in bounce tracking may have cookies forced to SameSite=strict, and Verified Partitioned Cache is enabled (cached resources are re-downloaded after seven days and deleted if they fail certain privacy tests). Whew!

    WebKit has many additional privacy protections not tied to the ITP setting and therefore not discussed here — did you know that cached resources are partioned based on the first-party domain? — and there’s more that’s not very well documented which I don’t understand and haven’t mentioned (tracker collusion!), but that should give you the general idea of how sophisticated this is relative to, say, Chrome (chirp!). Thanks to John Wilander from Apple for his work developing and maintaining ITP, and to Carlos Garcia for getting it working on Linux. If you’re interested in the full history of how ITP has evolved over the years to respond to the changing threat landscape (e.g. tracking prevention tracking), see John’s WebKit blog posts. You might also be interested in WebKit’s Tracking Prevention Policy, which I believe is the strictest anti-tracking stance of any major web engine. TL;DR: “we treat circumvention of shipping anti-tracking measures with the same seriousness as exploitation of security vulnerabilities. If a party attempts to circumvent our tracking prevention methods, we may add additional restrictions without prior notice.” No exceptions.

    Updated Website Data Preferences

    As part of the work on ITP, you’ll notice that Epiphany’s cookie storage preferences have changed a bit. Since ITP enforces full third-party cookie blocking, it no longer makes sense to have a separate cookie storage preference for that, so I replaced the old tri-state cookie storage setting (always accept cookies, block third-party cookies, block all cookies) with two switches: one to toggle ITP, and one to toggle all website data storage.

    Previously, it was only possible to block cookies, but this new setting will additionally block localStorage and IndexedDB, web features that allow websites to store arbitrary data in your browser, similar to cookies. It doesn’t really make much sense to block cookies but allow other types of data storage, so the new preferences should better enforce the user’s intent behind disabling cookies. (This preference does not yet block media keys, service workers, or legacy offline web application cache, but it probably should.) I don’t really recommend disabling website data storage, since it will cause compatibility issues on many websites, but this option is there for those who want it. Disabling ITP is also not something I want to recommend, but it might be necessary to access certain broken websites that have not yet been updated to use the Storage Access API.

    Accordingly, Andrei has removed the old cookies dialog and moved cookie management into the Clear Personal Data dialog, which is a better place because anyone clearing cookies for a particular website is likely to also want to clear other personal data. (If you want to delete a website’s cookies, then you probably don’t want to leave its SQL databases intact, right?) He had to remove the ability to clear data from a particular point in time, because WebKit doesn’t support this operation for cookies, but that function is probably  rarely-used and I think the benefit of the change should outweigh the cost. (We could bring it back in the future if somebody wants to try implementing that feature in WebKit, but I suspect not many users will notice.) Treating cookies as separate and different from other forms of website data storage no longer makes sense in 2020, and it’s good to have finally moved on from that antiquated practice.

    New HTML Theme

    Carlos Garcia has added a new Adwaita-based HTML theme to WebKitGTK 2.30, and removed support for rendering HTML elements using the GTK theme (except for scrollbars). Trying to use the GTK theme to render web content was fragile and caused many web compatibility problems that nobody ever managed to solve. The GTK developers were never very fond of us doing this in the first place, and the foreign drawing API required to do so has been removed from GTK 4, so this was also good preparation for getting WebKitGTK ready for GTK 4. Carlos’s new theme is similar to Adwaita, but gradients have been toned down or removed in order to give a flatter, neutral look that should blend in nicely with all pages while still feeling modern.

    This should be a fairly minor style change for Adwaita users, but a very large change for anyone using custom themes. I don’t expect everyone will be happy, but please trust that this will at least result in better web compatibility and fewer tricky theme-related bug reports.

    <figure aria-describedby="caption-attachment-9212" class="wp-caption alignnone" id="attachment_9212" style="width: 1920px">Screenshot demonstrating new HTML theme vs. GTK theme<figcaption class="wp-caption-text" id="caption-attachment-9212">Left: Adwaita GTK theme controls rendered by WebKitGTK 2.28. Right: hardcoded Adwaita-based HTML theme with toned down gradients.</figcaption></figure>

    Although scrollbars will still use the GTK theme as of WebKitGTK 2.30, that will no longer be possible to do in GTK 4, so themed scrollbars are almost certain to be removed in the future. That will be a noticeable disappointment in every app that uses WebKitGTK, but I don’t see any likely solution to this.

    Media Permissions

    Jan-Michael added new API in WebKitGTK 2.30 to allow muting individual browser tabs, and hooked it up in Epiphany. This is good when you want to silence just one annoying tab without silencing everything.

    Meanwhile, Charlie Turner added WebKitGTK API for managing autoplay policies. Videos with sound are now blocked from autoplaying by default, while videos with no sound are still allowed. Charlie hooked this up to Epiphany’s existing permission manager popover, so you can change the behavior for websites you care about without affecting other websites.

    <figure aria-describedby="caption-attachment-9140" class="wp-caption aligncenter" id="attachment_9140" style="width: 525px">Screenshot displaying new media autoplay permission settings<figcaption class="wp-caption-text" id="caption-attachment-9140">Configure your preferred media autoplay policy for a website near you today!</figcaption></figure>

    Improved Dialogs

    In addition to his work on the Clear Data dialog, Andrei has also implemented many improvements and squashed bugs throughout each view of the preferences dialog, the passwords dialog, and the history dialog, and refactored the code to be much more maintainable. Head over to his blog to learn more about his accomplishments. (Thanks to Google for sponsoring Andrei’s work via Google Summer of Code, and to Alexander for help mentoring.)

    Additionally, Adrien Plazas has ported the preferences dialog to use HdyPreferencesWindow, bringing a pretty major design change to the view switcher:

    <figure aria-describedby="caption-attachment-9146" class="wp-caption aligncenter" id="attachment_9146" style="width: 1920px">Screenshot showing changes to the preferences dialog<figcaption class="wp-caption-text" id="caption-attachment-9146">Left: Epiphany 3.36 preferences dialog. Right: Epiphany 3.38. Note the download settings are present in the left screenshot but missing from the right screenshot because the right window is using flatpak, and the download settings are unavailable in flatpak.</figcaption></figure>

    User Scripts

    User scripts (like Greasemonkey) allow you to run custom JavaScript on websites. WebKit has long offered user script functionality alongside user CSS, but previous versions of Epiphany only exposed user CSS. Jan-Michael has added the ability to configure a user script as well. To enable, visit the Appearance tab in the preferences dialog (a somewhat odd place, but it really needs to be located next to user CSS due to the tight relationship there). Besides allowing you to do, well, basically anything, this also significantly enhances the usability of user CSS, since now you can apply certain styles only to particular websites. The UI is a little primitive — your script (like your CSS) has to be one file that will be run on every website, so don’t try to design a complex codebase using your user script — but you can use conditional statements to limit execution to specific websites as you please, so it should work fairly well for anyone who has need of it. I fully expect 99.9% of users will never touch user scripts or user styles, but it’s nice for power users to have these features available if needed.

    HTTP Authentication Password Storage

    Jan-Michael and Carlos Garcia have worked to ensure HTTP authentication passwords are now stored in Epiphany’s password manager rather than by WebKit, so they can now be viewed and deleted from Epiphany, which required some new WebKitGTK API to do properly. Unfortunately, WebKitGTK saves network passwords using the default network secret schema, meaning its passwords (saved by older versions of Epiphany) are all leaked: we have no way to know which application owns those passwords, so we don’t have any way to know which passwords were stored by WebKit and which can be safely managed by Epiphany going forward. Accordingly, all previously-stored HTTP authentication passwords are no longer accessible; you’ll have to use seahorse to look them up manually if you need to recover them. HTTP authentication is not very commonly-used nowadays except for internal corporate domains, so hopefully this one-time migration snafu will not be a major inconvenience to most users.

    New Tab Animation

    Jan-Michael has added a new animation when you open a new tab. If the newly-created tab is not visible in the tab bar, then the right arrow will flash to indicate success, letting you know that you actually managed to open the page. Opening tabs out of view happens too often currently, but at least it’s a nice improvement over not knowing whether you actually managed to open the tab or not. This will be improved further next year, because Alexander is working on a completely new tab widget to replace GtkNotebook.

    <video class="wp-video-shortcode" controls="controls" height="295" id="video-9094-1" preload="metadata" width="525"><source src="https://blogs.gnome.org/mcatanzaro/files/2020/09/Screencast-from-09-15-2020-082111-PM.webm?_=1" type="video/webm">https://blogs.gnome.org/mcatanzaro/files/2020/09/Screencast-from-09-15-2020-082111-PM.webm</video>

    New View Source Theme

    Jim Mason changed view source mode to use a highlight.js theme designed to mimic Firefox’s syntax highlighting, and added dark mode support.

    <figure aria-describedby="caption-attachment-9122" class="wp-caption aligncenter" id="attachment_9122" style="width: 997px">Image showing dark mode support in view source mode<figcaption class="wp-caption-text" id="caption-attachment-9122">Embrace the dark.</figcaption></figure>

    And More…

    • WebKitGTK 2.30 now supports video formats in image elements, thanks to Philippe Normand. You’ll notice that short GIF-style videos will now work on several major websites where they previously didn’t.
    • I added a new WebKitGTK 2.30 API to expose the paste as plaintext editor command, which was previously internal but fully-functional. I’ve hooked it up in Epiphany’s context menu as “Paste Text Only.” This is nice when you want to discard markup when pasting into a rich text editor (such as the WordPress editor I’m using to write this post).
    • Jan-Michael has implemented support for reordering pinned tabs. You can now drag to reorder pinned tabs any way you please, subject to the constraint that all pinned tabs stay left of all unpinned tabs.
    • Jan-Michael added a new import/export menu, and the bookmarks import/export features have moved there. He also added a new feature to import passwords from Chrome. Meanwhile, ignapk added support for importing bookmarks from HTML (compatible with Firefox).
    • Jan-Michael added a new preference to web apps to allow running them in the background. When enabled, closing the window will only hide the the window: everything will continue running. This is useful for mail apps, music players, and similar applications.
    • Continuing Jan-Michael’s list of accomplishments, he removed Epiphany’s previous hidden setting to set a mobile user agent header after discovering that it did not work properly, and replaced it by adding support in WebKitGTK 2.30 for automatically setting a mobile user agent header depending on the chassis type detected by logind. This results in a major user experience improvement when using Epiphany as a mobile browser. Beware: this functionality currently does not work in flatpak because it requires the creation of a new desktop portal.
    • Stephan Verbücheln has landed multiple fixes to improve display of favicons on hidpi displays.
    • Zach Harbort fixed a rounding error that caused the zoom level to display oddly when changing zoom levels.
    • Vanadiae landed some improvements to the search engine configuration dialog (with more to come) and helped investigate a crash that occurs when using the “Set as Wallpaper” function under Flatpak. The crash is pretty tricky, so we wound up disabling that function under Flatpak for now. He also updated screenshots throughout the  user help.
    • Sabri Ünal continued his effort to document and standardize keyboard shortcuts throughout GNOME, adding a few missing shortcuts to the keyboard shortcuts dialog.

    Epiphany 3.38 will be the final Epiphany 3 release, concluding a decade of releases that start with 3. We will match GNOME in following a new version scheme going forward, dropping the leading 3 and the confusing even/odd versioning. Onward to Epiphany 40!

    worse is better: making late buffer swaps tear

    Posted by Adam Jackson on September 10, 2020 07:40 PM

    In an ideal world, every frame your application draws would appear on the screen exactly on time. Sadly, as anyone living in the year 2020 CE can attest, this is far from an ideal world. Sometimes the scene gets more complicated and takes longer to draw than you estimated, and sometimes the OS scheduler just decides it has more important things to do than pay attention to you.

    When this happens, for some applications, it would be best if you could just get the bits on the screen as fast as possible rather than wait for the next vsync. The Present extension for X11 has a option to let you do exactly this:

    If 'options' contains PresentOptionAsync, and the 'target-msc'
    is less than or equal to the current msc for 'window', then
    the operation will be performed as soon as possible, not
    necessarily waiting for the next vertical blank interval. 

    But you don't use Present directly, usually, usually Present is the mechanism for GLX and Vulkan to put bits on the screen. So, today I merged some code to Mesa to enable the corresponding features in those APIs, namely GLX_EXT_swap_control_tear and VK_PRESENT_MODE_FIFO_RELAXED_KHR. If all goes well these should be included in Mesa 21.0, with a backport to 20.2.x not out of the question. As the GLX extension name suggests, this can introduce some visual tearing when the buffer swap does come in late, but for fullscreen games or VR displays that can be an acceptable tradeoff in exchange for reduced stuttering.

    power-profiles-daemon: new project announcement

    Posted by Bastien Nocera on September 10, 2020 06:00 PM

    Despite what this might look like, I don't actually enjoy starting new projects: it's a lot easier to clean up some build warnings, or add a CI, than it is to start from an empty directory.

    But sometimes needs must, and I've just released version 0.1 of such a project. Below you'll find an excerpt from the README, which should answer most of the questions. Please read the README directly in the repository if you're getting to this blog post more than a couple of days after it was first published.

    Feel free to file new issues in the tracker if you have ideas on possible power-saving or performance enhancements. Currently the only supported “Performance” mode supported will interact with Intel CPUs with P-State support. More hardware support is planned.

    TLDR; this setting in the GNOME 3.40 development branch soon, Fedora packages are done, API docs available:

     

     

    From the README:

    Introduction

    power-profiles-daemon offers to modify system behaviour based upon user-selected power profiles. There are 3 different power profiles, a "balanced" default mode, a "power-saver" mode, as well as a "performance" mode. The first 2 of those are available on every system. The "performance" mode is only available on select systems and is implemented by different "drivers" based on the system or systems it targets.

    In addition to those 2 or 3 modes (depending on the system), "actions" can be hooked up to change the behaviour of a particular device. For example, this can be used to disable the fast-charging for some USB devices when in power-saver mode.

    GNOME's Settings and shell both include interfaces to select the current mode, but they are also expected to adjust the behaviour of the desktop depending on the mode, such as turning the screen off after inaction more aggressively when in power-saver mode.

    Note that power-profiles-daemon does not save the currently active profile across system restarts and will always start with the "balanced" profile selected.

    Why power-profiles-daemon

    The power-profiles-daemon project was created to help provide a solution for two separate use cases, for desktops, laptops, and other devices running a “traditional Linux desktop”.

    The first one is a "Low Power" mode, that users could toggle themselves, or have the system toggle for them, with the intent to save battery. Mobile devices running iOS and Android have had a similar feature available to end-users and application developers alike.

    The second use case was to allow a "Performance" mode on systems where the hardware maker would provide and design such a mode. The idea is that the Linux kernel would provide a way to access this mode which usually only exists as a configuration option in some machines' "UEFI Setup" screen.

    This second use case is the reason why we didn't implement the "Low Power" mode in UPower, as was originally discussed.

    As the daemon would change kernel settings, we would need to run it as root, and make its API available over D-Bus, as has been customary for more than 10 years. We would also design that API to be as easily usable to build graphical interfaces as possible.

    Why not...

    This section will contain explanations of why this new daemon was written rather than re-using, or modifying an existing one. Each project obviously has its own goals and needs, and those comparisons are not meant as a slight on the project.

    As the code bases for both those projects listed and power-profiles-daemon are ever evolving, the comments were understood to be correct when made.

    thermald

    thermald only works on Intel CPUs, and is very focused on allowing maximum performance based on a "maximum temperature" for the system. As such, it could be seen as complementary to power-profiles-daemon.

    tuned and TLP

    Both projects have similar goals, allowing for tweaks to be applied, for a variety of workloads that goes far beyond the workloads and use cases that power-profiles-daemon targets.

    A fair number of the tweaks that could apply to devices running GNOME or another free desktop are either potentially destructive (eg. some of the SATA power-saving mode resulting in corrupted data), or working well enough to be put into place by default (eg. audio codec power-saving), even if we need to disable the power saving on some hardware that reacts badly to it.

    Both are good projects to use for the purpose of experimenting with particular settings to see if they'd be something that can be implemented by default, or to put some fine-grained, static, policies in place on server-type workloads which are not as fluid and changing as desktop workloads can be.

    auto-cpufreq

    It doesn't take user-intent into account, doesn't have a D-Bus interface and seems to want to work automatically by monitoring the CPU usage, which kind of goes against a user's wishes as a user might still want to conserve as much energy as possible under high-CPU usage.

    Avoid “Tag: v-3.38.0-fixed-brown-paper-bag”

    Posted by Bastien Nocera on September 10, 2020 03:47 PM

    Over the past couple of (gasp!) decades, I've had my fair share of release blunders: forgetting to clean the tree before making a tarball by hand, forgetting to update the NEWS file, forgetting to push after creating the tarball locally, forgetting to update the appdata file (causing problems on Flathub)...

    That's where check-news.sh comes in, to replace the check-news function of the autotools. Ideally you would:

    - make sure your CI runs a dist job

    - always use a merge request to do releases

    - integrate check-news.sh to your meson build (though I would relax the appdata checks for devel releases)

    Videos in GNOME 3.38

    Posted by Bastien Nocera on September 08, 2020 01:31 PM

    This is going to be a short post, as changes to Videos have been few and far between in the past couple of releases.

    The major change to the latest release is that we've gained Tracker 3 support through a grilo plugin (which meant very few changes to our own code). But the Tracker 3 libraries are incompatible with the Tracker 2 daemon that's usually shipped in distributions, including on this author's development system.

    So we made use of the ability of Tracker to run inside a Flatpak sandbox along with the video player, removing the need to have Tracker installed by the distribution, on the host. This should also make it easier to give users control of the directories they want to use to store their movies, in the future.

    The release candidate for GNOME 3.38 is available right now as the stable version on Flathub.

    User-specific XKB configuration - putting it all together

    Posted by Peter Hutterer on September 04, 2020 04:36 AM

    This is the continuation from these posts: part 1, part 2, part 3

    This is the part where it all comes together, with (BYO) fireworks and confetti, champagne and hoorays. Or rather, this is where I'll describe how to actually set everything up. It's a bit complicated because while libxkbcommon does the parsing legwork now, we haven't actually changed other APIs and the file formats which are still 1990s-style nerd cool and requires significant experience in CS [1] to understand what goes where.

    The below relies on software using libxkbcommon and libxkbregistry. At the time of writing, libxkbcommon is used by all mainstream Wayland compositors but not by the X server. libxkbregistry is not yet used because I'm typing this before we had a release for it. But at least now I have a link to point people to.

    libxkbcommon has a xkbcli-scaffold-new-layout tool that The xkblayout tool creates the template files as shown below. At the time of writing, this tool must be run from the git repo build directory, it is not installed.

    I'll explain here how to add the us(banana) variant and the custom:foo option, and I will optimise for simplicity and brevity.

    Directory structure

    First, create the following directory layout:


    $ tree $XDG_CONFIG_HOME/xkb
    /home/user/.config/xkb
    ├── compat
    ├── keycodes
    ├── rules
    │   ├── evdev
    │   └── evdev.xml
    ├── symbols
    │   ├── custom
    │   └── us
    └── types
    If $XDG_CONFIG_HOME is unset, fall back to $HOME/.config.

    Rules files

    Create the rules file and add an entry to map our custom:foo option to a section in the symbols/custom file.


    $ cat $XDG_CONFIG_HOME/xkb/rules/evdev
    ! option = symbols
    custom:foo = +custom(foo)

    // Include the system 'evdev' file
    ! include %S/evdev
    Note that no entry is needed for the variant, that is handled by wildcards in the system rules file. If you only want a variant and no options, you technically don't need this rules file.

    Second, create the xml file used by libxkbregistry to display your new entries in the configuration GUIs:


    $ cat $XDG_CONFIG_HOME/xkb/rules/evdev.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE xkbConfigRegistry SYSTEM "xkb.dtd">
    <xkbConfigRegistry version="1.1">
    <layoutList>
    <layout>
    <configItem>
    <name>us</name>
    </configItem>
    <variantList>
    <variant>
    <configItem>
    <name>banana</name>
    <shortDescription>banana</shortDescription>
    <description>US(Banana)</description>
    </configItem>
    </variant>
    </variantList>
    </layout>
    </layoutList>
    <optionList>
    <group allowMultipleSelection="true">
    <configItem>
    <name>custom</name>
    <description>custom options</description>
    </configItem>
    <option>
    <configItem>
    <name>custom:foo</name>
    <description>This option does something great</description>
    </configItem>
    </option>
    </group>
    </optionList>
    </xkbConfigRegistry>
    Our variant needs to be added as a layoutList/layout/variantList/variant, the option to the optionList/group/option. libxkbregistry will combine this with the system-wide evdev.xml file in /usr/share/X11/xkb/rules/evdev.xml.

    Overriding and adding symbols

    Now to the actual mapping. Add a section to each of the symbols files that matches the variant or option name:


    $ cat $XDG_CONFIG_HOME/xkb/symbols/us
    partial alphanumeric_keys modifier_keys
    xkb_symbols "banana" {
    name[Group1]= "Banana (us)";

    include "us(basic)"

    key <CAPS> { [ Escape ] };
    };
    with this, the us(banana) layout will be a US keyboard layout but with the CapsLock key mapped to Escape. What about our option? Mostly the same, let's map the tilde key to nothing:

    $ cat $XDG_CONFIG_HOME/xkb/symbols/custom
    partial alphanumeric_keys modifier_keys
    xkb_symbols "foo" {
    key <TLDE> { [ VoidSymbol ] };
    };
    A note here: NoSymbol means "don't overwrite it" whereas VoidSymbol is "map to nothing".

    Notes

    You may notice that the variant and option sections are almost identical. XKB doesn't care about variants vs options, it only cares about components to combine. So the sections do what we expect of them: variants include enough other components to make them a full keyboard layout, options merely define a few keys so they can be combined with layouts(variants). Due to how the lookups work, you could load the option template as layout custom(foo).

    For the actual documentation of keyboard configuration, you'll need to google around, there are quite a few posts on how to map keys. All that has changed is where from and how things are loaded but not the actual data formats.

    If you wanted to install this as system-wide custom rules, replace $XDG_CONFIG_HOME with /etc.

    The above is a replacement for xmodmap. It does not require a script to be run manually to apply the config, the existing XKB integration will take care of it. It will work in Wayland (but as said above not in X, at least not for now).

    A final word

    Now, I fully agree that this is cumbersome, clunky and feels outdated. This is easy to fix, all that is needed is for someone to develop a better file format, make sure it's backwards compatible with the full spec of the XKB parser (the above is a fraction of what it can do), that you can generate the old files from the new format to reduce maintenance, and then maintain backwards compatibility with the current format for the next ten or so years. Should be a good Google Decade of Code beginner project.

    [1] Cursing and Swearing

    No user-specific XKB configuration in X

    Posted by Peter Hutterer on September 04, 2020 12:39 AM

    This is the continuation from these posts: part 1, part 2, part 3 and part 4.

    In the posts linked above, I describe how it's possible to have custom keyboard layouts in $HOME or /etc/xkb that will get picked up by libxkbcommon. This only works for the Wayland stack, the X stack doesn't use libxkbcommon. In this post I'll explain why it's unlikely this will ever happen in X.

    As described in the previous posts, users configure with rules, models, layouts, variants and options (RMLVO). What XKB uses internally though are keycodes, compat, geometry, symbols types (KcCGST) [1].

    There are, effectively, two KcCGST keymap compilers: libxkbcommon and xkbcomp. libxkbcommon can go from RMLVO to a full keymap, xkbcomp relies on other tools (e.g. setxkbmap) which in turn use a utility library called libxkbfile to can parse rules files. The X server has a copy of the libxkbfile code. It doesn't use libxkbfile itself but it relies on the header files provided by it for some structs.

    Wayland's keyboard configuration works like this:

    • the compositor decides on the RMLVO keybard layout, through an out-of-band channel (e.g. gsettings, weston.ini, etc.)
    • the compositor invokes libxkbcommon to generate a KcCGST keymap and passes that full keymap to the client
    • the client compiles that keymap with libxkbcommon and feeds any key events into libxkbcommon's state tracker to get the right keysyms
    The advantage we have here is that only the full keymap is passed between entities. Changing how that keymap is generated does not affect the client. This, coincidentally [2], is also how Xwayland gets the keymap passed to it and why Xwayland works with user-specific layouts.

    X works differently. Notably, KcCGST can come in two forms, the partial form specifying names only and the full keymap. The partial form looks like this:


    $ setxkbmap -print -layout fr -variant azerty -option ctrl:nocaps
    xkb_keymap {
    xkb_keycodes { include "evdev+aliases(azerty)" };
    xkb_types { include "complete" };
    xkb_compat { include "complete" };
    xkb_symbols { include "pc+fr(azerty)+inet(evdev)+ctrl(nocaps)" };
    xkb_geometry { include "pc(pc105)" };
    };
    This defines the component names but not the actual keymap, punting that to the next part in the stack. This will turn out to be the achilles heel. Keymap handling in the server has two distinct aproaches:
    • During keyboard device init, the input driver passes RMLVO to the server, based on defaults or xorg.conf options
    • The server has its own rules file parser and creates the KcCGST component names (as above)
    • The server forks off xkbcomp and passes the component names to stdin
    • xkbcomp generates a keymap based on the components and writes it out as XKM file format
    • the server reads in the XKM format and updates its internal structs
    This has been the approach for decades. To give you an indication of how fast-moving this part of the server is: XKM caching was the latest feature added... in 2009.

    Driver initialisation is nice, but barely used these days. You set your keyboard layout in e.g. GNOME or KDE and that will apply it in the running session. Or run setxkbmap, for those with a higher affinity to neckbeards. setxkbmap works like this:

    • setkxkbmap parses the rules file to convert RMLVO to KcCGST component names
    • setkxkbmap calls XkbGetKeyboardByName and hands those component names to the server
    • The server forks off xkbcomp and passes the component names to stdin
    • xkbcomp generates a keymap based on the components and writes it out as XKM file format
    • the server reads in the XKM format and updates its internal structs
    Notably, the RMLVO to KcCGST conversion is done on the client side, not the server side. And the only way to send a keymap to the server is that XkbGetKeyboardByName request - which only takes KcCGST, you can't even pass it a full keymap. This is also a long-standing potential issue with XKB: if your client tools uses different XKB data files than the server, you don't get the keymap you expected.

    Other parts of the stack do basically the same as setxkbmap which is just a thin wrapper around libxkbfile anyway.

    Now, you can use xkbcomp on the client side to generate a keymap, but you can't hand it as-is to the server. xkbcomp can do this (using libxkbfile) by updating the XKB state one-by-one (XkbSetMap, XkbSetCompatMap, XkbSetNames, etc.). But at this point you're at the stage where you ask the server to knowingly compile a wrong keymap before updating the parts of it.

    So, realistically, the only way to get user-specific XKB layouts into the X server would require updating libxkbfile to provide the same behavior as libxkbcommon, update the server to actually use libxkbfile instead of its own copy, and updating xkbcomp to support the changes in part 2, part 3. All while ensuring no regressions in code that's decades old, barely maintained, has no tests, and, let's be honest, not particularly pretty to look at. User-specific XKB layouts are somewhat a niche case to begin with, so I don't expect anyone to ever volunteer and do this work [3], much less find the resources to review and merge that code. The X server is unlikely to see another real release and this is definitely not something you want to sneak in in a minor update.

    The other option would be to extend XKB-the-protocol with a request to take a full keymap so the server. Given the inertia involved and that the server won't see more full releases, this is not going to happen.

    So as a summary: if you want custom keymaps on your machine, switch to Wayland (and/or fix any remaining issues preventing you from doing so) instead of hoping this will ever work on X. xmodmap will remain your only solution for X.

    [1] Geometry is so pointless that libxkbcommon doesn't even implement this. It is a complex format to allow rendering a picture of your keyboard but it'd be a per-model thing and with evdev everyone is using the same model, so ...
    [2] totally not coincidental btw
    [3] libxkbcommon has been around for a decade now and no-one has volunteered to do this in the years since, so...

    User-specific XKB configuration - part 3

    Posted by Peter Hutterer on August 31, 2020 04:51 AM

    This is the continuation from these posts: part 1, part 2

    Let's talk about everyone's favourite [1] keyboard configuration system again: XKB. If you recall the goal is to make it simple for users to configure their own custom layouts. Now, as described earlier, XKB-the-implementation doesn't actually have a concept of a "layout" as such, it has "components" and something converts your layout desires into the combination of components. RMLVO (rules, model, layout, variant, options) is what you specify and gets converted to KcCGST (keycodes, compat, geometry, symbols, types). This is a one-way conversion, the resulting keymaps no longer has references to the RMLVO arguments. Today's post is about that conversion, and we're only talking about libxkbcommon as XKB parser because anything else is no longer maintained.

    The peculiar thing about XKB data files (provided by xkeyboard-config [3]) is that the filename is part of the API. You say layout "us" variant "dvorak", the rules file translates this to symbols 'us(dvorak)' and the parser will understand this as "load file 'symbols/us' and find the dvorak section in that file". [4] The default "us" keyboard layout will require these components:


    xkb_keymap {
    xkb_keycodes { include "evdev+aliases(qwerty)" };
    xkb_types { include "complete" };
    xkb_compat { include "complete" };
    xkb_symbols { include "pc+us+inet(evdev)" };
    xkb_geometry { include "pc(pc105)" };
    };
    So the symbols are really: file symbols/pc, add symbols/us and then the section named 'evdev' from symbols/inet [5]. Types are loaded from types/complete, etc. The lookup paths for libxkbcommon are $XDG_CONFIG_HOME/xkb, /etc/xkb, and /usr/share/X11/xkb, in that order.

    Most of the symbols sections aren't actually full configurations. The 'us' default section only sets the alphanumeric rows, everything else comes from the 'pc' default section (hence: include "pc+us+inet(evdev)"). And most variant sections just include the default one, usually called 'basic'. For example, this is the 'euro' variant of the 'us' layout which merely combines a few other sections:


    partial alphanumeric_keys
    xkb_symbols "euro" {

    include "us(basic)"
    name[Group1]= "English (US, euro on 5)";

    include "eurosign(5)"

    include "level3(ralt_switch)"
    };
    Including things works as you'd expect: include "foo(bar)" loads section 'bar' from file 'foo' and this works for 'symbols/', 'compat/', etc., it'll just load the file in the same subdirectory. So yay, the directory is kinda also part of the API.

    Alright, now you understand how KcCGST files are loaded, much to your despair.

    For user-specific configuration, we could already load a 'custom' layout from the user's home directory. But it'd be nice if we could just add a variant to an existing layout. Like "us(banana)", because potassium is important or somesuch. This wasn't possible because the filename is part of the API. So our banana variant had to be in $XDG_CONFIG_HOME/xkb/symbols/us and once we found that "us" file, we could no longer include the system one.

    So as of two days ago, libxkbcommon now extends the parser to have merged KcCGST files, or in other words: it'll load the symbols/us file in the lookup path order until it finds the section needed. With that, you can now copy this into your $XDG_CONFIG_HOME/xkb/symbols/us file and have it work as variant:


    partial alphanumeric_keys
    xkb_symbols "banana" {

    include "us(basic)"
    name[Group1]= "English (Banana)";

    // let's assume there are some keymappings here
    };
    And voila, you now have a banana variant that can combine with the system-level "us" layout.

    And because there must be millions [6] of admins out there that maintain custom XKB layouts for a set of machines, the aforementioned /etc/xkb lookup path was also recently added to libxkbcommon. So we truly now have the typical triplet of lookup paths:

    • vendor-provided ones in /usr/share/X11/xkb,
    • host-specific ones in /etc/xkb, and
    • user-specific ones in $XDG_CONFIG_HOME/xkb [7].
    Good times, good times.

    [1] Much in the same way everyone's favourite Model T colour was black
    [2] This all follows the UNIX philosophy, there are of course multiple tools involved and none of them know what the other one is doing
    [3] And I don't think Sergey gets enough credit for maintaining that pile of language oddities
    [4] Note that the names don't have to match, we could map layout 'us' to the symbols in 'banana' but life's difficult enough as it is
    [5] I say "add" when it's sort of a merge-and-overwrite and, yes, of course there are multiple ways to combine layouts, thanks for asking
    [6] Actual number may be less
    [7] Notice how "X11" is missing in the latter two? If that's not proof that we want to get rid of X, I don't know what is!

    libei - a library to support emulated input

    Posted by Peter Hutterer on August 25, 2020 04:43 AM

    Let's talk about eggs. X has always supported XSendEvent() which allows anyone to send any event to any client [1]. However, this event had a magic bit to make it detectable, so clients detect and subsequently ignore it. Spoofing input that just gets ignored is of course not productive, so in the year 13 BG [2] the XTest extension was conceived. XTest has a few requests that allow you to trigger a keyboard event (press and release, imagine the possibilities), buttons and pointer motion. The name may seem odd until someone explains to you that it was primarily written to support automated testing of X servers. But no-one has the time to explain that.

    Having a separate extension worked around the issue of detectability and thus any client could spoof input events. Security concerns were addressed with "well, just ifdef out that extension then" which worked great until other applications started using it for input emulation. Since around ~2008 XTest events are emulated through special XTest devices in the server but that is solely to make the implementation less insane. Technically this means that XTest events are detectable again, except that no-one bothers to actually do that. Having said that, these devices only make it possible to detect an XTest event, but not which client sent that event. And, due to how the device hierarchy works, it's really hard to filter out those events anyway.

    Now it's 2020 and we still have an X server that basically allows access to anything and any client to spoof input. This level of security is industry standard for IoT devices but we are trying to be more restrictive than that on your desktop, lest the stuff you type actually matters. The accepted replacement for X is of course Wayland, but Wayland-the-protocol does not provide an extension to emulate input. This makes sense, emulating input is not exactly a display server job [3] but it does leaves us with a bunch of things no longer working.

    libei

    So let's talk about libei. This new library for Emulated Input consists of two components: libei, the client library to be used in applications, and libeis, the part to be used by an Emulated Input Server, to be used in the compositors. libei provides the API to connect to an EIS implementation and send events. libeis provides the API to receive those events and pass them on to the compositor. Let's see how this looks like in the stack:


    +--------------------+ +------------------+
    | Wayland compositor |---wayland---| Wayland client B |
    +--------------------+\ +------------------+
    | libinput | libeis | \_wayland______
    +----------+---------+ \
    | | +-------+------------------+
    /dev/input/ +---brei----| libei | Wayland client A |
    +-------+------------------+

    "brei" is the communication "Bridge for EI" and is a private protocol between libei and libeis.

    Emulated input is not particularly difficult, mostly because when you're emulating input, you know exactly what you are trying to do. There is no interpretation of bad hardware data like libinput has to do, it's all straightforward. Hence the communication between libei and libeis is relatively trivial, it's all button, keyboard, pointer and touch events. Nothing fancy here and I won't go into details because that part will work as you expect it to. The selling point of libei is elsewhere, primarily in separation, distinction and control.

    libei is a separate library to libinput or libwayland or Xlib or whatever else may have. By definition, it is thus a separate input stream in both the compositor and the client. This means the compositor can do things like display a warning message while emulated input is active. Or it can re-route the input automatically, e.g. assign a separate wl_seat to the emulated input devices so they can be independently focused, etc. Having said that, the libeis API is very similar to libinput's API so integration into compositors is quite trivial because most of the hooks to process incoming events are already in place.

    libei distinguishes between different clients, so the compositor is always aware of who is currently emulating input. You have synergy, xdotool and a test suite running at the same time? The compositor is aware of which client is sending which events and can thus handle those accordingly.

    Finally, libei provides control over the emulated input. This works on multiple levels. A libei client has to request device capabilities (keyboard, touch, pointer) and the compositor can restrict to a subset of those (e.g. "no keyboard for you"). Second, the compositor can suspend/resume a device at any time. And finally, since the input events go through the compositor anyway, it can discard events it doesn't like. For example, even where the compositor allowed keyboards and the device is not suspended, it may still filter out Escape key presses. Or rewrite those to be Caps Lock because we all like a bit of fun, don't we?

    The above isn't technically difficult either, libei itself is not overly complex. The interaction between an EI client and an EIS server is usually the following:

    • client connects to server and says hello
    • server disconnects the client, or accepts it
    • client requests one or more devices with a set of capabilities
    • server rejects those devices or allows those devices and/or limits their capabilities
    • server resumes the device (because they are suspended by default)
    • client sends events
    • client disconnects, server disconnects the client, server suspends the device, etc.
    Again, nothing earth-shattering here. There is one draw-back though: the server must approve of a client and its devices, so a client is not able to connect, send events and exit. It must wait until the server has approved the devices before it can send events. This means tools like xdotool need to be handled in a special way, more on that below.

    Flatpaks and portals

    With libei we still have the usual difficulties: a client may claim it's synergy when really it's bad-hacker-tool. There's not that much we can do outside a sandbox, but once we are guarded by a portal, things look different:


    +--------------------+
    | Wayland compositor |_
    +--------------------+ \
    | libinput | libeis | \_wayland______
    +----------+---------+ \
    | [eis-0.socket] \
    /dev/input/ / \\ +-------+------------------+
    | ======>| libei | Wayland client A |
    | after +-------+------------------+
    initial| handover /
    connection| / initial request
    | / dbus[org.freedesktop.portal.EmulatedInput]
    +--------------------+
    | xdg-desktop-portal |
    +--------------------+
    The above shows the interaction when a client is run inside a sandbox with access to the org.freedesktop.portal.Desktop bus. The libei client connects to the portal (it won't see the EIS server otherwise), the portal hands it back a file descriptor to communicate on and from then on it's like a normal EI session. What the portal implementation can do though is write several Restrictions for EIS on the server side of the file descriptor using libreis. Usually, the portal would write the app-id of the client (thus guaranteeing a reliable name) and maybe things like "don't allow keyboard capabilities for any device". Once the fd is handed to the libei client, the restrictions cannot be loosened anymore so a client cannot overwrite its own name.

    So the full interaction here becomes:

    • Client connects to org.freedesktop.portal.EmulatedInput
    • Portal implementation verifies that the client is allowed to emulate input
    • Portal implementation obtains a socket to the EIS server
    • Portal implementation sends the app id and any other restrictions to the EIS server
    • Portal implementation returns the socket to the client
    • Client creates devices, sends events, etc.
    For a client to connect to the portal instead of the EIS server directly is currently a one line change.

    Note that the portal implementation is still in its very early stages and there are changes likely to happen. The general principle is unlikely to change though.

    Xwayland

    Turns out we still have a lot of X clients around so somehow we want to be able to use those. Under Wayland, those clients connect to Xwayland which translates X requests to Wayland requests. X clients will use XTest to emulate input which currently goes to where the dodos went. But we can add libei support to Xwayland and connect XTest, at which point the diagram looks like this:


    +--------------------+ +------------------+
    | Wayland compositor |---wayland---| Wayland client B |
    +--------------------+\ +------------------+
    | libinput | libeis | \_wayland______
    +----------+---------+ \
    | | +-------+------------------+
    /dev/input/ +---brei----| libei | XWayland |
    +-------+------------------+
    |
    | XTEST
    |
    +-----------+
    | X client |
    +-----------+
    XWayland is basically just another Wayland client but it knows about XTest requests, and it knows about the X clients that request those. So it can create a separate libei context for each X client, with pointer and keyboard devices that match the XTest abilities. The end result of that is you can have a xdotool libei context, a synergy libei context, etc. And of course, where XWayland runs in a sandbox it could use the libei portal backend. At which point we have a sandboxed X client using a portal to emulate input in the Wayland compositor. Which is pretty close to what we want.

    Where to go from here?

    So, at this point the libei repo is still sitting in my personal gitlab namespace. Olivier Fourdan has done most of the Xwayland integration work and there's a basic Weston implementation. The portal work (tracked here) is the one needing the most attention right now, followed by the implementations in the real compositors. I think I have tentative agreement from the GNOME and KDE developers that this all this is a good idea. Given that the goal of libei is to provide a single approach to emulate input that works on all(-ish) compositors [4], that's a good start.

    Meanwhile, if you want to try it, grab libei from git, build it, and run the eis-demo-server and ei-demo-client tools. And for portal support, run the eis-fake-portal tool, just so you don't need to mess with the real one. At the moment, those demo tools will have a client connecting a keyboard and pointer and sending motion, button and 'qwerty' keyboard events every few seconds. The latter with client and/or server-set keymaps because that's possible too.

    Eggs

    What does all this have to do with eggs? "Ei", "Eis", "Brei", and "Reis" are, respectively, the German words for "egg", "ice" or "ice cream", "mush" (think: porridge) and "rice". There you go, now you can go on holidays to a German speaking country and sustain yourself on a nutritionally imbalanced diet.

    [1] The whole "any to any" is a big thing in X and just shows that in the olden days you could apparently trust, well, apparently anyone
    [2] "before git", i.e. too bothersome to track down so let's go with the Copyright notice in the protocol specification
    [3] XTest the protocol extension exists presumably because we had a hammer and a protocol extension thus looked like a nail
    [4] Let's not assume that every compositor having their own different approach to emulation is a good idea

    Updating Secure Boot dbx with fwupd and the LVFS

    Posted by Richard Hughes on August 17, 2020 08:37 PM

    This is one of those blog posts which is going to escalate quickly; in my effort to make it understandable I might simplify some of the theory, so if you know all this stuff already please scroll down a few paragraphs and try not to be pedantic.

    UEFI Secure Boot is protection technology that is designed to protect a system against malicious code being executed early in the boot process. It defines a way to “lock down” the platform so only binaries signed with a specific key will run. The certificate authority is managed by Microsoft, which is unimportant generally, but was hugely important before Microsoft agreed to sign loaders like shim as it meant hardware wouldn’t run Linux. SecureBoot isn’t evil by itself, and having SecureBoot turned on protects your hardware from real world attacks and in theory secures most of the boot process to where the Linux kernel takes over. In the case where you don’t love or trust Microsoft you can configure another certificate authority to manage the trust somewhere else, but that’s a blog post for another day.

    Anyway — I alluded that Microsoft has a special process for signing our Linux binaries so we could still keep running Linux on hardware that can’t turn off SecureBoot for one reason or another. What happened with the BootHole set of CVEs was that a researcher from Eclypsium found a nasty bug in the GRUB project (of which the binary was signed by Microsoft) which allowed complete circumvention of the SecureBoot architecture. This would mean a bootkit that previously would only work with SB turned off (which is fairly uncommon, as Microsoft forces OEMs to turn it on by default) now could be easily deployed onto hardware with SB enabled using an old version of shim packaged in the bootkit.

    Luckily, this scenario was planned for by the UEFI people, and as well as having a “binaries must be signed by this certificate” functionality we also have a “the binary must not have this checksum” back up plan. Microsoft recently invoked the back-up-plan and added quite a few checksums of binaries like shim and grub to the latest dbx update. They’ve done this three times since 2010, but only traditionally for things like the Symantec recovery binary which most people won’t have installed. The signed “blocklist of checksums” is called the dbx internally, and can only be updated and can never be downgraded for obvious reasons.

    If you actually deploy the current UEFI Revocation List (a.k.a dbx) on your Linux box right now using something like dbxtool it’ll most probably apply fine, and then when you reboot you’ll get a nice red screen and a SECURE BOOT VIOLATION message, and then for most people the computer will be a useless brick. The grub and shim installed into /boot/efi by your OS is now being blocked, and so nothing works. You can’t even boot a Linux installer to reinstall as the grub on the live media is going to be blocked too.

    What we used to do was have a dbxtool.service that just applied the latest dbx (if you were not using a LiveCD) and then just hope it all still boots. This mostly worked, as the dbx contained checksums of things you were not likely to be using, rather than things you are most likely to have. You might also be thinking this also isn’t an awesome way to deploy a single-use binary to millions of computers. It’s like shipping firmware updates in an rpm. Hmm. That gives me an idea.

    What we could do is have a fwupd plugin that reads the current dbx and creates a device:

    │   
    └─UEFI dbx:
          Device ID:           362301da643102b9f38477387e2193e57abaa590
          Summary:             UEFI Revocation Database
          Current version:     77
          Minimum Version:     77
          Install Duration:    1 second
          GUIDs:               5971a208-da00-5fce-b5f5-1234342f9cf7 ← UEFI\CRT_A9087D1044AD18F7A94916D284CBC01827CF23CD8F60B79072C9CAA1FEF4D649&ARCH_X64
                               f8ba2887-9411-5c36-9cee-88995bb39731 ← UEFI\CRT_A1117F516A32CEFCBA3F2D1ACE10A87972FD6BBE8FE0D0B996E09E65D802A503&ARCH_X64
          Device Flags:        • Internal device
                               • Updatable
                               • Supported on remote server
                               • Needs a reboot after installation
    

    Then we could push the dbx updates onto the LVFS so that they get downloaded as required rather than stored into /usr for all eternity. We could put all the checks into fwupd to verify that the user doesn’t have any blocked binaries into the ESP (as fwupd already knows how to mount the ESPs to deploy UEFI capsules) so that we don’t accidentally brick any systems:

    And we could wire this up to the GUI so that it’s super easy to use, and makes it a one-click install to deploy locally or scriptable using the CLI:

    If you install fwupd and gnome-software from git master, this is exactly what you can do right now. To test this on your non-production system, you can add something like this to /etc/fwupd/remotes.d/dbx.conf:

    [fwupd Remote]
    Enabled=true
    Title=UEFI dbx
    Keyring=gpg
    MetadataURI=https://fwupd.org/downloads/firmware-1d780afd8789afc6bccff638e5e8376604ba082aa32dc06db71d6768824a1efa.xml.gz
    ReportURI=https://fwupd.org/lvfs/firmware/report
    OrderBefore=lvfs,fwupd
    

    Then fwupdmgr refresh gets you the metadata and fwupdmgr update applies the update only if it’s safe to do so. Again: Do not do this on a system you don’t have backups for.

    I’m going to test this myself on a few more machines and then once the shim and grub builds hit Fedora stable we can think about packaging up the new fwupd so that we can deploy the dbx configuration using the LVFS easily. Of course, nobody is taking away doing it manually with dbxtool, but now it’s integrated into the system and easy for the user to deploy. Comments and corrections most welcome.

    Kubernetes Watches will ghost you without warning

    Posted by Dan Williams on August 05, 2020 02:25 PM

    Alternate title: if you’re ahead of Clayton you’re doing well, at least for a few hours.

    Consider the following code in the Kubernetes e2e test framework:

    ctx, cancel := context.WithTimeout(context.Background(), csiPodRunningTimeout)
    defer cancel()
    pvcWatch, err := f.ClientSet.CoreV1().PersistentVolumeClaims(f.Namespace.Name).Watch(ctx, metav1.ListOptions{})
    framework.ExpectNoError(err, "create PVC watch")
    defer pvcWatch.Stop()
    
    ...
    
    for {
    	select {
    	case event := <-pvcWatch.ResultChan():
    		framework.Logf("PVC event %s: %#v", event.Type, event.Object)
    		switch event.Type {
    		case watch.Modified:
    			pvc, ok := event.Object.(*v1.PersistentVolumeClaim)
    			if !ok {
    				framework.Failf("PVC watch sent %#v instead of a PVC", event.Object)
    			}
    			_, set := pvc.Annotations["volume.kubernetes.io/selected-node"]
    			if set {
    				nodeAnnotationSet = true
    			} else if nodeAnnotationSet {
    				nodeAnnotationReset = true
    			}
    		case watch.Deleted:
    			break loop
    		case watch.Error:
    			// Can this occur when the apiserver is under heavy load?
    			// If yes, then we should bail out of the test here early and
    			// skip further checks instead of treating it as a test failure.
    			framework.Failf("PVC watch failed prematurely: %v", event.Object)
    		}
    	case <-ctx.Done():
    		framework.Failf("Timeout while waiting to observe PVC list")
    	}
    }
    

    The problem is hard to spot unless you’re familiar with Kubernetes watches, and perhaps even if you are familiar but don’t work on OpenShift.

    Watches and Channels

    Watches can and do terminate at any time, gracefully or not. Sometimes a new apiserver leader is elected and the old one terminates watches and clients must reconnect to the new leader. Sometimes the leader just goes away because its node got rebooted. Sometimes there’s a network hiccup and the HTTP connection backing the watch times out. Regardless of the cause, they happen and your code needs to handle them. OpenShift CI forces frequent leader elections to specifically catch these issues before they get to customers.

    A watch stuffs events into a Go channel. The code using the watch reads events out of the channel, usually in a for loop (to continuously grab events) with a select block (to ensure individual read operations don’t block which enables cancelation when the channel returned by ctx.Done() is closed). Reading from a Go channel (case event := <-pvcWatch.ResultChan()) returns an optional second boolean indicating whether the channel has been closed.

    The testcase loop doesn’t exit until either the testcase times out and the ctx.Done() channel is closed, one of the event handler cases fails the testcase, or the PersistentVolumeClaim is deleted. So what happens if the Watch is closed unexpectedly and nothing checks whether the channel is closed?

    The read returns a null value immediately. The code continuously executes the framework.Logf("PVC event %s: %#v", event.Type, event.Object) line for 30 seconds until the test terminates. Depending on how fast your machine is, this can be millions of log lines and lots of CPU.

    How do we fix it?

    1. Assume watches can terminate at any time, and that you need to restart the watch if it does. That’s what the SharedInformer framework does for you.
    2. If you’re going to use watches directly always handle channel closure and restart your watch.
    for {
    	select {
    	case event, ok := <-pvcWatch.ResultChan():
    		if !ok {
    			framework.Failf("PVC watch ended prematurely")
    		}
    
    		framework.Logf("PVC event %s: %#v", event.Type, event.Object)
    		switch event.Type {
    

    Filesystem deduplication is a sidechannel

    Posted by Matthew Garrett on July 27, 2020 07:57 PM
    First off - nothing I'm going to talk about in this post is novel or overly surprising, I just haven't found a clear writeup of it before. I'm not criticising any design decisions or claiming this is an important issue, just raising something that people might otherwise be unaware of.

    With that out of the way: Automatic deduplication of data is a feature of modern filesystems like zfs and btrfs. It takes two forms - inline, where the filesystem detects that data being written to disk is identical to data that already exists on disk and simply references the existing copy rather than, and offline, where tooling retroactively identifies duplicated data and removes the duplicate copies (zfs supports inline deduplication, btrfs only currently supports offline). In a world where disks end up with multiple copies of cloud or container images, deduplication can free up significant amounts of disk space.

    What's the security implication? The problem is that deduplication doesn't recognise ownership - if two users have copies of the same file, only one copy of the file will be stored[1]. So, if user a stores a file, the amount of free space will decrease. If user b stores another copy of the same file, the amount of free space will remain the same. If user b is able to check how much free space is available, user b can determine whether the file already exists.

    This doesn't seem like a huge deal in most cases, but it is a violation of expected behaviour (if user b doesn't have permission to read user a's files, user b shouldn't be able to determine whether user a has a specific file). But we can come up with some convoluted cases where it becomes more relevant, such as law enforcement gaining unprivileged access to a system and then being able to demonstrate that a specific file already exists on that system. Perhaps more interestingly, it's been demonstrated that free space isn't the only sidechannel exposed by deduplication - deduplication has an impact on access timing, and can be used to infer the existence of data across virtual machine boundaries.

    As I said, this is almost certainly not something that matters in most real world scenarios. But with so much discussion of CPU sidechannels over the past couple of years, it's interesting to think about what other features also end up leaking information in ways that may not be obvious.

    (Edit to add: deduplication isn't enabled on zfs by default and is explicitly triggered on btrfs, so unless it's something you've enabled then this isn't something that affects you)

    [1] Deduplication is usually done at the block level rather than the file level, but given zfs's support for variable sized blocks, identical files should be deduplicated even if they're smaller than the maximum record size

    comment count unavailable comments

    Using Red Hat Satellite with the LVFS

    Posted by Richard Hughes on July 23, 2020 03:10 PM

    A months weeks ago I alluded that you could run the LVFS in an offline mode, where updates could be synced to a remote location and installed on desktops and servers in a corporate setting without internet access. A few big companies asked for more details, and so we created some official documentation which should help. It works, but you need to script things manually and set up your system in a custom way.

    For those companies using Red Hat Satellite there’s now an even easier way. Marc Richter has created a public Red Hat knowledge base article on how to configure Satellite 6 so that firmware updates can be deployed on client systems without access to the public internet. This is something that’s been requested by Red Hat customers for some time, and I really appreciate all the research and debugging Marc had to do so that this kind of content could be written. Feel free to share the link to this content as it’s available without a Red Hat subscription. Comments welcome!

    User-specific XKB configuration - part 2

    Posted by Peter Hutterer on July 06, 2020 05:18 AM

    This is the continuation from this post.

    Several moons have bypassed us [1] in the time since the first post, and Things Have Happened! If you recall (and of course you did because you just re-read the article I so conveniently linked above), libxkbcommon supports an include directive for the rules files and it will load a rules file from $XDG_CONFIG_HOME/xkb/rules/ which is the framework for custom per-user keyboard layouts. Alas, those files are just sitting there, useful but undiscoverable.

    To give you a very approximate analogy, the KcCGST format I described last time are the ingredients to a meal (pasta, mince, tomato). The rules file is the machine-readable instruction set to assemble your meal but it relies a lot on wildcards. Feed it "spaghetti, variant bolognese" and the actual keymap ends up being the various components put together: "pasta(spaghetti)+sauce(tomato)+mince". But for this to work you need to know that spag bol is available in the first place, i.e you need the menu. This applies to keyboard layouts too - the keyboard configuration panel needs to present a list so the users can clickedy click-click on whatever layout they think is best for them.

    This menu of possible layouts is provided by the xkeyboard-config project but for historical reasons [2], it is stored as an XML file named after the ruleset: usually /usr/share/X11/xkb/rules/evdev.xml [3]. Configuration utilities parse that file directly which is a bit of an issue when your actual keymap compiler starts supporting other include paths. Your fancy new layout won't show up because everything insists on loading the system layout menu only. This bit is part 2, i.e. this post here.

    If there's one thing that the world doesn't have enough of yet, it's low-level C libraries. So I hereby present to you: libxkbregistry. This library has now been merged into the libxkbcommon repository and provides a simple C interface to list all available models, layouts and options for a given ruleset. It sits in the same repository as libxkbcommon - long term this will allow us to better synchronise any changes to XKB handling or data formats as we can guarantee that the behaviour of both components is the same.

    Speaking of data formats, we haven't actually changed any of those which means they're about as easy to understand as your local COVID19 restrictions. In the previous post I outlined the example for the KcCGST and rules file, what you need now with libxkbregistry is an XKB-compatible XML file named after your ruleset. Something like this:


    $ cat $HOME/.config/xkb/rules/evdev.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE xkbConfigRegistry SYSTEM "xkb.dtd">
    <xkbConfigRegistry version="1.1">
    <layoutList>
    <layout>
    <configItem>
    <name>us</name>
    </configItem>
    <variantList>
    <variant>
    <configItem>
    <name>banana</name>
    <shortDescription>banana</shortDescription>
    <description>US (Banana)</description>
    </variant>
    </variantList>
    </layoutList>
    <optionList>
    <group allowMultipleSelection="true">
    <configItem>
    <name>custom</name>
    <description>Custom options</description>
    </configItem>
    <option>
    <configItem>
    <name>custom:foo</name>
    <description>Map Tilde to nothing</description>
    </configItem>
    </option>
    <option>
    <configItem>
    <name>custom:baz</name>
    <description>Map Z to K</description>
    </configItem>
    </option>
    </group>
    </optionList>
    </xkbConfigRegistry>
    This looks more complicated than it is: we have models (not shown here), layouts which can have multiple variants and options which are grouped together in option group (to make options mutually exclusive). libxkbregistry will merge this with the system layouts in what is supposed to be the most obvious merge algorithm. The simple summary of that is that you can add to existing system layouts but you can't modify those - the above example will add a "banana" variant to the US keyboard layout without modifying "us" itself or any of its other variants. The second part adds two new options based on my previous post.

    Now, all that is needed is to change every user of evdev.xml to use libxkbregistry. The gnome-desktop merge request is here for a start.

    [1] technically something that goes around something else doesn't bypass it but the earth is flat, the moon is made of cheese, facts don't matter anymore and stop being so pedantic about things already!
    [2] it's amazing what you can handwave away with "for historical reasons". Life would be better if there was less history to choose reasons from.
    [3] there's also evdev.extras.xml for very niche layouts which is a separate file for historical reasons [2], despite there being a "popularity" XML attribute

    Now firmware can depend on available client features

    Posted by Richard Hughes on June 29, 2020 03:56 PM

    At the moment we just blindly assume the capabilities of the front-end client when installing firmware. We can somewhat work around this limitation by requiring a new enough fwupd daemon version, but the GUI client software may be much older than the fwupd version or just incomplete. If you maintain a text or graphical client that uses fwupd to deploy updates then there’s an additional API call I’d like you to start using so we can fix this limitation.

    This would allow, for instance, the firmware to specify that it requires the client to be able to show a runtime detach image. This would not be set by a dumb command line tool using FwupdClient, but would be set by a GUI client that is capable of downloading a URL and showing a PNG to the user.

    Clients that do not register features are assumed to be dumb and won’t be offered firmware that has a hard requirement on showing a post-install “you need to restart the hardware manually” image and caption. The three known actions you can register for client feature support are can-report, detach-action and the recently added update-action. See this commit for more details about what each feature actually means.

    If you’re using libfwupd then it’s a simple call to fwupd_client_set_feature_flags() otherwise you’ll have to call the SetFeatureFlags() on the main D-Bus interface before requesting the list of updates. Simple!

    Making my doorbell work

    Posted by Matthew Garrett on June 24, 2020 05:09 AM
    I recently moved house, and the new building has a Doorbird to act as a doorbell and open the entrance gate for people. There's a documented local control API (no cloud dependency!) and a Home Assistant integration, so this seemed pretty straightforward.

    Unfortunately not. The Doorbird is on separate network that's shared across the building, provided by Monkeybrains. We're also a Monkeybrains customer, so our network connection is plugged into the same router and antenna as the Doorbird one. And, as is common, there's port isolation between the networks in order to avoid leakage of information between customers. Rather perversely, we are the only people with an internet connection who are unable to ping my doorbell.

    I spent most of the past few weeks digging myself out from under a pile of boxes, but we'd finally reached the point where spending some time figuring out a solution to this seemed reasonable. I spent a while playing with port forwarding, but that wasn't ideal - the only server I run is in the UK, and having packets round trip almost 11,000 miles so I could speak to something a few metres away seemed like a bad plan. Then I tried tethering an old Android device with a data-only SIM, which worked fine but only in one direction (I could see what the doorbell could see, but I couldn't get notifications that someone had pushed a button, which was kind of the point here).

    So I went with the obvious solution - I added a wifi access point to the doorbell network, and my home automation machine now exists on two networks simultaneously (nmcli device modify wlan0 ipv4.never-default true is the magic for "ignore the gateway that the DHCP server gives you" if you want to avoid this), and I could now do link local service discovery to find the doorbell if it changed addresses after a power cut or anything. And then, like magic, everything worked - I got notifications from the doorbell when someone hit our button.

    But knowing that an event occurred without actually doing something in response seems fairly unhelpful. I have a bunch of Chromecast targets around the house (a mixture of Google Home devices and Chromecast Audios), so just pushing a message to them seemed like the easiest approach. Home Assistant has a text to speech integration that can call out to various services to turn some text into a sample, and then push that to a media player on the local network. You can group multiple Chromecast audio sinks into a group that then presents as a separate device on the network, so I could then write an automation to push audio to the speaker group in response to the button being pressed.

    That's nice, but it'd also be nice to do something in response. The Doorbird exposes API control of the gate latch, and Home Assistant exposes that as a switch. I'm using Home Assistant's Google Assistant integration to expose devices Home Assistant knows about to voice control. Which means when I get a house-wide notification that someone's at the door I can just ask Google to open the door for them.

    So. Someone pushes the doorbell. That sends a signal to a machine that's bridged onto that network via an access point. That machine then sends a protobuf command to speakers on a separate network, asking them to stream a sample it's providing. Those speakers call back to that machine, grab the sample and play it. At this point, multiple speakers in the house say "Someone is at the door". I then say "Hey Google, activate the front gate" - the device I'm closest to picks this up and sends it to Google, where something turns my speech back into text. It then looks at my home structure data and realises that the "Front Gate" device is associated with my Home Assistant integration. It then calls out to the home automation machine that received the notification in the first place, asking it to trigger the front gate relay. That device calls out to the Doorbird and asks it to open the gate. And now I have functionality equivalent to a doorbell that completes a circuit and rings a bell inside my home, and a button inside my home that completes a circuit and opens the gate, except it involves two networks inside my building, callouts to the cloud, at least 7 devices inside my home that are running Linux and I really don't want to know how many computational cycles.

    The future is wonderful.

    (I work for Google. I do not work on any of the products described in this post. Please god do not ask me how to integrate your IoT into any of this)

    comment count unavailable comments

    It's templates all the way down - part 2

    Posted by Peter Hutterer on June 17, 2020 04:37 AM

    In Part 1 I've shown you how to create your own distribution image using the freedesktop.org CI templates. In Part 2, we'll go a bit further than that by truly embracing nested images.

    Our assumption here is that we have two projects (or jobs), with the second one relying heavily on the first one. For example, the base project and a plugin, or a base project and its language bindings. What we'll get out of this blog post is a setup where we have

    • a base image in the base project
    • an image extending that base image in a different project
    • automatic rebuilds of that extended image when the base image changes
    And none of your contributors have to care about this. It's all handled automatically and filing a MR against a project will build against the right image. So let's get started.

    Our base project has CI that pushes an image to its registry. The .gitlab-ci.yml contains something like this:


    .fedora32:
    variables:
    FDO_DISTRIBUTION_VERSION: '32'
    FDO_DISTRIBUTION_TAG: 'base.0'

    build-img:
    extends:
    - .fedora32
    - .fdo.container-build@fedora
    variables:
    FDO_DISTRIBUTION_PACKAGES: "curl wget"

    This will build a fedora/32:base.0 image in the project's container registry. That image is built once and then re-used by any job extending .fdo.distribution-image@fedora. So far, so Part 1.

    Now, the second project needs to test things on top of this base image, for example language bindings for rust. You want to use the same image that the base project uses (and has successfully completed its CI on) but you need some extra packages or setup. This is where the FDO_BASE_IMAGE comes in. In our dependent project, we have this:


    .fedora32:
    variables:
    FDO_DISTRIBUTION_VERSION: '32'
    FDO_DISTRIBUTION_TAG: 'rust.0'

    build-rust-image:
    extends:
    - .fedora32
    - .fdo.container-build@fedora
    variables:
    FDO_BASE_IMAGE: "registry.freedesktop.org/baseproject/name/fedora/32:base.0"
    # extra packages we want to install and things we need to set up
    FDO_DISTRIBUTION_PACKAGES: "rust cargo"
    FDO_DISTRIBUTION_EXEC: "bash -x some-setup-script.sh"

    test-rust:
    extends:
    - .fedora32
    - .fdo.distribution-image@fedora
    script:
    - cargo build myproject-bindings

    And voila, you now have two images: the base image with curl and wget in the base project and an extra image with rust and cargo in the dependent project. And all that is required is to reference the FDO_BASE_IMAGE, everything else is the same. Note how the FDO_BASE_IMAGE is a full path in this example since we assume it's in a different project. For dependent images within the same project, you can just use the image path without the host.

    The dependency only matters while the image is built, after that the dependent image is just another standalone image. So even if the base project removes the base image, you still have yours to test on.

    But eventually you will need to change the base image and you want the dependent image to update as well. The best solution here is to have a CI job as part of the base repo that pokes the dependent repo's CI whenever the base image updates. The CI templates add the pipeline id as label to an image when it is built. In your base project, you can thus have a job like this:


    poke-dependents:
    extends:
    - .fedora32
    - .fdo.distribution-image@fedora
    image: something-with-skopeo-and-jq
    script:
    # FDO_DISTRIBUTION_IMAGE still has indirections
    - DISTRO_IMAGE=$(eval echo ${FDO_DISTRIBUTION_IMAGE})
    # retrieve info from the registry and extract the pipeline id
    - JSON_IMAGE=$(skopeo inspect docker://$DISTRO_IMAGE)
    - IMAGE_PIPELINE_ID=$(echo $JSON_IMAGE | jq -r '.Labels["fdo.pipeline_id"]')
    - |
    if [[ x"$IMAGE_PIPELINE_ID" == x"$CI_PIPELINE_ID" ]]; then
    curl -X POST
    -F "token=$AUTH_TOKEN_VALUE"
    -F "ref=master"
    -F "variables[SOMEVARIABLE]=somevalue"
    https://gitlab.freedesktop.org/api/v4/projects/dependent${SLASH}project/trigger/pipeline
    fi
    variables:
    SLASH: "%2F"

    Let's dissect this: First, we use the .fdo.distribution-image@fedora template to get access to FDO_DISTRIBUTION_IMAGE. We don't need to use the actual image though, anything with skopeo and jq will do. Then we fetch the pipeline id label from the image and compare it to the current pipeline ID. If it is the same, our image was rebuilt as part of the pipeline and we poke the other project's pipeline with a SOMEVARIABLE set to somevalue. The auth token is a standard GitLab token you need to create to allow triggering the pipeline in the dependent project.

    In that dependent project you can have a job like this:


    rebuild-extra-image:
    extends: build-extra-image
    rules:
    - if: '$SOMEVARIABLE == "somevalue"'
    variables:
    FDO_FORCE_REBUILD: 1

    This job is only triggered where the variable is set and it will force a rebuild of the container image. If you want custom rebuilds of images, set the variables accordingly.

    So, as promised above, we now have a base image and a separate image building on that, together with auto-rebuild hooks. The gstreamer-plugins-rs project uses this approach. The base image is built by gstreamer-rs during its CI run which then pokes gstreamer-plugins-rs to rebuild selected dependent images.

    The above is most efficient when the base project knows of the dependent projects. Where this is not the case, the dependent project will need a scheduled pipeline to poll the base project and extract the image IDs from that, possibly using creation dates and whatnot. We'll figure that out when we have a use-case for it.

    Firefox on Fedora finally gets VA-API on Wayland.

    Posted by Martin Stransky on June 03, 2020 09:50 AM

    <figure aria-describedby="caption-attachment-147" class="wp-caption alignnone" data-shortcode="caption" id="attachment_147" style="width: 639px">video1<figcaption class="wp-caption-text" id="caption-attachment-147">I used of Toy Story 3 trailer as a test video and saw it a thousand times during the VA-API debugging. I should definitely watch the movie one day.</figcaption></figure>

    Yes, it’s finally here. One and half year after Tom Callaway, Engineering Manager @ Red Hat added the patch to Chromium we also get hardware accelerated video playback for Firefox. It’s shame it took too long but I’m still learning.

    The VA-API support in Firefox is a bit specific as it works under Wayland only right now. There isn’t any technical reason for that, I just don’t have enough time to implement it for X11 so Bug 1619523 is waiting for brave hackers.

    There are a lot of people who greatly contributed to the Firefox Wayland port. Jan Horak (Red Hat) did all the uneasy Wayland patches reviews I threw at him. Jonas Ådahl (Red Hat) helped me with Wayland backend since the first Wayland patch four years ago. Robert Mader faced various Mutter/Gtk compositor bugs, Kenny Levinsen implemented adaptive Wayland vsync handlers, Jan Andre Ikenmeyer has been tirelessly triaged new Wayland bugs and cleaning bugzilla. Sotaro Ikeda (Mozilla) reviewed almost all Wayland patches for graphics subsystem, Jean-Yves Avenard (Mozilla) reviewed VA-API video patches and Jeff Gilbert (Mozilla) faced to my OpenGL Wayland patches.

    The contributor list is not exhaustive as I mentioned only the most active ones who comes to mind right now. There are a lot of people who contribute to Firefox/Wayland. You’re the best!

    How to enable it in Fedora?

    When you run Gnome Wayland session on Fedora you get Firefox with Wayland backend by default. Make sure you have the latest Firefox 77.0 for Fedora 32 / Fedora 31.

    You also need working VA-API acceleration and ffmpeg (valib) packages. They are provided by RPM Fusion repository. Enable it and install ffmpeg, libva and libva-utils.

    Intel graphics card

    There are two drivers for Intel cards, libva-intel-driver (provides i965_drv_video.so) and libva-intel-hybrid-driver (iHD_drv_video.so). Firefox works with libva-intel-driver only, intel-media-driver is broken due to sandboxing issues (Bug 1619585). I strongly recommend to avoid it all cost and don’t disable media sandbox for it.

    AMD graphics card

    AMD open source drivers decode video with radeonsi_drv_video.so library which is provided by mesa-dri-drivers package and it comes with Fedora by default.

    NVIDIA graphics cards

    I have no idea how NVIDIA cards are supported because I don’t owny any. Please refer to Fedora VA-API page for details.

    Test VA-API state

    When you have the driver set it’s time to prove it. Run vainfo on terminal and check which media formats are decoded on the hardware.

    vaapi1

    There’s vainfo output from my laptop with integrated Intel UHD Graphics 630. Loads i965_drv_video.so driver and decodes H.264/VP8/VP9 video formats. I don’t expect much more from it – seems to be up.

    Configure Firefox

    It’s time to whip up the lazy fox 🙂 At about:config set gfx.webrender.enabled and widget.wayland-dmabuf-vaapi.enabled. Restart browser, go to about:support and make sure WebRender is enabled…

    vaapi2

    …and Window Protocol is Wayland/drm.

    vaapi3

    Right now you should be able to decode and play clips on your graphics cards only without any CPU interaction.

    Get more info from Firefox log

    VA-API video playback may not work from various reason. Incompatible video codec, large video size, missing system libraries and so on. All those errors can be diagnosed by Firefox media log. Run on terminal

    MOZ_LOG="PlatformDecoderModule:5" MOZ_ENABLE_WAYLAND=1 firefox

    and you should see something like

    vaapi-log

    VA-API FFmpeg init successful” claims the VA-API is up and running, VP9 is the video format and “Got one VAAPI frame output…” line confirms that frame decoding works.

    VA-API and Youtube

    Unfortunately Youtube tends to serve various video formats, from H.264 to AV1. Actual codec info is shown after right click on video under “Stats for nerds” option.

    <figure aria-describedby="caption-attachment-146" class="wp-caption alignnone" data-shortcode="caption" id="attachment_146" style="width: 1095px">vaapi4<figcaption class="wp-caption-text" id="caption-attachment-146">Surprisingly “avc1” means H.264 video. You can expect also AV1 and VP8/VP9 there.</figcaption></figure>

    Youtube video codec can be changed by enhanced-h264ify Firefox add-on, so disable all SW decoded formats there. And that’s it. If you’re running Fedora you should be settled for now.

    <figure aria-describedby="caption-attachment-148" class="wp-caption aligncenter" data-shortcode="caption" id="attachment_148" style="width: 639px">video2<figcaption class="wp-caption-text" id="caption-attachment-148">We’re done, bro!</figcaption></figure>

    VA-API with stock Mozilla binaries

    Stock Mozilla Firefox 77.0 is missing some important stability/performance VA-API fixes which hit Firefox 78.0 and are backported to Fedora Firefox package. You should grab latest nightly binaries or Developer/Beta versions and run them under Wayland as

    MOZ_ENABLE_WAYLAND=1 ./firefox

    Mozilla binaries perform VP8/VP9 decoding by bundled libvpx library which is missing VA-API decode path. If your hardware supports it and you want to use VA-API for VP8/VP9 decoding, you need to disable bundled libvpx and force external ffmpeg. Go to about:config and set media.ffvpx.enabled to false. Fedora sets that by default when VA-API is enabled.

     

    Disrupted CVE Assignment Process

    Posted by Michael Catanzaro on May 27, 2020 09:00 PM

    Due to an invalid TLS certificate on MITRE’s CVE request form, I have — ironically — been unable to request a new CVE for a TLS certificate verification vulnerability for a couple weeks now. (Note: this vulnerability does not affect WebKit and I’m only aware of one vulnerable application, so impact is limited; follow the link if you’re curious.) MITRE, if you’re reading my blog, your website’s contact form promises a two-day response, but it’s been almost three weeks now, still waiting.

    Update May 29: I received a response today stating my request has been forwarded to MITRE’s IT department, and less than an hour later the issue is now fixed. I guess that’s score +1 for blog posts. Thanks for fixing this, MITRE.

    Browser security warning on MITRE's CVE request form

    Of course, the issue is exactly the same as it was five years ago, the server is misconfigured to send only the final server certificate with no chain of trust, guaranteeing failure in Epiphany or with command-line tools. But the site does work in Chrome, and sometimes works in Firefox… what’s going on? Again, same old story. Firefox is accepting incomplete certificate chains based on which websites you’ve visited in the past, so you might be able to get to the CVE request form or not depending on which websites you’ve previously visited in Firefox, but a fresh profile won’t work. Chrome has started downloading the missing intermediate certificate automatically from the issuer, which Firefox refuses to implement for fear of allowing the certificate authority to track which websites you’re visiting. Eventually, we’ll hopefully have this feature in GnuTLS, because Firefox-style nondeterministic certificate verification is nuts and we have to do one or the other to be web-compatible, but for now that is not supported and we reject the certificate. (I fear I may have delayed others from implementing the GnuTLS support by promising to implement it myself and then never delivering… sorry.)

    We could have a debate on TLS certificate verification and the various benefits or costs of the Firefox vs. Chrome approach, but in the end it’s an obvious misconfiguration and there will be no further CVE requests from me until it’s fixed. (Update May 29: the issue is now fixed. :) No, I’m not bypassing the browser security warning, even though I know exactly what’s wrong. We can’t expect users to take these seriously if we skip them ourselves.

    xisxwayland checks for Xwayland ... or not

    Posted by Peter Hutterer on May 19, 2020 10:30 AM

    One of the more common issues we encounter debugging things is that users don't always know whether they're running on a Wayland or X11 session. Which I guess is a good advertisement for how far some of the compositors have come. The question "are you running on Xorg or Wayland" thus comes up a lot and suggestions previously included things like "run xeyes", "grep xinput list", "check xrandr" and so on and so forth. None of those are particularly scriptable, so there's a new tool around now: xisxwayland.

    Run without arguments it simply exits with exit code 0 if the X server is Xwayland, or 1 otherwise. Which means use can use it like this:


    $ cat my-xorg-only-script.sh
    #!/bin/bash

    if xisxwayland; then
    echo "This is an Xwayland server!";
    exit 1
    fi

    ...
    Or, in the case where you have a human user (gasp!), you can ask them to run:

    $ xisxwayland --verbose
    Xwayland: YES
    And even non-technical users should be able to interpret that.

    Note that the script checks for Xwayland (hence the name) via the $DISPLAY environment variable, just like any X application. It does not check whether there's a Wayland compositor running but for most use-cases this doesn't matter anyway. For those where it matters you get to write your own script. Congratulations, I guess.

    Patching Vendored Rust Dependencies

    Posted by Michael Catanzaro on May 18, 2020 03:56 PM

    Recently I had a difficult time trying to patch a CVE in librsvg. The issue itself was simple to patch because Federico kindly backported the series of commits required to fix it to the branch we are using downstream. Problem was, one of the vendored deps in the old librsvg tarball did not build with our modern rustc, because the code contained a borrow error that was not caught by older versions of rustc. After finding the appropriate upstream fix, I tried naively patching the vendored dep, but that failed because cargo tries very hard to prevent you from patching its dependencies, and complains if the dependency does not match its checksum in Cargo.lock. I tried modifying the checksum in Cargo.lock, but then it complains that you modified the Cargo.lock. It seems cargo is designed to make patching dependencies as difficult as possible, and that not much thought was put into how cargo would be used from rpmbuild with no network access.

    Anyway, it seems the kosher way to patch Rust dependencies is to add a [patch] section to librsvg’s Cargo.toml, but I could not figure out how to make that work. Eventually, I got some help: you can edit the .cargo-checksum.json of the vendored dependency and change “files” to an empty array, like so:

    diff --git a/vendor/cssparser/.cargo-checksum.json b/vendor/cssparser/.cargo-checksum.json
    index 246bb70..713372d 100644
    --- a/vendor/cssparser/.cargo-checksum.json
    +++ b/vendor/cssparser/.cargo-checksum.json
    @@ -1 +1 @@
    -{"files":{".cargo-ok":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",".travis.yml":"f1fb4b65964c81bc1240544267ea334f554ca38ae7a74d57066f4d47d2b5d568","Cargo.toml":"7807f16d417eb1a6ede56cd4ba2da6c5c63e4530289b3f0848f4b154e18eba02","LICENSE":"fab3dd6bdab226f1c08630b1dd917e11fcb4ec5e1e020e2c16f83a0a13863e85","README.md":"c5781e673335f37ed3d7acb119f8ed33efdf6eb75a7094b7da2abe0c3230adb8","build.rs":"b29fc57747f79914d1c2fb541e2bb15a003028bb62751dcb901081ccc174b119","build/match_byte.rs":"2c84b8ca5884347d2007f49aecbd85b4c7582085526e2704399817249996e19b","docs/.nojekyll":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","docs/404.html":"025861f76f8d1f6d67c20ab624c6e418f4f824385e2dd8ad8732c4ea563c6a2e","docs/index.html":"025861f76f8d1f6d67c20ab624c6e418f4f824385e2dd8ad8732c4ea563c6a2e","src/color.rs":"c60f1b0ab7a2a6213e434604ee33f78e7ef74347f325d86d0b9192d8225ae1cc","src/cow_rc_str.rs":"541216f8ef74ee3cc5cbbc1347e5f32ed66588c401851c9a7d68b867aede1de0","src/from_bytes.rs":"331fe63af2123ae3675b61928a69461b5ac77799fff3ce9978c55cf2c558f4ff","src/lib.rs":"46c377e0c9a75780d5cb0bcf4dfb960f0fb2a996a13e7349bb111b9082252233","src/macros.rs":"adb9773c157890381556ea83d7942dcc676f99eea71abbb6afeffee1e3f28960","src/nth.rs":"5c70fb542d1376cddab69922eeb4c05e4fcf8f413f27563a2af50f72a47c8f8c","src/parser.rs":"9ed4aec998221eb2d2ba99db2f9f82a02399fb0c3b8500627f68f5aab872adde","src/rules_and_declarations.rs":"be2c4f3f3bb673d866575b6cb6084f1879dff07356d583ca9a3595f63b7f916f","src/serializer.rs":"4ccfc9b4fe994aab3803662bbf31cc25052a6a39531073a867b14b224afe42dd","src/size_of_tests.rs":"e5f63c8c18721cc3ff7a5407e84f9889ffa10e66da96e8510a696c3e00ad72d5","src/tests.rs":"80b02c80ab0fd580dad9206615c918e0db7dff63dfed0feeedb66f317d24b24b","src/tokenizer.rs":"429b2cba419cf8b923fbcc32d3bd34c0b39284ebfcb9fc29b8eb8643d8d5f312","src/unicode_range.rs":"c1c4ed2493e09d248c526ce1ef8575a5f8258da3962b64ffc814ef3bdf9780d0"},"package":"8a807ac3ab7a217829c2a3b65732b926b2befe6a35f33b4bf8b503692430f223"}
    \ No newline at end of file
    +{"files":{},"package":"8a807ac3ab7a217829c2a3b65732b926b2befe6a35f33b4bf8b503692430f223"}

    Then cargo will stop complaining and you can patch the dependency. Success!

    Converting to encrypted swap

    Posted by Richard Hughes on May 14, 2020 01:03 PM

    I’m working on a firmware platform security specification which we will announce soon. Most of the things we test are firmware protections the user cannot actually change, but there are some runtime checks we do to make sure we can actually trust the results from the kernel. For instance, if you load unknown random modules into the kernel (which means it becomes “tainted”) you can’t actually trust the values reported. Another basic sanity check we do is checking for encrypted swap space.

    My Lenovo P50 was installed with Fedora 29ish, a long time ago, with encrypted /home and unencrypted swap. It’s been upgraded quite a few times and I’m not super keen on re-installing it now. I wanted to upgrade to encrypted swap so I could pass the same requirements that I’m going to be asking people to meet.

    Please don’t just copy and paste the below, as you will have a different swap partition to me. If you choose the wrong partition you will either overwrite your data or your root partition, so be careful. Caveat emptor, and all that.

    So, lets get started. Lets turn off the existing swap partition:

    [root@localhost ~]# cat /proc/swaps
    Filename				Type		Size	Used	Priority
    /dev/nvme0n1p4                          partition	5962748	0	-2
    [root@localhost ~]# swapoff /dev/nvme0n1p4
    

    Lets overwrite the existing partition with zeros, as it might have data that we’d consider private:

    dd if=/dev/zero of=/dev/nvme0n1p4 bs=102400
    

    We then need to change /etc/fstab from

    # Created by anaconda on Mon Dec  9 09:05:10 2019
    ...
    UUID=97498951-0a49-4110-b838-dd90d02ea11f none                    swap    defaults        0 0
    ...
    

    to

    ...
    /dev/mapper/swap                          none                    swap    defaults        0 0    
    ...
    

    We then need to append to /etc/crypttab:

    swap /dev/nvme0n1p4 /dev/urandom swap,cipher=aes-cbc-essiv:sha256,size=256
    

    Reboot, and then cat /proc/swaps will show you using a dm device. Done!

    Wayland doesn't support $BLAH

    Posted by Peter Hutterer on May 08, 2020 12:40 AM

    Gather round children, it's analogy time! First, some definitions:

    • Wayland is a protocol to define the communication between a display server (the "compositor") and a client, i.e. an application though the actual communication is usually done by a toolkit like GTK or Qt.
    • A Wayland Compositor is an implementation of a display server that (usually but not necessary) handles things like displaying stuff on screen and handling your input devices, among many other things. Common examples for Wayland Compositors are GNOME's mutter, KDE's KWin, weston, sway, etc.[1]

    And now for the definitions we need for our analogy:

    • HTTP is a protocol to define the communication between a web server and a client (usually called the "browser")
    • A Web Browser is an implementation that (sometimes but not usually) handles things like displaying web sites correctly, among many other things. Common examples for Web Browsers are Firefox, Chrome, Edge, Safari, etc. [2]

    And now for the analogy:

    The following complaints are technically correct but otherwise rather pointless to make:

    • HTTP doesn't support CSS
    • HTTP doesn't support adblocking
    • HTTP doesn't render this or that website correctly
    And much in the same style, the following complaints are technically correct but otherwise rather pointless to make:
    • Wayland doesn't support keyboard shortcuts
    • Wayland doesn't support screen sharing
    • Wayland doesn't render this or that application correctly
    In most cases you may encounter (online or on your setup), saying "Wayland doesn't support $BLAH" is like saying "HTTP doesn't support $BLAH". The problem isn't in with Wayland itself, it's a missing piece or bug in the compositor you're using.

    Likewise, saying "I don't like Wayland" is like saying "I don't like HTTP".The vast majority of users will have negative feelings towards the browser, not the transport protocol.

    [1] Because they're implementations of a display server they can speak multiple protocols and some compositors can also be X11 window managers, much in the same way as you can switch between English and your native language(s).[2] Because they're implementations of a web browser they can speak multiple protocols and some browsers can also be FTP file managers, much in the same way as... you get the point