May 16, 2022

v3dv status update 2022-05-16

We haven’t posted updates to the work done on the V3DV driver since
we announced the driver becoming Vulkan 1.1 Conformant.

But after reaching that milestone, we’ve been very busy working on more improvements, so let’s summarize the work done since then.

Multisync support

As mentioned on past posts, for the Vulkan driver we tried to focus as much as possible on the userspace part. So we tried to re-use the already existing kernel interface that we had for V3D, used by the OpenGL driver, without modifying/extending it.

This worked fine in general, except for synchronization. The V3D kernel interface only supported one synchronization object per submission. This didn’t properly map with Vulkan synchronization, which is more detailed and complex, and allowed defining several semaphores/fences. We initially handled the situation with workarounds, and left some optional features as unsupported.

After our 1.1 conformance work, our colleage Melissa Wen started to work on adding support for multiple semaphores on the V3D kernel side. Then she also implemented the changes on V3DV to use this new feature. If you want more technical info, she wrote a very detailed explanation on her blog (part1 and part2).

For now the driver has two codepaths that are used depending on if the kernel supports this new feature or not. That also means that, depending on the kernel, the V3DV driver could expose a slightly different set of supported features.

More common code – Migration to the common synchronization framework

For a while, Mesa developers have been doing a great effort to refactor and move common functionality to a single place, so it can be used by all drivers, reducing the amount of code each driver needs to maintain.

During these months we have been porting V3DV to some of that infrastructure, from small bits (common VkShaderModule to NIR code), to a really big one: common synchronization framework.

As mentioned, the Vulkan synchronization model is really detailed and powerful. But that also means it is complex. V3DV support for Vulkan synchronization included heavy use of threads. For example, V3DV needed to rely on a CPU wait (polling with threads) to implement vkCmdWaitEvents, as the GPU lacked a mechanism for this.

This was common to several drivers. So at some point there were multiple versions of complex synchronization code, one per driver. But, some months ago, Jason Ekstrand refactored Anvil support and collaborated with other driver developers to create a common framework. Obviously each driver would have their own needs, but the framework provides enough hooks for that.

After some gitlab and IRC chats, Jason provided a Merge Request with the port of V3DV to this new common framework, that we iterated and tested through the review process.

Also, with this port we got timelime semaphore support for free. Thanks to this change, we got ~1.2k less total lines of code (and have more features!).

Again, we want to thank Jason Ekstrand for all his help.

Support for more extensions:

Since 1.1 got announced the following extension got implemented and exposed:

  • VK_EXT_debug_utils
  • VK_KHR_timeline_semaphore
  • VK_KHR_create_renderpass2
  • VK_EXT_4444_formats
  • VK_KHR_driver_properties
  • VK_KHR_16_bit_storage and VK_KHR_8bit_storage
  • VK_KHR_imageless_framebuffer
  • VK_KHR_depth_stencil_resolve
  • VK_EXT_image_drm_format_modifier
  • VK_EXT_line_rasterization
  • VK_EXT_inline_uniform_block
  • VK_EXT_separate_stencil_usage
  • VK_KHR_separate_depth_stencil_layouts
  • VK_KHR_pipeline_executable_properties
  • VK_KHR_shader_float_controls
  • VK_KHR_spirv_1_4

If you want more details about VK_KHR_pipeline_executable_properties, Iago wrote recently a blog post about it (here)

Android support

Android support for V3DV was added thanks to the work of Roman Stratiienko, who implemented this and submitted Mesa patches. We also want to thank the Android RPi team, and the Lineage RPi maintainer (Konsta) who also created and tested an initial version of that support, which was used as the baseline for the code that Roman submitted. I didn’t test it myself (it’s in my personal TO-DO list), but LineageOS images for the RPi4 are already available.

Performance

In addition to new functionality, we also have been working on improving performance. Most of the focus was done on the V3D shader compiler, as improvements to it would be shared among the OpenGL and Vulkan drivers.

But one of the features specific to the Vulkan driver (pending to be ported to OpenGL), is that we have implemented double buffer mode, only available if MSAA is not enabled. This mode would split the tile buffer size in half, so the driver could start processing the next tile while the current one is being stored in memory.

In theory this could improve performance by reducing tile store overhead, so it would be more benefitial when vertex/geometry shaders aren’t too expensive. However, it comes at the cost of reducing tile size, which also causes some overhead on its own.

Testing shows that this helps in some cases (i.e the Vulkan Quake ports) but hurts in others (i.e. Unreal Engine 4), so for the time being we don’t enable this by default. It can be enabled selectively by adding V3D_DEBUG=db to the environment variables. The idea for the future would be to implement a heuristic that would decide when to activate this mode.

FOSDEM 2022

If you are interested in watching an overview of the improvements and changes to the driver during the last year, we made a presention in FOSDEM 2022:
“v3dv: Status Update for Open Source Vulkan Driver for Raspberry Pi
4”

Can we fix bearer tokens?

Last month I wrote about how bearer tokens are just awful, and a week later Github announced that someone had managed to exfiltrate bearer tokens from Heroku that gave them access to, well, a lot of Github repositories. This has inevitably resulted in a whole bunch of discussion about a number of things, but people seem to be largely ignoring the fundamental issue that maybe we just shouldn't have magical blobs that grant you access to basically everything even if you've copied them from a legitimate holder to Honest John's Totally Legitimate API Consumer.

To make it clearer what the problem is here, let's use an analogy. You have a safety deposit box. To gain access to it, you simply need to be able to open it with a key you were given. Anyone who turns up with the key can open the box and do whatever they want with the contents. Unfortunately, the key is extremely easy to copy - anyone who is able to get hold of your keyring for a moment is in a position to duplicate it, and then they have access to the box. Wouldn't it be better if something could be done to ensure that whoever showed up with a working key was someone who was actually authorised to have that key?

To achieve that we need some way to verify the identity of the person holding the key. In the physical world we have a range of ways to achieve this, from simply checking whether someone has a piece of ID that associates them with the safety deposit box all the way up to invasive biometric measurements that supposedly verify that they're definitely the same person. But computers don't have passports or fingerprints, so we need another way to identify them.

When you open a browser and try to connect to your bank, the bank's website provides a TLS certificate that lets your browser know that you're talking to your bank instead of someone pretending to be your bank. The spec allows this to be a bi-directional transaction - you can also prove your identity to the remote website. This is referred to as "mutual TLS", or mTLS, and a successful mTLS transaction ends up with both ends knowing who they're talking to, as long as they have a reason to trust the certificate they were presented with.

That's actually a pretty big constraint! We have a reasonable model for the server - it's something that's issued by a trusted third party and it's tied to the DNS name for the server in question. Clients don't tend to have stable DNS identity, and that makes the entire thing sort of awkward. But, thankfully, maybe we don't need to? We don't need the client to be able to prove its identity to arbitrary third party sites here - we just need the client to be able to prove it's a legitimate holder of whichever bearer token it's presenting to that site. And that's a much easier problem.

Here's the simple solution - clients generate a TLS cert. This can be self-signed, because all we want to do here is be able to verify whether the machine talking to us is the same one that had a token issued to it. The client contacts a service that's going to give it a bearer token. The service requests mTLS auth without being picky about the certificate that's presented. The service embeds a hash of that certificate in the token before handing it back to the client. Whenever the client presents that token to any other service, the service ensures that the mTLS cert the client presented matches the hash in the bearer token. Copy the token without copying the mTLS certificate and the token gets rejected. Hurrah hurrah hats for everyone.

Well except for the obvious problem that if you're in a position to exfiltrate the bearer tokens you can probably just steal the client certificates and keys as well, and now you can pretend to be the original client and this is not adding much additional security. Fortunately pretty much everything we care about has the ability to store the private half of an asymmetric key in hardware (TPMs on Linux and Windows systems, the Secure Enclave on Macs and iPhones, either a piece of magical hardware or Trustzone on Android) in a way that avoids anyone being able to just steal the key.

How do we know that the key is actually in hardware? Here's the fun bit - it doesn't matter. If you're issuing a bearer token to a system then you're already asserting that the system is trusted. If the system is lying to you about whether or not the key it's presenting is hardware-backed then you've already lost. If it lied and the system is later compromised then sure all your apes get stolen, but maybe don't run systems that lie and avoid that situation as a result?

Anyway. This is covered in RFC 8705 so why aren't we all doing this already? From the client side, the largest generic issue is that TPMs are astonishingly slow in comparison to doing a TLS handshake on the CPU. RSA signing operations on TPMs can take around half a second, which doesn't sound too bad, except your browser is probably establishing multiple TLS connections to subdomains on the site it's connecting to and performance is going to tank. Fixing this involves doing whatever's necessary to convince the browser to pipe everything over a single TLS connection, and that's just not really where the web is right at the moment. Using EC keys instead helps a lot (~0.1 seconds per signature on modern TPMs), but it's still going to be a bottleneck.

The other problem, of course, is that ecosystem support for hardware-backed certificates is just awful. Windows lets you stick them into the standard platform certificate store, but the docs for this are hidden in a random PDF in a Github repo. Macs require you to do some weird bridging between the Secure Enclave API and the keychain API. Linux? Well, the standard answer is to do PKCS#11, and I have literally never met anybody who likes PKCS#11 and I have spent a bunch of time in standards meetings with the sort of people you might expect to like PKCS#11 and even they don't like it. It turns out that loading a bunch of random C bullshit that has strong feelings about function pointers into your security critical process is not necessarily something that is going to improve your quality of life, so instead you should use something like this and just have enough C to bridge to a language that isn't secretly plotting to kill your pets the moment you turn your back.

And, uh, obviously none of this matters at all unless people actually support it. Github has no support at all for validating the identity of whoever holds a bearer token. Most issuers of bearer tokens have no support for embedding holder identity into the token. This is not good! As of last week, all three of the big cloud providers support virtualised TPMs in their VMs - we should be running CI on systems that can do that, and tying any issued tokens to the VMs that are supposed to be making use of them.

So sure this isn't trivial. But it's also not impossible, and making this stuff work would improve the security of, well, everything. We literally have the technology to prevent attacks like Github suffered. What do we have to do to get people to actually start working on implementing that?

comment count unavailable comments

May 15, 2022

Pika Backup 0.4 Released with Schedule Support

Pika Backup is an app focused on backups of personal data. It’s internally based on BorgBackup and provides fast incremental backups.

Pika Backup version 0.4 has been released today. This release wraps up a year of development work. After the huge jump to supporting scheduled backups and moving to GTK 4 and Libadwaita, I am planning to go back to slightly more frequent and smaller releases. At the same time, well-tested and reliable releases will remain a priority of the project.

Release Overview

The release contains 41 resolved issues, 27 changelog entries, and a codebase that despite many cleanups nearly doubled. Here is a short rundown of the changes.

      • Ability to schedule regular backups.
      • Support for deleting old archives.
      • Revamped graphical interface including a new app icon.
      • Better compression and performance for backups.
      • Several smaller issues rectified.

You can find a more complete list of changes in Pika’s Changelog.

Thanks!

Pika Backup’s backbone is the BorgBackup software. If you want to support them you can check out their funding page. The money will be well spent.

A huge thank you to everyone who helped with this release. Especially Alexander, Fina, Zander, but also everyone else, the borg maintainers, the translators, the beta testers, people who reported issues or contributed ideas, and last but not least, all the people that gave me so much positive and encouraging feedback!

Resources

May 14, 2022

Crosswords 0.3

I’m pleased to announce Crosswords 0.3. This is the first version that feels ready for public consumption. Unlike the version I announced five months ago, it is much more robust and has some key new features.

New in this version:

  • Available on flathub: After working on it out of the source tree for a long while, I finally got the flatpaks working. Download it and try it out—let me know what you think! I’d really appreciate any sort of feedback, positive or negative.

  • Puzzle Downloaders: This is a big addition. We now support external downloaders and puzzle sets. This means it’s possible to have a tool that downloads puzzles from the internet. It also lets us ship sets of puzzles for the user to solve. With a good set of puzzles, we could even host them on gnome.org. These can be distributed externally, or they can be distributed via flatpak extensions (if not installing locally). I wrapped xword-dl and puzzlepull with a downloader to add some newspaper downloaders, and I’m looking to add more original puzzles shortly.

  • Dark mode: I thought that this was the neatest feature in GNOME 42. We now support both light and dark mode and honor the system setting. CSS is also heavily used to style the app allowing for future visual modifications and customizations. I’m interested in allowing  css customization on a per-puzzle set basis.

  • Hint Button: This is a late addition. It can be used to suggest a random word that fits in the current row. It’s not super convenient, but I also don’t want to make the game too easy! We use Peter Broda’s wordlist as the basis for this.
  • .puz support: Internally we use the unencumbered .ipuz format. This is a pretty decent format and supports a wide-variety of crossword features. But there’s a lot of crosswords out there that use the .puz format and a I know people have large collections of puzzles in that format. I wrote a simple convertor to load these.

Next steps

I hope to release this a bit more frequently now that we have gotten to this stage. Next on my immediate list:

  • Revisit clue formats; empirically crossword files in the wild play a little fast-and-loose with escaping and formatting (eg. random entities and underline-escaping).
  • Write a Puzzle Set manager that will let you decide which downloaders/puzzles to show, as well as launch gnome-software to search for more.
  • Document the external Puzzle Set format to allow others to package up games.
  • Fix any bugs that are found.

I also plan to work on the Crossword editor and get that ready for a release on flathub. The amazing-looking AdwEntryRow should make parts of the design a lot cleaner.

But, no game is good without being fun! I am looking to expand the list of puzzles available. Let me know if you write crosswords and want to include them in a puzzle set.

Thanks

I couldn’t have gotten this release out without lots of help. In particular:

  • Federico, for helping to  refactor the main code and lots of great advice
  • Nick, for explaining how to get apps into flathub
  • Alexander, for his help with getting toasts working with a new behavior
  • Parker, for patiently explaining to me how the world of Crosswords worked
  • The folks on #gtk for answering questions and producing an excellent toolkit
  • And most importantly Rosanna, for testing everything and for her consistent cheering and support

Download on FLATHUB

Voice

Voice is a new Public Voice Communication Software being built on GNOME 42.

Voice will let you listen to and share short, personal and enjoyable Voicegrams via electronic mail and on the World Wide Web by GNOME executives, employees and volunteers. Xiph.org Ogg Vorbis is a patent-free audio codec that more and more Free Software programs, including GNOME Voice (https://www.gnomevoice.org/) have implemented, so that you can listen to Voicegram recordings with good/fair recording quality by accessing the Voicegram file $HOME/Music/GNOME.ogg in the G_USER_DIRECTORY_MUSIC folder in Evolution or Nautilus.

Currently it records sound waves from the live microphone into $HOME/Music/GNOME.ogg (or $HOME/Musikk/GNOME.ogg on Norwegian bokmål systems) and plays back an audio stream from api.perceptron.stream:8000/56.ogg simultaneously on GNOME 42.

The second Voice 0.0.2 release with live microphone recording into $HOME/Music/GNOME.ogg and a concert experience with Sondre Lerche (Honolulu, Hawaii) and presenter Neil McGovern (Executive Director, GNOME Foundation) is available from https://download.gnome.org/sources/gnome-voice/0.0/gnome-voice-0.0.2.tar.xz

More information about Voice is available on https://wiki.gnome.org/Apps/Voice and http://www.gnomevoice.org/

Builder GTK 4 Porting, Part IV

This week was a little slower as I was struggling with an adjustment to my new medication. Things progress nonetheless.

Text Editor

I spent a little time this week triaging some incoming Text Editor issues and feature requests. I’d really like this application to get into maintenance mode soon because I have plenty of other projects to maintain.

  • Added support for gnome-text-editor - to open a file from standard input, even if you’re communicating to a single instance application from terminal.
  • Branch GNOME 42 so we can add new strings.
  • Fix a no-data-loss crash during shutdown.

Template-GLib

  • Fix template evaluation on macOS.
  • Make boolean expression precedence more predictable.
  • Cleanup output of templates with regards to newlines.

libpanel

  • Propagate modified page state to tabs
  • Some action tweaks to make things more keyboard shortcut friendly.

Builder

  • Merged support for configuration editing from Georges.
  • Add lots of keybindings using our new keybinding engine.
  • Track down and triage that shortcut controllers do not capture/bubble to popovers. Added workarounds for common popovers in Builder.
  • Teach Builder to load keybindings from plugins again and auto-manage them.
  • Lots of tweaks to the debugger UI and where widgetry is placed.
  • Added syntax highlighting for debugger disassembly.
  • Added menus and toggles for various logging and debugger hooks. You can get a breakpoint on g_warning() or g_critical() by checking a box.
  • Ability to select a build target as the default build target finally.
  • More menuing fixes all over the place, particularly with treeviews and sourceviews.
  • Fix keyboard navigation and activation for the new symbol-tree
  • Port the find-other-file plugin to the new workspace design which no longer requires using global search.
  • GTK 4 doesn’t appear to scroll to cells in textview as reliably as I’d like, so I dropped the animation code in Builder and we jump strait to the target before showing popovers.
  • Various work on per-language settings overrides by project.
  • Drop the Rust rls plugin as we can pretty much just rely on rust-analyzer now.
  • Lots of CSS tweaks to make things fit a bit better with upcoming GNOME styling.
  • Fix broken dialog which prevented SDK updates from occurring with other dependencies.

A screenshot of builder's find-other-file plugin

A screenshot of Builder's debugger

A screenshot showing the build target selection dialog

A screenshot of the run menu A screenshot of the logging menu

May 13, 2022

Trying out systemd’s Portable Services

I recently acquired a monome grid, a set of futuristic flashing buttons which can be used for controlling software, making music, and/or playing the popular 90’s game Lights Out.

Lights Out game

There’s no sound from the device itself, all it outputs is a USB serial connection. Software instruments connect to the grid to receive button presses and control the lights via the widely-supported protocol Open Sound Control protocol. I am using monome-rs to convert the grid signals into MIDI, send them to Bitwig Studio and make interesting noises, which I am excited to share with you in the future, but first we need to talk about software packaging.

Monome provide a system service named serialosc, which connects to the grid hardware (over USB-serial) and provides the Open Sound Control endpoint. This program is not packaged in by Linux distributions and that is fine, it’s rather niche hardware and distro maintainers shouldn’t have support every last weird device. On the other hand, it’s rather crude to build it from source myself, install it into /usr/local, add a system service, etc. etc. Is there a better way?

First I tried bundling serialosc along with my control app using Flatpak, which is semi-possible – you can build and run the service, but it can’t see the device unless you set the super-insecure “–devices=all” mode, and it still can’t detect the device because udev is not available, so you would have to hardcode the driver name /dev/ttyACM0 and hotplug no longer works … basically this is not a good option.

Then I read about systemd’s Portable Services. This is a way to ship system services in containers, which sounds like the correct way to treat something like serialosc. So I followed the portable walkthrough and within a couple of hours the job was done: here’s a PR that could add Portable Services packaging upstream: https://github.com/monome/serialosc/pull/62

I really like the concept here, it has some pretty clear advantages as a way to distribute system services:

  • Upstreams can deliver Linux binaries of services in a (relatively) distro-independent way
  • It works on immutable-/usr systems like Fedora Silverblue and Endless OS
  • It encourages containerized builds, which can lower the barrier for developing local improvements.

This is quite a new technology and I have mixed feelings about the current implementation. Firstly, when I went to screenshot the service for this blog post, i discovered it had broken:

screenshot of terminal showing "Failed at step EXEC - permission denied"

Fixing the error was a matter of – disabling SELinux. On my Fedora 35 machine the SELinux profile seems totally unready for Portable Services, as evidenced in this bug I reported, and this similar bug someone else reported a year ago which was closed without fixing. I accept that you get a great OS like Fedora for free in return for being a beta tester of SELinux, but this suggests portable services are not yet ready widespread use.

That’s better:

I used the recommended mkosi build to create the serialosc container, which worked as documented and was pretty quick. All mkosi operations have to run as root. This is unfortunate and also interacts badly with the Git safe.directory feature (the source trees are owned by me, not root, so Git’s default config raises an error).

It’s a little surprising the standard OCI container format isn’t supported, only disk images or filesystem trees. Initially I built a 27MB squashfs, but this didn’t work as the container rootfs has to be read/write (another surprise) so I deployed a tree of files in the end. The service container is Fedora-based and comes out at 76MB across 5,200 files – that’s significant bloat around a service implemented in a few 1000 lines of C code. If mkosi supported Alpine Linux as base OS we could likely reduce this overhead significantly.

The build/test workflow could be optimised but is already workable, the following steps take about 30 seconds with a warm mkosi.cache directory:

  • sudo mkosi -t directory -o /opt/serialosc --force
  • sudo portablectl reattach --enable --now /opt/serialosc --profile trusted

We are using “Trusted” profile because the service needs access to udev and /dev, by the way.

All in all, the core pieces are already in place for a very promising new technology that should make it easier for 3rd parties to provide Linux system-level software in a safe and convenient way, well done to the systemd team for a well executed concept. All it lacks is some polish around the tooling and integration.

Maps Spring Cleaning

Thought it was time to share some news on Maps again.


After the 42.0 release I have been putting down some time to do some spring cleaning in Maps to slim down a little bit on the code.

This would also mean less stuff to care about later on when porting to GTK4.

First we have the “no network” view.

We used to use the network monitor functionality from GIO to determine if the machine has a usable connection able to reach  the public internet to avoid showing incomplete map tile data (where some parts might be cached from earlier, while others are missing).

Unfortunately there has been some issues with some setups not playing well NetworkManager (such as some third-party VPN software). So we have had several bug reports around this over the years.

At one point we even added a CLI option to bypass network checking as a sort of “workaround”. But now we decided to remove this (and along with some widgetry), and just rely on the user having a useful connection. The search and route functionality should still behave well and show error messages they were unable to read from the connections.

 Moreover, we dropped the dependency on GOA (gnome-online-accounts), and the remaining support for user check-in using Foursquare, as this has been pretty flaky (possibly due to quota issues, or something like that. Facebook support has been removed already since a few releases (and prior to that logging in to Facebook using GOA hasn't been working for many years). 

Next thing is the process for signing up with an OpenStreetMap account for editing points-of-interest.

Previously we had an implementation (which by the way was my first large contribution to Maps back in 2015) which was based on a literal translation for Java to JS of the “signpost” library used by JOSM which basically “programmatically runs” the HTML forms for requesting access to an OAuth access token when signing in with the supplied user name and password. It then presents the verification code using a WebKit web view.

This has a few problems: It involves handling passwords inside the stand-alone application, which goes against best practices for OAuth. Furthermore it implies a dependency on WebKitGTK (the GTK WebKit wrapper), which is yet another dependency that needs porting to GTK4.

So now with the new implementation we “divert” off to the user's default browser presenting them with logging in (or, if they're already logged in to OSM in the browser session (cookies) they will directly be prompted to grant access to the “GNOME Maps” application without giving credentials again. This implementation is pretty similar to how Cawbird handles signing in to Twitter.


There is also some new visual updates.

The search results now has icons for additional place types, such as fast food joints, theaters, dog parks, drink shops. I did some scouting around in the GNOME icon library 😄





Also, as a continuation of one of the last features added for 42.0, where Maps remembers if the scale stock was shown or hidden last you ran Maps when starting again. I realized the feature hiding the scale is a bit hidden (you need to look for it in the keyboard shorts menu), and also it is not possible to access on a touch-only device (such as on the Pine phone). So I added a checkbox for it under the layers menu (where I think it fits in).


 And that's about it for this time.

Next time, I think there will be some more under-the-hood changes.

#43 Foundation News

Update on what happened across the GNOME project in the week from May 06 to May 13.

GNOME Foundation

Thib reports

The GNOME Foundation’s Board of Directors would like to inform the members of the GNOME Foundation that regrettably, Alberto Ruiz submitted his resignation from the Board of Directors for a combination of work and personal reasons.

We are pleased to announce that we have appointed Martín Abente Lahaye to serve on the Board as per the bylaws. Martín is an active community member, the author of Flatseal and Portfolio, and has experience promoting and deploying FOSS in educational and impact settings through his previous work with One Laptop Per Child Paraguay, Sugar Labs and Endless. We believe Martín is a great asset to help us implementing our roadmap and are excited to start working together.

Full details in this Discourse post. Welcome onboard Martín Abente Lahaye !

Thib says

The GNOME Foundation’s Board of Directors has been working on a high-level roadmap to give the Foundation a clear direction. Rob posted the three initiatives that will keep us busy. Full details in the post, but the strategy is to:

  1. Make the newcomers initiative more sustainable, with paid contributors polishing the documentation and welcoming people on our platform
  2. Make GNOME an appealing platform to develop for, by making Flathub support payments
  3. Make GNOME a local-first platform to empower individuals by reducing their dependency on the cloud and more generally the Internet

In short: funnel more people to contribute to the GNOME project, make them able to earn money from the skills they developed, and allow them to create apps that have a strong impact on the world.

Thib reports

The GNOME Foundation’s Board of Directors has proposed an amendment to the bylaws of the Foundation to allow a limited number of non-members to run for the Board of Directors if they are backed by enough Foundation members.

The rationale behind this amendment is that our roadmap is about getting more people from outside our community to understand and support the GNOME Project. The initiatives we work on are ambitious and need to be funded properly to have the full impact we want them to have. Fundraising is not something most of the current board and Foundation members are very experienced with.

Given the initial feedback and concerns some members have raised, the Board has decided to submit this amendment to full vote. If you are a member of the Foundation, you should have received instructions to vote. If you haven’t, please reach out to Andrea Veri.

In short: we want to open the Board of Directors to people with different skills than the one we have today, to maximise our chances to make GNOME useful for more people, with proper safeguards so it’s still representative of the Foundation members.

Core Apps and Libraries

Libadwaita

Building blocks for modern GNOME apps using GTK4.

Alexander Mikhaylenko says

Libadwaita has gained AdwPropertyAnimationTarget for animating object properties in addition to AdwCallbackAnimationTarget:

AdwAnimationTarget *target = adw_property_animation_target_new (G_OBJECT (widget), "opacity");
g_autoptr (AdwAnimation) animation = adw_timed_animation_new (widget, 0, 1, 250, target);

adw_animation_play (animation);

GTK

Cross-platform widget toolkit for creating graphical user interfaces.

Ivan Molodetskikh reports

GTK 4 has seen a few performance improvements to ListView and ColumnView scrolling. Additionally, a fix for unstable FPS after opening a popup menu has been finally merged. Both of these are already available on the gnome-nightly Flatpak runtime and should be included in GTK 4.6.4.

Software

Lets you install and update applications and system extensions.

Philip Withnall announces

Basic webapps support has just landed in gnome-software, the culmination of a project by Phaedrus Leeds

Third Party Projects

flxzt says

Last week I released Rnote v0.5 . It now features an additional eraser mode, new shape types such as ellipses from foci and bezier curves. Additionally, the undo / redo system was overhauled, handwriting latency reduced and you can now enable automatic saving. There were also some UI tweaks, stability improvements and a lot of internal code refactoring

Fina announces

I released the first version of Warp - A one-to-one file transfer application. Warp allows you to securely send files to each other via the Internet or local network by exchanging a word-based code.

To try it out visit https://flathub.org/apps/details/app.drey.Warp

Workbench

A sandbox to learn and prototype with GNOME technologies.

sonnyp reports

Blueprint support has landed in Workbench. This new markup language from James Westman not only provides a much nicer experience for handwriting GTK, but also comes with a language server which will allow us to present educational hints. GNOME Builder already support Blueprint language server – definitely worth checking out.

sonnyp announces

lwildberg made great progress on Vala support in Workbench. The sub-process can now compile, run, preview and update ! We are making the architecture easy to use for additional language support. Stay tuned.

That’s all for this week!

See you next week, and be sure to stop by #thisweek:gnome.org with updates on your own projects!

May 11, 2022

Why the open source driver release from NVIDIA is so important for Linux?

Background
Today NVIDIA announced that they are releasing an open source kernel driver for their GPUs, so I want to share with you some background information and how I think this will impact Linux graphics and compute going forward.

One thing many people are not aware of is that Red Hat is the only Linux OS company who has a strong presence in the Linux compute and graphics engineering space. There are of course a lot of other people working in the space too, like engineers working for Intel, AMD and NVIDIA or people working for consultancy companies like Collabora or individual community members, but Red Hat as an OS integration company has been very active on trying to ensure we have a maintainable and shared upstream open source stack. This engineering presence is also what has allowed us to move important technologies forward, like getting hiDPI support for Linux some years ago, or working with NVIDIA to get glvnd implemented to remove a pain point for our users since the original OpenGL design only allowed for one OpenGl implementation to be installed at a time. We see ourselves as the open source community’s partner here, fighting to keep the linux graphics stack coherent and maintainable and as a partner for the hardware OEMs to work with when they need help pushing major new initiatives around GPUs for Linux forward. And as the only linux vendor with a significant engineering footprint in GPUs we have been working closely with NVIDIA. People like Kevin Martin, the manager for our GPU technologies team, Ben Skeggs the maintainer of Nouveau and Dave Airlie, the upstream kernel maintainer for the graphics subsystem, Nouveau contributor Karol Herbst and our accelerator lead Tom Rix have all taken part in meetings, code reviews and discussions with NVIDIA. So let me talk a little about what this release means (and also what it doesn’t mean) and what we hope to see come out of this long term.

First of all, what is in this new driver?
What has been released is an out of tree source code kernel driver which has been tested to support CUDA usecases on datacenter GPUs. There is code in there to support display, but it is not complete or fully tested yet. Also this is only the kernel part, a big part of a modern graphics driver are to be found in the firmware and userspace components and those are still closed source. But it does mean we have a NVIDIA kernel driver now that will start being able to consume the GPL-only APIs in the linux kernel, although this initial release doesn’t consume any APIs the old driver wasn’t already using. The driver also only supports NVIDIA Turing chip GPUs and newer, which means it is not targeting GPUs from before 2018. So for the average Linux desktop user, while this is a great first step and hopefully a sign of what is to come, it is not something you are going to start using tomorrow.

What does it mean for the NVidia binary driver?
Not too much immediately. This binary kernel driver will continue to be needed for older pre-Turing NVIDIA GPUs and until the open source kernel module is full tested and extended for display usecases you are likely to continue using it for your system even if you are on Turing or newer. Also as mentioned above regarding firmware and userspace bits and the binary driver is going to continue to be around even once the open source kernel driver is fully capable.

What does it mean for Nouveau?
Let me start with the obvious, this is actually great news for the Nouveau community and the Nouveau driver and NVIDIA has done a great favour to the open source graphics community with this release. And for those unfamiliar with Nouveau, Nouveau is the in-kernel graphics driver for NVIDIA GPUs today which was originally developed as a reverse engineered driver, but which over recent years actually have had active support from NVIDIA. It is fully functional, but is severely hampered by not having had the ability to for instance re-clock the NVIDIA card, meaning that it can’t give you full performance like the binary driver can. This was something we were working with NVIDIA trying to remedy, but this new release provides us with a better path forward. So what does this new driver mean for Nouveau? Less initially, but a lot in the long run. To give a little background first. The linux kernel does not allow multiple drivers for the same hardware, so in order for a new NVIDIA kernel driver to go in the current one will have to go out or at least be limited to a different set of hardware. The current one is Nouveau. And just like the binary driver a big chunk of Nouveau is not in the kernel, but are the userspace pieces found in Mesa and the Nouveau specific firmware that NVIDIA currently kindly makes available. So regardless of the long term effort to create a new open source in-tree kernel driver based on this new open source driver for NVIDIA hardware, Nouveau will very likely be staying around to support pre-turing hardware just like the NVIDIA binary kernel driver will.

The plan we are working towards from our side, but which is likely to take a few years to come to full fruition, is to come up with a way for the NVIDIA binary driver and Mesa to share a kernel driver. The details of how we will do that is something we are still working on and discussing with our friends at NVIDIA to address both the needs of the NVIDIA userspace and the needs of the Mesa userspace. Along with that evolution we hope to work with NVIDIA engineers to refactor the userspace bits of Mesa that are now targeting just Nouveau to be able to interact with this new kernel driver and also work so that the binary driver and Nouveau can share the same firmware. This has clear advantages for both the open source community and the NVIDIA. For the open source community it means that we will now have a kernel driver and firmware that allows things like changing the clocking of the GPU to provide the kind of performance people expect from the NVIDIA graphics card and it means that we will have an open source driver that will have access to the firmware and kernel updates from day one for new generations of NVIDIA hardware. For the ‘binary’ driver, and I put that in ” signs because it will now be less binary :), it means as stated above that it can start taking advantage of the GPL-only APIs in the kernel, distros can ship it and enable secure boot, and it gets an open source consumer of its kernel driver allowing it to go upstream.
If this new shared kernel driver will be known as Nouveau or something completely different is still an open question, and of course it happening at all depends on if we and the rest of the open source community and NVIDIA are able to find a path together to make it happen, but so far everyone seems to be of good will.

What does this release mean for linux distributions like Fedora and RHEL?

Over time it provides a pathway to radically simplify supporting NVIDIA hardware due to the opportunities discussed elsewhere in this document. Long term we will hope be able to get a better user experience with NVIDIA hardware in terms out of box functionality. Which means day 1 support for new chipsets, a high performance open source Mesa driver for NVIDIA and it will allow us to sign the NVIDIA driver alongside the rest of the kernel to enable things like secureboot support. Since this first release is targeting compute one can expect that these options will first be available for compute users and then graphics at a later time.

What are the next steps
Well there is a lot of work to do here. NVIDIA need to continue the effort to make this new driver feature complete for both Compute and Graphics Display usecases, we’d like to work together to come up with a plan for what the future unified kernel driver can look like and a model around it that works for both the community and NVIDIA, we need to add things like a Mesa Vulkan driver. We at Red Hat will be playing an active part in this work as the only Linux vendor with the capacity to do so and we will also work to ensure that the wider open source community has a chance to participate fully like we do for all open source efforts we are part of.

If you want to hear more about this I did talk with Chris Fisher and Linux Action News about this topic. Note: I did state some timelines in that interview which I didn’t make clear was my guesstimates and not in any form official NVIDIA timelines, so apologize for the confusion.

May 09, 2022

Evolving a strategy for 2022 and beyond

As a board, we have been working on several initiatives to make the Foundation a better asset for the GNOME Project. We’re working on a number of threads in parallel, so I wanted to explain the “big picture” a bit more to try and connect together things like the new ED search and the bylaw changes.

We’re all here to see free and open source software succeed and thrive, so that people can be be truly empowered with agency over their technology, rather than being passive consumers. We want to bring GNOME to as many people as possible so that they have computing devices that they can inspect, trust, share and learn from.

In previous years we’ve tried to boost the relevance of GNOME (or technologies such as GTK) or solicit donations from businesses and individuals with existing engagement in FOSS ideology and technology. The problem with this approach is that we’re mostly addressing people and organisations who are already supporting or contributing FOSS in some way. To truly scale our impact, we need to look to the outside world, build better awareness of GNOME outside of our current user base, and find opportunities to secure funding to invest back into the GNOME project.

The Foundation supports the GNOME project with infrastructure, arranging conferences, sponsoring hackfests and travel, design work, legal support, managing sponsorships, advisory board, being the fiscal sponsor of GNOME, GTK, Flathub… and we will keep doing all of these things. What we’re talking about here are additional ways for the Foundation to support the GNOME project – we want to go beyond these activities, and invest into GNOME to grow its adoption amongst people who need it. This has a cost, and that means in parallel with these initiatives, we need to find partners to fund this work.

Neil has previously talked about themes such as education, advocacy, privacy, but we’ve not previously translated these into clear specific initiatives that we would establish in addition to the Foundation’s existing work. This is all a work in progress and we welcome any feedback from the community about refining these ideas, but here are the current strategic initiatives the board is working on. We’ve been thinking about growing our community by encouraging and retaining diverse contributors, and addressing evolving computing needs which aren’t currently well served on the desktop.

Initiative 1. Welcoming newcomers. The community is already spending a lot of time welcoming newcomers and teaching them the best practices. Those activities are as time consuming as they are important, but currently a handful of individuals are running initiatives such as GSoC, Outreachy and outreach to Universities. These activities help bring diverse individuals and perspectives into the community, and helps them develop skills and experience of collaborating to create Open Source projects. We want to make those efforts more sustainable by finding sponsors for these activities. With funding, we can hire people to dedicate their time to operating these programs, including paid mentors and creating materials to support newcomers in future, such as developer documentation, examples and tutorials. This is the initiative that needs to be refined the most before we can turn it into something real.

Initiative 2: Diverse and sustainable Linux app ecosystem. I spoke at the Linux App Summit about the work that GNOME and Endless has been supporting in Flathub, but this is an example of something which has a great overlap between commercial, technical and mission-based advantages. The key goal here is to improve the financial sustainability of participating in our community, which in turn has an impact on the diversity of who we can expect to afford to enter and remain in our community. We believe the existence of this is critically important for individual developers and contributors to unlock earning potential from our ecosystem, through donations or app sales. In turn, a healthy app ecosystem also improves the usefulness of the Linux desktop as a whole for potential users. We believe that we can build a case for commercial vendors in the space to join an advisory board alongside with GNOME, KDE, etc to input into the governance and contribute to the costs of growing Flathub.

Initiative 3: Local-first applications for the GNOME desktop. This is what Thib has been starting to discuss on Discourse, in this thread. There are many different threats to free access to computing and information in today’s world. The GNOME desktop and apps need to give users convenient and reliable access to technology which works similarly to the tools they already use everyday, but keeps them and their data safe from surveillance, censorship, filtering or just being completely cut off from the Internet. We believe that we can seek both philanthropic and grant funding for this work. It will make GNOME a more appealing and comprehensive offering for the many people who want to protect their privacy.

The idea is that these initiatives all sit on the boundary between the GNOME community and the outside world. If the Foundation can grow and deliver these kinds of projects, we are reaching to new people, new contributors and new funding. These contributions and investments back into GNOME represent a true “win-win” for the newcomers and our existing community.

(Originally posted to GNOME Discourse, please feel free to join the discussion there.)

May 07, 2022

Builder GTK 4 Porting, Part III

Another week of porting Builder which ultimately sent me on a few fun tangents. I especially enjoyed the work on Template-GLib which brought me back to my days working on languages and runtimes.

GTK

  • Prototype and submit a merge request to add support for setting an action parent on a widget. This allows you to alter the normal GtkActionMuxer action resolution. Very handy for situations like Builder and other document-oriented applications.

GtkSourceView

  • GtkSourceView updates to make it possible to write an external “snippet editor” application (Günther is working on one already)
  • Improve Solarized style schemes a bit for better IDE integration by specifying colors for diff as they can be used by git-based gutter renderers.
  • A little more performance work on gutter renderers as they have always been a major source of runtime costs.
  • Allow gutter renderers to overdraw atop the textview so they can do some more fancy things.

VTE

  • Figured out why I was getting spinning CPU with the VTE port for GTK 4. Submitted a diagnosis/fix upstream.

Builder

  • Prototyped and landed a new shortcut manager with uses a keybindings.json-like file similar to VS Code (albeit with slightly different syntax which makes more sense for GTK applications). Plugins will be able to override and extend these, as will the user.
  • Prototyped and landed support for showing “selection area” within the gutter renderer.
  • Compress information in a bit tighter space for gutter, as we have a lot of information to display there. Still more we can do should anyone have free time to work on this.
  • Improve breakpoints drawing now that we have some overlapping selections to worry about.
  • Lots of menu cleanup across plugins.
  • More porting from our old IdeSourceView into a new implementation.
  • Lots of object-lifecycle fixes now that Builder more aggressively shuts down components.

Keybindings look something like this.

I would like to specify how wonderful the new shortcut components are in GTK 4, particularly if you’re writing complex applications that have to manage layered shortcuts, user overrides, and such. I’m thinking applications in the class of Inkscape, GIMP, Builder, Darktable, and such will really benefit from this someday.

A screenshot of Builders updated line selections

Template-GLib

One of the things that Builder needed was the ability to express when a shortcut is active. In VS Code, they have a "when" element which works well enough for this. However, it has some basic expressions that need to be evaluated at runtime.

Builder already has an expression engine which is used for project templates and what not and it even supports GObject Introspection. It was in sore need of some updates but is very capable for the problem here. I have the tendency to only implement enough to solve problems so this was a nice return to finishing some of that up.

One of the nice things here, when the use case is so focused, is that there is no need for a runtime, JIT, etc. It’s just an AST that has a tmpl_expr_eval() API where you pass in scope and all the objects continue to live in C land, nothing special to deal with.

  • Add anonymous functions (no lambdas though).
  • Allow assigning functions to a symbol.
  • Improve function calling under various situations (named, anonymous, and GObject Introspection functions).
  • Make constructors work.
  • Add a bunch of builtins for things like asserts, casts, math.
  • Fix typeof() builtin for a number of GObject Introspection cases.
  • Add unit tests!
  • Add null keyword.
  • You can now do things like typeof(widget).is_a(typeof(Ide.Page))
  • The require Gtk version "4.0" style imports made it so you couldn’t call .require() due to the Bison parser. That is fixed and you can use @keyword for reserved keywords.
  • A nop has been added with the keyword pass.
  • Fix a bunch of valgrind discovered things.
  • Add linewise comments using the # python style.

I don’t know how much time I’ll spend on this in the future, or if it will go another few years without much work. But with a maintainer who had time, it could be a nice little glue language without all the “muchness” that more well known languages tend to bring in.

May 06, 2022

#42 Numerous Emojis

Update on what happened across the GNOME project in the week from April 29 to May 06.

Core Apps and Libraries

Characters

A simple utility application to find and insert unusual characters.

Alexander Mikhaylenko says

Characters now supports composite emoji such as people with skin tone and gender modifiers, or country flags. It also displays emoji in the proper order instead of sorted by their codepoints.

Circle Apps and Libraries

Apostrophe

A distraction free Markdown editor.

Manu announces

I’ve released a new version of Apostrophe with small bugfixes and improvements and updated translations. I’m also working on a GTK4 port, so I don’t plan on releasing more features until that is complete.

Third Party Projects

ranfdev announces

I’m releasing Geopard, a simple, colorful gemini client. Browse the space and reach hundreds of gemini capsules! It let’s you read stories, download files, play games… https://flathub.org/apps/details/com.ranfdev.Geopard

Maximiliano announces

Introducing Citations, your new manager for BibTeX references, this is a (for now) very small app to manage your bibliography and get easy to copy cites for LaTeX and other formats. The app is in heavy development and Iin the next weeks I will improve the performance and add more features before its stable release.

You can get Citations at flathub

Peter Eisenmann reports

Announcing OS-Installer: A generic OS-Installer that can be customized by distributions to install an OS. Translations welcome!

Workbench

A sandbox to learn and prototype with GNOME technologies.

sonnyp says

During Linux App Summit Julian Sparber prototyped with docked GTK Inspector in Workbench. We exchanged lots of interesting ideas around developer tooling and closing the gap with Web inspectors.

sonnyp reports

A new version of Workbench is out - featuring

A brand new Library of examples - including

  • WebSocket client
  • Toasts
  • Application Window
  • Desktop notifications

Help needed and contributions welcome for new examples and demos.

Platform tools Adwaita Demo, GTK Demo, GTK Widget Factory and GTK Icon Browser for the current platform version are now available in the main menu.

But also

  • The Console can be collapsed by resizing it
  • Prevent system.exit from quitting Workbench
  • Allow calling GObject.registerClass multiple times
  • Prevent crash when using non GtkBuildable objects
  • Allow using DBus and Gio.Application
  • Allow using network
  • Enable GtkWindow objects preview
  • Design improvements

Get it on Flathub

Miscellaneous

sonnyp says

Last weekend, many of us were at the Linux App Summit. It was fantastic! Thank you KDE, GNOME, organizers, volunteers, sponsors and everyone involved. Make sure to check out the talks, here are some of my favorites:

Sophie Herold announces

I have registered the domain drey.app to make it publicly available for app ids. If you want to use app.drey.<YourApp> please register it first.

The idea behind proving this domain is that gnome.org is reserved for core apps and using app ids derived from repository hosting can be long, short-lived, and tied to one author. The name app.drey derives from the term for a squirrel’s nest.

That’s all for this week!

See you next week, and be sure to stop by #thisweek:gnome.org with updates on your own projects!

May 04, 2022

GNOME Foundation Board Elections 2022

My involvement with GNOME started in my teens and has continued over the years where it influenced my studies, career, and even the place I chose to live. One of my desires in my journey has been to help the GNOME project achieve its goals and fulfill its vision of building an open source desktop environment that is accessible and easy to use to a general audience. Sitting on the Board has enabled me to contribute to these efforts more directly and has also taught me plenty about community governance and nonprofit sustainability.

My Board term is ending now and will not run for reelection for a few reasons: firstly, I believe that a rotation of board members can help increase community engagement and transparency. The current model our Board has of renewing parts of its members every year IMO does a great job at ensuring continuity of board programs while allowing for new voices and perspectives to onboard and maximize the impact.

Another reason why I will not be running for reelection is that I am convinced I can be more beneficial to the GNOME project by contributing to more operational tasks and running some of our programs, instead of the position of governance and oversight expected of the Board members. I would like for my seat on the Board to be filled by someone with skills and enthusiasm for reaching out to broader audiences beyond GNOME, someone capable of bridging our plans and vision with opportunities that can bring funding, diversity, and sustainability to the Foundation.

I am not going anywhere. You will still see me around the chat channels, forums, and conferences. I want to focus on improving our Newcomers onboarding experience as well as increase our conversion rate of Outreachy/GSoC interns that become long-term contributors. This also involves helping application developers monetize their work and making sure volunteers are given employment opportunities that allow them to continue working on open source software. I also want to refocus on my coding contributions, while learning new things and keeping up with modern technologies.

All in all, I am looking forward to meeting my fellow GNOME friends in GUADEC this year after such a long time with no travel. o/

May 03, 2022

Linux App Summit 2022

Engineers at Codethink get some time and money each year to attend conferences, and part of the deal is we have to write a report afterwards. Having written the report I thought… this could be a bit more widely shared! So, excuse the slightly formal tone of this report, but here are some thoughts on LAS 2022.

Talk highlights

1. Energy Conservation and Energy Efficiency with Free Software – Joseph De Veaugh-Geiss

Video, Abstract

If you follow climate science you’ll know we have some major problems coming. Solving these will take big societal changes towards degrowth, which was not the topic of this talk at all. This talk was rather about what Free Software developers might do to help out in the wider context of reducing energy use.

I recommend watching in full as it was well delivered and interesting. Here are some of the points I noted:

  • Aviation is estimated to cause 2.5% of CO² emissions in 2017, and “ICT” between 1.5-2.8%.
    • Speaker didn’t know if that number includes Bitcoin or other “proof-of-work” things
    • This number does include Bitcoin. (Thanks to Joseph for clarifying 🙂
    • Two things that make the number high: short-lifespan electronic devices, video streaming/conferencing
  • Software projects could be more transparent about how much energy they use.
    • Imagine 3 word processors which each provide a Github badge showing energy consumption for specific use cases, so you can choose the one which uses least power
  • KDE’s PDF reader (Okular) recently achieved the Blauer Angel eco-certification
  • There is an ongoing initiative in KDE to measure and improve energy consumption of software – a “measurement party” in Berlin is the next step.
    • Their method of measuring power consumption requires a desktop PC and a special plug that can measure power at reasonably frequency.
  • Free software already has a good story here (as we are often the ones keeping old hardware alive after manufacturers drop support), and there’s an open letter asking the EU to legislate such that manufacturers have to give this option.

2. Flathub – now and next

Video, Abstract
There is big growth in number of users. The talk had various graphs including a download chart of something like 10PB of data downloaded.

The goal of Flathub is not to package other people’s apps, rather to reduce barriers to Linux app developers. One remaining barrier is money – as there’s no simple way to finance development of a Linux app at present. So together with Codethink, GNOME and Flathub are working on a way that app users can make donations to app authors.

Notes

There was a larger presence from Canonical than I am used to, and it seems they are once again growing their desktop team, and bringing back the face-to-face Ubuntu Developer Summit (rebranded Ubuntu Summit). All good news.

COVID safety:

  • all participants expect speaker were masked during talks
  • the seats were spaced and our “green pass” (vaccine or -ve test) was checked on day 1
  • in the social events the masks all came off, these were mostly in the outside terrace area.

Online participation:

  • maybe 30% of talks were online and these worked well, in fact the Flathub talk had one speaker on stage and one projected behind him as a giant head, and it worked well.
  • the online attendees were effectively invisible in the venue, it would be better if there was a screen projecting the online chat so we could have some interaction with them.

The organisation was top notch, the local team did a great job and the venue was also perfect for this kind of event. Personally I met more folks from KDE than I ever have and felt like a lot of important desktop-related knowledge sharing took place. Definitely recommend the conference for anyone with an interest in this area.

I don’t have any good photos to share, but check https://twitter.com/hashtag/las2022 for the latest. Hope to see you there next year!

Scaling the Foundation to Contribute to GNOME

This article follows one published by a former director, Allan Day, who detailed the evolution of the Board of Directors. The article you’re reading goes further on what I believe is needed to help us scale the Foundation to become an active contributor to the GNOME Project, beyond its traditional support activities.

Some Misconceptions

The Foundation is not the GNOME Project’s vendor, and it’s not the GNOME Project’s shepherd either. For more about the balance of powers within the Project, see Tobias Bernard’s blog series.

Board members aren’t here to help from an operational perspective (e.g. gathering the release notes, doing moderation in our various channels, doing project management, issue triaging…). Becoming a board membwr is also not the logical next step for long term contributors. Board members who have the best chances to fullfill their mission are those with a specific skill set. It’s usually quite different from what it takes to be a good hacker.

Being a board member is more than simply attending meetings. It’s about taking an active role in shaping the GNOME Foundation’s strategy. Developing new ideas can be a substantial time commitment – far more than approving or vetoing decisions someone else came up with.

What the GNOME Foundation Is About

The GNOME Foundation is about giving the GNOME Project and community the means to exist and work in the same general direction. There are technical means, such as a place for the code (GitLab), a place for discussion (Discourse, Matrix/IRC) or even outreach materials (such as YouTube release videos). There are legal means: the GNOME Foundation is responsible of the GNOME name and logo.

Finally the GNOME Foundation is about contributing to the GNOME Project (and should increase its contributions!) to make it an appealing platform to use and build for. In short: it’s about giving GNOME a purpose to expand beyond its user base, sell the idea to donors, and eventually hire people to make it happen.

A Good Director

As said earlier, a good director is not necessarily a good hacker. It’s a person who is able to help with organisational challenges rather than operational ones. In practice, a good director is not only able to understand how the GNOME Project works, but also how it can have a positive social impact, what changes are required in practice, and who would be willing to support such changes.

Purpose & Strategy

A good director knows what the GNOME Project is and what are the forces at play within the project (e.g. what is the role of the release team, what is the weight of the design team when it comes to adding new features to GNOME, what release cycle the project is following, etc). Knowing where the GNOME Project excels, they should also be aware where the project is lagging behind.

Based on this understanding of the project, the directors need to create a strategy that:

  • Takes advantage of the strengths of the project;
  • Answers a need and (additionnally, may) have a positive social impact;
  • Has a clear target population it solves problem for;
  • Has a clear target population willing to fund it;
  • Is realistic, and can be turned into actionnable and budgeted items

Additionnaly a good member of the board is capable of review the strategic initiatives and assess fairly if they have been successful or not. In practice this means each strategic initiative should come with a set of indicators to measure progress and success.

Finances & Sustainability

The Foundation is a not for profit but still operates under a budget. When possible, directors need to help connecting the Foundation to potential donors. A good director needs to take the finances into consideration when drafting a strategy.

Outreach

Directors need to help with the Outreach of the project. Outreach plays an important role in understanding the impact of software on society, how GNOME can make it more positive, who needs the most help, who would be interested in improving GNOME, and finally who has a budget for it.

A well-connected director who is able to introduce the Foundation to potential donor organisations would significantly help the Foundation scale to the point where it can go beyond supporting the GNOME Project and become an active contributor to it.

Key Takeaways

  • Being a member of the Board of Directors takes time and commitment
  • It is not the logical next step for having been contributing for years. This is a different role, with a specific skill set requirement
  • Strategic thinking and planning are difficult. Materialising the strategy is even harder!
  • The better connected a director is to those who could fund initiatives, the more the Foundation can do, so long as they don’t get carried too far away from the GNOME Project.

May 02, 2022

Fitting Everything Together

TLDR: Hermetic /usr/ is awesome; let's popularize image-based OSes with modernized security properties built around immutability, SecureBoot, TPM2, adaptability, auto-updating, factory reset, uniformity – built from traditional distribution packages, but deployed via images.

Over the past years, systemd gained a number of components for building Linux-based operating systems. While these components individually have been adopted by many distributions and products for specific purposes, we did not publicly communicate a broader vision of how they should all fit together in the long run. In this blog story I hope to provide that from my personal perspective, i.e. explain how I personally would build an OS and where I personally think OS development with Linux should go.

I figure this is going to be a longer blog story, but I hope it will be equally enlightening. Please understand though that everything I write about OS design here is my personal opinion, and not one of my employer.

For the last 12 years or so I have been working on Linux OS development, mostly around systemd. In all those years I had a lot of time thinking about the Linux platform, and specifically traditional Linux distributions and their strengths and weaknesses. I have seen many attempts to reinvent Linux distributions in one way or another, to varying success. After all this most would probably agree that the traditional RPM or dpkg/apt-based distributions still define the Linux platform more than others (for 25+ years now), even though some Linux-based OSes (Android, ChromeOS) probably outnumber the installations overall.

And over all those 12 years I kept wondering, how would I actually build an OS for a system or for an appliance, and what are the components necessary to achieve that. And most importantly, how can we make these components generic enough so that they are useful in generic/traditional distributions too, and in other use cases than my own.

The Project

Before figuring out how I would build an OS it's probably good to figure out what type of OS I actually want to build, what purpose I intend to cover. I think a desktop OS is probably the most interesting. Why is that? Well, first of all, I use one of these for my job every single day, so I care immediately, it's my primary tool of work. But more importantly: I think building a desktop OS is one of the most complex overall OS projects you can work on, simply because desktops are so much more versatile and variable than servers or embedded devices. If one figures out the desktop case, I think there's a lot more to learn from, and reuse in the server or embedded case, then going the other way. After all, there's a reason why so much of the widely accepted Linux userspace stack comes from people with a desktop background (including systemd, BTW).

So, let's see how I would build a desktop OS. If you press me hard, and ask me why I would do that given that ChromeOS already exists and more or less is a Linux desktop OS: there's plenty I am missing in ChromeOS, but most importantly, I am lot more interested in building something people can easily and naturally rebuild and hack on, i.e. Google-style over-the-wall open source with its skewed power dynamic is not particularly attractive to me. I much prefer building this within the framework of a proper open source community, out in the open, and basing all this strongly on the status quo ante, i.e. the existing distributions. I think it is crucial to provide a clear avenue to build a modern OS based on the existing distribution model, if there shall ever be a chance to make this interesting for a larger audience.

(Let me underline though: even though I am going to focus on a desktop here, most of this is directly relevant for servers as well, in particular container host OSes and suchlike, or embedded devices, e.g. car IVI systems and so on.)

Design Goals

  1. First and foremost, I think the focus must be on an image-based design rather than a package-based one. For robustness and security it is essential to operate with reproducible, immutable images that describe the OS or large parts of it in full, rather than operating always with fine-grained RPM/dpkg style packages. That's not to say that packages are not relevant (I actually think they matter a lot!), but I think they should be less of a tool for deploying code but more one of building the objects to deploy. A different way to see this: any OS built like this must be easy to replicate in a large number of instances, with minimal variability. Regardless if we talk about desktops, servers or embedded devices: focus for my OS should be on "cattle", not "pets", i.e that from the start it's trivial to reuse the well-tested, cryptographically signed combination of software over a large set of devices the same way, with a maximum of bit-exact reuse and a minimum of local variances.

  2. The trust chain matters, from the boot loader all the way to the apps. This means all code that is run must be cryptographically validated before it is run. All storage must be cryptographically protected: public data must be integrity checked; private data must remain confidential.

    This is in fact where big distributions currently fail pretty badly. I would go as far as saying that SecureBoot on Linux distributions is mostly security theater at this point, if you so will. That's because the initrd that unlocks your FDE (i.e. the cryptographic concept that protects the rest of your system) is not signed or protected in any way. It's trivial to modify for an attacker with access to your hard disk in an undetectable way, and collect your FDE passphrase. The involved bureaucracy around the implementation of UEFI SecureBoot of the big distributions is to a large degree pointless if you ask me, given that once the kernel is assumed to be in a good state, as the next step the system invokes completely unsafe code with full privileges.

    This is a fault of current Linux distributions though, not of SecureBoot in general. Other OSes use this functionality in more useful ways, and we should correct that too.

  3. Pretty much the same thing: offline security matters. I want my data to be reasonably safe at rest, i.e. cryptographically inaccessible even when I leave my laptop in my hotel room, suspended.

  4. Everything should be cryptographically measured, so that remote attestation is supported for as much software shipped on the OS as possible.

  5. Everything should be self descriptive, have single sources of truths that are closely attached to the object itself, instead of stored externally.

  6. Everything should be self-updating. Today we know that software is never bug-free, and thus requires a continuous update cycle. Not only the OS itself, but also any extensions, services and apps running on it.

  7. Everything should be robust in respect to aborted OS operations, power loss and so on. It should be robust towards hosed OS updates (regardless if the download process failed, or the image was buggy), and not require user interaction to recover from them.

  8. There must always be a way to put the system back into a well-defined, guaranteed safe state ("factory reset"). This includes that all sensitive data from earlier uses becomes cryptographically inaccessible.

  9. The OS should enforce clear separation between vendor resources, system resources and user resources: conceptually and when it comes to cryptographical protection.

  10. Things should be adaptive: the system should come up and make the best of the system it runs on, adapt to the storage and hardware. Moreover, the system should support execution on bare metal equally well as execution in a VM environment and in a container environment (i.e. systemd-nspawn).

  11. Things should not require explicit installation. i.e. every image should be a live image. For installation it should be sufficient to dd an OS image onto disk. Thus, strong focus on "instantiate on first boot", rather than "instantiate before first boot".

  12. Things should be reasonably minimal. The image the system starts its life with should be quick to download, and not include resources that can as well be created locally later.

  13. System identity, local cryptographic keys and so on should be generated locally, not be pre-provisioned, so that there's no leak of sensitive data during the transport onto the system possible.

  14. Things should be reasonably democratic and hackable. It should be easy to fork an OS, to modify an OS and still get reasonable cryptographic protection. Modifying your OS should not necessarily imply that your "warranty is voided" and you lose all good properties of the OS, if you so will.

  15. Things should be reasonably modular. The privileged part of the core OS must be extensible, including on the individual system. It's not sufficient to support extensibility just through high-level UI applications.

  16. Things should be reasonably uniform, i.e. ideally the same formats and cryptographic properties are used for all components of the system, regardless if for the host OS itself or the payloads it receives and runs.

  17. Even taking all these goals into consideration, it should still be close to traditional Linux distributions, and take advantage of what they are really good at: integration and security update cycles.

Now that we know our goals and requirements, let's start designing the OS along these lines.

Hermetic /usr/

First of all the OS resources (code, data files, …) should be hermetic in an immutable /usr/. This means that a /usr/ tree should carry everything needed to set up the minimal set of directories and files outside of /usr/ to make the system work. This /usr/ tree can then be mounted read-only into the writable root file system that then will eventually carry the local configuration, state and user data in /etc/, /var/ and /home/ as usual.

Thankfully, modern distributions are surprisingly close to working without issues in such a hermetic context. Specifically, Fedora works mostly just fine: it has adopted the /usr/ merge and the declarative systemd-sysusers and systemd-tmpfiles components quite comprehensively, which means the directory trees outside of /usr/ are automatically generated as needed if missing. In particular /etc/passwd and /etc/group (and related files) are appropriately populated, should they be missing entries.

In my model a hermetic OS is hence comprehensively defined within /usr/: combine the /usr/ tree with an empty, otherwise unpopulated root file system, and it will boot up successfully, automatically adding the strictly necessary files, and resources that are necessary to boot up.

Monopolizing vendor OS resources and definitions in an immutable /usr/ opens multiple doors to us:

  • We can apply dm-verity to the whole /usr/ tree, i.e. guarantee structural, cryptographic integrity on the whole vendor OS resources at once, with full file system metadata.

  • We can implement updates to the OS easily: by implementing an A/B update scheme on the /usr/ tree we can update the OS resources atomically and robustly, while leaving the rest of the OS environment untouched.

  • We can implement factory reset easily: erase the root file system and reboot. The hermetic OS in /usr/ has all the information it needs to set up the root file system afresh — exactly like in a new installation.

Initial Look at the Partition Table

So let's have a look at a suitable partition table, taking a hermetic /usr/ into account. Let's conceptually start with a table of four entries:

  1. An UEFI System Partition (required by firmware to boot)

  2. Immutable, Verity-protected, signed file system with the /usr/ tree in version A

  3. Immutable, Verity-protected, signed file system with the /usr/ tree in version B

  4. A writable, encrypted root file system

(This is just for initial illustration here, as we'll see later it's going to be a bit more complex in the end.)

The Discoverable Partitions Specification provides suitable partition types UUIDs for all of the above partitions. Which is great, because it makes the image self-descriptive: simply by looking at the image's GPT table we know what to mount where. This means we do not need a manual /etc/fstab, and a multitude of tools such as systemd-nspawn and similar can operate directly on the disk image and boot it up.

Booting

Now that we have a rough idea how to organize the partition table, let's look a bit at how to boot into that. Specifically, in my model "unified kernels" are the way to go, specifically those implementing Boot Loader Specification Type #2. These are basically kernel images that have an initial RAM disk attached to them, as well as a kernel command line, a boot splash image and possibly more, all wrapped into a single UEFI PE binary. By combining these into one we achieve two goals: they become extremely easy to update (i.e. drop in one file, and you update kernel+initrd) and more importantly, you can sign them as one for the purpose of UEFI SecureBoot.

In my model, each version of such a kernel would be associated with exactly one version of the /usr/ tree: both are always updated at the same time. An update then becomes relatively simple: drop in one new /usr/ file system plus one kernel, and the update is complete.

The boot loader used for all this would be systemd-boot, of course. It's a very simple loader, and implements the aforementioned boot loader specification. This means it requires no explicit configuration or anything: it's entirely sufficient to drop in one such unified kernel file, and it will be picked up, and be made a candidate to boot into.

You might wonder how to configure the root file system to boot from with such a unified kernel that contains the kernel command line and is signed as a whole and thus immutable. The idea here is to use the usrhash= kernel command line option implemented by systemd-veritysetup-generator and systemd-fstab-generator. It does two things: it will search and set up a dm-verity volume for the /usr/ file system, and then mount it. It takes the root hash value of the dm-verity Merkle tree as the parameter. This hash is then also used to find the /usr/ partition in the GPT partition table, under the assumption that the partition UUIDs are derived from it, as per the suggestions in the discoverable partitions specification (see above).

systemd-boot (if not told otherwise) will do a version sort of the kernel image files it finds, and then automatically boot the newest one. Picking a specific kernel to boot will also fixate which version of the /usr/ tree to boot into, because — as mentioned — the Verity root hash of it is built into the kernel command line the unified kernel image contains.

In my model I'd place the kernels directly into the UEFI System Partition (ESP), in order to simplify things. (systemd-boot also supports reading them from a separate boot partition, but let's not complicate things needlessly, at least for now.)

So, with all this, we now already have a boot chain that goes something like this: once the boot loader is run, it will pick the newest kernel, which includes the initial RAM disk and a secure reference to the /usr/ file system to use. This is already great. But a /usr/ alone won't make us happy, we also need a root file system. In my model, that file system would be writable, and the /etc/ and /var/ hierarchies would be located directly on it. Since these trees potentially contain secrets (SSH keys, …) the root file system needs to be encrypted. We'll use LUKS2 for this, of course. In my model, I'd bind this to the TPM2 chip (for compatibility with systems lacking one, we can find a suitable fallback, which then provides weaker guarantees, see below). A TPM2 is a security chip available in most modern PCs. Among other things it contains a persistent secret key that can be used to encrypt data, in a way that only if you possess access to it and can prove you are using validated software you can decrypt it again. The cryptographic measuring I mentioned earlier is what allows this to work. But … let's not get lost too much in the details of TPM2 devices, that'd be material for a novel, and this blog story is going to be way too long already.

What does using a TPM2 bound key for unlocking the root file system get us? We can encrypt the root file system with it, and you can only read or make changes to the root file system if you also possess the TPM2 chip and run our validated version of the OS. This protects us against an evil maid scenario to some level: an attacker cannot just copy the hard disk of your laptop while you leave it in your hotel room, because unless the attacker also steals the TPM2 device it cannot be decrypted. The attacker can also not just modify the root file system, because such changes would be detected on next boot because they aren't done with the right cryptographic key.

So, now we have a system that already can boot up somewhat completely, and run userspace services. All code that is run is verified in some way: the /usr/ file system is Verity protected, and the root hash of it is included in the kernel that is signed via UEFI SecureBoot. And the root file system is locked to the TPM2 where the secret key is only accessible if our signed OS + /usr/ tree is used.

(One brief intermission here: so far all the components I am referencing here exist already, and have been shipped in systemd and other projects already, including the TPM2 based disk encryption. There's one thing missing here however at the moment that still needs to be developed (happy to take PRs!): right now TPM2 based LUKS2 unlocking is bound to PCR hash values. This is hard to work with when implementing updates — what we'd need instead is unlocking by signatures of PCR hashes. TPM2 supports this, but we don't support it yet in our systemd-cryptsetup + systemd-cryptenroll stack.)

One of the goals mentioned above is that cryptographic key material should always be generated locally on first boot, rather than pre-provisioned. This of course has implications for the encryption key of the root file system: if we want to boot into this system we need the root file system to exist, and thus a key already generated that it is encrypted with. But where precisely would we generate it if we have no installer which could generate while installing (as it is done in traditional Linux distribution installers). My proposed solution here is to use systemd-repart, which is a declarative, purely additive repartitioner. It can run from the initrd to create and format partitions on boot, before transitioning into the root file system. It can also format the partitions it creates and encrypt them, automatically enrolling an TPM2-bound key.

So, let's revisit the partition table we mentioned earlier. Here's what in my model we'd actually ship in the initial image:

  1. An UEFI System Partition (ESP)

  2. An immutable, Verity-protected, signed file system with the /usr/ tree in version A

And that's already it. No root file system, no B /usr/ partition, nothing else. Only two partitions are shipped: the ESP with the systemd-boot loader and one unified kernel image, and the A version of the /usr/ partition. Then, on first boot systemd-repart will notice that the root file system doesn't exist yet, and will create it, encrypt it, and format it, and enroll the key into the TPM2. It will also create the second /usr/ partition (B) that we'll need for later A/B updates (which will be created empty for now, until the first update operation actually takes place, see below). Once done the initrd will combine the fresh root file system with the shipped /usr/ tree, and transition into it. Because the OS is hermetic in /usr/ and contains all the systemd-tmpfiles and systemd-sysuser information it can then set up the root file system properly and create any directories and symlinks (and maybe a few files) necessary to operate.

Besides the fact that the root file system's encryption keys are generated on the system we boot from and never leave it, it is also pretty nice that the root file system will be sized dynamically, taking into account the physical size of the backing storage. This is perfect, because on first boot the image will automatically adapt to what it has been dd'ed onto.

Factory Reset

This is a good point to talk about the factory reset logic, i.e. the mechanism to place the system back into a known good state. This is important for two reasons: in our laptop use case, once you want to pass the laptop to someone else, you want to ensure your data is fully and comprehensively erased. Moreover, if you have reason to believe your device was hacked you want to revert the device to a known good state, i.e. ensure that exploits cannot persist. systemd-repart already has a mechanism for it. In the declarations of the partitions the system should have, entries may be marked to be candidates for erasing on factory reset. The actual factory reset is then requested by one of two means: by specifying a specific kernel command line option (which is not too interesting here, given we lock that down via UEFI SecureBoot; but then again, one could also add a second kernel to the ESP that is identical to the first, with only different that it lists this command line option: thus when the user selects this entry it will initiate a factory reset) — and via an EFI variable that can be set and is honoured on the immediately following boot. So here's how a factory reset would then go down: once the factory reset is requested it's enough to reboot. On the subsequent boot systemd-repart runs from the initrd, where it will honour the request and erase the partitions marked for erasing. Once that is complete the system is back in the state we shipped the system in: only the ESP and the /usr/ file system will exist, but the root file system is gone. And from here we can continue as on the original first boot: create a new root file system (and any other partitions), and encrypt/set it up afresh.

So now we have a nice setup, where everything is either signed or encrypted securely. The system can adapt to the system it is booted on automatically on first boot, and can easily be brought back into a well defined state identical to the way it was shipped in.

Modularity

But of course, such a monolithic, immutable system is only useful for very specific purposes. If /usr/ can't be written to, – at least in the traditional sense – one cannot just go and install a new software package that one needs. So here two goals are superficially conflicting: on one hand one wants modularity, i.e. the ability to add components to the system, and on the other immutability, i.e. that precisely this is prohibited.

So let's see what I propose as a middle ground in my model. First, what's the precise use case for such modularity? I see a couple of different ones:

  1. For some cases it is necessary to extend the system itself at the lowest level, so that the components added in extend (or maybe even replace) the resources shipped in the base OS image, so that they live in the same namespace, and are subject to the same security restrictions and privileges. Exposure to the details of the base OS and its interface for this kind of modularity is at the maximum.

    Example: a module that adds a debugger or tracing tools into the system. Or maybe an optional hardware driver module.

  2. In other cases, more isolation is preferable: instead of extending the system resources directly, additional services shall be added in that bring their own files, can live in their own namespace (but with "windows" into the host namespaces), however still are system components, and provide services to other programs, whether local or remote. Exposure to the details of the base OS for this kind of modularity is restricted: it mostly focuses on the ability to consume and provide IPC APIs from/to the system. Components of this type can still be highly privileged, but the level of integration is substantially smaller than for the type explained above.

    Example: a module that adds a specific VPN connection service to the OS.

  3. Finally, there's the actual payload of the OS. This stuff is relatively isolated from the OS and definitely from each other. It mostly consumes OS APIs, and generally doesn't provide OS APIs. This kind of stuff runs with minimal privileges, and in its own namespace of concepts.

    Example: a desktop app, for reading your emails.

Of course, the lines between these three types of modules are blurry, but I think distinguishing them does make sense, as I think different mechanisms are appropriate for each. So here's what I'd propose in my model to use for this.

  1. For the system extension case I think the systemd-sysext images are appropriate. This tool operates on system extension images that are very similar to the host's disk image: they also contain a /usr/ partition, protected by Verity. However, they just include additions to the host image: binaries that extend the host. When such a system extension image is activated, it is merged via an immutable overlayfs mount into the host's /usr/ tree. Thus any file shipped in such a system extension will suddenly appear as if it was part of the host OS itself. For optional components that should be considered part of the OS more or less this is a very simple and powerful way to combine an immutable OS with an immutable extension. Note that most likely extensions for an OS matching this tool should be built at the same time within the same update cycle scheme as the host OS itself. After all, the files included in the extensions will have dependencies on files in the system OS image, and care must be taken that these dependencies remain in order.

  2. For adding in additional somewhat isolated system services in my model, Portable Services are the proposed tool of choice. Portable services are in most ways just like regular system services; they could be included in the system OS image or an extension image. However, portable services use RootImage= to run off separate disk images, thus within their own namespace. Images set up this way have various ways to integrate into the host OS, as they are in most ways regular system services, which just happen to bring their own directory tree. Also, unlike regular system services, for them sandboxing is opt-out rather than opt-in. In my model, here too the disk images are Verity protected and thus immutable. Just like the host OS they are GPT disk images that come with a /usr/ partition and Verity data, along with signing.

  3. Finally, the actual payload of the OS, i.e. the apps. To be useful in real life here it is important to hook into existing ecosystems, so that a large set of apps are available. Given that on Linux flatpak (or on servers OCI containers) are the established format that pretty much won they are probably the way to go. That said, I think both of these mechanisms have relatively weak properties, in particular when it comes to security, since immutability/measurements and similar are not provided. This means, unlike for system extensions and portable services a complete trust chain with attestation and per-app cryptographically protected data is much harder to implement sanely.

What I'd like to underline here is that the main system OS image, as well as the system extension images and the portable service images are put together the same way: they are GPT disk images, with one immutable file system and associated Verity data. The latter two should also contain a PKCS#7 signature for the top-level Verity hash. This uniformity has many benefits: you can use the same tools to build and process these images, but most importantly: by using a single way to validate them throughout the stack (i.e. Verity, in the latter cases with PKCS#7 signatures), validation and measurement is straightforward. In fact it's so obvious that we don't even have to implement it in systemd: the kernel has direct support for this Verity signature checking natively already (IMA).

So, by composing a system at runtime from a host image, extension images and portable service images we have a nicely modular system where every single component is cryptographically validated on every single IO operation, and every component is measured, in its entire combination, directly in the kernel's IMA subsystem.

(Of course, once you add the desktop apps or OCI containers on top, then these properties are lost further down the chain. But well, a lot is already won, if you can close the chain that far down.)

Note that system extensions are not designed to replicate the fine grained packaging logic of RPM/dpkg. Of course, systemd-sysext is a generic tool, so you can use it for whatever you want, but there's a reason it does not bring support for a dependency language: the goal here is not to replicate traditional Linux packaging (we have that already, in RPM/dpkg, and I think they are actually OK for what they do) but to provide delivery of larger, coarser sets of functionality, in lockstep with the underlying OS' life-cycle and in particular with no interdependencies, except on the underlying OS.

Also note that depending on the use case it might make sense to also use system extensions to modularize the initrd step. This is probably less relevant for a desktop OS, but for server systems it might make sense to package up support for specific complex storage in a systemd-sysext system extension, which can be applied to the initrd that is built into the unified kernel. (In fact, we have been working on implementing signed yet modular initrd support to general purpose Fedora this way.)

Note that portable services are composable from system extension too, by the way. This makes them even more useful, as you can share a common runtime between multiple portable service, or even use the host image as common runtime for portable services. In this model a common runtime image is shared between one or more system extensions, and composed at runtime via an overlayfs instance.

More Modularity: Secondary OS Installs

Having an immutable, cryptographically locked down host OS is great I think, and if we have some moderate modularity on top, that's also great. But oftentimes it's useful to be able to depart/compromise for some specific use cases from that, i.e. provide a bridge for example to allow workloads designed around RPM/dpkg package management to coexist reasonably nicely with such an immutable host.

For this purpose in my model I'd propose using systemd-nspawn containers. The containers are focused on OS containerization, i.e. they allow you to run a full OS with init system and everything as payload (unlike for example Docker containers which focus on a single service, and where running a full OS in it is a mess).

Running systemd-nspawn containers for such secondary OS installs has various nice properties. One of course is that systemd-nspawn supports the same level of cryptographic image validation that we rely on for the host itself. Thus, to some level the whole OS trust chain is reasonably recursive if desired: the firmware validates the OS, and the OS can validate a secondary OS installed within it. In fact, we can run our trusted OS recursively on itself and get similar security guarantees! Besides these security aspects, systemd-nspawn also has really nice properties when it comes to integration with the host. For example the --bind-user= permits binding a host user record and their directory into a container as a simple one step operation. This makes it extremely easy to have a single user and $HOME but share it concurrently with the host and a zoo of secondary OSes in systemd-nspawn containers, which each could run different distributions even.

Developer Mode

Superficially, an OS with an immutable /usr/ appears much less hackable than an OS where everything is writable. Moreover, an OS where everything must be signed and cryptographically validated makes it hard to insert your own code, given you are unlikely to possess access to the signing keys.

To address this issue other systems have supported a "developer" mode: when entered the security guarantees are disabled, and the system can be freely modified, without cryptographic validation. While that's a great concept to have I doubt it's what most developers really want: the cryptographic properties of the OS are great after all, it sucks having to give them up once developer mode is activated.

In my model I'd thus propose two different approaches to this problem. First of all, I think there's value in allowing users to additively extend/override the OS via local developer system extensions. With this scheme the underlying cryptographic validation would remain in tact, but — if this form of development mode is explicitly enabled – the developer could add in more resources from local storage, that are not tied to the OS builder's chain of trust, but a local one (i.e. simply backed by encrypted storage of some form).

The second approach is to make it easy to extend (or in fact replace) the set of trusted validation keys, with local ones that are under the control of the user, in order to make it easy to operate with kernel, OS, extension, portable service or container images signed by the local developer without involvement of the OS builder. This is relatively easy to do for components down the trust chain, i.e. the elements further up the chain should optionally allow additional certificates to allow validation with.

(Note that systemd currently has no explicit support for a "developer" mode like this. I think we should add that sooner or later however.)

Democratizing Code Signing

Closely related to the question of developer mode is the question of code signing. If you ask me, the status quo of UEFI SecureBoot code signing in the major Linux distributions is pretty sad. The work to get stuff signed is massive, but in effect it delivers very little in return: because initrds are entirely unprotected, and reside on partitions lacking any form of cryptographic integrity protection any attacker can trivially easily modify the boot process of any such Linux system and freely collected FDE passphrases entered. There's little value in signing the boot loader and kernel in a complex bureaucracy if it then happily loads entirely unprotected code that processes the actually relevant security credentials: the FDE keys.

In my model, through use of unified kernels this important gap is closed, hence UEFI SecureBoot code signing becomes an integral part of the boot chain from firmware to the host OS. Unfortunately, code signing – and having something a user can locally hack, is to some level conflicting. However, I think we can improve the situation here, and put more emphasis on enrolling developer keys in the trust chain easily. Specifically, I see one relevant approach here: enrolling keys directly in the firmware is something that we should make less of a theoretical exercise and more something we can realistically deploy. See this work in progress making this more automatic and eventually safe. Other approaches are thinkable (including some that build on existing MokManager infrastructure), but given the politics involved, are harder to conclusively implement.

Running the OS itself in a container

What I explain above is put together with running on a bare metal system in mind. However, one of the stated goals is to make the OS adaptive enough to also run in a container environment (specifically: systemd-nspawn) nicely. Booting a disk image on bare metal or in a VM generally means that the UEFI firmware validates and invokes the boot loader, and the boot loader invokes the kernel which then transitions into the final system. This is different for containers: here the container manager immediately calls the init system, i.e. PID 1. Thus the validation logic must be different: cryptographic validation must be done by the container manager. In my model this is solved by shipping the OS image not only with a Verity data partition (as is already necessary for the UEFI SecureBoot trust chain, see above), but also with another partition, containing a PKCS#7 signature of the root hash of said Verity partition. This of course is exactly what I propose for both the system extension and portable service image. Thus, in my model the images for all three uses are put together the same way: an immutable /usr/ partition, accompanied by a Verity partition and a PKCS#7 signature partition. The OS image itself then has two ways "into" the trust chain: either through the signed unified kernel in the ESP (which is used for bare metal and VM boots) or by using the PKCS#7 signature stored in the partition (which is used for container/systemd-nspawn boots).

Parameterizing Kernels

A fully immutable and signed OS has to establish trust in the user data it makes use of before doing so. In the model I describe here, for /etc/ and /var/ we do this via disk encryption of the root file system (in combination with integrity checking). But the point where the root file system is mounted comes relatively late in the boot process, and thus cannot be used to parameterize the boot itself. In many cases it's important to be able to parameterize the boot process however.

For example, for the implementation of the developer mode indicated above it's useful to be able to pass this fact safely to the initrd, in combination with other fields (e.g. hashed root password for allowing in-initrd logins for debug purposes). After all, if the initrd is pre-built by the vendor and signed as whole together with the kernel it cannot be modified to carry such data directly (which is in fact how parameterizing of the initrd to a large degree was traditionally done).

In my model this is achieved through system credentials, which allow passing parameters to systems (and services for the matter) in an encrypted and authenticated fashion, bound to the TPM2 chip. This means that we can securely pass data into the initrd so that it can be authenticated and decrypted only on the system it is intended for and with the unified kernel image it was intended for.

Swap

In my model the OS would also carry a swap partition. For the simple reason that only then systemd-oomd.service can provide the best results. Also see In defence of swap: common misconceptions

Updating Images

We have a rough idea how the system shall be organized now, let's next focus on the deployment cycle: software needs regular update cycles, and software that is not updated regularly is a security problem. Thus, I am sure that any modern system must be automatically updated, without this requiring avoidable user interaction.

In my model, this is the job for systemd-sysupdate. It's a relatively simple A/B image updater: it operates either on partitions, on regular files in a directory, or on subdirectories in a directory. Each entry has a version (which is encoded in the GPT partition label for partitions, and in the filename for regular files and directories): whenever an update is initiated the oldest version is erased, and the newest version is downloaded.

With the setup described above a system update becomes a really simple operation. On each update the systemd-sysupdate tool downloads a /usr/ file system partition, an accompanying Verity partition, a PKCS#7 signature partition, and drops it into the host's partition table (where it possibly replaces the oldest version so far stored there). Then it downloads a unified kernel image and drops it into the EFI System Partition's /EFI/Linux (as per Boot Loader Specification; possibly erase the oldest such file there). And that's already the whole update process: four files are downloaded from the server, unpacked and put in the most straightforward of ways into the partition table or file system. Unlike in other OS designs there's no mechanism required to explicitly switch to the newer version, the aforementioned systemd-boot logic will automatically pick the newest kernel once it is dropped in.

Above we talked a lot about modularity, and how to put systems together as a combination of a host OS image, system extension images for the initrd and the host, portable service images and systemd-nspawn container images. I already emphasized that these image files are actually always the same: GPT disk images with partition definitions that match the Discoverable Partition Specification. This comes very handy when thinking about updating: we can use the exact same systemd-sysupdate tool for updating these other images as we use for the host image. The uniformity of the on-disk format allows us to update them uniformly too.

Boot Counting + Assessment

Automatic OS updates do not come without risks: if they happen automatically, and an update goes wrong this might mean your system might be automatically updated into a brick. This of course is less than ideal. Hence it is essential to address this reasonably automatically. In my model, there's systemd's Automatic Boot Assessment for that. The mechanism is simple: whenever a new unified kernel image is dropped into the system it will be stored with a small integer counter value included in the filename. Whenever the unified kernel image is selected for booting by systemd-boot, it is decreased by one. Once the system booted up successfully (which is determined by userspace) the counter is removed from the file name (which indicates "this entry is known to work"). If the counter ever hits zero, this indicates that it tried to boot it a couple of times, and each time failed, thus is apparently "bad". In this case systemd-boot will not consider the kernel anymore, and revert to the next older (that doesn't have a counter of zero).

By sticking the boot counter into the filename of the unified kernel we can directly attach this information to the kernel, and thus need not concern ourselves with cleaning up secondary information about the kernel when the kernel is removed. Updating with a tool like systemd-sysupdate remains a very simple operation hence: drop one old file, add one new file.

Picking the Newest Version

I already mentioned that systemd-boot automatically picks the newest unified kernel image to boot, by looking at the version encoded in the filename. This is done via a simple strverscmp() call (well, truth be told, it's a modified version of that call, different from the one implemented in libc, because real-life package managers use more complex rules for comparing versions these days, and hence it made sense to do that here too). The concept of having multiple entries of some resource in a directory, and picking the newest one automatically is a powerful concept, I think. It means adding/removing new versions is extremely easy (as we discussed above, in systemd-sysupdate context), and allows stateless determination of what to use.

If systemd-boot can do that, what about system extension images, portable service images, or systemd-nspawn container images that do not actually use systemd-boot as the entrypoint? All these tools actually implement the very same logic, but on the partition level: if multiple suitable /usr/ partitions exist, then the newest is determined by comparing the GPT partition label of them.

This is in a way the counterpart to the systemd-sysupdate update logic described above: we always need a way to determine which partition to actually then use after the update took place: and this becomes very easy each time: enumerate possible entries, pick the newest as per the (modified) strverscmp() result.

Home Directory Management

In my model the device's users and their home directories are managed by systemd-homed. This means they are relatively self-contained and can be migrated easily between devices. The numeric UID assignment for each user is done at the moment of login only, and the files in the home directory are mapped as needed via a uidmap mount. It also allows us to protect the data of each user individually with a credential that belongs to the user itself. i.e. instead of binding confidentiality of the user's data to the system-wide full-disk-encryption each user gets their own encrypted home directory where the user's authentication token (password, FIDO2 token, PKCS#11 token, recovery key…) is used as authentication and decryption key for the user's data. This brings a major improvement for security as it means the user's data is cryptographically inaccessible except when the user is actually logged in.

It also allows us to correct another major issue with traditional Linux systems: the way how data encryption works during system suspend. Traditionally on Linux the disk encryption credentials (e.g. LUKS passphrase) is kept in memory also when the system is suspended. This is a bad choice for security, since many (most?) of us probably never turn off their laptop but suspend it instead. But if the decryption key is always present in unencrypted form during the suspended time, then it could potentially be read from there by a sufficiently equipped attacker.

By encrypting the user's home directory with the user's authentication token we can first safely "suspend" the home directory before going to the system suspend state (i.e. flush out the cryptographic keys needed to access it). This means any process currently accessing the home directory will be frozen for the time of the suspend, but that's expected anyway during a system suspend cycle. Why is this better than the status quo ante? In this model the home directory's cryptographic key material is erased during suspend, but it can be safely reacquired on resume, from system code. If the system is only encrypted as a whole however, then the system code itself couldn't reauthenticate the user, because it would be frozen too. By separating home directory encryption from the root file system encryption we can avoid this problem.

Partition Setup

So we discussed the organization of the partitions OS images multiple times in the above, each time focusing on a specific aspect. Let's now summarize how this should look like all together.

In my model, the initial, shipped OS image should look roughly like this:

  • (1) An UEFI System Partition, with systemd-boot as boot loader and one unified kernel
  • (2) A /usr/ partition (version "A"), with a label fooOS_0.7 (under the assumption we called our project fooOS and the image version is 0.7).
  • (3) A Verity partition for the /usr/ partition (version "A"), with the same label
  • (4) A partition carrying the Verity root hash for the /usr/ partition (version "A"), along with a PKCS#7 signature of it, also with the same label

On first boot this is augmented by systemd-repart like this:

  • (5) A second /usr/ partition (version "B"), initially with a label _empty (which is the label systemd-sysupdate uses to mark partitions that currently carry no valid payload)
  • (6) A Verity partition for that (version "B"), similar to the above case, also labelled _empty
  • (7) And ditto a Verity root hash partition with a PKCS#7 signature (version "B"), also labelled _empty
  • (8) A root file system, encrypted and locked to the TPM2
  • (9) A home file system, integrity protected via a key also in TPM2 (encryption is unnecessary, since systemd-homed adds that on its own, and it's nice to avoid duplicate encryption)
  • (10) A swap partition, encrypted and locked to the TPM2

Then, on the first OS update the partitions 5, 6, 7 are filled with a new version of the OS (let's say 0.8) and thus get their label updated to fooOS_0.8. After a boot, this version is active.

On a subsequent update the three partitions fooOS_0.7 get wiped and replaced by fooOS_0.9 and so on.

On factory reset, the partitions 8, 9, 10 are deleted, so that systemd-repart recreates them, using a new set of cryptographic keys.

Here's a graphic that hopefully illustrates the partition stable from shipped image, through first boot, multiple update cycles and eventual factory reset:

Partitions Overview

Trust Chain

So let's summarize the intended chain of trust (for bare metal/VM boots) that ensures every piece of code in this model is signed and validated, and any system secret is locked to TPM2.

  1. First, firmware (or possibly shim) authenticates systemd-boot.

  2. Once systemd-boot picks a unified kernel image to boot, it is also authenticated by firmware/shim.

  3. The unified kernel image contains an initrd, which is the first userspace component that runs. It finds any system extensions passed into the initrd, and sets them up through Verity. The kernel will validate the Verity root hash signature of these system extension images against its usual keyring.

  4. The initrd also finds credentials passed in, then securely unlocks (which means: decrypts + authenticates) them with a secret from the TPM2 chip, locked to the kernel image itself.

  5. The kernel image also contains a kernel command line which contains a usrhash= option that pins the root hash of the /usr/ partition to use.

  6. The initrd then unlocks the encrypted root file system, with a secret bound to the TPM2 chip.

  7. The system then transitions into the main system, i.e. the combination of the Verity protected /usr/ and the encrypted root files system. It then activates two more encrypted (and/or integrity protected) volumes for /home/ and swap, also with a secret tied to the TPM2 chip.

Here's an attempt to illustrate the above graphically:

Trust Chain

This is the trust chain of the basic OS. Validation of system extension images, portable service images, systemd-nspawn container images always takes place the same way: the kernel validates these Verity images along with their PKCS#7 signatures against the kernel's keyring.

File System Choice

In the above I left the choice of file systems unspecified. For the immutable /usr/ partitions squashfs might be a good candidate, but any other that works nicely in a read-only fashion and generates reproducible results is a good choice, too. The home directories as managed by systemd-homed should certainly use btrfs, because it's the only general purpose file system supporting online grow and shrink, which systemd-homed can take benefit of, to manage storage.

For the root file system btrfs is likely also the best idea. That's because we intend to use LUKS/dm-crypt underneath, which by default only provides confidentiality, not authenticity of the data (unless combined with dm-integrity). Since btrfs (unlike xfs/ext4) does full data checksumming it's probably the best choice here, since it means we don't have to use dm-integrity (which comes at a higher performance cost).

OS Installation vs. OS Instantiation

In the discussion above a lot of focus was put on setting up the OS and completing the partition layout and such on first boot. This means installing the OS becomes as simple as dd-ing (i.e. "streaming") the shipped disk image into the final HDD medium. Simple, isn't it?

Of course, such a scheme is just too simple for many setups in real life. Whenever multi-boot is required (i.e. co-installing an OS implementing this model with another unrelated one), dd-ing a disk image onto the HDD is going to overwrite user data that was supposed to be kept around.

In order to cover for this case, in my model, we'd use systemd-repart (again!) to allow streaming the source disk image into the target HDD in a smarter, additive way. The tool after all is purely additive: it will add in partitions or grow them if they are missing or too small. systemd-repart already has all the necessary provisions to not only create a partition on the target disk, but also copy blocks from a raw installer disk. An install operation would then become a two stop process: one invocation of systemd-repart that adds in the /usr/, its Verity and the signature partition to the target medium, populated with a copy of the same partition of the installer medium. And one invocation of bootctl that installs the systemd-boot boot loader in the ESP. (Well, there's one thing missing here: the unified OS kernel also needs to be dropped into the ESP. For now, this can be done with a simple cp call. In the long run, this should probably be something bootctl can do as well, if told so.)

So, with this scheme we have a simple scheme to cover all bases: we can either just dd an image to disk, or we can stream an image onto an existing HDD, adding a couple of new partitions and files to the ESP.

Of course, in reality things are more complex than that even: there's a good chance that the existing ESP is simply too small to carry multiple unified kernels. In my model, the way to address this is by shipping two slightly different systemd-repart partition definition file sets: the ideal case when the ESP is large enough, and a fallback case, where it isn't and where we then add in an addition XBOOTLDR partition (as per the Discoverable Partitions Specification). In that mode the ESP carries the boot loader, but the unified kernels are stored in the XBOOTLDR partition. This scenario is not quite as simple as the XBOOTLDR-less scenario described first, but is equally well supported in the various tools. Note that systemd-repart can be told size constraints on the partitions it shall create or augment, thus to implement this scheme it's enough to invoke the tool with the fallback partition scheme if invocation with the ideal scheme fails.

Either way: regardless how the partitions, the boot loader and the unified kernels ended up on the system's hard disk, on first boot the code paths are the same again: systemd-repart will be called to augment the partition table with the root file system, and properly encrypt it, as was already discussed earlier here. This means: all cryptographic key material used for disk encryption is generated on first boot only, the installer phase does not encrypt anything.

Live Systems vs. Installer Systems vs. Installed Systems

Traditionally on Linux three types of systems were common: "installed" systems, i.e. that are stored on the main storage of the device and are the primary place people spend their time in; "installer" systems which are used to install them and whose job is to copy and setup the packages that make up the installed system; and "live" systems, which were a middle ground: a system that behaves like an installed system in most ways, but lives on removable media.

In my model I'd like to remove the distinction between these three concepts as much as possible: each of these three images should carry the exact same /usr/ file system, and should be suitable to be replicated the same way. Once installed the resulting image can also act as an installer for another system, and so on, creating a certain "viral" effect: if you have one image or installation it's automatically something you can replicate 1:1 with a simple systemd-repart invocation.

Building Images According to this Model

The above explains how the image should look like and how its first boot and update cycle will modify it. But this leaves one question unanswered: how to actually build the initial image for OS instances according to this model?

Note that there's nothing too special about the images following this model: they are ultimately just GPT disk images with Linux file systems, following the Discoverable Partition Specification. This means you can use any set of tools of your choice that can put together GPT disk images for compliant images.

I personally would use mkosi for this purpose though. It's designed to generate compliant images, and has a rich toolset for SecureBoot and signed/Verity file systems already in place.

What is key here is that this model doesn't depart from RPM and dpkg, instead it builds on top of that: in this model they are excellent for putting together images on the build host, but deployment onto the runtime host does not involve individual packages.

I think one cannot underestimate the value traditional distributions bring, regarding security, integration and general polishing. The concepts I describe above are inherited from this, but depart from the idea that distribution packages are a runtime concept and make it a build-time concept instead.

Note that the above is pretty much independent from the underlying distribution.

Final Words

I have no illusions, general purpose distributions are not going to adopt this model as their default any time soon, and it's not even my goal that they do that. The above is my personal vision, and I don't expect people to buy into it 100%, and that's fine. However, what I am interested in is finding the overlaps, i.e. work with people who buy 50% into this vision, and share the components.

My goals here thus are to:

  1. Get distributions to move to a model where images like this can be built from the distribution easily. Specifically this means that distributions make their OS hermetic in /usr/.

  2. Find the overlaps, share components with other projects to revisit how distributions are put together. This is already happening, see systemd-tmpfiles and systemd-sysuser support in various distributions, but I think there's more to share.

  3. Make people interested in building actual real-world images based on general purpose distributions adhering to the model described above. I'd love a "GnomeBook" image with full trust properties, that is built from true Linux distros, such as Fedora or ArchLinux.

FAQ

  1. What about ostree? Doesn't ostree already deliver what this blog story describes?

    ostree is fine technology, but in respect to security and robustness properties it's not too interesting I think, because unlike image-based approaches it cannot really deliver integrity/robustness guarantees over the whole tree easily. To be able to trust an ostree setup you have to establish trust in the underlying file system first, and the complexity of the file system makes that challenging. To provide an effective offline-secure trust chain through the whole depth of the stack it is essential to cryptographically validate every single I/O operation. In an image-based model this is trivially easy, but in ostree model it's with current file system technology not possible and even if this is added in one way or another in the future (though I am not aware of anyone doing on-access file-based integrity that spans a whole hierarchy of files that was compatible with ostree's hardlink farm model) I think validation is still at too high a level, since Linux file system developers made very clear their implementations are not robust to rogue images. (There's this stuff planned, but doing structural authentication ahead of time instead of on access makes the idea to weak — and I'd expect too slow — in my eyes.)

    With my design I want to deliver similar security guarantees as ChromeOS does, but ostree is much weaker there, and I see no perspective of this changing. In a way ostree's integrity checks are similar to RPM's and enforced on download rather than on access. In the model I suggest above, it's always on access, and thus safe towards offline attacks (i.e. evil maid attacks). In today's world, I think offline security is absolutely necessary though.

    That said, ostree does have some benefits over the model described above: it naturally shares file system inodes if many of the modules/images involved share the same data. It's thus more space efficient on disk (and thus also in RAM/cache to some degree) by default. In my model it would be up to the image builders to minimize shipping overly redundant disk images, by making good use of suitably composable system extensions.

  2. What about configuration management?

    At first glance immutable systems and configuration management don't go that well together. However, do note, that in the model I propose above the root file system with all its contents, including /etc/ and /var/ is actually writable and can be modified like on any other typical Linux distribution. The only exception is /usr/ where the immutable OS is hermetic. That means configuration management tools should work just fine in this model – up to the point where they are used to install additional RPM/dpkg packages, because that's something not allowed in the model above: packages need to be installed at image build time and thus on the image build host, not the runtime host.

  3. What about non-UEFI and non-TPM2 systems?

    The above is designed around the feature set of contemporary PCs, and this means UEFI and TPM2 being available (simply because the PC is pretty much defined by the Windows platform, and current versions of Windows require both).

    I think it's important to make the best of the features of today's PC hardware, and then find suitable fallbacks on more limited hardware. Specifically this means: if there's desire to implement something like the this on non-UEFI or non-TPM2 hardware we should look for suitable fallbacks for the individual functionality, but generally try to add glue to the old systems so that conceptually they behave more like the new systems instead of the other way round. Or in other words: most of the above is not strictly tied to UEFI or TPM2, and for many cases already there are reasonably fallbacks in place for more limited systems. Of course, without TPM2 many of the security guarantees will be weakened.

  4. How would you name an OS built that way?

    I think a desktop OS built this way if it has the GNOME desktop should of course be called GnomeBook, to mimic the ChromeBook name. ;-)

    But in general, I'd call hermetic, adaptive, immutable OSes like this "particles".

How can you help?

  1. Help making Distributions Hermetic in /usr/!

    One of the core ideas of the approach described above is to make the OS hermetic in /usr/, i.e. make it carry a comprehensive description of what needs to be set up outside of it when instantiated. Specifically, this means that system users that are needed are declared in systemd-sysusers snippets, and skeleton files and directories are created via systemd-tmpfiles. Moreover additional partitions should be declared via systemd-repart drop-ins.

    At this point some distributions (such as Fedora) are (probably more by accident than on purpose) already mostly hermetic in /usr/, at least for the most basic parts of the OS. However, this is not complete: many daemons require to have specific resources set up in /var/ or /etc/ before they can work, and the relevant packages do not carry systemd-tmpfiles descriptions that add them if missing. So there are two ways you could help here: politically, it would be highly relevant to convince distributions that an OS that is hermetic in /usr/ is highly desirable and it's a worthy goal for packagers to get there. More specifically, it would be desirable if RPM/dpkg packages would ship with enough systemd-tmpfiles information so that configuration files the packages strictly need for operation are symlinked (or copied) from /usr/share/factory/ if they are missing (even better of course would be if packages from their upstream sources on would just work with an empty /etc/ and /var/, and create themselves what they need and default to good defaults in absence of configuration files).

    Note that distributions that adopted systemd-sysusers, systemd-tmpfiles and the /usr/ merge are already quite close to providing an OS that is hermetic in /usr/. These were the big, the major advancements: making the image fully hermetic should be less controversial – at least that's my guess.

    Also note that making the OS hermetic in /usr/ is not just useful in scenarios like the above. It also means that stuff like this and like this can work well.

  2. Fill in the gaps!

    I already mentioned a couple of missing bits and pieces in the implementation of the overall vision. In the systemd project we'd be delighted to review/merge any PRs that fill in the voids.

  3. Build your own OS like this!

    Of course, while we built all these building blocks and they have been adopted to various levels and various purposes in the various distributions, no one so far built an OS that puts things together just like that. It would be excellent if we had communities that work on building images like what I propose above. i.e. if you want to work on making a secure GnomeBook as I suggest above a reality that would be more than welcome.

    How could this look like specifically? Pick an existing distribution, write a set of mkosi descriptions plus some additional drop-in files, and then build this on some build infrastructure. While doing so, report the gaps, and help us address them.

Further Documentation of Used Components and Concepts

  1. systemd-tmpfiles
  2. systemd-sysusers
  3. systemd-boot
  4. systemd-stub
  5. systemd-sysext
  6. systemd-portabled, Portable Services Introduction
  7. systemd-repart
  8. systemd-nspawn
  9. systemd-sysupdate
  10. systemd-creds, System and Service Credentials
  11. systemd-homed
  12. Automatic Boot Assessment
  13. Boot Loader Specification
  14. Discoverable Partitions Specification
  15. Safely Building Images

Earlier Blog Stories Related to this Topic

  1. The Strange State of Authenticated Boot and Disk Encryption on Generic Linux Distributions
  2. The Wondrous World of Discoverable GPT Disk Images
  3. Unlocking LUKS2 volumes with TPM2, FIDO2, PKCS#11 Security Hardware on systemd 248
  4. Portable Services with systemd v239
  5. mkosi — A Tool for Generating OS Images

And that's all for now.

Instantaneous RTP synchronization & retrieval of absolute sender clock times with GStreamer

Over the last few weeks, GStreamer’s RTP stack got a couple of new and quite useful features. As it is difficult to configure, mostly because there being so many different possible configurations, I decided to write about this a bit with some example code.

The features are RFC 6051-style rapid synchronization of RTP streams, which can be used for inter-stream (e.g. audio/video) synchronization as well as inter-device (i.e. network) synchronization, and the ability to easily retrieve absolute sender clock times per packet on the receiver side.

Note that each of this was already possible before with GStreamer via different mechanisms with different trade-offs. Obviously, not being able to have working audio/video synchronization would be simply not acceptable and I previously talked about how to do inter-device synchronization with GStreamer before, for example at the GStreamer Conference 2015 in Düsseldorf.

The example code below will make use of the GStreamer RTSP Server library but can be applied to any kind of RTP workflow, including WebRTC, and are written in Rust but the same can also be achieved in any other language. The full code can be found in this repository.

And for reference, the merge requests to enable all this are [1], [2] and [3]. You probably don’t want to backport those to an older version of GStreamer though as there are dependencies on various other changes elsewhere. All of the following needs at least GStreamer from the git main branch as of today, or the upcoming 1.22 release.

Baseline Sender / Receiver Code

The starting point of the example code can be found here in the baseline branch. All the important steps are commented so it should be relatively self-explanatory.

Sender

The sender is starting an RTSP server on the local machine on port 8554 and provides a media with H264 video and Opus audio on the mount point /test. It can be started with

$ cargo run -p rtp-rapid-sync-example-send

After starting the server it can be accessed via GStreamer with e.g. gst-play-1.0 rtsp://127.0.0.1:8554/test or similarly via VLC or any other software that supports RTSP.

This does not do anything special yet but lays the foundation for the following steps. It creates an RTSP server instance with a custom RTSP media factory, which in turn creates custom RTSP media instances. All this is not needed at this point yet but will allow for the necessary customization later.

One important aspect here is that the base time of the media’s pipeline is set to zero

pipeline.set_base_time(gst::ClockTime::ZERO);
pipeline.set_start_time(gst::ClockTime::NONE);

This allows the timeoverlay element that is placed in the video part of the pipeline to render the clock time over the video frames. We’re going to use this later to confirm on the receiver that the clock time on the sender and the one retrieved on the receiver are the same.

let video_overlay = gst::ElementFactory::make("timeoverlay", None)
    .context("Creating timeoverlay")?;
[...]
video_overlay.set_property_from_str("time-mode", "running-time");

It actually only supports rendering the running time of each buffer, but in a live pipeline with the base time set to zero the running time and pipeline clock time are the same. See the documentation for some more details about the time concepts in GStreamer.

Overall this creates the following RTSP stream producer bin, which will be used also in all the following steps:

Receiver

The receiver is a simple playbin pipeline that plays an RTSP URI given via command-line parameters and runs until the stream is finished or an error has happened.

It can be run with the following once the sender is started

$ cargo run -p rtp-rapid-sync-example-send -- "rtsp://192.168.1.101:8554/test"

Please don’t forget to replace the IP with the IP of the machine that is actually running the server.

All the code should be familiar to anyone who ever wrote a GStreamer application in Rust, except for one part that might need a bit more explanation

pipeline.connect_closure(
    "source-setup",
    false,
    glib::closure!(|_playbin: &gst::Pipeline, source: &gst::Element| {
        source.set_property("latency", 40u32);
    }),
);

playbin is going to create an rtspsrc, and at that point it will emit the source-setup signal so that the application can do any additional configuration of the source element. Here we’re connecting a signal handler to that signal to do exactly that.

By default rtspsrc introduces a latency of 2 seconds of latency, which is a lot more than what is usually needed. For live, non-VOD RTSP streams this value should be around the network jitter and here we’re configuring that to 40 milliseconds.

Retrieval of absolute sender clock times

Now as the first step we’re going to retrieve the absolute sender clock times for each video frame on the receiver. They will be rendered by the receiver at the bottom of each video frame and will also be printed to stdout. The changes between the previous version of the code and this version can be seen here and the final code here in the sender-clock-time-retrieval branch.

When running the sender and receiver as before, the video from the receiver should look similar to the following

The upper time that is rendered on the video frames is rendered by the sender, the bottom time is rendered by the receiver and both should always be the same unless something is broken here. Both times are the pipeline clock time when the sender created/captured the video frame.

In this configuration the absolute clock times of the sender are provided to the receiver via the NTP / RTP timestamp mapping provided by the RTCP Sender Reports. That’s also the reason why it takes about 5s for the receiver to know the sender’s clock time as RTCP packets are not scheduled very often and only after about 5s by default. The RTCP interval can be configured on rtpbin together with many other things.

Sender

On the sender-side the configuration changes are rather small and not even absolutely necessary.

rtpbin.set_property_from_str("ntp-time-source", "clock-time");

By default the RTP NTP time used in the RTCP packets is based on the local machine’s walltime clock converted to the NTP epoch. While this works fine, this is not the clock that is used for synchronizing the media and as such there will be drift between the RTP timestamps of the media and the NTP time from the RTCP packets, which will be reset every time the receiver receives a new RTCP Sender Report from the sender.

Instead, we configure rtpbin here to use the pipeline clock as the source for the NTP timestamps used in the RTCP Sender Reports. This doesn’t give us (by default at least, see later) an actual NTP timestamp but it doesn’t have the drift problem mentioned before. Without further configuration, in this pipeline the used clock is the monotonic system clock.

rtpbin.set_property("rtcp-sync-send-time", false);

rtpbin normally uses the time when a packet is sent out for the NTP / RTP timestamp mapping in the RTCP Sender Reports. This is changed with this property to instead use the time when the video frame / audio sample was captured, i.e. it does not include all the latency introduced by encoding and other processing in the sender pipeline.

This doesn’t make any big difference in this scenario but usually one would be interested in the capture clock times and not the send clock times.

Receiver

On the receiver-side there are a few more changes. First of all we have to opt-in to rtpjitterbuffer putting a reference timestamp metadata on every received packet with the sender’s absolute clock time.

pipeline.connect_closure(
    "source-setup",
    false,
    glib::closure!(|_playbin: &gst::Pipeline, source: &gst::Element| {
        source.set_property("latency", 40u32);
        source.set_property("add-reference-timestamp-meta", true);
    }),
);

rtpjitterbuffer will start putting the metadata on packets once it knows the NTP / RTP timestamp mapping, i.e. after the first RTCP Sender Report is received in this case. Between the Sender Reports it is going to interpolate the clock times. The normal timestamps (PTS) on each packet are not affected by this and are still based on whatever clock is used locally by the receiver for synchronization.

To actually make use of the reference timestamp metadata we add a timeoverlay element as video-filter on the receiver:

let timeoverlay =
    gst::ElementFactory::make("timeoverlay", None).context("Creating timeoverlay")?;

timeoverlay.set_property_from_str("time-mode", "reference-timestamp");
timeoverlay.set_property_from_str("valignment", "bottom");

pipeline.set_property("video-filter", &timeoverlay);

This will then render the sender’s absolute clock times at the bottom of each video frame, as seen in the screenshot above.

And last we also add a pad probe on the sink pad of the timeoverlay element to retrieve the reference timestamp metadata of each video frame and then printing the sender’s clock time to stdout:

let sinkpad = timeoverlay
    .static_pad("video_sink")
    .expect("Failed to get timeoverlay sinkpad");
sinkpad
    .add_probe(gst::PadProbeType::BUFFER, |_pad, info| {
        if let Some(gst::PadProbeData::Buffer(ref buffer)) = info.data {
            if let Some(meta) = buffer.meta::<gst::ReferenceTimestampMeta>() {
                println!("Have sender clock time {}", meta.timestamp());
            } else {
                println!("Have no sender clock time");
            }
        }

        gst::PadProbeReturn::Ok
    })
    .expect("Failed to add pad probe");

Rapid synchronization via RTP header extensions

The main problem with the previous code is that the sender’s clock times are only known once the first RTCP Sender Report is received by the receiver. There are many ways to configure rtpbin to make this happen faster (e.g. by reducing the RTCP interval or by switching to the AVPF RTP profile) but in any case the information would be transmitted outside the actual media data flow and it can’t be guaranteed that it is actually known on the receiver from the very first received packet onwards. This is of course not a problem in every use-case, but for the cases where it is there is a solution for this problem.

RFC 6051 defines an RTP header extension that allows to transmit the NTP timestamp that corresponds an RTP packet directly together with this very packet. And that’s what the next changes to the code are making use of.

The changes between the previous version of the code and this version can be seen here and the final code here in the rapid-synchronization branch.

Sender

To add the header extension on the sender-side it is only necessary to add an instance of the corresponding header extension implementation to the payloaders.

let hdr_ext = gst_rtp::RTPHeaderExtension::create_from_uri(
    "urn:ietf:params:rtp-hdrext:ntp-64",
    )
    .context("Creating NTP 64-bit RTP header extension")?;
hdr_ext.set_id(1);
video_pay.emit_by_name::<()>("add-extension", &[&hdr_ext]);

This first instantiates the header extension based on the uniquely defined URI for it, then sets its ID to 1 (see RFC 5285) and then adds it to the video payloader. The same is then done for the audio payloader.

By default this will add the header extension to every RTP packet that has a different RTP timestamp than the previous one. In other words: on the first packet that corresponds to an audio or video frame. Via properties on the header extension this can be configured but generally the default should be sufficient.

Receiver

On the receiver-side no changes would actually be necessary. The use of the header extension is signaled via the SDP (see RFC 5285) and it will be automatically made use of inside rtpbin as another source of NTP / RTP timestamp mappings in addition to the RTCP Sender Reports.

However, we configure one additional property on rtpbin

source.connect_closure(
    "new-manager",
    false,
    glib::closure!(|_rtspsrc: &gst::Element, rtpbin: &gst::Element| {
        rtpbin.set_property("min-ts-offset", gst::ClockTime::from_mseconds(1));
    }),
);

Inter-stream audio/video synchronization

The reason for configuring the min-ts-offset property on the rtpbin is that the NTP / RTP timestamp mapping is not only used for providing the reference timestamp metadata but it is also used for inter-stream synchronization by default. That is, for getting correct audio / video synchronization.

With RTP alone there is no mechanism to synchronize multiple streams against each other as the packet’s RTP timestamps of different streams have no correlation to each other. This is not too much of a problem as usually the packets for audio and video are received approximately at the same time but there’s still some inaccuracy in there.

One approach to fix this is to use the NTP / RTP timestamp mapping for each stream, either from the RTCP Sender Reports or from the RTP header extension, and that’s what is made use of here. And because the mapping is provided very often via the RTP header extension but the RTP timestamps are only accurate up to clock rate (1/90000s for video and 1/48000s) for audio in this case, we configure a threshold of 1ms for adjusting the inter-stream synchronization. Without this it would be adjusted almost continuously by a very small amount back and forth.

Other approaches for inter-stream synchronization are provided by RTSP itself before streaming starts (via the RTP-Info header), but due to a bug this is currently not made use of by GStreamer.

Yet another approach would be via the clock information provided by RFC 7273, about which I already wrote previously and which is also supported by GStreamer. This also allows inter-device, network synchronization and used for that purpose as part of e.g. AES67, Ravenna, SMPTE 2022 / 2110 and many other protocols.

Inter-device network synchronization

Now for the last part, we’re going to add actual inter-device synchronization to this example. The changes between the previous version of the code and this version can be seen here and the final code here in the network-sync branch. This does not use the clock information provided via RFC 7273 (which would be another option) but uses the same NTP / RTP timestamp mapping that was discussed above.

When starting the receiver multiple times on different (or the same) machines, each of them should play back the media synchronized to each other and exactly 2 seconds after the corresponding audio / video frames are produced on the sender.

For this, both sender and all receivers are using an NTP clock (pool.ntp.org in this case) instead of the local monotonic system clock for media synchronization (i.e. as the pipeline clock). Instead of an NTP clock it would also be possible to any other mechanism for network clock synchronization, e.g. PTP or the GStreamer netclock.

println!("Syncing to NTP clock");
clock
    .wait_for_sync(gst::ClockTime::from_seconds(5))
    .context("Syncing NTP clock")?;
println!("Synced to NTP clock");

This code instantiates a GStreamer NTP clock and then synchronously waits up to 5 seconds for it to synchronize. If that fails then the application simply exits with an error.

Sender

On the sender side all that is needed is to configure the RTSP media factory, and as such the pipeline used inside it, to use the NTP clock

factory.set_clock(Some(&clock));

This causes all media inside the sender’s pipeline to be synchronized according to this NTP clock and to also use it for the NTP timestamps in the RTCP Sender Reports and the RTP header extension.

Receiver

On the receiver side the same has to happen

pipeline.use_clock(Some(&clock));

In addition a couple more settings have to be configured on the receiver though. First of all we configure a static latency of 2 seconds on the receiver’s pipeline.

pipeline.set_latency(gst::ClockTime::from_seconds(2));

This is necessary as GStreamer can’t know the latency of every receiver (e.g. different decoders might be used), and also because the sender latency can’t be automatically known. Each audio / video frame will be timestamped on the receiver with the NTP timestamp when it was captured / created, but since then all the latency of the sender, the network and the receiver pipeline has passed and for this some compensation must happen.

Which value to use here depends a lot on the overall setup, but 2 seconds is a (very) safe guess in this case. The value only has to be larger than the sum of sender, network and receiver latency and in the end has the effect that the receiver is showing the media exactly that much later than the sender has produced it.

And last we also have to tell rtpbin that

  1. sender and receiver clock are synchronized to each other, i.e. in this case both are using exactly the same NTP clock, and that no translation to the pipeline’s clock is necessary, and
  2. that the outgoing timestamps on the receiver should be exactly the sender timestamps and that this conversion should happen based on the NTP / RTP timestamp mapping

source.set_property_from_str("buffer-mode", "synced");
source.set_property("ntp-sync", true);

And that’s it.

A careful reader will also have noticed that all of the above would also work without the RTP header extension, but then the receivers would only be synchronized once the first RTCP Sender Report is received. That’s what the test-netclock.c / test-netclock-client.c example from the GStreamer RTSP server is doing.

As usual with RTP, the above is by far not the only way of doing this and GStreamer also supports various other synchronization mechanisms. Which one is the correct one for a specific use-case depends on a lot of factors.

Save the Date: Berlin Mini GUADEC

Since GUADEC is hard to get to from Europe and some of us don’t do air travel, we’re going to do another edition of Berlin Mini GUADEC this year!

We have a pretty solid local community in Berlin these days, and there are a lot of other contributors living reasonably close by in and around central Europe. Last year’s edition was very fun and productive with minimal organizational effort, and this year will be even better!

Location and other details are TBA, but it’s going to be in Berlin during the conference and BoF days (July 20th to 25th).

See you in Berlin!

May 01, 2022

Engaging with the OSI Elections 2022.1

Every year, the Open Source Initiative (OSI) holds an election to add a few new directors to its board of directors. This year, I decided to try and engage with that process, asking the candidates some real questions.

This is an account of that, which I’m hoping is food for thought for any and all other people who care about FOSS governance, and it’s also a record of what happened, since I expect that to be soon forgotten if I don’t write it down myself. It’s been more than a month, yeah, so … ancient history. I had a thing. Still, historical record and all.

Background

Ostensibly, these elections are serious affairs. The OSI is high-profile organization, with a robust list of Big Tech sponsor companies funding it. And “open source” as a term is the OSI’s property: the OSI is in charge of the trademark and defends it when it is misused; the OSI also maintains the formal “open source” definition and the list of licenses that you are permitted to call “open source”. That’s actually not a massive list of duties, so you might well wonder why the OSI is so high-profile to begin with; I think that’s one of the big philosophical questions, certainly worth revisiting (especially with regard to running for governance positions) but, for the time being, suffice it to say that it’s where we (“we” meaning “FOSS in the general sense”) are.

Nevertheless, these elections kinda just plod through without a lot of interest or engagement.  You might remember that in 2021, the election-management- or voting-system went a little haywire and the OSI had to redo the election. But, by and large, they aren’t really news when they happen. That’s in pretty stark contrast to the public back-and-forth that happens for Debian Project Leader (DPL) elections and the brouhaha over recent leadership “maneuvering” (scare quotes intentional) in the FSF.

The OSI board candidates can each write a candidacy-page text that gets put on the wiki, but it can say whatever they want. In short, to you the voter, there’s no genuine back-and-forth provided. No debates, no time allotted, no required position papers, etc. For the past few years, however, Luis Villa has made an effort to pose questions to the candidates. I think that’s great. Although not everyone answers, some do. But Villa is just one voter, and he doesn’t ask everything. And some of what he does ask is more of a “tell us in your own words” prompt than it is anything particularly drilling down on a specific point. So it’s not a real incisive process on those grounds.

Plus, this year I saw some things that concerned me about the ballot, and when I got the OSI’s “this much time until the election” email, I realized it was now or never. So this year was my first attempt to ask the candidates questions that I, as a vote-caster, wanted to know the answers to.

The ballot and the candidates

OSI membership itself is a little odd; I’m a student member, but I had to jump through a lot of hoops to even figure out how to join as a student member in the first place, and if I hadn’t been old-conference-pals with the then-OSI executive director I’m not sure I would have succeeded. Not a lot of info on the site; lotta dead links, etc. And even then, the process is basically trade-some-personal-email-based. That could use updating as well, but I digress.

The relevant point is that my “student” membership is a subclass of the “individual” membership. Individual members are people who have joined OSI of their own, personal accord. There is one other type of membership: “affiliate” membership, which is open to organizations (specifically, to other non-profit orgs, FOSS projects, user groups, and educational institutions). Affiliate members are institutions, but they are not the same as OSI corporate sponsors.

That matters because the OSI board is also a little odd, with four the seats chosen by the individual members and four chosen by the affiliate members, plus two more seats chosen by the other members of the board itself. Sponsors don’t get any seats. Add to that the fact that the individual-seat board members and the board-selected board members have a two-year term while the affiliate-seat board members have a three-year term.

Whatever. The relevant idea is that individuals and affiliates are separate memberships, and there is one election held for the individual-member board seats and a separate election held of the affiliate-member board seats. I can vote for “individual” seat candidates; I can’t vote for “affiliate” seats. The affiliate organizations get an organization-wide vote for the “affiliate” board seats; they do not get to vote on the “individual” board seats. And that’s by design.

Which brings us to this year’s candidates. Two individual seats were up for voting and two affiliate seats were up for voting. There’s an election page that currently has some of the details on it, but the pages of the candidates were only ever on the wiki (side note: don’t have two disjoint CMSes for your organization. It makes the baby pandas cry.)

[There used to be a blog post linking to all of the bio pages below but, for some reason I haven’t been able to track down an answer to, OSI pulled down that page– //blog.opensource.org/meet-osis-2022-candidates-for-board-of-directors — and it’s not archived in Wayback Machine. I’ll update that if it reappears. I locally archived all the links below before posting this, so if those also vanish, lemme know.]

The individual candidates running for the 2 individual-voter seats were:

Editorial note: at the moment, the above “candidacy page” links show the questions I eventually posed the candidates; feel free to not read those yet if you want to stay spoiler-free….

The affiliate candidates running for the 2 affiliate-voter seats were:

…and, just for comparison, the current board prior to Election Day was:

  • Aeva Black {individual} (Microsoft)
  • Megan Byrd-Sanicki {individual} – outgoing – (Google)
  • Catharina Maracke {individual} (Lawyer; lists several associations with unclear professional-affiliation status)
  • Josh Simmons {individual} – outgoing – (Salesforce)

plus:

  • Thierry Carrez {affiliate} (OpenStack)
  • Pamela Chestek {affiliate} – up for reelection
  • Hong Phuc Dang {affiliate} (FOSSASIA)
  • Italo Vignoli {affiliate} – outgoing – (The Document Foundation)

and:

  • Justin Colannino {board-appointed} (Microsoft)
  • Tracy Hinds {board-appointed; serves as Treasurer}

(Side note: it’s sometimes slightly tricky to come up with a real precise “employer”/”affiliation” tag to parentheticize next to each candidate; the meanings vary between individual and affiliate seats, and there are people who are self-employed, etc. It’s a spectrum.)

So, there we have it. As a matter of principle, I may care about the overall make-up of the OSI board, but as a voter I am offered input only on the “individual” board seats. So that’s where I focused my attention.

Issues

Before you even look at the candidate pages, a couple of thing stand out. First, there are two organizations who have multiple people running this year. There are two Red Hatters running on the same ballot, and there are two OpenUK ers running, with one of them running on the individual ballot and the other on the affiliate ballot. Slightly less noticeable is that there is a Googler running to fill a seat vacated by an outgoing Googler.

It’s clearly a problem for any one organization (for profit or otherwise) to hold on to a disproportionate number of seats on the OSI board. This is single-digit territory, mind you. That being said, some of these organizations are quite large and could hypothetically have people in totally disjoint divisions running, with different experiences, reporting to different supervisors, and operating in mutual isolation from one another. We just don’t know.

Furthermore, it’s arguably most problematic for the two OpenUK-ers who are running simultaneously for a seat on two different ballots, then somewhat less problematic for the two Red Hatters both running for an individual seat, and further less problematic for the Googler running as another Googler exits. That’s because the “individual” and “affiliate” voters are different constituencies; the individual-seat voters can see two Red Hatters on the ballot and say “yeah, we definitely need more diversity than THAT”, but the double-dipping maneuver goes more easily unnoticed when one candidate is presented to the individual voter block and the other candidate is presented to the affiliate voter block. That’s compounded by the fact that the affiliate “voters” vote at the organizational level; they’re more likely to cast those votes based on some internal organizational process that we just can’t know form outside.

The second thing that stands out is that one of the candidates for an “individual” seat is actually affiliated with one of the affiliate organizations. That’s a second red flag; the affiliate seats exist to give the affiliate organizations an institutional say in OSI operations. Affiliate employees and officers should definitely not be eligible to run for an “individual” seat for that reason. If they want to participate, they need to run for one of the affiliate seats that is set out in the bylaws for that purpose. Doubling down on the inappropriateness, this affiliate is OpenUK, who was also a red-flagged participant by virtue of running two candidates at the same time, on different ballots.

Moreover, the candidate in question (Brock) is, in fact, the CEO of the affiliate organization. Which is exponentially more problematic, even without the other two prior red flags.

The other thing to notice is that several candidates are employed by corporate sponsors of OSI. This, too, cedes a disproportionate degree of influence over OSI’s actions to one seat.

Last and certainly least — and I want to be generous in the way I phrase this — there are some candidates whose public profile maybe leaves a few open questions to be filled in. That’s not to say that one has to be formally associated with some Big Fancy FOSS HQ to participate in the OSI; far from it. In fact, I think it’s great that anyone can run, and I’m far more interested in the OSI hearing input from individual computer users than I am in giving yet another soapbox to a well-heeled commercial outfit that already has other ways to throw its weight around (and isn’t shy about using them). But some of the candidate’s candidacy pages are a tad sparse on detail, and it’s not immediately clear where they participate in the open-source community. So knowing more of that background information would go along way toward establishing how well-suited and interesting they ultimately are as a candidate.

So it’s a troubling ballot to look at. There’s an ostensibly non-profit organization that’s an official OSI affiliate trying to run its CEO as an individual candidate while also running a second member (a board director) on the appropriate, affiliate ballot in the same election. There’s also two financial sponsors running candidates on the individual ballot, one of them (Red Hat) running two candidates at the same time for the two open seats.

If you’re an individual voter at this stage, it looks rough. To be perfectly frank, OpenUK is violating multiple ethical principles here and clearly should not be allowed to run an officer (much less CEO) as an individual candidate. That, after all, in the entire reason that OSI has set out this distinction between “individual” and “affiliate” board seats in the first place; robbing the individual voters of their representation is blatantly wrong.

Only slightly less troubling is the organizations running multiple candidates at the same time; that raises questions about collusion or coordination on the part of the organization, which obviously would be unethical.

Finally, no financial sponsor ought to be allowed to run a candidate on the individual ballot unless that candidate will publicly agree to take steps to act solely on behalf of the individual voters, rather than the sponsor (I’m not trying to be overly specific about this risk; “recusal” and “conflict of interest” policy stuff is all I mean). This is an obvious-enough avenue to trouble that the OSI has theoretically adopted a conflict-of-interest policy … with a disclosure requirement. But they have not posted any disclosure statements since 2019. Why not? Dunno.

Assessment

So where do we go from here, if we’re an individual voter concerned about the governance of OSI? Well, Brock’s candidacy is egregiously unethical and should not have been allowed in the first place. But it has been allowed for 2022, so fixing that for the future is a matter of amending the broken bylaws … in the future. On the the sponsor-employee front, there is at least a conflict-of-interest policy to point towards and ask for clear answers. And on the multiple-candidates-from-one-organization front, although this also really does need to be fixed in the bylaws, we can at least raise the issue in public, and request some clear answers.

So that’s what I ultimately set out to do for 2022. But let’s save that (and an analysis of the candidacy pages, which factored into what questions I posted) for part II. This has gotten long enough already. Do feel free to contemplate what you would ask the candidates (and even to stay spoiler-free by not reading my own questions, if you’re that hardcore).

I know it’s easy to think that the little project-governance details don’t matter, but they’re important. Procedures matter; limits and checks on authority matter; equitable influence for all participants matter; honesty matters. Without an eye firmly fixed on those subjects, they can get eroded fast.

April 30, 2022

Paying technical debt in our accessibility infrastructure

This is somewhat of an extended abstract for the talk I want to give at GUADEC.

Curently, very few people work on GNOME's accessibility infrastructure, which is basically what the free desktop ecosystem uses regardless of GUI toolkit. After Oracle acquired Sun Microsystems in 2010, paid work on accessibility virtually disappeared. There is little volunteer work on it, and the accessibility stack has been accumulating technical debt since then.

What is legacy code? What sort of technical debt is there in our accessibility stack?

  • It has few or no tests.

  • It does not have a reproducible environment for building and testing.

  • It has not kept up with the rest of the system's evolution.

  • Few people know how it works.

It's a vicious circle, and I'd like to help break the loop.

Quick reminder: What does the accessibility infrastructure do?

An able-bodied person with good eyesight uses a desktop computer by looking at the screen, and interacting with things via the keyboard and mouse. GUI toolkits rely very heavily on this assumption, and hardware does, too — think of all the work expended in real-time flicker-free graphics, GPUs, frame-by-frame profilers, etc.

People who can't see well, or at all, or who can't use a regular keyboard in the way applications require them to (only got one hand? try pressing a modifier and a key at the opposite ends of the keyboard!), or who can't use a mouse effectively (shaky hands? arthritis pain? can't double-click? can't do fine motor control to left-click vs. right-click?), they need different technologies.

Or an adapter that translates the assumptions of "regular" applications into an interaction model they can use.

The accessibility stack for each platform, including GNOME, is that kind of adapter.

In subsequent blog posts I'll describe our accessibility infrastructure in more detail.

Times change

I've been re-familiarizing myself with the accessibility code. The last time I looked at it was in the early 2000s, when Sun contracted with Ximian to fix "easy" things like adding missing accessible roles to widgets, or associating labels with their target widgets. Back then everything assumed X11, and gross hacks were used to snoop events from GTK and forward them to the accessibility infrastructure. The accessibility code still used CORBA for inter-process communication!

Nowadays, things are different. When GNOME dropped CORBA, the accessibility code was ported in emergency mode to DBus. GTK3 and then GTK4 happened, and Wayland too. The accessibility infrastructure didn't quite keep up, and now we have a particularly noticeable missing link between the toolkit and the accessibility stack: GTK removed the event snooping code that the accessibility code used to forward all events to itself, and so for example not all features of Orca (the screen reader) fully work with GTK4 apps.

Also, in Wayland application windows don't know their absolute position in the screen. The compositor may place them anywhere or transform their textures in arbitrary ways. In addition, Wayland wants to move away from the insecure X11 model where any rogue application can set itself up as an accessibility technology (AT) and sniff all the events in all applications.

Both our toolkit infrastructure and the world's security needs have changed.

What I have been doing

I've been re-familiarizing myself with how the accessibility stack works, and I hope to detail it in subsequent blog posts.

For now, I've done the following:

  • Add continuous integration to at-spi2-core. Three of the basic modules in the accessibility infrastructure — at-spi2-core, at-spi2-atk, pyatspi2 — didn't have any CI configured for them. Atk has CI, Orca doesn't, but I haven't explored the Orca code yet.

  • I intend to merge at-spi2-core/atk/at-spi2-atk/pyatspi2 into a single repository, since they are very tightly coupled with each other and it will be easier to do end-to-end tests if they are in the same repository. Currently those tests are spread out between at-spi2-atk and pyatspi2.

  • Made the README in at-spi2-core friendlier; turned it into Markdown and updated it for the current state of things.

  • Did a bit of refactoring in at-spi2-core after setting up static analysis in its CI.

  • Did a bit of exploratory refactoring there, but found out that I have no easy way to test it. Hence the desire to make it possible to test all the accessibility code together.

  • Fixed a crash in GTK when one uses gnome-text-editor with the screen reader on. (merge request awaiting review)

  • Currently trying to debug this missing annotation in gnome-shell.

A little call for help

Is anyone familiar with testing things in Gitlab that need to be launched by dbus-broker? I think this may require either running systemd in the CI container (a cumbersome proposition), or using a virtual machine with systemd instead of a container. Anyway — if you care about Fedora or dbus-broker, please help.

April 28, 2022

fwupd 1.8.0 and 50 million updates

I’ve just tagged the 1.8.0 release of fwupd, with these release notes — there’s lots of good stuff there as always. More remarkable is that LVFS has now supplied over 50 million updates to Linux machines all around the globe. The true number is going to be unknown, as we allow vendors to distribute updates without any kind of logging, and also allow companies or agencies to mirror the entire LVFS so the archive can be used offline. The true number of updates deployed will be a lot higher than 50 million, which honestly blows my tiny mind. Just 7 years ago Christian asked me to “make firmware updates work on Linux” and now we have a thriving client project that respects both your freedom and your privacy, and a thriving ecosystem of hardware vendors who consider Linux users first class citizens. Of course, there are vendors who are not shipping updates for popular hardware, but they’re now in the minority — and every month we have two or three new vendor account requests. The logistical, security and most importantly commercial implications of not being “on the LVFS” are now too critical even for tier-1 IHVs, ODMs and OEMs to ignore.

I’m still amazed to see Reddit posts, YouTube videos and random people on Twitter talk about the thing that’s been my baby for the last few years. It’s both frightening as hell (because of the responsibility) and incredibly humbling at the same time. Red Hat can certainly take a lot of credit for the undeniable success of LVFS and fwupd, as they have been the people paying my salary and pushing me forward over the last decade and more. Obviously I’m glad everything is being used by the distros like Ubuntu and Arch, although for me it’s Fedora that’s at least technically the one pushing Linux forward these days. I’ve seen Fedora grow in market share year on year, and I’m proud to be one of the people pushing the exciting Future Features into Fedora.

So what happens next? I guess we have the next 50 million updates to look forward to. The LVFS has been growing ever so slightly exponentially since it was first conceived so that won’t take very long now. We’ve blasted through 1MM updates a month, and now regularly ship more than 2MM updates a month and with the number of devices supported growing like it has (4004 different streams, with 2232 more planned), it does seem an exciting place to be. I’m glad that the number of committers for fwupd is growing at the same pace as the popularity, and I’m not planning to burn out any time soon. Google has also been an amazing partner in encouraging vendors to ship updates on the LVFS and shipping fwupd in ChromeOS — and their trust and support has been invaluable. I’m also glad the “side-projects” like “GNOME Firmware“, “Host Security ID“, “fwupd friendly firmware” and “uSWID as an SBoM” also seem to be flourishing into independent projects in their own right. It does seem now is the right time to push the ecosystem towards transparency, open source and respecting the users privacy. Redistributing closed source firmware may be an unusual route to get there, but it’s certainly working. There are a few super-sekret things I’m just not allowed to share yet, but it’s fair to say that I’m incredibly excited about the long term future.

From the bottom of my heart, thank you all for your encouragement and support.

April 26, 2022

Testing my System Code in /usr/ Without Modifying /usr/

I recently blogged about how to run a volatile systemd-nspawn container from your host's /usr/ tree, for quickly testing stuff in your host environment, sharing your home drectory, but all that without making a single modification to your host, and on an isolated node.

The one-liner discussed in that blog story is great for testing during system software development. Let's have a look at another systemd tool that I regularly use to test things during systemd development, in a relatively safe environment, but still taking full benefit of my host's setup.

Since a while now, systemd has been shipping with a simple component called systemd-sysext. It's primary usecase goes something like this: on one hand OS systems with immutable /usr/ hierarchies are fantastic for security, robustness, updating and simplicity, but on the other hand not being able to quickly add stuff to /usr/ is just annoying.

systemd-sysext is supposed to bridge this contradiction: when invoked it will merge a bunch of "system extension" images into /usr/ (and /opt/ as a matter of fact) through the use of read-only overlayfs, making all files shipped in the image instantly and atomically appear in /usr/ during runtime — as if they always had been there. Now, let's say you are building your locked down OS, with an immutable /usr/ tree, and it comes without ability to log into, without debugging tools, without anything you want and need when trying to debug and fix something in the system. With systemd-sysext you could use a system extension image that contains all this, drop it into the system, and activate it with systemd-sysext so that it genuinely extends the host system.

(There are many other usecases for this tool, for example, you could build systems that way that at their base use a generic image, but by installing one or more system extensions get extended to with additional more specific functionality, or drivers, or similar. The tool is generic, use it for whatever you want, but for now let's not get lost in listing all the possibilites.)

What's particularly nice about the tool is that it supports automatically discovered dm-verity images, with signatures and everything. So you can even do this in a fully authenticated, measured, safe way. But I am digressing…

Now that we (hopefully) have a rough understanding what systemd-sysext is and does, let's discuss how specficially we can use this in the context of system software development, to safely use and test bleeding edge development code — built freshly from your project's build tree – in your host OS without having to risk that the host OS is corrupted or becomes unbootable by stuff that didn't quite yet work the way it was envisioned:

The images systemd-sysext merges into /usr/ can be of two kinds: disk images with a file system/verity/signature, or simple, plain directory trees. To make these images available to the tool, they can be placed or symlinked into /usr/lib/extensions/, /var/lib/extensions/, /run/extensions/ (and a bunch of others). So if we now install our freshly built development software into a subdirectory of those paths, then that's entirely sufficient to make them valid system extension images in the sense of systemd-sysext, and thus can be merged into /usr/ to try them out.

To be more specific: when I develop systemd itself, here's what I do regularly, to see how my new development version would behave on my host system. As preparation I checked out the systemd development git tree first of course, hacked around in it a bit, then built it with meson/ninja. And now I want to test what I just built:

sudo DESTDIR=/run/extensions/systemd-test meson install -C build --quiet --no-rebuild &&
        sudo systemd-sysext refresh --force

Explanation: first, we'll install my current build tree as a system extension into /run/extensions/systemd-test/. And then we apply it to the host via the systemd-sysext refresh command. This command will search for all installed system extension images in the aforementioned directories, then unmount (i.e. "unmerge") any previously merged dirs from /usr/ and then freshly mount (i.e. "merge") the new set of system extensions on top of /usr/. And just like that, I have installed my development tree of systemd into the host OS, and all that without actually modifying/replacing even a single file on the host at all. Nothing here actually hit the disk!

Note that all this works on any system really, it is not necessary that the underlying OS even is designed with immutability in mind. Just because the tool was developed with immutable systems in mind it doesn't mean you couldn't use it on traditional systems where /usr/ is mutable as well. In fact, my development box actually runs regular Fedora, i.e. is RPM-based and thus has a mutable /usr/ tree. As long as system extensions are applied the whole of /usr/ becomes read-only though.

Once I am done testing, when I want to revert to how things were without the image installed, it is sufficient to call:

sudo systemd-sysext unmerge

And there you go, all files my development tree generated are gone again, and the host system is as it was before (and /usr/ mutable again, in case one is on a traditional Linux distribution).

Also note that a reboot (regardless if a clean one or an abnormal shutdown) will undo the whole thing automatically, since we installed our build tree into /run/ after all, i.e. a tmpfs instance that is flushed on boot. And given that the overlayfs merge is a runtime thing, too, the whole operation was executed without any persistence. Isn't that great?

(You might wonder why I specified --force on the systemd-sysext refresh line earlier. That's because systemd-sysext actually does some minimal version compatibility checks when applying system extension images. For that it will look at the host's /etc/os-release file with /usr/lib/extension-release.d/extension-release.<name>, and refuse operaton if the image is not actually built for the host OS version. Here we don't want to bother with dropping that file in there, we know already that the extension image is compatible with the host, as we just built it on it. --force allows us to skip the version check.)

You might wonder: what about the combination of the idea from the previous blog story (regarding running container's off the host /usr/ tree) with system extensions? Glad you asked. Right now we have no support for this, but it's high on our TODO list (patches welcome, of course!). i.e. a new switch for systemd-nspawn called --system-extension= that would allow merging one or more such extensions into the container tree booted would be stellar. With that, with a single command I could run a container off my host OS but with a development version of systemd dropped in, all without any persistence. How awesome would that be?

(Oh, and in case you wonder, all of this only works with distributions that have completed the /usr/ merge. On legacy distributions that didn't do that and still place parts of /usr/ all over the hierarchy the above won't work, since merging /usr/ trees via overlayfs is pretty pointess if the OS is not hermetic in /usr/.)

And that's all for now. Happy hacking!

April 25, 2022

scikit-survival 0.17.2 released

I’m pleased to announce the release of scikit-survival 0.17.2. This release fixes several small issues with packaging scikit-survival and the documentation. For a full list of changes in scikit-survival 0.17.2, please see the release notes.

Most notably, binary wheels are now available for Linux, Windows, and macOS (Intel). This has been possible thanks to the cibuildwheel build tool, which makes it incredible easy to use GitHub Actions for building those wheels for multiple versions of Python. Therefore, you can now use pip without building everything from source by simply running

pip install scikit-survival

As before, pre-built conda packages are available too, by running

 conda install -c sebp scikit-survival

Support for Apple Silicon M1

Currently, there are no pre-built packages for Mac with Apple Silicon M1 hardware (also known as macos/arm64). There are two main reasons for that. The biggest problem is the lack of CI servers that run on Apple Silicon M1. This makes it difficult to systematically test scikit-survival on such hardware. Second, some of scikit-survival’s dependencies do not offer wheels for macos/arm64 yet, namely ecos and osqp.

However, conda-forge figured out a way to cross-compile packages, and do offer scikit-survival for macos/arm64. I tried to adapt their conda recipe, but failed to achieve a working cross-compilation process so far. Cross-compiling with cibuildwheel would be an alternative, but currently doesn’t make much sense if run-time dependencies ecos and osqp do not provide wheels for macos/arm64. I hope these issues can be resolved in the near future.

Adopting Matrix at the GNOME Foundation

This blog post was also posted as a wall of text on GNOME’s Discourse to share the results of the survey with the broader GNOME Community.

The topic of our Instant Messaging platform of choice is quite old. In May 2021 I covered the history of IRC and Matrix in the GNOME Community.

This followed a survey to ask people which platform they were the most comfortable on, and which one they wanted to keep for instant messaging in the GNOME Project.

When the bridge was introduced in 2016, things were not great. Nobody took the time to fine tune it, and it was left in a pretty sorry state. Things have improved a lot since we started talking to people who maintain and host the bridge. We progressively tweaked it to avoid the inconveniences it initially brought.

Although things are not perfect, people complain less and less about the bridge, and the most outstanding issues are gone. But there are still some issues in our current setup: from an external perspective, it’s difficult to know where to find the GNOME community. Some discussions are on Matrix, some both on Matrix and IRC. In addition, we do not meet the safety and abuse management standards we would want to.

Let’s have a look at where we’re at now, what the data of the (1 year old) survey tells us, and how to move forward.

Surveying the land - what do we know?

This section is courtesy of my good friend, data scientist and community manager Greg Sutcliffe who did an incredible job at processing the data from the survey and extracting meaningful information from it.

To shape the solution we’re after, we can start by looking to the survey that was carried out last year. The information it contains, combined with our vision for the future, will help us plan our next steps.

We know we’re a polarised community, split between Matrix and IRC. It’s thus a simple sanity check to see this in the survey data, and that is indeed the case:

We can dive into this data a little more - by splitting it along other axes, we can get a feel for the variation between sub-groups. We can look at two splits - “member status” and “time in GNOME”. How does chat usage vary here? It turns out that the first doesn’t say a great deal - members & non-members alike use both IRC and Matrix. However, the second (the “time” subgroups) has more to say:

We can see a strong gradient towards IRC as time-in-GNOME increases (look at the “Used Regularly” bar of IRC in each plot). Conversely, Matrix is more popular among those with less than 5 years of GNOME-time.

While that question looked at usage today, the survey also asked two other important questions - how much do you like the platform, and would you use it in the future if it was the community standard. Let’s see those:

The first is ordered by mean opinion, hence the colour order is different, and it shows a few things. Firstly, the obvious one - Matrix has a higher opinion than any of the others. But look again at IRC - despite having a very similar score to RocketChat and Telegram (which are both neutral), IRC is not neutral, it is bimodal. As many people like it as dislike it - so the average is neutral, but not for the same reason.

In terms of how willing people are to move platform, again, Matrix is the clear lead here. There’s not much more to say here.

So, what do we conclude? We have evidence of our polarisation, to go with our own knowledge. Removing either IRC or Matrix out of the blue would be a good way to create a strong community split and maybe even rogue instances/channels. That’s not in our interest, and it’s clear that both need to co-exist for some considerable time.

Drawing up plans - what’s next?

With that said, it doesn’t mean one platform shouldn’t take priority over the other. Quite the opposite: given we cannot make things as seamless as we’d like for both worlds at once, we should have one firm preference for one of the two platforms. One for which we want the experience to be excellent - but we have to choose just one.

To answer that, look to the split in preference been long-time members and newer members. Any organisation needs to consider its succession planning in order to remain successful over many years - if newer members prefer Matrix, then that should weigh more heavily than their more junior status might otherwise allow for. In combination with the data that people are more comfortable with Matrix in general anyway, I think the answer is clear - Matrix should be the default and preferred platform, with IRC remaining bridged and available but secondary in nature. The federated and open nature of Matrix only makes this more acctractive. People can join our conversations whether they have a Matrix.org account, a KDE one, a Mozilla one, or whichever provider they want to use - it is the modern-day mailing list, in effect.

It could be tempting to proactively restrict our federation in “allow list” mode rather than in “block list” mode, only allowing other instances to federate with us after they have signed a code of conduct agreement and we have established ways to escalate issues to them. In practice, this is a good way to asphixiate a community with a lot of bureaucracy for little benefit. There are other tools to handle moderation in a more agile (and efficient!) manner.

Completely removing the instance from the public federation is not something we can really afford to do either: since our gnome.org server was set up, it has been in touch with 12393 other homeservers (though for the sake of transparency, not all of them are necessarily still active)! It would also be a surprising stance for an open project like ours.

TLDR

  • The community is used to both IRC and Matrix, the rest of the solutions are clearly behind.
  • People who use IRC also seem unhappy with IRC, which is not the case for Matrix.
  • The newer to our community the members are, the more likely they are to enjoy using Matrix.
  • Given how polarised we are on the IRC/Matrix question, removing one out of the blue would create a community split, so we need to keep the bridge and tweak it if needed.
  • The question on whether we self-host our instance or keep it hosted at Element Hosted Services will be answered answered separately.

Of snaps and stratagem

My desktop machine is running Kubuntu for a while now. As the new LTS came out I decided to update it. One major change in this release is that Firefox is no longer provided natively, instead it is a Snap package. If you have any experience with computers you might guess that this will cause issues. It surely did.

First the upgrade failed on Firefox and left me with broken packages. After fixing that by hand I did have Firefox but it could not be launched. The launcher app only shows a "Find Firefox" entry but not the browser itself. It can only be run from the command line. For comparison, all apps installed via Flatpak show up properly.

I could go on about this and Snap's other idiosyncrasies (like its insistence to pollute the output of mount with unnecessary garbage) but there is a wider, reoccurring issue here. Over the past years the following dance has taken place several times.

  1. A need for some sort of new functionality appears.
  2. A community project (or several) is started to work on it.
  3. Canonical starts their own project aiming to do the same.
  4. Canonical makes the project require a CLA or equivalent and make the license inconvenient (GPLv3 or stronger).
  5. Everyone else focuses on the shared project, which eventually gains critical mass and wins.
  6. Canonical adopts the community project and quietly drops their own.

This happened with Bzr (which is unfortunate since it was kind of awesome) vs Git. This happened with Mir vs Wayland. This happened with Upstart vs systemd. This happened with Unity vs Gnome/KDE. The details of how and why things went this way vary for each project but the "story arc" was approximately the same.

The interesting step is #4. We don't really know why they do that, presumably so that they can control the platform and/or license the code out under licenses for money. This seems to cause the community at large to decide that they really don't want Canonical to control such a major piece of core infrastructure and double down on step 5.

Will this also happen with Snaps? The future is uncertain, as always, but there are signs that this is already happening. For example the app delivery setup on the Steam Deck is done with Flatpaks. This makes business sense. If I was Valve I sure as hell would not want to license such core functionality from a third party company because it would mean a) recurring license fees and b) no direct control of the direction of the software stack and c) if things do go south you can't even fork the project and do your own thing because of the license.

If this ends up happening and Snaps eventually lose out to Flatpaks then I'd like to send a friendly suggestion to the people who make these sorts of decisions at Canonical. The approach you have used for the past ten plus years clearly does not seem to be working in the long term. Maybe you should consider a different tactic the next time this issue pops up. There's nothing wrong with starting your own new projects as such. That is, after all, how all innovation works, but if the outcomes have no chance of getting widely adopted because of issues like these then you are just wasting a lot of effort. There may also be unintended side effects. For example the whole Mir debacle probably delayed Wayland adoption by several years. 

A more condensed version would go something like this:

History shows us that those who oppose cooperation have eventually ended up losing.

Full disclosure bit: In the past I have worked for Canonical but I did not have anything to do with these decisions and I didn't even know where and how they were made. I still don't. Everything in this blog post is my own speculation based on public information.

April 24, 2022

Writing a #[no_std] compatible crate in Rust

I’ve been toying around with Rust during Easter. It has been a while since I last had a go at it for UEFI binaries. Turns out that the uefi-rs crate has gotten tons of love in terms of usability, stability and built-in protocol interfaces.

Now, no_std is useful for a myriad of use cases like embedded platforms, it can even work in environments with no memory allocation. You can find an example UEFI app here, it is nothing fancy, crawls the directory tree of the main EFI System Partition.

For the purpose of my pet project I wanted to add a Boot Loader Spec parser to my project that I could also link in a library with std:: as well as no_std. This introduces the requirement that your no_std environment needs to be hooked up with an allocator, at which point you can consume the alloc crate.

Otherwise you can only use the data types in libcore (as well as crates that only depend on libcore), which is a small subset of std stuff. uefi-rs sets this up for you but this documentation might come handy.

This would be your minimal no_std lib.rs:

#[no_std]
fn my_function () -> bool {
    true
}

to std or no to std

Now, the first problem we have here is that we want to also be able to compile this module with the standard library. How can we do both? Well, turns out there is a convention for this, enter cargo features:

[features]
default = ["std"]
std = []

At which point you can use #![cfg_attr] to make no_std a conditional attribute upon compilation in the absence of the std feature.

#![cfg_attr(not(feature = "std"), no_std)]
fn my_function () -> bool {
    true
}

And you can drop the std feature at compile time like this

$ cargo build --no-default-features

Or consume the crate as a dependency in your toml like this:

[dependencies]
nostdable-crate = { version = "0.1.1", default-features = false }

optionally use core:: and alloc::

So, let’s say we have the following function:

#![cfg_attr(not(feature = "std"), no_std)]

fn my_function () -> String {
    String::from("foobar")
}

Here we are using std::String which is not available, this is going to fail on a no_std setup as is. libcore’s core::str will only work for static strings as there is no allocation available. In these environments you need an allocator so that we can import the alloc:: crate, if you have an allocator primitive you need to implement a GlobalAlloc and initialize it with #[global_allocator]. This is no needed in UEFI so I didn’t have to do it.
So the question is, how do I set things up so that I can use core and alloc types conditionally? This would be it:

#![cfg_attr(not(feature = "std"), no_std)]

// SETUP ALLOCATOR WITH #[global_allocator]
// see: https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html

#[cfg(not(feature = "std"))]
extern crate alloc;

#[cfg(not(feature = "std"))]
use alloc::string::String;

fn my_function () -> String {
    String::from("foobar")
}

If you are using Vec<> the same applies, you’d have to conditionally use it from alloc:: or from std:: accordingly.

Conclusions

I really thought that porting an existing crate to #[no_std] was a lot more work and a lot more constraining. In general, if you depend or could port your code to stuff that is present in both std:: and core:: + alloc:: you should be good to go. If you want to target an environment where an allocator is not possible then porting relatively common Rust code becomes a lot more complicated as you need to find a way to write your code with no allocations whatsoever.

In my original implementation I did some file io:: operations so my entire crate API was filled with -> io::Result<,>. io:: extirpation was 90% of the porting efforts as I didn’t have any external dependencies. If you have a crate that relies on a complicated dependency tree you might have a harder time porting your existing code.

If you want to have a look at my Boot Loader Spec crate for a full example it’s in my gitlab.

April 19, 2022

WebKit frame delivery

Part of my work on WebKit at Igalia is keeping an eye on performance, especially when it concerns embedded devices. In my experience, there tend to be two types of changes that cause the biggest performance gains – either complete rewrites (e.g. WebRender for Firefox) or tiny, low-hanging fruit that get overlooked because the big picture is so multi-faceted and complex (e.g. making unnecessary buffer copies, removing an errant sleep(5) call). I would say that it’s actually the latter bug that’s harder to diagnose and fix, even though the code produced at the end of it tends to be much smaller. What I’m writing about here falls firmly into the latter group.

For a good while now, Alejandro García has been producing internal performance reports for WPE WebKit. A group of us will gather and do some basic testing of WPE WebKit on the same hardware – a Raspberry Pi 3B+. This involves running things like MotionMark and going to popular sites like YouTube and Google Maps and noting how well they work. We do this periodically and it’s a great help in tracking down regressions or obvious, user-facing issues. Another part of it is monitoring things like memory and CPU usage during this testing. Alex noted that we have a lot of idle CPU time during benchmarks, and at the same time our benchmark results fall markedly behind Chromium. Some of this is expected, Chromium has a more advanced rendering architecture that better makes use of the GPU, but a well-written benchmark should certainly be close to saturating the CPU and we certainly have CPU time to spare to improve our results. Based on this idea, Alex crafted a patch that would pre-emptively render frames (this isn’t quite what it did, but for the sake of brevity, that’s how I’m going to describe it). This showed significant improvements in MotionMark results and proved at the very least that it was legitimately possible to improve our performance in this synthetic test without any large changes.

This patch couldn’t land as it was due to concerns with it subtly changing the behaviour of how requestAnimationFrame callbacks work, but the idea was sound and definitely pointed towards an area where there may be a relatively small change that could have a large effect. This laid the groundwork for us to dig deeper and debug what was really happening here. Being a fan of computer graphics and video games, frame delivery/cadence is quite a hot topic in that area. I had the idea that something was unusual, but the code that gets a frame to the screen in WebKit is spread across many classes (and multiple threads) and isn’t so easy to reason about. On the other hand, it isn’t so hard to write a tool to analyse it from the client side and this would also give us a handy comparison with other browsers too.

So I went ahead and wrote a tool that would help us analyse exactly how frames are delivered from the client side, including under load, and the results were illuminating. The tool visualises the time between frames, the time between requesting a frame and receiving the corresponding callback and the time passed between performance.now() called at the start of a callback and the timestamp received as the callback parameter. To simulate ‘load’, it uses performance.now() and waits until the given amount of time has elapsed since the start of the callback before returning control to the main loop. If you want to sustain a 60fps update, you have about 16ms to finish whatever you’re doing in your requestAnimationFrame callback. If you exceed that, you won’t be able to maintain 60fps. Here’s Firefox under a 20ms load:

Firefox frame-times under a 20ms rendering load

And here’s Chrome under the same parameters:

Firefox frame-times under a 20ms rendering load

Now here’s WPE WebKit, using the Wayland backend, without any patches:

WebKit/WPE Wayland under a 20ms rendering load

One of these graphs does not look like the others, right? We can immediately see that when a frame exceeds the 16.67ms/60fps rendering budget in WebKit/WPE under the Wayland backend, that it hard drops to 30fps. Other browsers don’t wait for a vsync to kick off rendering work and so are able to achieve frame-rates between 30fps and 60fps, when measured over multiple frames (all of these browsers are locked to the screen refresh, there is no screen tearing present). The other noticeable thing in these is that the green line is missing on the WebKit test – this shows that the timestamp delivered to the frame callback is exactly the same as performance.now(), where as the timestamps in both Chrome and Firefox appear to be the time of the last vsync. This makes sense from an animation point of view and would mean that animations that are timed using this timestamp would move at a rate that’s more consistent with the screen refresh when under load.

What I’m omitting from this write-up is the gradual development of this tool, the subtle and different behaviour of different WebKit backends and the huge amount of testing and instrumentation that was required to reach these conclusions. I also wrote another tool to visualise the WebKit render pipeline, which greatly aided in writing potential patches to fix these issues, but perhaps that’s the topic of another blog post for another day.

In summary, there are two identified bugs, or at least, major differences in behaviour with other browsers here, both of which affect both fluidity and synthetic test performance. I’m not too concerned about the latter, but that’s a hard sell to a potential client that’s pointing at concrete numbers that say WebKit is significantly worse than some competing option. The first bug is that if a frame goes over budget and we miss a screen refresh (a vsync signal), we wait for the next one before kicking off rendering again. This is what causes the hard drop from 60fps to 30fps. As it concerns Linux, this only affects the Wayland WPE backend because that’s the only backend that implements vsync signals fully, so this doesn’t affect GTK or other WPE backends. The second bug, which is less of a bug, as a reading of the spec (steps 9 and 11.10) would certainly indicate that WebKit is doing the correct thing here, is that the timestamp given to requestAnimationFrame callbacks is the current time and not the vsync time as it is in other browsers (which makes more sense for timing animations). I have patches for both of these issues and they’re tracked in bug 233312 and bug 238999.

With these two issues fixed, this is what the graph looks like.

Patched WebKit/WPE with a 20ms rendering load

And another nice consequence is that MotionMark 1.2 has gone from:

WebKit/WPE MotionMark 1.2 results

to:

Patched WebKit/WPE MotionMark 1.2 results

Much better 🙂

No ETA on these patches landing; perhaps I’ve drawn some incorrect conclusions, or done something in a way that won’t work long term, or is wrong in some fashion that I’m not aware of yet. Also, this will most affect users of WPE/Wayland, so don’t get too excited if you’re using GNOME Web or similar. Fingers crossed though! A huge thanks to Alejandro García who worked with me on this and did an awful lot of debugging and testing (as well as the original patch that inspired this work).

April 17, 2022

The Freedom Phone is not great at privacy

The Freedom Phone advertises itself as a "Free speech and privacy first focused phone". As documented on the features page, it runs ClearOS, an Android-based OS produced by Clear United (or maybe one of the bewildering array of associated companies, we'll come back to that later). It's advertised as including Signal, but what's shipped is not the version available from the Signal website or any official app store - instead it's this fork called "ClearSignal".

The first thing to note about ClearSignal is that the privacy policy link from that page 404s, which is not a great start. The second thing is that it has a version number of 5.8.14, which is strange because upstream went from 5.8.10 to 5.9.0. The third is that, despite Signal being GPL 3, there's no source code available. So, I grabbed jadx and started looking for differences between ClearSignal and the upstream 5.8.10 release. The results were, uh, surprising.

First up is that they seem to have integrated ACRA, a crash reporting framework. This feels a little odd - in the absence of a privacy policy, it's unclear what information this gathers or how it'll be stored. Having a piece of privacy software automatically uploading information about what you were doing in the event of a crash with no notification other than a toast that appears saying "Crash Report" feels a little dubious.

Next is that Signal (for fairly obvious reasons) warns you if your version is out of date and eventually refuses to work unless you upgrade. ClearSignal has dealt with this problem by, uh, simply removing that code. The MacOS version of the desktop app they provide for download seems to be derived from a release from last September, which for an Electron-based app feels like a pretty terrible idea. Weirdly, for Windows they link to an official binary release from February 2021, and for Linux they tell you how to use the upstream repo properly. I have no idea what's going on here.

They've also added support for network backups of your Signal data. This involves the backups being pushed to an S3 bucket using credentials that are statically available in the app. It's ok, though, each upload has some sort of nominally unique identifier associated with it, so it's not trivial to just download other people's backups. But, uh, where does this identifier come from? It turns out that Clear Center, another of the Clear family of companies, employs a bunch of people to work on a ClearID[1], some sort of decentralised something or other that seems to be based on KERI. There's an overview slide deck here which didn't really answer any of my questions and as far as I can tell this is entirely lacking any sort of peer review, but hey it's only the one thing that stops anyone on the internet being able to grab your Signal backups so how important can it be.

The final thing, though? They've extended Signal's invitation support to encourage users to get others to sign up for Clear United. There's an exposed API endpoint called "get_user_email_by_mobile_number" which does exactly what you'd expect - if you give it a registered phone number, it gives you back the associated email address. This requires no authentication. But it gets better! The API to generate a referral link to send to others sends the name and phone number of everyone in your phone's contact list. There does not appear to be any indication that this is going to happen.

So, from a privacy perspective, going to go with things being some distance from ideal. But what's going on with all these Clear companies anyway? They all seem to be related to Michael Proper, who founded the Clear Foundation in 2009. They are, perhaps unsurprisingly, heavily invested in blockchain stuff, while Clear United also appears to be some sort of multi-level marketing scheme which has a membership agreement that includes the somewhat astonishing claim that:

Specifically, the initial focus of the Association will provide members with supplements and technologies for:

9a. Frequency Evaluation, Scans, Reports;

9b. Remote Frequency Health Tuning through Quantum Entanglement;

9c. General and Customized Frequency Optimizations;


- there's more discussion of this and other weirdness here. Clear Center, meanwhile, has a Chief Physics Officer? I have a lot of questions.

Anyway. We have a company that seems to be combining blockchain and MLM, has some opinions about Quantum Entanglement, bases the security of its platform on a set of novel cryptographic primitives that seem to have had no external review, has implemented an API that just hands out personal information without any authentication and an app that appears more than happy to upload all your contact details without telling you first, has failed to update this app to keep up with upstream security updates, and is violating the upstream license. If this is their idea of "privacy first", I really hate to think what their code looks like when privacy comes further down the list.

[1] Pointed out to me here

comment count unavailable comments

April 13, 2022

Updates on Boatswain

Since I wrote the announcement of Boatswain, things have progressed quite a lot. As I prepare for the 1.0 release, more features and bugfixes get in, and it’s getting dangerously close to achieving all features I personally want from it.

Stream Deck Mini & Original (v1)

Thanks to a generous Stream Deck Mini donation, I managed to fix a couple of bugs in the HID code that controls is. It is now able to upload icons to buttons, and properly fetch the serial number of the device.

Later on, a kind individual helped testing and debugging the Stream Deck Original (v1) code. I only have a 2nd generation Original, and the HID protocol changed significantly between them, so this testing was invaluable. There were another couple of bugs specific to Original v1 fixed in no time after they were reported.

Because Stream Deck Original (v2), XL, and MK.2 seem to share the same HID protocol, I’m cautiously confident that they all should be fine.

MPRIS

Something that has been quite high on my list was to integrate Boatswain with media players. I originally thought about adding media player capabilities to Boatswain, but the moment I returned to a sane state, I realized I really did not want to get in the media players business. Let’s leave media players to the people who know how to write media players 🙂

The alternative to writing my own media player inside Boatswain was to integrate it with existing ones. Fortunately for us, people solved this integration problem 16 years ago, and it’s called MPRIS. The MPRIS API is absolutely insane, and it makes me want to retire, but it is what it is, and it was the shortest path between me and my goals.

As far as I could test, it works with GNOME Music, Rhythmbox, Spotify, and Totem. There’s no reason to think it won’t work with other MPRIS media players.

Cross-device Profile Switching

Now that I have two Stream Decks, one nice little feature I thought would be nice to have is the ability for one device to change the profile of the other. This is not rocket science, neither it is a breakthrough in user experience, but it does allow using e.g. the Mini as toplevel controller, and the Original as the actual holder of buttons.

This was by far not a priority for the 1.0, but one of my goals with Boatswain is to have fun, which is something I barely have with coding these days, and implementing this definitely sounded like fun! I’m happy I did this, because it did allow me to find other bugs in unrelated parts of the codebase, all of which are fixed.

Assets & Hardware Simulator

Part of the beauty of Boatswain is that my goal with it is to make it pretty, easy to use, obvious, and most importantly, the app must Just Work. To have the latter, Boatswain is forcing me to work on various parts of the stack (systemd, portals, etc) to actually ensure everything is in place for it. But to make it pretty, obvious, and aesthetically pleasing, Boatswain requires assets.

Lots of assets.

And it is in times like this that I bother the GNOME design team. Jakub and Sam have been producing these assets, and let me tell you, they are truly outstanding at their work. I am so happy that these visual craftsmen are part of the community! Good, well rendered icons are single handedly the most important part of what makes Stream Deck buttons look sophisticated.

Boatswain in fact now has a small library of custom icons. I’ll grow it over time, adding more icons from the GNOME icon library as needed, but it already is rather nice:

Icon library in Boatswain

In particular, in order to test how these icons would look without having a real Stream Deck to use, Jakub did what anyone would do in this situation: a photorealistic 3D modelling of Stream Deck Original, with camera work, lighting, diffusion, and motion for button pressed. To test these icons. Have a look:

If anyone wants to play with this, the Blender file can be found in Boatswain’s repository.

More OBS Studio

By the time I announced the project, Boatswain already had some integration with OBS Studio through obs-websockets. I’ve added more actions to it, and now it feels pretty complete in terms of regular features – I have been using it to drive my own streams now, and if it’s enough to me, I’m happy already.

The OBS Studio plugin now is able to mute and unmute audio sources (including global audio sources), and show and hide other sources. I think this should cover most cases already. I don’t plan on implementing every single request that obs-websockets expose, so this is probably as far as I will go with the OBS Studio plugin for now. (Of course, if anyone wants anything else and is up to write the code and maintain it, I’ll happily accept the contribution.)

So, What’s Next Now?

As Boatswain continues to grow, it’s only natural to ask when it will be published on Flathub. The Boatswain release is currently blocked on systemd 251, which is the first version that will contain the udev rules that allow Stream Deck devices to be accessible from regular user sessions.

I’m waiting for this systemd release because I want to be able to at least point people to a systemd release they should have if they want a smooth, zero setting up process. Without this release, every single one who tries to use Boatswain would have to add a custom udev rule under /etc, which is of course awful and I can’t provide any support for.

Once systemd 251 is out, I’ll roll a release, publish on Flathub, and submit it to the GNOME Circle. Flathub will be the only distribution method I am willing to support, any other packaging form should be considered unofficial and unsupported.

Buy Me a Coffee at ko-fi.com

April 11, 2022

Getting web proxys and certificates working on Linux or "if it's all the same to you, I'd rather take a thousand years of the Sarlacc pit, thankyouverymuch"

In my day job I'm a consultant. Every now and then my customer changes. This means setting up a new development environment and all that. Recently I started working for a Very Big customer who have a Very Corporative network setup. Basically:

  • All network traffic must go through a corporate HTTP proxy. This means absolutely everything. Not one bit gets routed outside it.
  • Said proxy has its own SSL certificate that all clients must trust and use for all traffic. Yes, this is a man-in-the-middle attack, but a friendly one at that, so it's fine.

This seems like a simple enough problem to solve. Add the proxy to the system settings, import the certificate to the global cert store and be done with it.

As you could probably guess by the title, this is not the case. At all. The journey to get this working (which I still have not been able to do, just so you know) is a horrible tale of never ending misery, pain and despair. Everything about this is so incredibly broken and terrible that it makes a duct taped Gentoo install from 2004 look like the highest peak of usability ever to have graced us mere mortals with its presence.

The underlying issue

When web proxies originally came to being, people added support for them in the least invasive and most terrible way possible: using environment variables. Enter http_proxy, https_proxy and their kind. Then the whole Internet security thing happened and people realised that this was far too a convenient way to steal all your traffic. So programs stopped using those envvars.

Oh, I'm sorry, that's not how it went at all.

Some programs stopped using those envvars whereas other did not. New programs were written and they, too, either used those envvars or didn't, basically at random. Those that eschewed envvars had a problem because proxy support is important, so they did the expected thing: everyone and their dog invented their own way of specifying a proxy. Maybe they created a configuration file, maybe they hid the option somewhere deep in the guts of their GUI configuration menus. Maybe they added their own envvars. Who's to say what is the correct way?

This was, obviously, seen as a bad state of things so modern distros have a centralised proxy setting in their GUI configurator and now everyone uses that.

Trololololololooooo! Of course they don't. I mean, some do, others don't. There is no logic which do or don't. For example you might thing that GUI apps would obey the GUI option whereas command line programs would not but in reality it's a complete crapshoot. There is no way to tell. There does not even seem to be any consensus on what the value of said option string should be (as we shall see later).

Since things were not broken enough already, the same thing happened with SSL certificates. Many popular applications will not use the system's cert store at all. Instead they prefer to provide their own artisanal hand-crafted certificates because the ones provided by the operating system have cooties. The official reason is probably "security" because as we all know if someone has taken over your computer to the extent that they can insert malicious security certificates into root-controlled locations, sticking to your own hand-curated certificate set is enough to counter any other attacks they could possibly do.

What does all of this cause then?

Pain.

More specifically the kind of pain where you need to do the same change in a gazillion different places in different ways and hope you get it right. When you don't, anything can and will happen or not happen. By my rough estimate, in order to get a basic development environment running, I had to manually alter proxy and certificate settings in roughly ten different applications. Two of these were web browsers (actually six, because I tried regular, snap and flatpak versions of both) and let me tell you that googling how to add proxies and certificates to browsers so you could access the net is slightly complicated by the fact that until you get them absolutely correct you can't get to Google.

Apt obeys the proxy envvars but lately Canonical has started replacing application debs with Snaps and snapd obviously does not obey those envvars, because why would you. Ye olde google says that it should obey either /etc/environment or snap set system proxy.http=. Experimental results would seem to indicate that it does neither. Or maybe it does and there exists a second, even more secret set of config settings somewhere.

Adding a new certificate requires that it is in a specific DERP format as opposed to the R.E.M. format out there in the corner. Or maybe the other way around. In any case you have to a) know what format your cert blob is and b) manually convert it between the two using the openssl command line program. If you don't the importer script will just mock you for getting it wrong (and not telling you what the right thing would have been) instead of doing the conversion transparently (which it could do, since it is almost certainly using OpenSSL behind the scenes).

Even if every single option you can possibly imagine seems to be correct, 99% of the time Outlook webmail (as mandated by the customer) forces Firefox into an eternal login loop. The same settings work on a coworker's machine without issues.

This is the actual error message Outlook produces. I kid you not.

Flatpak applications do not seem to inherit any network configuration settings from the host system. Chromium does not have a setting page for proxies (Firefox does) but instead has a button to launch the system proxy setting app, which does not launch the system proxy setting app. Instead it shows a page saying that the Flatpak version obeys system settings while not actually obeying said settings. If you try to be clever and start the Flatpak with a custom command, set proxy envvars and then start Chromium manually, you find that it just ignores the system settings it said it would obey and thus you can't actually tell it to use a custom proxy.

Chromium does have a way to import new root signatures but it then marks them as untrusted and refuses to use them. I could not find a menu option to change their state. So it would seem the browser has implemented a fairly complex set of functionality that can't be used for the very purpose it was created.


The text format for the environment variables looks like https_proxy=http://my-proxy.corporation.com:80. You can also write this in the proxy configuration widget in system settings. This will cause some programs to completely fail. Some, like Chromium, fail silently whereas others, like fwupdmgr, fail with Could not determine address for server "http". If there is a correct format for this string, the entry widget does not validate it.

There were a bunch of other funnities like these but I have fortunately forgotten them. Some of the details above might also be slightly off because I have been battling with this thing for about a week already. Also, repeated bashes against the desk may have caused me head bairn damaeg.

How should things work instead?

There are two different kinds of programs. The first are those that only ever use their own certificates and do not provide any way to add new ones. These can keep on doing their own thing. For some use cases that is exactly what you want and doing anything else would be wrong. The second group does support new certificates. These should, in addition to their own way of adding new certificates, also use certificates that have been manually added to the system cert store as if they had been imported in the program itself.

There should be one, and only one, place for setting both certs and proxies. You should be able to open that widget, set the proxies and import your certificate and immediately after that every application should obey these new rules. If there is ever a case that an application does not use the new settings by default, it is always a bug in the application.

For certificates specifically the imported certificate should go to a separate group like "certificates manually added by the system administrator". In this way browsers and the like could use their own certificates and only bring in the ones manually added rather than the whole big ball of mud certificate clump from the system. There are valid reasons not to autoimport large amounts of certificates from the OS so any policy that would mandate that is DoA.

In this way the behaviour is the same, but the steps needed to make it happen are shorter, simpler, more usable, easier to document and there is only one of them. As an added bonus you can actually uninstall certificates and be fairly sure that copies of them don't linger in any of the tens of places they were shoved into.

Counters to potential arguments

In case this blog post gets linked to the usual discussion forums, there are a couple of kneejerk responses that people will post in the comments, typically without even reading the post. To save everyone time and effort, here are counterarguments to the most obvious issues raised.

"The problem is that the network architecture is wrong and stupid and needs to be changed"

Yes indeed. I will be more than happy to bring you in to meet the people in charge of this Very Big Corporation's IT systems so you can work full time on convincing them to change their entire network infrastructure. And after the 20+ years it'll take I shall be the first one in line to shake your hand and tell everyone that you were right.

"This guy is clearly an incompetent idiot."

Yes I am. I have gotten this comment every time my blog has been linked to news forums so let's just accept that as a given and move on.

"You have to do this because of security."

The most important thing in security is simplicity. Any security system that depends on human beings needing to perform needlessly complicated things is already broken. People are lazy, they will start working around things they consider wrong and stupid and in so doing undermine security way more than they would have done with a simpler system. In other words:

And finally

Don't even get me started on the VPN.

April 09, 2022

Ingenuity’s One Year of Mars Flights

Next week it’s going to be a year since Ingenuity made the first flight in an atmosphere of another planet. An atmosphere with about 1% of density of the one on Earth.

Something that sounded unlikely became a great success with 24 flights so far. To celebrate the occasion I revisited an older project of mine and used more appropriate footage shot by my friend VOPO at Fuertaventura, Canary Islands. I rendered some overlays of the Ingenuity model kindly provided by NASA.

Ingenuity with Perseverance

The music was composed on the Dirtywave M8 Tracker, went straight from the device, no mastering. The time constraints of weeklybeats is brutal. I used a hint to the Close Encounters of the Third Kind main theme, but because the flight is unrealistically dynamic, it leans on a mangled classic Jungle Jungle break.

Previously

Blender Eevee is Magic

Mentioning my ever lasting adoration of the realtime engine in Blender on twitter kind of exploded (for my standards), so I figured I’d write a blog post.

Boatswain

Georges wrote a great little app to configure Elgato Stream Deck buttons on non-proprietary platforms. I’ve been following Georges’ ventures into streaming. He’s done amazing job promoting GNOME development and has been helping with OBS a lot too.

To make Boatswain work great by default, rather than just presenting people with a monstrous shit work mountain to even get going, Georges needed some graphical assets to have the configurator ship with ready made presets.

Boatswain Screenshot

Sadly I’ve already invested in an appliance that covers my video streaming and recording needs, so I wasn’t going to buy the gear to test how things look. Instead I made the device virtual, and rendered an approximation of it in Blender.

For GNOME wallpapers there are cases where I have to resort to waiting on Cycles, but if you compare this 2 minute render (left) to the 0.5s render (right) it’s really not worth it in many cases.

Cycles left, Eevee right

Personally switching to AMD GPUs a couple of years back has removed much pain updating the OS and the reason I was allowed to do that is Eevee.

There was some interest in how the shader is set up, so I’m happily making the file available for dissection.

To do it properly, I’d probably want to improve the actual display shader to rasterize the bitmaps in a more sophisticated manner than just displaying a bitmap with no filtering. But I’d say even this basic setup has served the purpose of checking the viability of a symbol rendered on a lousy display.

April 07, 2022

Why is Kopper and Zink important? AKA the future of OpenGL

Since Kopper got merged today upstream I wanted to write a little about it as I think the value it brings can be unclear for the uninitiated.

Adam Jackson in our graphics team has been working for the last Months together with other community members like Mike Blumenkrantz implementing Kopper. For those unaware Zink is an OpenGL implementation running on top of Vulkan and Kopper is the layer that allows you to translate OpenGL and GLX window handling to Vulkan WSI handling. This means that you can get full OpenGL support even if your GPU only has a Vulkan driver available and it also means you can for instance run GNOME on top of this stack thanks to the addition of Kopper to Zink.

During the lifecycle of the soon to be released Fedora Workstation 36 we expect to allow you to turn on the doing OpenGL using Kopper and Zink as an experimental feature once we update Fedora 36 to Mesa 22.1.

So you might ask why would I care about this as an end user? Well initially you probably will not care much, but over time it is likely that GPU makers will eventually stop developing native OpenGL drivers and just focus on their Vulkan drivers. At that point Zink and Kopper provides you with a backwards compatibility solution for your OpenGL applications. And for Linux distributions it will also at some point help reduce the amount of code we need to ship and maintain significantly as we can just rely on Zink and Kopper everywhere which of course reduces the workload for maintainers.

This is not going to be an overnight transition though, Zink and Kopper will need some time to stabilize and further improve performance. At the moment performance is generally a bit slower than the native drivers, but we have seen some examples of games which actually got better performance with specific driver combinations, but over time we expect to see the negative performance delta shrink. The delta is unlikely to ever fully go away due to the cost of translating between the two APIs, but on the other side we are going to be in a situation in a few years where all current/new applications use Vulkan natively (or through Proton) and thus the stuff that relies on OpenGL will be older software, so combined with faster GPUs you should still get more than good enough performance. And at that point Zink will be a lifesaver for your old OpenGL based applications and games.

06 Apr 2022

SwiftTermApp: SSH Client for iOS

For the past couple of years, programming in Swift has been a guilty pleasure of mine - I would sneak out after getting the kids to sleep to try out the latest innovations in iOS, such as SwiftUI and RealityKit. I have decided to ship a complete app based on this work, and I put together an SSH client for iOS and iPadOS using my terminal emulator, which I call “SwiftTermApp.”

What it lacks in terms of an original name, it makes up for by having solid fundamentals in place: a comprehensive terminal emulator with all the features you expect from a modern terminal emulator, good support for international input and output, tasteful use of libssh2, keyboard accessories for your Unix needs, storing your secrets in the iOS keychain, extensive compatibility tests, an embrace of the latest and greatest iOS APIs I could find, and is fuzzed and profiled routinely to ensure a solid foundation.

While I am generally pleased with the application for personal use, my goal is to make this app generally valuable to users that routinely use SSH to connect to remote hosts - and nothing brings more clarity to a product than a user’s feedback.

I would love for you to try this app and help me identify opportunities and additional features for it. These are some potential improvements to the app, and I could use your help prioritizing them:

To reduce my development time and maximize my joy, I built this app with SwiftUI and the latest features from Swift and iOS, so it won't work on older versions of iOS. In particular, I am pretty happy with what Swift async enabled me to do, which I hope to blog about soon.

SwiftTermApp is part of a collection of open-source code built around the Unix command line that I have been authoring on and off for the past 15 years. First in C#, now also in Swift. If you are interested in some of the other libraries, check out my UI toolkits for console applications (gui.cs for C#, and TermKit for Swift) and my xterm/vt100 emulator libraries (XtermSharp for C# and SwiftTerm for Swift). I previously wrote about how they came to be.

Update: Join the discussion


For later:

For a few months during the development of the SwiftTerm library, I worked to ensure great compatibility with other terminal emulators using the esctest and vttest. I put my MacPro to good use during the evenings to run the Swift fuzzer and tracked down countless bugs and denial of service errors, used Instruments religiously to improve the performance of the terminal emulator and ensured a good test suite to prevent regressions.

Original intro: For the past few years, I have been hacking on assorted terminal tools in both C# and Swift, including a couple of UI toolkits for console applications (gui.cs for C#, and TermKit for Swift) and xterm/vt100 emulators (XtermSharp for C# and SwiftTerm for Swift). I previously wrote about how they came to be.

April 05, 2022

Automating my home network with Salt

I'm a lousy sysadmin.

For years, my strategy for managing my machines in my home network has been one of maximum neglect:

  • Avoid distribution upgrades or reinstalls as much as possible, because stuff breaks or loses its configuration.

  • Keep my $HOME intact across upgrades, to preserve my personal configuration files as much as possible, even if it accumulates vast amounts of cruft.

  • Cry when I install a long-running service on my home server, like SMB shares or a music server, because I know it will break when I have to reinstall or upgrade.

About two years ago, I wrote some scripts to automate at least the package installation step, and to set particularly critical configuration files like firewall rules. These scripts helped me lose part of my fear of updates/reinstalls; after running them, I only needed to do a little manual work to get things back in working order. The scripts also made me start to think actively on what I am comfortable with in terms of distro defaults, and what I really need to change after a default installation.

Salt

In my mind there exists this whole universe of scary power tools for large-scale sysadmin work. Of course you would want some automation if you have a server farm to manage. Of course nobody would do this by hand if you had 3000 workstations somewhere. But for my puny home network with one server and two computers? Surely those tools are overkill?

Thankfully that is not so!

My colleague Richard Brown has been talking about the Salt Project for a few years. It is similar to tools for provisioning and configuration management like Ansible or Puppet.

What I have liked about Salt so far is that the documentation is very good, and it has let me translate my little setup into its configuration language while learning some good practices along the way.

I started with the Salt walkthrough, which is pretty nice.

TL;DR: the salt-master is the central box that keeps and distributes configuration to other machines, and those machines are called salt-minions. You write some mostly-declarative YAML in the salt-master, and propagate that configuration to the minions. Salt knows how to "create a user" or "install a package" or "restart a service when a config file changes" without you having to use distro-specific commands.

My home setup and how I want it to be

pambazo - Has a RAID and serves media and stores backups. This is my home server.

tlacoyo - A desktop box, my main workstation.

torta - My laptop, which has seen very little use during the pandemic; in principle it should have an identical setup to my desktop box.

I open the MDNS firewall ports on those three machines so I can use somehost.local to access them directly, without having to set up DNS. Maybe I should learn how to do the latter.

All the machines need my basic configuration files (Emacs, shell prompt), and a few must-have programs (Emacs, Midnight Commander, git, podman).

My workstations need my gitlab/github SSH keys, my Suse VPN keys, some basic infrastructure for development, I need to be able to reinstall and reconstruct them quickly.

All the machines should get the same configuration for the two printers we have at home (a laser that can do two-sided printing, and an inkjet for photos).

My home server of course needs all the configuration for the various services it runs.

I also have a few short-lived virtual machines to test distro images. In those I only need my must-have packages.

Setting up the salt master

My home server works as the "salt master", which is the machine that holds the configuration that should be distributed to other boxes. I am just using the stock salt-master package from openSUSE. The only things I changed in its default configuration were the paths where it looks for configuration, so I can use a git checkout in my home directory instead of /etc/salt. This goes in /etc/salt/master:

# configuration for minions
file_roots:
  base:
    - /home/federico/src/salt-states

# sensitive data to be distributed to minions
pillar_roots:
  base:
    - /home/federico/src/salt-pillar

Setting up minions

It is easy enough to install the salt-minion package and set up its configuration to talk to the salt-master, but I wanted a way to bootstrap that. Salt-bootstrap is exactly that. I can run this on a newly-installed machine:

curl -o bootstrap-salt.sh -L https://bootstrap.saltproject.io
sudo sh bootstrap-salt.sh -w -A 192.168.1.10 -i my-hostname stable

The first line downloads the bootstrap-salt.sh script.

The second line:

  • -w - use the distro's packages for salt-minion, not the upstream ones.

  • -A 192.168.1.10 - the IP address of my salt-master. At this point the machine that is to become a minion doesn't have MDNS ports open yet, so it can't find pambazo.local directly and it needs its IP address.

  • -i my-hostname - Name to give to the minion. Salt lets you have the minion's name different from the hostname, but I want them to be the same. That is, I want my tlacoyo.local machine to be a minion called tlacoyo, etc.

  • stable - Use a stable release of salt-minion, not a development one.

When the script runs, it creates a keypair and asks the salt-master to register its public key. Then, on the salt-master I run this:

salt-key -a my-hostname

This accepts the minion's public key, and it is ready to be configured.

The very basics: set the hostname, open up MDNS in the firewall

I want my hostname to be the same as the minion name. Make it so!

'set hostname to be the same as the minion name':
  network.system:
    - hostname: {{ grains['id'] }}
    - apply_hostname: True
    - retain_settings: True

Salt uses Jinja templates to preprocess its configuration files. One of the variables it makes available is grains, which contains information inherent to each minion: its CPU architecture, amount of memory, OS distribution, and its minion id. Here I am using {{ grains['id'] }} to look up the minion id, and then set it as the hostname.

To set up the firewall for desktop machines, I used YaST and then copied the resulting configuration to Salt:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
/etc/firewalld/firewalld.conf:
  file.managed:
    - source: salt://opensuse/desktop-firewalld.conf
    - mode: 600
    - user: root
    - group: root

/etc/firewalld/zones/desktop.xml:
  file.managed:
    - source: salt://opensuse/desktop-firewall-zone.xml
    - mode: 644
    - user: root
    - group: root

firewalld:
  service.running:
    - enable: True
    - watch:
      - file: /etc/firewalld/firewalld.conf
      - file: /etc/firewalld/zones/desktop.xml

file.managed is how Salt lets you copy files to destination machines. In lines 1 to 6, salt://opensuse/desktop-firewalld.conf gets copied to /etc/firewalld/firewalld.conf. The salt:// prefix indicates a path under your location for salt-states; this is the git checkout with Salt's configuration that I mentioned above.

Lines 15 to 20 tell Salt to enable the firewalld service, and to restart it when either of two files change.

Indispensable packages

I cannot live without these:

'indispensable packages':
  pkg.installed:
    - pkgs:
      - emacs
      - git
      - mc
      - ripgrep

pkg.installed takes an array of package names. Here I hard-code the names which those packages have in openSUSE. Salt lets you do all sorts of magic with Jinja and the salt-pillar mechanism to have distro-specific package names, if you have a heterogeneous environment. All my machines are openSUSE, so I don't need to do that.

My username, and personal configuration files

This creates my user:

federico:
  user.present:
    - fullname: Federico Mena Quintero
    - home: /home/federico
    - shell: /bin/bash
    - usergroup: False
    - groups:
      - users

This just copies a few configuration files:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
{% set managed_files = [
  [ 'bash_profile',  '.bash_profile',         644 ],
  [ 'bash_logout',   '.bash_logout',          644 ],
  [ 'bashrc',        '.bashrc',               644 ],
  [ 'starship.toml', '.config/starship.toml', 644 ],
  [ 'gitconfig',     '.gitconfig',            644 ],
] %}

{% for file_in_salt, file_in_homedir, mode in managed_files %}

/home/federico/{{ file_in_homedir }}:
  file.managed:
    - source: salt://opensuse/federico-config-files/{{ file_in_salt }}
    - user: federico
    - group: users
    - mode: {{ mode }}

{% endfor %}

I use a Jinja array to define the list of files and their destinations, and a for loop to reduce the amount of typing.

Install some flatpaks

Install the flatpaks I need:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
'Add flathub repository':
  cmd.run:
    - name: flatpak remote-add --if-not-exists --user flathub https://flathub.org/repo/flathub.flatpakrepo
    - runas: federico

{% set flatpaks = [
   'com.github.tchx84.Flatseal',
   'com.microsoft.Teams',
   'net.ankiweb.Anki',
   'org.gnome.Games',
   'org.gnome.Solanum',
   'org.gnome.World.Secrets',
   'org.freac.freac',
   'org.zotero.Zotero',
] %}

install-flatpaks:
  cmd.run:
    - name: flatpak install --user --or-update --assumeyes {{ flatpaks | join(' ') }}
    - runas: federico
    - require:
      - 'Add flathub repository'

Set up one of their configuration files:

/home/federico/.var/app/org.freac.freac/config/.freac/freac.xml:
  file.managed:
    - source: salt://opensuse/federico-config-files/freac.xml
    - user: federico
    - group: users
    - mode: 644
    - makedirs: True

This last file is the configuration for fre:ac, a CD audio ripper. Flatpaks store their configuration files under ~/.var/app/flatpak-name. I configured the app once by hand in its GUI and then copied its configuration file to my salt-states.

Etcetera

The above is not all my setup; obviously things like the home server have a bunch of extra packages and configuration files. However, the patterns above are practically all I need to set up everything else.

How it works in practice

I have a git checkout of my salt-states. When I change them and I want to distribute the new configuration to my machines, I push to the git repository on my home server, and then just run a script that does this:

#!/bin/sh
set -e
cd /home/federico/src/salt-states
git pull
sudo salt '*' state.apply

The salt '*' state.apply causes all machines to get an updated configuration. Salt tells you what changed on each and what stayed the same. The slowest part seems to be updating zypper's package repositories; apart from that, Salt is fast enough for me.

Playing with this for the first time

After setting up the bare salt-master on my home server, I created a virtual machine and immediately registered it as a salt-minion and created a snapshot for it. I wanted to go back to that "just installed, nothing set up" state easily to test my salt-states as if for a new setup. Once I was confident that it worked, I set up salt-minions on my desktop and laptop in exactly the same way. This was very useful!

A process of self-actualization

I have been gradually moving my accumulated configuration cruft from my historical $HOME to Salt. This made me realize that I still had obsolete dotfiles lying around like ~/.red-carpet and ~/.realplayerrc. That software is long gone. My dotfiles are much cleaner now!

I was also able to remove my unused scripts in ~/bin (I still had the one I used to connect to Ximian's SSH tunnel, and the one I used to connect to my university's PPP modem pool), realize which ones I really need, move them to ~/.local/bin and make them managed under Salt.

To clean up that cruft across all machines, I have something like this:

/home/federico/.dotfile-that-is-no-longer-used:
  file.absent

That deletes the file.

In the end I did reach my original goal: I can reinstall a machine, and then get it to a working state with a single command.

This has made me more confident to install cool toys in my home server... like a music server, which brings me endless joy. Look at all this!

One thing that would be nice in Flatpak

Salt lets you know what changed when you apply a configuration, or it can do a dry run where it tells you what will change without actually modifying anything. For example, Salt knows how to query the package database for each distro and tell you if a package needs to be updated, or if an existing config file is different from the one that will be propagated.

It would be nice if the flatpak command-line tool would return this information. In the examples above, I use flatpak remote-add --if-not-exists and flatpak install --or-update to achieve idempotency, but Salt is not able to know what Flatpak actually did; it just runs the commands and returns success or failure.

Some pending things

Some programs keep state along with configuration in the same user-controlled files, and that state gets lost if each run of Salt just overwrites those files:

  • fre:ac, a CD audio ripper, has one of those "tip of the day" windows at startup. On each run, it updates the "number of the last shown tip" somewhere in its configuration for the user. However, since that configuration is in the same file that holds the paths for ripped music and the encoder parameters I want to use, the "tip of the day" gets reset to the beginning every time that Salt rewrites the config file.

  • When you load a theme in Emacs, say, with custom-enabled-theme in custom.el, it stores a checksum of the theme you picked after first confirming that you indeed want to load the theme's code — to prevent malicious themes or something. However, that checksum is stored in another variable in custom.el, so if Salt overwrites that file, Emacs will ask me again if I want to allow that theme.

In terms of Salt, maybe I need to use a finer-grained method than copying whole configuration files. Salt allows changing individual lines in config files, instead of overwriting the whole file. Copy the file if it doesn't exist; just update the relevant lines later?

I need to distill my dconf for GNOME programs installed on the system, as opposed to flatpaks, to be able to restore it later.

I'm sure there's detritus under ~/.config that should be managed by Salt... maybe I need to do another round of self-actualization and clean that up.

We have a couple of Windows machines kicking around at home, because schoolwork. Salt works on Windows, too, and I'd love to set them up for automatic backups to the home server.

April 03, 2022

Plans for GNOME 43 and Beyond

GNOME 42 is fresh out the door, bringing some exciting new features – like the new dark style preference and the new UI for taking screenshots and screen casts. Since we’re right on the heels of a release, I want to keep some momentum going and share my plans for features I want to implement during the upcoming release cycles.

Accent Colors and Libadwaita Recoloring API

The arrival of libadwaita allows us to do a few new things with the GNOME platform, since we have a platform library to help developers implement new platform features and implement them properly. For example, libadwaita gave us the opportunity to implement a global dark style preference with machinery that allows developers to choose whether they support it and easily adjust their app’s styling when it’s enabled. Alexander Mikhaylenko spent a long time reworking Adwaita so that it works with recoloring, and I want to take full advantage of that with the next two features: global accent colors and a recoloring API.

Libadwaita makes it simple to implement a much-wanted personalization feature: customizable accent colors. Global accent colors will be opt-in for app developers. For the backend I want accent colors to be desktop- and platform-agnostic like the new dark style preference. I plan to submit a proposal for this to xdg-desktop-portal in the near future. In GNOME it’d probably be best to show only a few QA-tested accents in the UI, but libadwaita would support arbitrary colors so that apps from KDE, GNOME, elementary OS, and more all use the same colors if they support the preference.

Developers using the recoloring API will be able to programmatically change colors in their apps and have dependent colors update automatically. They’ll be able to easily create presets which can be used, for example, to recolor the window based on a text view’s color scheme. Technically this is already possible with CSS in libadwaita 1.0, but the API will make it simpler. Instead of having to consider every single color, they’ll only need to set a few and libadwaita will handle the rest properly. The heuristics used here will also be used to ensure that accent colors have proper contrast against an app’s background.

libadwaita recoloring demo

There’s no tracking issue for this, but if you’re interested in this work you may want to track the libadwaita repository: https://gitlab.gnome.org/GNOME/libadwaita/

Adaptive Nautilus and Improved File Chooser

The GTK file chooser has a few issues. For example, it doesn’t support GNOME features like starred files, and it needs to be patched by downstream vendors (e.g. PureOS, Mobian) to work on mobile form factors. In order to keep up with platform conventions, ideally the file chooser should become part of GNOME’s core rather than part of GTK. There’s some discussion to be had on solutions, but I believe that it would be smart to keep the file chooser and our file browser in sync by making the file chooser a part of the file browser.

With all of that in mind I plan to make Nautilus adaptive for mobile form factors and add a new file chooser mode to it. The file chooser living in Nautilus instead of GTK allows us support GNOME platform features at GNOME’s pace rather than GTK’s pace, follow GNOME design patterns, and implement features like a grid view with thumbnails without starting from scratch.

Nautilus with thumbnails of various files

If you’re interested in seeing this progress, keep track of the Nautilus repository: https://gitlab.gnome.org/GNOME/nautilus/

Loupe (Image Viewer)

For a while now I’ve been working on and off on Loupe, a new image viewer written in Rust using GTK4 and libadwaita. I plan for Loupe to be an adaptive, touch pad and touchscreen friendly, and easy to use. I also want it to integrate with Nautilus, so that Loupe will follow the sorting settings you have for a folder in Nautilus.

Image of Loupe's main screen Loupe displaying a picture of a field of red flowers

In the long term we want Loupe to gain simple image editing capabilities, namely cropping, rotation, and annotations. With annotations Loupe can integrate with the new screenshot flow, allowing users to take screenshots and annotate them without needing any additional programs.

If Loupe sounds like an interesting project to you, feel free to track development on GitLab: https://gitlab.gnome.org/BrainBlasted/loupe/

Rewrite Baobab in Rust, and Implement new Design

Baobab (a.k.a. Disk Usage Analyzer) is written in Vala. Vala does not have access to a great ecosystem of libraries, and the tooling has left something to be desired. Rust, however, has a flourishing library ecosystem and wonderful tooling. Rust also has great GTK bindings that are constantly improving. By rewriting Baobab in Rust, I will be able to take full advantage of the ecosystem while improving the performance of it’s main function: analyzing disk usage. I’ve already started work in this direction, though it’s not available on GitLab yet.

In addition to the rewrite, I also plan to implement a redesign based on Allan Day’s mockups. The new design will modernize the UI, using new patterns and fixing a few major UI gripes people have with the current design.

You can keep track of progress on Baobab on GitLab: https://gitlab.gnome.org/GNOME/baobab

Opening Neighboring Files from FileChooser Portal

The xdg-desktop-portal file picker doesn’t allow opening neighboring files when you choose a file. Apps like image browsers, web browsers, and program runners all need a sandbox hole if they want to function without friction. If you use a web browser as a flatpak you may have run into this issue: opening an html file won’t load associated HTML files or media files. If you are working on a website locally, you need to serve it with a web server in order to preview it – e.g. with python -m http.server.

I want to work on a portal that allows developers to request access to neighboring files when opening one file. With this portal I could ship Loupe as a flatpak without requiring any sandbox holes, and apps like Lutris or Bottles would also be more viable as flatpaks.

If you want to learn more and follow along with progress on this issue, see the xdg-desktop-portal issue on GitHub: https://github.com/flatpak/xdg-desktop-portal/issues/463

Accessibility Fixups

GTK4 makes accessibility simpler than ever. However, there are still improvements to be made when it comes to making core apps accessible. I want to go through our core app set, test them with the accessibility tooling available, and document and fix any issues that come up.

Sponsoring my Work

I hope to be able to work on all of these items (and more I haven’t shared) this year. However, I am currently looking for work. Right now I would need to be looking for work full-time or working on something else full-time instead of working on these initiatives – I don’t have the mental bandwidth to do both. If you want to see this work get done, I could really use your support. I have three places you can support me:

If I get enough sponsorships, I plan to start posting regular (bi-weekly) updates for GitHub sponsors and Patrons. I also may start streaming some of my work on Twitch or YouTube, since people seem to be interested in seeing how things get done in GNOME.

That’s all for now – thanks for reading until the end, and I hope we can get some or all of this done by the time of the next update.

April 02, 2022

Radio for GNOME 42 is here, there and everywhere

Radio for GNOME 42 is the Public Network Radio Software for Accessing Free Audio Broadcasts from the Internet.

Radio for GNOME 42 is available from http://download.gnome.org/sources/gnome-radio/16.0/gnome-radio-16.0.42.tar.xz and https://wiki.gnome.org/Apps/Radio

Latest Information about Radio for GNOME 42 is available on https://wiki.gnome.org/Apps/Radio and http://www.gnomeradio.org/

March 30, 2022

An Erroneous Preliminary Injunction Granted in Neo4j v. PureThink

[ A version of this article was also posted on Software Freedom Conservancy's blog. ]

Bad Early Court Decision for AGPLv3 Has Not Yet Been Appealed

We at Software Freedom Conservancy proudly and vigilantly watch out for your rights under copyleft licenses such as the Affero GPLv3. Toward this goal, we have studied the Neo4j, Inc. v. PureThink, LLC ongoing case in the Northern District of California , and the preliminary injunction appeal decision in the Ninth Circuit Court this month. The case is complicated, and we've seen much understandable confusion in the public discourse about the status of the case and the impact of the Ninth Circuit's decision to continue the trial court's preliminary injunction while the case continues. While it's true that part of the summary judgment decision in the lower court bodes badly for an important provision in AGPLv3§7¶4, the good news is that the case is not over, nor was the appeal (decided this month) even an actual appeal of the decision itself! This lawsuit is far from completion.

A Brief Summary of the Case So Far

The primary case in question is a dispute between Neo4j, a proprietary relicensing company, against a very small company called PureThink, run by an individual named John Mark Suhy. Studying the docket of the case, and a relevant related case, and other available public materials, we've come to understand some basic facts and events. To paraphrase LeVar Burton, we encourage all our readers to not take our word (or anyone else's) for it, but instead take the time to read the dockets and come to your own conclusions.

After canceling their formal, contractual partnership with Suhy, Neo4j alleged multiple claims in court against Suhy and his companies. Most of these claims centered around trademark rights regarding “Neo4j” and related marks. However, the claims central to our concern relate to a dispute between Suhy and Neo4j regarding Suhy's clarification in downstream licensing of the Enterprise version that Neo4j distributed.

Specifically, Neo4j attempted to license the codebase under something they (later, in their Court filings) dubbed the “Neo4j Sweden Software License” — which consists of a LICENSE.txt file containing the entire text of the Affero General Public License, version 3 (“AGPLv3”) (a license that I helped write), and the so-called “Commons Clause” — a toxic proprietary license. Neo4j admits that this license mash-up (if legitimate, which we at Software Freedom Conservancy and Suhy both dispute), is not an “open source license”.

There are many complex issues of trademark and breach of other contracts in this case; we agree that there are lots of interesting issues there. However, we focus on the matter of most interest to us and many FOSS activists: Suhy's permissions to remove of the “Commons Clause”. Neo4j accuses Suhy of improperly removing the “Commons Clause” from the codebase (and subsequently redistributing the software under pure AGPLv3) in paragraph 77 of their third amended complaint. (Note that Suhy denied these allegations in court — asserting that his removal of the “Commons Clause” was legitimate and permitted.

Neo4j filed for summary judgment on all the issues, and throughout their summary judgment motion, Neo4j argued that the removal of the “Commons Clause” from the license information in the repository (and/or Suhy's suggestions to others that removal of the “Commons Clause” was legitimate) constituted behavior that the Court should enjoin or otherwise prohibit. The Court partially granted Neo4j's motion for summary judgment. Much of that ruling is not particularly related to FOSS licensing questions, but the section regarding licensing deeply concerns us. Specifically, to support the Court's order that temporarily prevents Suhy and others from saying that the Neo4j Enterprise edition that was released under the so-called “Neo4j Sweden Software License” is a “free and open source” version and/or alternative to proprietary-licensed Neo4j EE, the Court held that removal of the “Commons Clause” was not permitted. (BTW, the court confuses “commercial” and “proprietary” in that section — it seems they do not understand that FOSS can be commercial as well.)

In this instance, we're not as concerned with the names used for the software; as much as the copyleft licensing question — because it's the software's license, not its name, that either assures or prevents users to exercise their fundamental software rights. Notwithstanding our disinterest in the naming issue, we'd all likely agree that — if “AGPLv3 WITH Commons-Clause” were a legitimate form of licensing — such a license is not FOSS. The primary issue, therefore, is not about whether or not this software is FOSS, but whether or not the “Commons Clause” can be legitimately removed by downstream licensees when presented with a license of “AGPLv3 WITH Commons-Clause”. We believe the Court held incorrectly by concluding that Suhy was not permitted to remove the “Commons Clause”. “Their order that enjoins Suhy from calling the resulting code “FOSS” — even if it's a decision that bolsters a minor goal of some activists — is problematic because the underlying holding (if later upheld on appeal) could seriously harm FOSS and copyleft.

The Confusion About the Appeal

Because this was an incomplete summary judgment and the case is ongoing, the injunction against Suhy's on making such statements is a preliminary injunction, and cannot be made permanent until the case actually completes in the trial court. The decision by the Ninth Circuit appeals court regarding this preliminary injunction has been widely reported by others as an “appeal decision” on the issue of what can be called “open source”. However, this is not an appeal of the entire summary judgment decision, and certainly not an appeal of the entire case (which cannot even been appealed until the case completes). The Ninth Circuit decision merely affirms that Suhy remains under the preliminary injunction (which prohibits him and his companies from taking certain actions and saying certain things publicly) while the case continues. In fact, the standard that an appeals Court uses when considering an appeal of a preliminary injunction differs from the standard for ordinary appeals. Generally speaking, appeals Courts are highly deferential to trial courts regarding preliminary injunctions, and appeals of actual decisions have a much more stringent standard.

The Affero GPL Right to Restriction Removal

In their partial summary judgment ruling, the lower Court erred because they rejected an important and (in our opinion) correct counter-argument made by Suhy's attorneys. Specifically, Suhy's attorneys argued that Neo4j's license expressly permitted the removal of the “Commons Clause” from the license. AGPLv3 was, in fact, drafted to permit such removal in this precise fact pattern.

Specifically, the AGPLv3 itself has the following provisions (found in AGPLv3§0 and AGPLv3§7¶4):

  • “This License” refers to version 3 of the GNU Affero General Public License.
  • “The Program” refers to any copyrightable work licensed under this License. Each licensee is addressed as “you”.
  • If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term.

That last term was added to address a real-world, known problem with GPLv2. Frequently throughout the time when GPLv2 was the current version, original copyright holders and/or licensors would attempt to license work under the GPL with additional restrictions. The problem was rampant and caused much confusion among licensees. As an attempted solution, the FSF (the publisher of the various GPL's) loosened its restrictions on reuse of the text of the GPL — in hopes that would provide a route for reuse of some GPL text, while also avoiding confusion for licensees. Sadly, many licensors continued to take the confusing route of using the entire text a GPL license with an additional restriction — attached either before or after, or both. Their goals were obvious and nefarious: they wanted to confuse the public into “thinking” the software was under the GPL, but in fact restrict certain other activities (such as commercial redistribution). They combined this practice with proprietary relicensing (i.e., a sole licensor selling separate proprietary licenses while releasing a (seemingly FOSS) public version of the code as demoware for marketing). Their goal is to build on the popularity of the GPL, but in direct opposition to the GPL's policy goals; they manipulate the GPL to open-wash bad policies rather than give actual rights to users. This tactic even permitted bad actors to sell “gotcha” proprietary licenses to those who were legitimately confused. For example, a company would look for users operating commercially with the code in compliance with GPLv2, but hadn't noticed the company's code had the statement: “Licensed GPLv2, but not for commercial use”. The user had seen GPLv2, and knew from its brand reputation that it gave certain rights, but hadn't realized that the additional restriction outside of the GPLv2's text might actually be valid. The goal was to catch users in a sneaky trap.

Neo4j tried to use the AGPLv3 to set one of those traps. Neo4j, despite the permission in the FSF's GPL FAQ to “use the GPL terms (possibly modified) in another license provided that you call your license by another name and do not include the GPL preamble”, left the entire AGPLv3 intact as the license of the software — adding only a note at the front and at the end. However, their users can escape the trap, because GPLv3 (and AGPLv3) added a clause (which doesn't exist in GPLv2) to defend users from this. Specifically, AGPLv3§7¶4 includes a key provision to help this situation.

Specifically, the clause was designed to give more rights to downstream recipients when bad actors attempt this nasty trick. Indeed, I recall from my direct participation in the A/GPLv3 drafting that this provision was specifically designed for the situation where the original, sole copyright holder/licensor0 added additional restrictions. And, I'm not the only one who recalls this. Richard Fontana (now a lawyer at IBM's Red Hat, but previously legal counsel to the FSF during the GPLv3 process), wrote on a mailing list1 in response to the Neo4j preliminary injunction ruling:

For those who care about anecdotal drafting history … the whole point of the section 7 clause (“If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term.”) was to address the well known problem of an original GPL licensor tacking on non-GPL, non-FOSS, GPL-norm-violating restrictions, precisely like the use of the Commons Clause with the GPL. Around the time that this clause was added to the GPLv3 draft, there had been some recent examples of this phenomenon that had been picked up in the tech press.

Fontana also pointed us to the FSF's own words on the subject, written during their process of drafting this section of the license (emphasis ours):

Unlike additional permissions, additional requirements that are allowed under subsection 7b may not be removed. The revised section 7 makes clear that this condition does not apply to any other additional requirements, however, which are removable just like additional permissions. Here we are particularly concerned about the practice of program authors who purport to license their works under the GPL with an additional requirement that contradicts the terms of the GPL, such as a prohibition on commercial use. Such terms can make the program non-free, and thus contradict the basic purpose of the GNU GPL; but even when the conditions are not fundamentally unethical, adding them in this way invariably makes the rights and obligations of licensees uncertain.

While the intent of the original drafter of a license text is not dispositive over the text as it actually appears in the license, all this information was available to Neo4j as they drafted their license. Many voices in the community had told them that provision in AGPLv3§3¶4 was added specifically to prevent what Neo4j was trying to do. The FSF, the copyright holder of the actual text of the AGPLv3, also publicly gave Neo4j permission to draft a new license, using any provisions they like from AGPLv3 and putting them together in a new way. But Neo4j made a conscious choice to not do that, but instead constructed their license in the exact manner that allowed Suhy's removal of the “Commons Clause”.

In addition, that provision in AGPLv3§3¶4 has little meaning if it's not intended to bind the original licensor! Many other provisions (such as AGPLv3§10¶3) protect the users against further restrictions imposed later in the distribution chain of licensees. This clause was targeted from its inception against the exact, specific bad behavior that Neo4j did here.

We don't dispute that copyright and contract law give Neo4j authority to license their work under any terms they wish — including terms that we consider unethical or immoral. In fact, we already pointed out above that Neo4j had permission to pick and choose only some text from AGPLv3. As long as they didn't use the name “Affero”, “GNU” or “General Public” or include any of the Preamble text in the name/body of their license — we'd readily agree that Neo4j could have put together a bunch of provisions from the AGPLv3, and/or the “Commons Clause”, and/or any other license that suited their fancy. They could have made an entirely new license. Lawyers commonly do share text of licenses and contracts to jump-start writing new ones. That's a practice we generally support (since it's sharing a true commons of ideas freely — even if the resulting license might not be FOSS).

But Neo4j consciously chose not to do that. Instead, they license their software “subject to the terms of the GNU AFFERO GENERAL PUBLIC LICENSE Version 3, with the Commons Clause”. (The name “Neo4j Sweden Software License” only exists in the later Court papers, BTW, not with “The Program” in question.) Neo4j defines “This License” to mean “version 3 of the GNU Affero General Public License.”. Then, Neo4j tells all licensees that “If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term”. Yet, after all that, Neo4j had the audacity to claim to the Court that they didn't actually mean that last sentence, and the Court rubber-stamped that view.

Simply put, the Court erred when it said: “Neither of the two provisions in the form AGPLv3 that Defendants point to give licensees the right to remove the information at issue.”. The Court then used that error as a basis for its ruling to temporarily enjoin Suhy from stating that software with “Commons Clause” removed by downstream is “free and open source”, or tell others that he disagrees with the Court's (temporary) conclusion about removing the “Commons Clause” in this situation.

What Next?

The case isn't over. The lower Court still has various issues to consider — including a DMCA claim regarding Suhy's removal of the “Commons Clause”. We suspect that's why the Court only made a preliminary injunction against Suhy's words, and did not issue an injunction against the actual removal of the clause! The issue as to whether the clause can be removed is still pending, and the current summary judgment decision doesn't address the DMCA claim from Neo4j's complaint.

Sadly, the Court has temporarily enjoined Suhy from “representing that Neo4j Sweden AB’s addition of the Commons Clause to the license governing Neo4j Enterprise Edition violated the terms of AGPL or that removal of the Commons Clause is lawful, and similar statements”. But they haven't enjoined us, and our view on the matter is as follows:

Clearly, Neo4j gave explicit permission, pursuant to the AGPLv3, for anyone who would like to to remove the “Commons Clause” from their LICENSE.txt file in version 3.4 and other versions of their Enterprise edition where it appears. We believe that you have full permission, pursuant to AGPLv3, to distribute that software under the terms of the AGPLv3 as written. In saying that, we also point out that we're not a law firm, our lawyers are not your lawyers, and this is not legal advice. However, after our decades of work in copyleft licensing, we know well the reason and motivations of this policy in the license (describe above), and given the error by the Court, it's our civic duty to inform the public that the licensing conclusions (upon which they based their temporary injunction) are incorrect.

Meanwhile, despite what you may have read last week, the key software licensing issues in this case have not been decided — even by the lower Court. For example, the DMCA issue is still before the trial court. Furthermore, if you do read the docket of this case, it will be obvious that neither party is perfect. We have not analyzed every action Suhy took, nor do we have any comment on any action by Suhy other than this: we believe that Suhy's removal of the “Commons Clause” was fully permitted by the terms of the AGPLv3, and that Neo4j gave him that permission in that license. Suhy also did a great service to the community by taking action that obviously risked litigation against him. Misappropriation and manipulation of the strongest and most freedom-protecting copyleft license ever written to bolster a proprietary relicensing business model is an affront to FOSS and its advancement. It's even worse when the Courts are on the side of the bad actor. Neo4j should not have done this.

Finally, we note that the Court was rather narrow on what it said regarding the question of “What Is Open Source?”. The Court ruled that one individual and his companies — when presented with ambiguous licensing information in one part of a document, who then finds another part of the document grants permission to repair and clarify the licensing information, and does so — is temporarily forbidden from telling others that the resulting software is, in fact, FOSS, after making such a change. The ruling does not set precedent, nor does it bind anyone other than the Defendants as to what they can or cannot say is FOSS, which is why we can say it is FOSS, because the AGPLv3 is an OSI-approved license and the AGPLv3 permits removal of the toxic “Commons Clause” in this situation.

We will continue to follow this case and write further when new events occur..


0 We were unable to find anywhere in the Court record that shows Neo4j used a Contributor Licensing Agreement (CLA) or Copyright Assignment Agreement (©AA) that sufficiently gave them exclusive rights as licensor of this software. We did however find evidence online that Neo4j accepted contributions from others. If Neo4j is, in fact, also a licensor of others' AGPLv3'd derivative works that have been incorporated into their upstream versions, then there are many other arguments (in addition to the one presented herein) that would permit removal of the “Commons Clause”. This issue remains an open question of fact in this case.

1 Fontana made these statements on a mailing list governed by an odd confidentiality rule called CHR (which was originally designed for in-person meetings with a beginning and an end, not a mailing list). Nevertheless, Fontana explicitly waived CHR (in writing) to allow me to quote his words publicly.

March 29, 2022

How to install a bunch of debs

Recently, I needed to check if a regression in Ubuntu 22.04 Beta was triggered by the mesa upgrade. Ok, sounds simple, let me just install the older mesa version.

Let’s take a look.

Oh, wow, there are about 24 binary packages (excluding the packages for debug symbols) included in mesa!

Because it’s no longer published in Ubuntu 22.04, we can’t use our normal apt way to install those packages. And downloading those one by one and then installing them sounds like too much work.

Step Zero: Prerequisites

If you are an Ubuntu (or Debian!) developer, you might already have ubuntu-dev-tools installed. If not, it has some really useful tools!

$ sudo apt install ubuntu-dev-tools

Step One: Create a Temporary Working Directory

Let’s create a temporary directory to hold our deb packages. We don’t want to get them mixed up with other things.

$ mkdir mesa-downgrade; cd mesa-downgrade

Step Two: Download All the Things

One of the useful tools is pull-lp-debs. The first argument is the source package name. In this case, I next need to specify what version I want; otherwise it will give me the latest version which isn’t helpful. I could specify a series codename like jammy or impish but that won’t give me what I want this time.

$ pull-lp-debs mesa 21.3.5-1ubuntu2

By the way, there are several other variations on pull-lp-debs:

  • pull-lp-source – downloads source package from Launchpad.
  • pull-lp-debs – downloads debs package(s) from Launchpad.
  • pull-lp-ddebs – downloads dbgsym/ddebs package(s) from Launchpad.
  • pull-lp-udebs – downloads udebs package(s) from Launchpad.
  • pull-debian-* – same as pull-lp-* but for Debian packages.

I use the LP and Debian source versions frequently when I just want to check something in a package but don’t need the full git repo.

Step Three: Install Only What We Need

This command allows us to install just what we need.

$ sudo apt install --only-upgrade --mark-auto ./*.deb

--only-upgrade tells apt to only install packages that are already installed. I don’t actually need all 24 packages installed; I just want to change the versions for the stuff I already have.

--mark-auto tells apt to keep these packages marked in dpkg as automatically installed. This allows any of these packages to be suggested for removal once there isn’t anything else depending on them. That’s useful if you don’t want to have old libraries installed on your system in case you do manual installation like this frequently.

Finally, the apt install syntax has a quirk: It needs a path to a file because it wants an easy way to distinguish from a package name. So adding ./ before filenames works.

I guess this is a bug. apt should be taught that libegl-mesa0_21.3.5-1ubuntu2_amd64.deb is a file name not a package name.

Step Four: Cleanup

Let’s assume that you installed old versions. To get back to the current package versions, you can just upgrade like normal.

$ sudo apt dist-upgrade

If you do want to stay on this unsupported version a bit longer, you can specify which packages to hold:

$ sudo apt-mark hold

And you can use apt-mark list and apt-mark unhold to see what packages you have held and release the holds. Remember you won’t get security updates or other bug fixes for held packages!

And when you’re done with the debs we download, you can remove all the files:

$ cd .. ; rm -ri mesa-downgrade

Bonus: Downgrading back to supported

What if you did the opposite and installed newer stuff than is available in your current release? Perhaps you installed from jammy-proposed and you want to get back to jammy ? Here’s the syntax for libegl-mesa0

Note the /jammy suffix on the package name.

$ sudo apt install libegl-mesa0/jammy

But how do you find these packages? Use apt list

Here’s one suggested way to find them:

$ apt list --installed --all-versions| grep local] --after-context 1

Finally, I should mention that apt is designed to upgrade packages not downgrade them. You can break things by downgrading. For instance, a database could upgrade its format to a new version but I wouldn’t expect it to be able to reverse that just because you attempt to install an older version.