October 21, 2020

Accessibility in GTK 4

The big news in last weeks GTK 3.99.3 release is that we have a first non-trivial backend for our new accessibility implementation. Therefore, now is a good time to take a deeper look at accessibility in GTK 4.

Overview

Lets start with a quick review of how accessibility works on Linux. The actors in this are applications and assistive technologies (ATs) such as screen readers (for instance, Orca), magnifiers and the like.

The purpose of ATs generally is to provide users with alternative ways to interact with the application that are tailored to their needs (say, an enlarged view, text read out aloud, or voice commands). To do this, ATs need a lot of detailed information about the applications UI, and this is where the accessibility stack comes into play—it is the connecting layer between the application (or its toolkit) and the ATs.

Applications and ATs talk to each other on the accessibility bus, which is a separate instance of a D-Bus session bus, using the interfaces described by the AT-SPI project. The UI elements of the application are represented on as objects on the bus that implement somewhat abstracted interfaces, such as Text or Value. Applications emit signals to communicate changes in the UI, and ATs can call methods on the objects to get information or make changes (e.g. change the current value of a Value interface to move the GtkScale that it represents).

What has changed

In GTK 2 and 3, this was done in an awkwardly indirect way: GTK widgets have auxiliary accessible objects that are implementations of ATK interfaces (translation 1: GTK ➙ ATK). These are then turned in AT-SPI objects (translation 2: ATK ➙ AT-SPI) that are represented on the accessibility bus by adapter code in at-spi2-atk. On the other end, ATs then use pyatspi to convert the AT-SPI interfaces into Python objects (translation 3: AT-SPI ➙ Python).

This multi-step process was inefficient, lossy, and hard to maintain; it required implementing the same functionality over at least three components, and it led to discrepancies between the documented AT-SPI methods and properties, and the ones actually sent over the accessibility bus.

In GTK 4, we are simplifying the application side by cutting out ATK and at-spi2-atk. Widgets now implement a GtkAccessible interface that lets them set a number of roles, states, properties and relations that are more or less directly taken from the WAI-ARIA spec published by the W3C. The AT-SPI backend for GTKs accessibility API then takes these ARIA-inspired attributes (and the knowledge of the widgets themselves) and represents the widgets as objects on the accessibility bus, implementing the relevant AT-SPI interfaces for them.

This is a much more direct approach, and matches what Qt and web browsers already do.

Application API

Here are the highlights of the accessibility API that you are most likely to run into when using GTK 4 in applications:

Setting an accessible role. A role is a description of the semantics of a widget, and ATs will use it to decide what kind of behavior should be presented to their users. Setting a role is a one-time operation, which means it has to be done at widget creation time, either in class_init, or during instance initialization:

gtk_widget_class_set_accessible_role (widget_class,  
                                      GTK_ACCESSIBLE_ROLE_BUTTON);

Updating a widgets acccessible state or properties. This should be done whenever the widget’s accessible representation changes:

gtk_accessible_update_property (GTK_ACCESSIBLE (widget),
                       GTK_ACCESSIBLE_PROPERTY_VALUE_MIN, minimum,
                       GTK_ACCESSIBLE_PROPERTY_VALUE_NOW, value,
                       GTK_ACCESSIBLE_PROPERTY_VALUE_MAX, maximum,
                       -1);

The GTK reference documentation has an overview of the accessibility API, which includes guidance for application developers and widget writers.

What’s next ?

For GTK 4.0, we are focusing on completing the AT-SPI backend. But with the new API and the backend separation, we have a clear path towards making accessibility backends for other platforms, which is something we want to look into for subsequent GTK 4 releases.

On Linux, we want to work with other stakeholders on modernizing the AT-SPI interfaces to finally overcome the CORBA legacy that is still visible in some places. A part of that will be moving away from an accessibility bus toward peer-to-peer connections between the application and the ATs; this would enhance the security of the accessibility stack and plug a hole in the sandboxing used by technologies such as Flatpak.

In the future we want to introduce tools to ensure that application developers will be aware of missing accessibility annotations, such as providing a label attribute, or a labelled-by relation, to icons and images in their UIs; or ensure that every UI element is correctly represented in the accessible tree. We already have a test backend for the GtkAccessible interface which can be used to write unit tests and verify that roles and attributes are updated where necessary.

October 20, 2020

Cargo-style dependency management for C, C++ and other languages with Meson

My previous blog post about modern C++ got a surprising amount of feedback. Some people even reimplemented the program in other languages, including one in Go, two different ones in Rust and even this slightly brain bending C++ reimplementation as a declarative style pipeline. It also got talked about on Reddit and Hacker news. Two major comments that kept popping up were the following.

  1. There are potential bugs if the program is ever extended to support non-ASCII text
  2. It is hard to use external dependencies in C++ (and by extension C)
Let's solve both of these problems at the same time. There are many "lite" solutions for doing Unicode aware text processing, but we're going to use the 10 000 kilogram hammer: International Components for Unicode. The actual code changes are not that big, interested parties can go look up the details in this Github repo. The thing that is relevant for this blog post is the build and dependency management. The Meson build definition is quite short:

project('cppunicode', 'cpp',
        default_options: ['cpp_std=c++17',
                          'default_library=static'])
             
icu_dep = dependency('icu-i18n')
thread_dep = dependency('threads')
executable('cppunicode', 'cppunicode.cpp',
           dependencies: [icu_dep, thread_dep])

The threads dependency is for the multithreaded parts (see end of this post). I developed this on Linux and used the convenient system provided ICU. Windows and Mac do not provide system libs so we need to build ICU from scratch on those platforms. This is achieved by running the following command in your project's source root:

$ meson wrap install icu
Installed icu branch 67.1 revision 1

This contacts Meson's WrapDB server and downloads build definition files for ICU. That is all you need to do. The build files do not need any changes. When you start building the project, Meson will automatically download and build the dependency. Here is a screenshot of the download step:

Here it is building in Visual Studio:

And here is the final built result running on macOS:

One notable downside of this approach is that WrapDB does not have all that many packages yet. However I have been told that given the next Meson release (in a few weeks) and some upstream patches, it is possible to build the entire GTK widget toolkit as a subproject, even on Windows. 

If anyone wants to contribute to the project, contributions are most welcome. You can for example convert existing projects and submit them to wrapdb or become a reviewer. The Meson web site has the relevant documentation

Appendix: making it parallel

Several people pointed out that while the original program worked fine, it only uses one thread. This may be a bottleneck and that "in C++ it is hard to execute work in parallel". This is again one of those things that has gotten a lot better in the last few years. The "correct" solution would be to use the parallel version of transform_reduce. Unfortunately most parallel STL implementations are still in the process of being implemented so we can't use those in multiplatform code. We can, however, roll our own fairly easily, without needing to create or lock a single mutex by hand. The code has the actual details, but the (slightly edited) gist of of it is this:

for(const auto &e:
    std::filesystem::recursive_directory_iterator(".")) {
    futures.emplace_back(std::async(std::launch::async,
                                    count_word_files,
                                    e));
    if(futures.size() > num_threads) {
        pop_future(word_counts, futures);
    }
}
while(!futures.empty()) {
   pop_future(word_counts, futures);
}

Here the count_word_files function calculates the number of words in a single file and the pop_future function joins individual results to the final result. By using a share-nothing architecture, pure functions and value types all business logic code can be written as if it was single threaded and the details of thread and mutex management can be left to library code. Haskell fans would be proud (or possibly horrified, not really sure).

October 19, 2020

libsecret is accepting Outreachy interns as well

Like other projects in GNOME, libsecret also has an open project for Outreachy internship: Create a portable library for reading/writing libsecret keyring format.

libsecret is a library that allows applications to store/retrieve user secrets (typically passwords). While it usually works as a client against a separate D-Bus service, it can also use a local file as database. The project is about refactoring the file database so it can easily gain more advanced features like hardware-based security, etc. That might sound intimidating as it touches cryptography, but don’t worry and reach out to us if you are interested 🙂

October 18, 2020

2020-10-18 Sunday

  • Up early, family worship, sermon on Daniel 3, pizza lunch. H's friends over in the evening at the house.

Maps in GNOME 3.38

 It's been a while since the last blog post and it's been a while since 3.38.0 was released, and in fact there was already the stable on-schedule 3.38.1 release. On top of that a sneaky asynchronous programming bug showed up that in some circumstances such as high-latency connections, or fast typing can give out-of-sync search results I cut an extra 3.38.1.1 patch release.

But now to the summary of the 3.38 user-facing changes. I think all of this has been covered in previous posts, but I guess it's always nice with a bit of a summary.

For one thing, we now have a night mode, utilizing Mapbox' dark tile set:

 





Another feature that had been requested for quite some times was hybrid aerial mode (showing labels for street names, names of places and so on):


And thanks to work by James Westman the first step towards an adaptive UI (for e.g. smaller screen scenarios) has been implemented where a bottom action bar appears “taking over” some of the buttons from the header bar, leaving the search entry and main menu in narrow mode.

And video of the feature in real life says more than pictures here:


We had also planned to make the routing sidebar adaptive for 3.38, enabling it to take over the view (with a back button, as in other similar phone-enabled UIs) when in narrow mode, by utilizing the adaptive widgets from libhandy. But unfortunately some issues showed up under Wayland with some graphics driver (related to Clutter), so we had to roll back this before the release.

And stay tuned for updates on upcoming stuff for GNOME 40!

October 17, 2020

2020-10-17 Saturday

  • Drilled a hole in tarmac & hardcore with Pete's diamond core, cut up a scaffold pole & set it into the ground with resin from Mark; great to see M&M's surfacing work.
  • Toiled away at slideware on & off:
  • Helped E. with her maths revision & homework, and H. with her Oxford test preparation. Bed, tired.

October 15, 2020

Endless 3.9.0 Beta 1 Trip Report

Back in April I wrote a Trip Report of the last major Endless OS Beta release. It was fun! I use Endless OS every day and have a very soft spot for it after Product Managing it for a couple of rewarding yet high stress years. Now the team has released their first beta of the 3.9 release series. If I was still there this is the sort of thing the team would get in a blizzard of Google Docs and Phabricator tickets. Sadly for them open source means they don’t get to escape my opinions that easily.

I’m going to mention again, because I’ve recently had to sit through a couple of excruciating update experiences in macOS, Windows and Manjaro that image based update experiences like those used by Endless, Fedora Silverblue and Chrome OS are just plain better. Aside from that the most important thing to note is that this comes with the new 3.38 GNOME release so it inherits all the improvements from that, although many of the app improvements will have already come through to users directly through Flathub. Now GNOME is released software, but an operating system isn’t just GNOME, it’s a whole thing so you should really remember that this is a beta, which means that the team doesn’t think this is finished yet. I’m sure they’re aware of some or all of the comments I’m making below - this is intended to be helpful, to let them know what people find useful, or would like them to prioritise improving.

Over the past few releases Endless have been reducing the number of patches and changes they have on top of GNOME but this release contains a massive rearchitecture; moving the Endless desktop from patches to GNOME extensions. This means if you install the Extensions app from the App Center you can actually turn this all off and get a pretty pure GNOME! This only works because of years of investment in adding select Endless capabilities to upstream GNOME and removing differentiations that didn’t, at the end of the day, differentiate. This also results in Hack migrating to Extensions and apps which is a big change for that group as well. The user benefits from this change will be felt in future cycles as the Endless team don’t have to spend as much time rebasing the same work again and again but can work on delivering solutions their users need.

User facing, this is the biggest deal on the desktop, Endless is now using the same applications page as GNOME. This does mean fewer icons per page (but larger targets) and the pager has moved from a vertical orientation to horizontal. I suspect this is because drag and drop page changes don’t work as well vertically as they do horizontally with the bottom bar but it could also be to seem closer in experience to the Android and iOS experiences that Endless' target users know so well.

This also means Endless inherits the newer upstream folder overlay behaviour as well, which should have more contrast and better focus for users. A big change is that all applications on the system are now visible and can’t be hidden from the desktop, only available in Search. This leads to some annoyance with the “get” apps and other links like YouTube which are now obnoxious and difficult to get rid of. Like Apple users before me I have been forced to create a “Trash” folder. My recommendation is that these advertising apps should become proper apps which would allow them to be uninstalled and treated the same as anything else. Years back there was some early concepting around an ‘app launch’ portal which would open a software store if not installed that would allow cross app one way intents - you could of course just do this with URIs, although it may need a bit more thinking through than a blog post. It’s really important for Endless to keep the capability to highlight to customers in a shop that a range of software is available without having to put everything in the box.

This also highlights some other long standing issues with apps that can’t be found in the App Center when you press the “Show Details” button. It’s just not a great experience and feels very disconnected, whilst the copy in the error message you see when you press the button isn’t great the truth is that you just shouldn’t ever see it. Also apps from the read only OS partition like Eye Of GNOME seem to direct to the Flathub version rather than picking the “native” version. I’d always be in favour of a slimmer base image and layering the apps on top but at the moment it feels like we’re a little bit stuck in the middle. None of these issues are unfixable but it’s the nature of moving in the right direction that you end up exposing the hacks that have previously been hidden.

There’s a bunch of issues with drag and drop on the apps pane. This certainly feels in progress so I suspect we’ll see changes here in future betas. App reorder to a different screen works, but there’s not much indication as to what will actually happen and you need to get right over to the very edge of the screen to move the pane. If you miss the target it’s very unhelpful as well with the icon vanishing and little spatial understanding of where its gone. It’s particularly hard to move something to the first place in the grid as the push aside animation doesn’t appear to operate in that case. Less important but you can’t drag from one folder to another, although the highlight state suggests that it should work. Also an edge case but there’s no ability to ‘compress’ panes to fill up the empty space so it feels like things are a bit strung out which can make it pretty annoying to create folders. I’m aware that I’m up in the top percentile of apps installed, although the visibility changes mentioned earlier will increase the number of users who experience these sort of issues. Despite all of that animations are snappy and the process feels solid which is a real improvement from the previous experience. There’s a big step forward for moving between panes as two finger trackpad gestures work really well.

System search with the new bigger targets is great as well, however symbolic icons like ‘suspend’ don’t get the full size treatment which looks a little odd in the pane next to lovely large app icons. There’s also still a few legacy 64x64 app icons that should really be removed in 2020, even though Endless' target resolution is still 1080p.

The Discovery Feed has been removed - sad times but it didn’t get the app support needed and it never really found a natural place in the desktop for actually discovering the content inside. I still feel that it’s a strong concept (it’s not a million miles away from the Google Android homescreen, or projects I’ve worked on in the past like Lumi) but it’ll need another iteration and a clearer route to app adoption to be successful. This was a good choice, the desktop is clearer and more focused without it.

This is a faster, more streamlined Endless experience with tons more polish. There’s the expected level of bugs from a beta but this is very promising.

Does C++ still deserve the bad rap it has had for so long?

Traditionally C++ has been seen by many (and you know who you are) as just plain bad: the code is unreadably verbose, error messages are undecipherable, it's unsafe, compilation takes forever and so on. In fact making fun of C++ is even a fun pastime for some. All of this was certainly true in the 90s and even as recent as 10 years ago. But is it still the case? What would be a good way to determine this?

A practical experiment

Let's write a fairly simple program that solves a sort-of-real-worldish problem and see what comes out of it. The problem we want to solve is the following:
Find all files with the extension .txt recursively in the subdirectories of the current directory, count the number of times each word appears in them and print the ten most common words and the number of times they are used.

We're not going to go for extreme performance or handling all possible corner cases, instead going for a reasonable implementation. The full code can be found by following this link.

The first thing we need is a way to detect files with a given extension and words within text. This calls for regular expressions:

const std::regex fname_regex("\\.txt$", std::regex_constants::icase);

const std::regex word_regex("([a-z]{2,})", std::regex_constants::icase);

This might a bit verbose, but quite readable. This only works for ASCII text, but for the purposes of this experiment it is sufficient. To calculate the number of times each word is seen, we need a hash table:

std::unordered_map<std::string, int> word_counts;

Now we need to iterate over all files in the current directory subtree.

for(const auto &e: std::filesystem::recursive_directory_iterator("."))

Skip everything that is not a .txt file.

if(!e.is_regular_file()) {
    continue;
}
if(!std::regex_search(e.path().c_str(), fname_regex)) {
    continue;
}

Process the file line by line:

std::ifstream current_file(e.path());
for (std::string line; std::getline(current_file, line); )

For each line we need to find all words that match the regular expression.

std::sregex_iterator word_end;
for(auto it = std::sregex_iterator(line.begin(), line.end(), word_regex); it != word_end; ++it)

This is a bit verbose, but it takes care of all the state tracking needed to run the regex multiple times on the same input string. Doing the same by hand is fairly tedious and error prone. Now that we have the matches, we need to convert them to standalone lowercase words.

std::string word{it->str()};
for(auto &c : word) {
    c = std::tolower(c);
}

Lowercasing strings is a bit cumbersome, granted, but this is at least fairly readable. Then on to incrementing the word count.

++word_counts[word];

That's all that is needed to count the words. This can't be done directly in the hash map so we need to convert the data to an an array. It needs some setup code.

struct WordCount {
    std::string word;
    int count;
};

std::vector<WordCount> word_array;
word_array.reserve(word_counts.size());

Since we know how many entries will be in the array, we can reserve enough space for it in advance. Then we need to iterate over all entries and add them to the array.

for(const auto &[word, count]: word_counts) {
    word_array.emplace_back(WordCount{word, count});
}

This uses the new structured bindings feature, which is a lot more readable than the old way of dealing with iterator objects or pairs with their firsts and their seconds and all that horror. Fortunately no more.

The simple way of getting the 10 most used entries is to sort the array and then grab the 10 first elements. Let's do something slightly swaggier instead [0]. We'll do a partial sort and discard all entries after 10. For that we need a descending sorting criterion as a lambda.

auto count_order_desc = [](const WordCount &w1, const WordCount &w2) { return w2.count < w1.count; };

Using it is straightforward.

const auto final_size = std::min(10, (int)word_array.size());
std::partial_sort(word_array.begin(), word_array.begin() + final_size, word_array.end(), count_order_desc);
word_array.erase(word_array.begin() + final_size, word_array.end());

All that remains is to print the final result.

for(const auto& word_info: word_array) {
    std::cout << word_info.count << " " << word_info.word << "\n";
};

Safety

As safety and security are important features in current sofware development, let's examine how safe this program is. There are two main safety points: thread safety and memory safety. As this program has only one thread, it can be formally proven to be thread safe. Memory safety also divides into two main things to note: use after frees (or dangling references) and out of bound array accesses.

In this particular program there are no dangling references. All data is in value types and the only references are iterators. They are all scoped so that they never live past their usages. Most are not even stored into named variables but instead passed directly as function arguments (see especially the call to partial_sort). Thus they can not be accessed after they have expired. There is no manual resource management of any kind. All resources are freed automatically. Running the code under Valgrind yields zero errors and zero memory leaks.

There is one place in the code where an out-of-bounds access is possible: when creating the iterator pointing to 10 elements past the beginning of the array [2]. If you were to forget to check that the array has at least 10 elements, that would be an immediate error. Fortunately most standard libraries have a way to enable checks for this. Here's what GCC reports at runtime if you get this wrong:

Error: attempt to advance a dereferenceable (start-of-sequence) iterator 100000 steps, which falls outside its valid range.

Not really a perfect safety record, but overall it's fairly good and certainly a lot better than implementing the same by manually indexing arrays.

Problems

Compiling the program takes several seconds, which is not great. This is mostly due to the regex header, which is known to be heavyweight. Some of the compiler messages encountered during development were needlessly verbose. The actual errors were quite silly, though, such as passing arguments to functions in the wrong order. This could definitely be better. It is reported that the use of concepts in C++20 will fix most of this issue, but there are no practical real-world usage reports yet.

Unique features

This code compiles, runs and works [1] on all major platforms using only the default toolchain provided by the OS vendor without any external dependencies or downloads. This is something that no other programming language can provide as of date. The closest you can get is plain C, but its standard library does not have the necessary functionality. Compiling the code is also simple and can be done with a single command:

g++ -o cppexample -O2 -g -Wall -std=c++17 cppexample.cpp

Conclusions

If we try to look at the end result with objective eyes it would seem to be fairly nice. The entire implementation takes fewer than 60 lines of code. There's nothing immediately terrible that pops up. What is happening at each step is fairly understandable and readable. There's nothing that one could immediately hate and despise for being "awful", as is tradition.

This is not to say you could still not hate C++. If you want to, go right ahead. There are certainly reasons for it. But if you choose to do that, make sure you hate it for real actual reasons of today, rather than things that were broken decades ago but have been fixed since.

[0] There is an even more efficient solution using a priority queue that does not need the intermediate array. Implementing it is left as an exercise to the reader.
[1] Reader beware, I have only proven this code to be portable, not tried it.
[2] The implementation in [0] does not have this problem as no iterators are ever modified by hand.

on "binary security of webassembly"

Greets!

You may have seen an interesting paper cross your radar a couple months ago: Everything Old is New Again: Binary Security of WebAssembly, by Daniel Lehmann, Johannes Kinder and Michael Pradel. The paper makes some strong claims and I would like to share some thoughts on it.

reader-response theory

For context, I have been working on web browsers for the last 8 years or so, most recently on the JavaScript and WebAssembly engine in Firefox. My work mostly consists of implementing new features, which if you are familiar with software development translates as "writing bugs". Almost all of those bugs are security bugs, potentially causing Firefox to go from being an agent of the user to an agent of the Mossad, or of cryptocurrency thieves, or anything else.

Mitigating browser bug flow takes a siege mentality. Web browsers treat all web pages and their corresponding CSS, media, JavaScript, and WebAssembly as hostile. We try to reason about global security properties, and translate those properties into invariants ensured at compile-time and run-time, for example to ensure that a web page from site A can't access cookies from site B.

In this regard, WebAssembly has some of the strongest isolation invariants in the whole platform. A WebAssembly module has access to nothing, by default: neither functionality nor data. Even a module's memory is isolated from the rest of the browser, both by construction (that's just how WebAssembly is specified) and by run-time measures (given that pointers are 32 bits in today's WebAssembly, we generally reserve a multi-gigabyte region for a module's memory that can contain nothing else).

All of this may seem obvious, but consider that a C++ program compiled to native code on a POSIX platform can use essentially everything that the person running it has access to: your SSH secrets, your email, all of your programs, and so on. That same program compiled to WebAssembly does not -- any capability it has must have been given to it by the person running the program. For POSIX-like programs, the WebAssembly community is working on a POSIX for the web that standardizes a limited-capability access to data and functionality from the world, and in web browsers, well of course the module has access only to the capabilities that the embedding web page gives to it. Mostly, as the JS run-time accompanying the WebAssembly is usually generated by emscripten, this set of capabilties is a function of the program itself.

Of course, complex WebAssembly systems may contain multiple agents, acting on behalf of different parties. For example, a module might, through capabilities provided by the host, be able to ask flickr to delete a photo, but might also be able to crawl a URL for photos. Probably in this system, crawling a web page shouldn't be able to "trick" the WebAssembly module into deleting a photo. The C++ program compiled to WebAssembly could have a bug of course, in which case, you get to keep both pieces.

I mention all of this because we who work on WebAssembly are proud of this work! It is a pleasure to design and build a platform for high-performance code that provides robust capabilities-based security properties.

the new criticism

Therefore it was with skepticism that I started reading the Lehmann et al paper. The paper focusses on WebAssembly itself, not any particular implementation thereof; what could be wrong about WebAssembly?

I found the answer to be quite nuanced. To me, the paper shows three interesting things:

  1. Memory-safety bugs in C/C++ programs when compiled to WebAssembly can cause control-flow edges that were not present in the source program.

  2. Unexpected control-flow in a web browser can sometimes end up in a call to eval with the permissions of the web page, which is not good.

  3. It's easier in some ways to exploit bugs in a C/C++ program when compiled to WebAssembly than when compiled natively, because many common mitigations aren't used by the WebAssembly compiler toolchain.

Firstly, let's dicuss the control-flow point. Let's say that the program has a bug, and you have made an exploit to overwrite some memory location. What can you do with it? Well, consider indirect calls (call_indirect). This is what a compiler will emit for a vtable method call, or for a call to a function pointer. The possible targets for the indirect call are stored in a table, which is a side array of all possible call_indirect targets. The actual target is selected at run-time based on an index; WebAssembly function pointers are just indices into this table.

So if a function loads an index into the indirect call table from memory, and some exploit can change this index, then you can cause a call site to change its callee. Although there is a run-time type check that occurs at the call_indirect site to ensure that the callee is called with the right type, many functions in a module can have compatible types and thus be callable without an error.

OK, so that's not great. But what can you do with it? Well it turns out that emscripten will sometimes provide JavaScript's eval to the WebAssembly module. Usually it will be called only with a static string, but anything can happen. If an attacker can redirect a call site to eval instead of one of the possible targets from the source code, you can (e.g.) send the user's cookies to evil.com.

There's a similar vulnerability regarding changing the operand to eval, instead. Strings are represented in linear memory as well, and there's no write protection on them, even if they are read-only data. If your write primitive can change the string being passed to eval, that's also a win for the attacker. More details in the paper.

This observation brings us to the last point, which is that many basic mitigations in (e.g.) POSIX deployments aren't present in WebAssembly. There are no OS-level read-only protections for static data, and the compiler doesn't enforce this either. Also WebAssembly programs have to bundle their own malloc, but the implementations provided by emscripten don't implement the "hardening" techniques. There is no addres-space layout randomization, so exploits are deterministic. And so on.

on mitigations

It must be said that for most people working on WebAssembly, security "mitigations" are... unsatisfactory. They aren't necessary for memory-safe programs, and they can't prevent memory-unsafe programs from having unexpected behavior. Besides, we who work on WebAssembly are more focussed on the security properties of the WebAssembly program as embedded in its environment, but not on the program itself. Garbage in, garbage out, right?

In that regard, I think that one answer to this paper is just "don't". Don't ship memory-unsafe programs, or if you do, don't give them eval capabilities. No general mitigation will make these programs safe. Writing your program in e.g. safe Rust is a comprehensive fix to this class of bug.

But, we have to admit also that shipping programs written in C and C++ is a primary goal of WebAssembly, and that no matter how hard we try, some buggy programs will get shipped, and therefore that there is marginal value to including mitigations like read-only data or even address space randomization. We definitely need to work on getting control-flow integrity protections working well with the WebAssembly toolchain, probably via multi-table support (part of the reference types extension; my colleague Paulo Matos just landed a patch in this area). And certainly Emscripten should work towards minimizing the capabilities set exposed to WebAssembly by the generated JavaScript, notably by compiling away uses of eval by embind.

Finally, I think that many of the problems identified by this paper will be comprehensively fixed in a different way by more "managed" languages. The problem is that C/C++ pointers are capabilities into all of undifferentiated linear memory. By contrast, handles to GC-managed objects are unforgeable: given object A, you can't get to object B except if object A references B. It would be great if we could bring some of the benefits of this more capability-based approach to in-memory objects to languages like C and C++; more on that in a future note, I think.

chapeau

In the end, despite my initial orneriness, I have to admit that the paper authors point out some interesting areas to work on. It's clear that there's more work to do. I was also relieved to find that my code is not at fault in this particular instance :) Onwards and upwards, and until next time, happy hacking!

Specify Form-Factors in Your Librem 5 Apps

While more and more applications are being redesigned to take smartphones like the Librem 5 into account, PureOS still offers lots of desktop applications which are not ready to run on such devices yet.

As a user you want to know which applications are relevant to install, so PureOS Store will by default only present mobile-ready applications, while still letting you opt-into showing all applications to take full advantage of the Librem 5’s convergeant docked mode. As a user you also want to know which applications are relevant to run at a given time, so Phosh will let you run desktop-only applications only when the phone is docked.

This requires the applications to provide some information on which form-factors they can handle, if you are an application developer and you want your applications to work as expected on the Librem 5, please provide the relevant information as shown below.

To make your application appear in PureOS Store, add the following lines to your AppStream metainfo:

<?xml version="1.0" encoding="UTF-8"?>
<component type="desktop"><custom>
    <value key="Purism::form_factor">workstation</value>
    <value key="Purism::form_factor">mobile</value>
  </custom></component>

To make your application appear in Phosh, add the following lines to your desktop entry:

[Desktop Entry]
…
# Translators: Do NOT translate or transliterate this text (these are enum types)!
X-Purism-FormFactor=Workstation;Mobile;
…

If you don’t add these, your application will be assumed to only be compatible with the desktop mode.

You may be intrigued by the Purism namespace in these solutions, we came up with these ad-hoc solutions to provide the form-factor compatibility information we need until better fleshed out solutions are ready. You can read more about this issue here:

October 14, 2020

Sandboxing inside the sandbox: No rogue thumbnailers inside Flatpak

 A couple of years ago, we sandboxed thumbnailers using bubblewrap to avoid drive-by downloads taking advantage of thumbnailers with security issues.

 It's a great tool, and it's a tool that Flatpak relies upon to create its own sandboxes. But that also meant that we couldn't use it inside the Flatpak sandboxes themselves, and those aren't always as closed as they could be, to support legacy applications.

 We've finally implemented support for sandboxing thumbnailers within Flatpak, using the Spawn D-Bus interface (indirectly).

This should all land in GNOME 40, though it should already be possible to integrate it into your Flatpaks. Make sure to use the latest gnome-desktop development version, and that the flatpak-spawn utility is new enough in the runtime you're targeting (it's been updated in the freedesktop.org runtimes #1, #2, #3, but it takes time to trickle down to GNOME versions). Example JSON snippets:

        {
"name": "flatpak-xdg-utils",
"buildsystem": "meson",
"sources": [
{
"type": "git",
"url": "https://github.com/flatpak/flatpak-xdg-utils.git",
"tag": "1.0.4"
}
]
},
{
"name": "gnome-desktop",
"buildsystem": "meson",
"config-opts": ["-Ddebug_tools=true", "-Dudev=disabled"],
"sources": [
{
"type": "git",
"url": "https://gitlab.gnome.org/GNOME/gnome-desktop.git"
}
]
}  

(We also sped up GStreamer-based thumbnailers by allowing them to use a cache, and added profiling information to the thumbnail test tools, which could prove useful if you want to investigate performance or bugs in that area)

October 13, 2020

gedit crowdfunding

The gedit text editor has a long history of development, it has been created in 1998 at the beginnings of GNOME. So it is one of the oldest GNOME application still alive and usually installed by default with Linux distributions that provide GNOME as their desktop environment.

It is this – the fact that many Linux users know and have gedit installed – that motivates me to improve it, to make it a top notch core application. It is not an easy undertaking though, the codebase is old and large, and there are several underlying software components (libraries) that are critical for the main functioning of gedit.

I started to contribute to gedit in 2011, and I'm now its main developer. I'm a freelance software developer, and would like to devote as much time as possible to gedit development, including the underlying libraries.

That's why there is now a crowdfunding for gedit! It's on the Liberapay platform, you can also take a look at my profile, I've described my contribution milestones, main projects and why Liberapay has been chosen.

Your donations are critical help that make it possible for the development of gedit to continue. To maintain it, develop new features and bring it to the next level.

Thanks!

malloc as a service

Greetings, internet! Today I have the silliest of demos for you: malloc-as-a-service.

loading walloc...

>&&<&>>>>>&&>

The above input box, if things managed to work, loads up a simple bare-bones malloc implementation, and exposes "malloc" and "free" bindings. But the neat thing is that it's built without emscripten: it's a standalone C file that compiles directly to WebAssembly, with no JavaScript run-time at all. I share it here because it might come in handy to people working on WebAssembly toolchains, and also because it was an amusing experience to build.

wat?

The name of the allocator is "walloc", in which the w is for WebAssembly.

Walloc was designed with the following priorities, in order:

  1. Standalone. No stdlib needed; no emscripten. Can be included in a project without pulling in anything else.

  2. Reasonable allocation speed and fragmentation/overhead.

  3. Small size, to minimize download time.

  4. Standard interface: a drop-in replacement for malloc.

  5. Single-threaded (currently, anyway).

Emscripten includes a couple of good malloc implementations (dlmalloc and emmalloc) which probably you should use instead. But if you are really looking for a bare-bones malloc, walloc is fine.

You can check out all the details at the walloc project page; a selection of salient bits are below.

Firstly, to build walloc, it's just a straight-up compile:

clang -DNDEBUG -Oz --target=wasm32 -nostdlib -c -o walloc.o walloc.c

The resulting walloc.o is a conforming WebAssembly file on its own, but which also contains additional symbol table and relocation sections which allow wasm-ld to combine separate compilation units into a single final WebAssembly file. walloc.c on its own doesn't import or export anything, in the WebAssembly sense; to make bindings visible to JS, you need to add a little wrapper:

typedef __SIZE_TYPE__ size_t;

#define WASM_EXPORT(name) \
  __attribute__((export_name(#name))) \
  name

// Declare these as coming from walloc.c.
void *malloc(size_t size);
void free(void *p);
                          
void* WASM_EXPORT(walloc)(size_t size) {
  return malloc(size);
}

void WASM_EXPORT(wfree)(void* ptr) {
  free(ptr);
}

If you compile that to exports.o and link via wasm-ld --no-entry --import-memory -o walloc.wasm exports.o walloc.o, you end up with the walloc.wasm used in the demo above. See your inspector for the URL.

The resulting wasm file is about 2 kB (uncompressed).

Walloc isn't the smallest allocator out there. A simple bump-pointer allocator that never frees is the fastest thing you can have. There is also an alternate allocator for Rust, wee_alloc, which is said to be smaller than walloc, though I think it is less space-efficient for small objects. But still, walloc is pretty small.

implementation notes

When a C program is compiled to WebAssembly, the resulting wasm module (usually) has associated linear memory. It can be linked in a way that the memory is created by the module when it's instantiated, or such that the module is given a memory by its host. The above example passed --import-memory to the linker, allowing the host to bound memory usage for the module instance.

The linear memory has the usual data, stack, and heap segments. The data and stack are placed first. The heap starts at the &__heap_base symbol. (This symbol is computed and defined by the linker.) All bytes above &__heap_base can be used by the wasm program as it likes. So &__heap_base is the lower bound of memory managed by walloc.

                                              memory growth ->
+----------------+-----------+-------------+-------------+----
| data and stack | alignment | walloc page | walloc page | ...
+----------------+-----------+-------------+-------------+----
^ 0              ^ &__heap_base            ^ 64 kB aligned

Interestingly, there are a few different orderings of data and stack used by different toolchains. It used to even be the case that the stack grew up. This diagram from the recent "Everything Old is New Again: Binary Security of WebAssembly" paper by Lehmann et al is a good summary:

The sensible thing to prevent accidental overflow (underflow, really) is to have the stack grow down to 0, with data at higher addresses. But this can cause WebAssembly code that references data to take up more bytes, because addresses are written using variable-length "LEB" encodings that favor short offsets, so it isn't the default, right now at least.

Anyway! The upper bound of memory managed by walloc is the total size of the memory, which is aligned on 64-kilobyte boundaries. (WebAssembly ensures this alignment.) Walloc manages memory in 64-kb pages as well. It starts with whatever memory is initially given to the module, and will expand the memory if it runs out. The host can specify a maximum memory size, in pages; if no more pages are available, walloc's malloc will simply return NULL; handling out-of-memory is up to the caller.

Walloc has two allocation strategies: small and large objects.

big bois

A large object is more than 256 bytes.

There is a global freelist of available large objects, each of which has a header indicating its size. When allocating, walloc does a best-fit search through that list.

struct large_object {
  struct large_object *next;
  size_t size;
  char payload[0];
};
struct large_object* large_object_free_list;

Large object allocations are rounded up to 256-byte boundaries, including the header.

If there is no object on the freelist that can satisfy an allocation, walloc will expand the heap by the size of the allocation, or by half of the current walloc heap size, whichever is larger. The resulting page or pages form a large object that can satisfy the allocation.

If the best object on the freelist has more than a chunk of space on the end, it is split, and the tail put back on the freelist. A chunk is 256 bytes.

+-------------+---------+---------+-----+-----------+
| page header | chunk 1 | chunk 2 | ... | chunk 255 |
+-------------+---------+---------+-----+-----------+
^ +0          ^ +256    ^ +512                      ^ +64 kB

As each page is 65536 bytes, and each chunk is 256 bytes, there are therefore 256 chunks in a page. The first chunk in a page that begins an allocated object, large or small, contains a header chunk. The page header has a byte for each of the 256 chunks in the page. The byte is 255 if the corresponding chunk starts a large object; otherwise the byte indicates the size class for packed small-object allocations (see below).

+-------------+---------+---------+----------+-----------+
| page header | large object 1    | large object 2 ...   |
+-------------+---------+---------+----------+-----------+
^ +0          ^ +256    ^ +512                           ^ +64 kB

When splitting large objects, we avoid starting a new large object on a page header chunk. A large object can only span where a page header chunk would be if it includes the entire page.

Freeing a large object pushes it on the global freelist. We know a pointer is a large object by looking at the page header. We know the size of the allocation, because the large object header precedes the allocation. When the next large object allocation happens after a free, the freelist will be compacted by merging adjacent large objects.

small fry

Small objects are allocated from segregated freelists. The granule size is 8 bytes. Small object allocations are packed in a chunk of uniform allocation size. There are size classes for allocations of each size from 1 to 6 granules, then 8, 10, 16, and 32 granules; 10 sizes in all. For example, an allocation of e.g. 12 granules will be satisfied from a 16-granule chunk. Each size class has its own free list.

struct small_object_freelist {
  struct small_object_freelist *next;
};
struct small_object_freelist small_object_freelists[10];

When allocating, if there is nothing on the corresponding freelist, walloc will allocate a new large object, then change its chunk kind in the page header to the size class. It then goes through the fresh chunk, threading the objects through each other onto a free list.

+-------------+---------+---------+------------+---------------------+
| page header | large object 1    | granules=4 | large object 2' ... |
+-------------+---------+---------+------------+---------------------+
^ +0          ^ +256    ^ +512    ^ +768       + +1024               ^ +64 kB

In this example, we imagine that the 4-granules freelist was empty, and that the large object freelist contained only large object 2, running all the way to the end of the page. We allocated a new 4-granules chunk, splitting the first chunk off the large object, and pushing the newly trimmed large object back onto the large object freelist, updating the page header appropriately. We then thread the 4-granules (32-byte) allocations in the fresh chunk together (the chunk has room for 8 of them), treating them as if they were instances of struct freelist, pushing them onto the global freelist for 4-granules allocations.

           in fresh chunk, next link for object N points to object N+1
                                 /--------\                     
                                 |        |
            +------------------+-^--------v-----+----------+
granules=4: | (padding, maybe) | object 0 | ... | object 7 |
            +------------------+----------+-----+----------+
                               ^ 4-granule freelist now points here 

The size classes were chosen so that any wasted space (padding) is less than the size class.

Freeing a small object pushes it back on its size class's free list. Given a pointer, we know its size class by looking in the chunk kind in the page header.

and that's it

Hey have fun with the thing! Let me know if you find it useful. Happy hacking and until next time!

October 11, 2020

Introducing Atypica

A while ago, I envisioned building a new* professional video production collective for commercial and non-commercial projects, both as a “creative outlet” for one of my long-standing passions, and as a way to build a specialized service offering that can act as a bridge between my own Montreal-based marketing agency and other collaborators or artists and freelancers.

For various reasons—including my brain getting monopolized by a handful of demanding contracts, and encountering a lot of “new and interesting” challenges in life—I had to shelve that project until I could give it proper attention.

As things quieted down over the past year or so, and after I finished an epic yak shaving journey (including but not limited to “bringing Getting Things GNOME back to life with a new release“), I recently dusted off my old notes and plans to present my vision for this particular creative offering.

The result is the Atypica video production collective.

I spun this into a standalone website and name, because I didn’t necessarily want to make it all about idéemarque, and I didn’t want to overload idéemarque’s own offering anyway–video production is a large-enough topic, and it involves a different-enough set of people in general, that it warranted its own website to adequately explain.

Whenever I get the opportunity to add some more of our video productions that meet or exceed this level of quality in the future, I will; and if anyone you know wants to hire a great video producer and editor who brings a human touch to any topic, now you know where to find my specific offering in that space.

Next, I will be reworking idéemarque’s offering and website. That’s a story for another day.


P.s.: a note on the FLOSS community’s sometimes surprising level of expectations—upon publication of this blog post, someone messaged me to make the case, in very strongly-worded terms, that my video work is not worth presenting; that I “should really remove” my portfolio, and “maybe stick to programming or do product management”, implying I have no idea how to make a worthwhile video, as evidenced by <insert list of nitpicks about some lighting or motion imperfections found in my 2019 GStreamer conference video>. Nevermind that I had to fix the colors and motion of hundreds of clips in post-production to overcome some technical limitations I encountered on-site, and did thousands of (invisible) cuts to fix the interviewees’ sentences because sound quality and story flow were more important to me than Oscar-winning visuals…
So, while I appreciate (constructive) feedback, if any photography connaisseur is watching that video and feels compelled to rectify my shortcomings, please have the following in mind, rather than assuming lack of knowledge or competence: picture the constraints of a jet-lagged run-and-gunner who couldn’t bring even a third of his equipment (let alone any crew to help) on site, and simultaneously had to: handle all technical aspects; chase down attendees all day to encourage them to overcome their shyness and speak to the camera in-between talks; wrestle for physical space throughout the day in less-than-optimal shooting conditions; record over 150 B-roll scenes to complement the interviews in a diversified way to truly convey the atmosphere of the event; try to accomplish all of that in only a few hours. It might not be the most perfect video in the eyes of a demanding DP, but I do think I’ve done a fairly good job in the circumstances.

Luckily I have a really tough skin (as I’ve been involved in free & open-source software for over 15 years), but not many “real world” marketeers or PMs are as understanding as me. Already rare in this segment of the industry due to cultural and economic factors, if other benevolent marketeers received such dismissive criticism when publishing creative work for FLOSS projects, it is no wonder why they are typically nowhere to be found in our community. In the business world, I’ve had CEOs express wild enthusiasm over much, much less technically-refined videos I produced before these—I guess I’m glad they have no taste! ;)


*: I used to run a video production collective decades ago, but none of the material produced back then would meet my quality standards today (whether from a technological or “technique” standpoint), and the team that was part of the collective back then almost all became bankers (they also gave up on their award-winning jazz band. True story.)

The post Introducing Atypica appeared first on The Open Sourcerer.

Pitivi 2020.09 — Hocus focus

The Pitivi team proudly presents Pitivi version 2020.09, a new milestone towards the most reliable video editor.

New features

Since the 2014 fundraiser and until Pitivi 0.999 released in 2018, we included only critical features to focus on quality. For example, the proxy functionality which allows precise editing by seamlessly using the optimized media when the original media format is not supported officially.

Life and GSoC happens and we developed many good features over this period. Most of these originate from the Google Summer of Code program in which we took part, but not only. The new features came with accompanying unit tests and they have been merged only after careful code reviews. Even so, we kept them in the development branch.

This approach led to extra work for taking care of two branches. In addition to a “stable” branch out of which we were making releases, we also maintained a “development” branch in which we were merging cool features. Luckily, despite not much appearing to be happening with the project due to releases containing only bug fixes, contributors kept showing up, and not only because of GSoC.

Since the previous release we came to our senses and reconsidered the earlier decision. Pitivi 2020.09 includes a ridiculous number of new features, for your delight. Read the full list below and get ready to be blown away by what our contributors built.

Quality

The user survey we conducted in 2013 revealed the most important point was to have a “stable as hell” basic editor. Since the previous release two years ago, we fixed the last outstanding bugs in GStreamer Editing Services (GES), Pitivi’s reusable video editing backend.

At this point, the extensive GES and Pitivi unit tests and the gst-validate testing framework developed for GStreamer/GES allow us to make changes without being too afraid of introducing regressions. Thanks to the quality assurance we have in place, we are now switching to a date-based versioning scheme for Pitivi. The intention is to release often what we have at that point and make gradual changes.

Even though there won’t be a Pitivi 1.0, after 15 years of hard work we are proud GES has reached the “1.0” level. We are very happy to finally reach this essential milestone in the project life!

This was possible thanks to contributions from Igalia who has been sponsoring the development of the GStreamer Editing Services by stabilizing it and implementing many features such as the integration with OpenTimelineIO from Pixar, implementing timeline nesting, and clip speed control (yet to be leveraged in Pitivi).

We’ll be looking for ways of creating a constant stream of money to fund development by drawing interest from institutions backed by public money. Until this will happen, if ever, we’ll play with occasional fundraisers for improving stability. More about this in a separate blog post.

What’s new

This release includes lots of new features, originating from GSoC internships spanning four years, from University of Nebraska-Lincoln’s SOFT-261 class Spring 2020 and from individual contributors.

GSoC 2017
  • A plugin system allows extending the Pitivi functionality medium-term, targeted for teams of editors.
  • A developer console plugin allows interacting with the project in a Python console.
  • Mechanism for supporting custom UI for effects issues of the automatically generated UI.
  • Custom UI for the frei0r-filter-3-point-color-balance and alpha effects.
  • Easy Ken-Burns effect.
GSoC 2018
  • The new Greeter perspective replaces the Welcome wizard dialog and allows a slick selection of a recently opened project.
  • When being resized, the Viewer shows the percent of the actual widget size in relation to the project video size.
  • The Viewer size snaps at 50% when being resized.
  • Scaled proxies if optimized media is too much for your machine.
GSoC 2019
  • Timeline markers.
  • Support for nested timelines by importing entire XGES files as a clip.
  • The Effects Library has been redesigned to allow quick access to effects.
  • Ability to favorite effects in the Effects Library.
  • Improved workflow for adding effects.
  • The refreshed clip effects UI allows working on multiple effects at a time.
GSoC 2020
  • Refactored Media Library to use the same logic for the different view modes.
  • Refactored Render Dialog UI to avoid overwhelming people. Tell us what you think about it.
University of Nebraska-Lincoln’s SOFT-261 class, Spring 2020
  • Restoring the editing state when reopening a project.
  • Safe areas visualization in the Viewer.
  • Easy alignment for video clips.
  • Composition guidelines in the Viewer.
  • Solid color clips.
  • Ability to mute an entire layer.

Individual hackers

  • Custom UI for the frei0r-filter-alphaspot effect.
  • Interactive Intro for newcomers to get familiar with the UI elements.
  • The action search is a shortcut to everything possible in the timeline, if you can find it. Press “/”.
  • Ability to hide an entire layer.
  • New keyboard shortcuts for pros.

Under the hood

Since the Pitivi 0.15.2 release that came at the time of our 2014 crowdfunding campaign, we have basically changed everything under the hood, with too many bug fixes to count:

  • We rewrote the timeline canvas twice due to the changing technological landscape around us.
  • We’ve redone the clip transformation UI in the viewer twice.
  • We’ve redone the undo-redo system.
  • We’ve redone the Media Library.
  • We’ve redone the Effects Library.

Please see the corresponding 2020.09 milestone for details.

Download

You can install Pitivi 2020.09 right away with flatpak. Follow the installation instructions from flathub.

What’s next

We keep the highest priority issues in the 2020.12 milestone. Basically a mix of bugfixing, cleanup and features.

Thanks

Many thanks to all the contributors!

October 09, 2020

Rift CV1 – multi-threaded tracking

This video shows the completion of work to split the tracking code into 3 threads – video capture, fast analysis and long analysis.

If the projected pose of an object doesn’t line up with the LEDs where we expect it to be, the frame is sent off for more expensive analysis in another thread. That way, it doesn’t block tracking of other objects – the fast analysis thread can continue with the next frame.

As a new blob is detected in a video frame, it is assigned an ID, and tracked between frames using motion flow. When the analysis results are available at some point in the future, the ID lets us find blobs that still exist in that most recent video frame. If the blobs are still unknowns in the new frame, the code labels them with the LED ID it found – and then hopefully in the next frame, the fast analysis is locked onto the object again.

There are some obvious next things to work on:

  • It’s hard to decide what constitutes a ‘good’ pose match, especially around partially visible LEDs at the edges. More experimentation and refinement needed
  • The IMU dead-reckoning between frames is bad – the accelerometer biases especially for the controllers tends to make them zoom off very quickly and lose tracking. More filtering, bias extraction and investigation should improve that, and help with staying locked onto fast-moving objects.
  • The code that decides whether an LED is expected to be visible in a given pose can use some improving.
  • Often the orientation of a device is good, but the position is wrong – a matching mode that only searches for translational matches could be good.
  • Taking the gravity vector of the device into account can help reject invalid poses, as could some tests against plausible location based on human movements limits.

Code is at https://github.com/thaytan/OpenHMD/tree/rift-correspondence-search

October 07, 2020

scikit-survival 0.14 with Improved Documentation Released

Today marks the release of version 0.14.0 of scikit-survival. The biggest change in this release is actually not in the code, but in the documentation. This release features a complete overhaul of the documentation. Most importantly, the documentation has a more modern feel to it, thanks to the visually pleasing pydata Sphinx theme, which also powers pandas.

Moreover, the documentation now contains a User Guide section that bundles several topics surrounding the use of scikit-survival. Some of these were available as separate Jupyter notebooks previously, such as the guide on Evaluating Survival Models. There are two new guides: The first one is on penalized Cox models. It provides a hands-on introduction to Cox’s proportional hazards model with $\ell_2$ (Ridge) and $\ell_1$ (LASSO) penalty. The second guide, is on Gradient Boosted Models and covers how gradient boosting can be used to obtain a non-linear proportional hazards model or a non-linear accelerated failure time model by using regression tree base learners. The second part of this guide covers a variant of gradient boosting that is most suitable for high-dimensional data and is based on component-wise least squares base learners.

To make it easier to get started, all notebooks can now be run in a Jupyter notebook, right from your browser, just by clicking on

In addition to the vastly improved documentation, this release includes important bug fixes. It fixes several bugs in CoxnetSurvivalAnalysis, where predict, predict_survival_function, and predict_cumulative_hazard_function returned wrong values if features of the training data were not centered. Moreover, the score function of ComponentwiseGradientBoostingSurvivalAnalysis and GradientBoostingSurvivalAnalysis will now correctly compute the concordance index if loss='ipcwls' or loss='squared'.

For a full list of changes in scikit-survival 0.14.0, please see the release notes.

Pre-built conda packages are available for Linux, macOS, and Windows via

 conda install -c sebp scikit-survival

Alternatively, scikit-survival can be installed from source following these instructions.

What is GNOME OS?

With the release of GNOME 3.38.0, we started producing and distributing bootable VM images for debugging and testing features before they hit any distribution repository. We called the images GNOME OS. The name itself is not new, and what it stands for has not changed dramatically since it was introduced, but let’s restate its goals.
 
GNOME OS aims to better facilitate development of GNOME by providing a working system for development, design, and user testing purposes.
 
The main feature of GNOME OS is that we can produce a new system image for each commit made in any of our modules. The ability to have these VM images is truly amazing since we are dealing with hundreds of modules that depend and integrate with each other, and with the lower layers of the OS stack. This effort represents a game changer with regards to being able to automate the boot and session initialization, testing design and implementation changes, catching regressions earlier in the development cycle, and many other possibilities.
 
GNOME OS will also allow the engagement team to more easily create visual assets ahead of the release, present features and raise bug fixes with the free and open source software community at large, and make initiatives like the release video a lot simpler. Journalists will be able to get their hands on the new release of GNOME before the final release. 

What GNOME OS is not

Despite the its name, GNOME OS should not be regarded as GNOME’s own platform or general purpose operating system. As much as I would personally like for it to be, and have talked in the past about it, we have to recognize that it’s not, and for good reasons. In its current state, GNOME OS is still an incomplete reference system. It’s the closest approximation that we have to a GNOME platform OS, but we don’t actually have the resources that would be required to support a fully realized, general purpose OS for everyday use.
 
Maintaining an OS would, at the bare minimum, require keeping up with security fixes (CVEs), doing hardware enablement, and having some kind of user support story. Each one of these is a gigantic task requiring a dedicated team to do properly, and GNOME is not currently set up to be able to tackle them. Distributions put a lot of effort into building their platforms to maintain the changes coming from various upstreams and QAing them together.
 
Furthermore, many GNOME contributors are already working for companies that develop distributions (Canonical, Red Hat, SUSE, Endless, etc.) and it’s unclear how many would be willing or able to maintain another one.

Who is it for?

Firstly it’s for designers, so they no longer have to suffer through countless hours of trying to build software themselves, in order to test the latest development versions of some of our core modules (most notably GNOME Shell). Tightening that feedback loop is incredibly valuable for delivering a polished product. After that, it’s for the release team, so it can validate releases before slinging them out the door; for developers and translators, so they can have a complete system to test and debug their changes on; for our downstream distributors and OS vendors, so that they can have a known to be working baseline against which they can compare their own products. Last but not least, it’s for the machines and robots that keep an eye out for regressions.
 
Enjoy GNOME OS!  

October 05, 2020

Rift CV1 update

This is another in my series of updates on developing positional tracking for the Oculus Rift CV1 in OpenHMD

In the last post I ended with a TODO list. Since then I’ve crossed off a few things from that, and fixed a handful of very important bugs that were messing things up. I took last week off work, which gave me some extra hacking hours and enthusiasm too, and really helped push things forward.

Here’s the updated list:

  • The full model search for re-acquiring lock when we start, or when we lose tracking takes a long time. More work will mean avoiding that expensive path as much as possible.
  • Multiple cameras interfere with each other.
    • Capturing frames from all cameras and analysing them happens on a single thread, and any delay in processing causes USB packets to be missed.
    • I plan to split this into 1 thread per camera doing capture and analysis of the ‘quick’ case with good tracking lock, and a 2nd thread that does the more expensive analysis when it’s needed. Partially Fixed
  • At the moment the full model search also happens on the video capture thread, stalling all video input for hundreds of milliseconds – by which time any fast motion means the devices are no longer where we expect them to be.
    • This means that by the next frame, it has often lost tracking again, requiring a new full search… making it late for the next frame, etc.
    • The latency of position observations after a full model search is not accounted for at all in the current fusion algorithm, leading to incorrect reporting. Partially Fixed
  • More validation is needed on the camera pose transformations. For the controllers, the results are definitely wrong – I suspect because the controller LED models are supplied (in the firmware) in a different orientation to the HMD and I used the HMD as the primary test. Much Improved
  • Need to take the position and orientation of the IMU within each device into account. This information is in the firmware information but ignored right now. Fixed
  • Filtering! This is a big ticket item. The quality of the tracking depends on many pieces – how well the pose of devices is extracted from the computer vision and how quickly, and then very much on how well the information from the device IMU is combined with those observations. I have read so many papers on this topic, and started work on a complex Kalman filter for it.
  • Improve the model to LED matching. I’ve done quite a bit of work on refining the model matching algorithm, and it works very well for the HMD. It struggles more with the controllers, where there are fewer LEDs and the 2 controllers are harder to disambiguate. I have some things to try out for improving that – using the IMU orientation information to disambiguate controllers, and using better models for what size/brightness we expect an LED to be for a given pose.
  • Initial calibration / setup. Rather than assuming the position of the headset when it is first sighted, I’d like to have a room calibration step and a calibration file that remembers the position of the cameras.
  • Detecting when cameras have been moved. When cameras observe the same device simultaneously (or nearly so), it should be possible to detect if cameras are giving inconsistent information and do some correction.
  • hot-plug detection of cameras and re-starting them when they go offline or encounter spurious USB protocol errors. The latter happens often enough to be annoying during testing.
  • Other things I can’t think of right now.

As you can see, a few of the top-level items have been fixed, or mostly so. I split the computer vision for the tracking into several threads:

  • 1 thread shared between all sensors to capture USB packets and assemble them into frames
  • 1 thread per sensor to analyse the frame and update poses

The goal with that initial split was to prevent the processing of multiple sensors from interfering with each other, but I found that it also has a strong benefit even with a single sensor. I realised something in the last week that I probably should have noted earlier: The Rift sensors capture a video frame every 19.2ms, but that frame then takes a full 17ms to deliver across the USB – the means that when everything was in one thread, even with 1 sensor there was only about 2.2ms for the full analysis to take place or else we’d miss a packet of the next frame and have to throw it away. With the analysis now happening in a separate thread and a ping-pong double buffer in place, the analysis can take quite a bit longer without losing any video frames.

I plan to add a 2nd per-sensor thread that will divide the analysis further. The current thread will do only fast pass validation of any existing tracking lock, and will defer any longer term analysis to the other thread. That means that if we have a good lock on the HMD, but can’t (for example) find one of the controllers, searching for the controller will be deferred and the fast pass thread will move onto the next frame and keep tracking lock on the headset.

I fixed some bugs in the calculations that move between frames of reference – converting to/from the global position and orientation in the world to the position and orientation relative to each camera sensor when predicting what the appearance of the LEDs should be. I also added in the IMU offset and orientation of the LED models from the firmware, to make the predictions more accurate when devices move in the time between camera exposures.

Yaw Correction: when a device is observed by a sensor, the orientation is incorporated into what the IMU is measuring. The IMU can sense gravity and knows which way is up or down, but not which way is forward. The observation from the camera now corrects for that yaw drift, to keep things pointing the way you expect them to.

Some other bits:

  • Fixing numerical overflow issues in the OpenHMD maths routines
  • Capturing the IMU orientation and prediction that most closely corresponds to the moment each camera image is recorded, instead of when the camera image finishes transferring to the PC (which is 17ms later)
  • Improving the annotated debug view, to help understand what’s happening in the tracking computer vision steps
  • A 1st order estimate of device velocity to help improve the next predicted position

I posted a longer form video walkthrough of the current code in action, and discussing some of the remaining shortcomings.

As previously, the code is available at https://github.com/thaytan/OpenHMD/tree/rift-correspondence-search

Librsvg is accepting interns for Outreachy's December 2020 round

There are two projects in librsvg available for Outreachy applicants in the December 2020 / March 2021 round:

  • Revamp the text engine - Do you know about international text layout? Can you read a right-to-left language, or do you write in a language that requires complex shaping? Would you like to implement the SVG 2 text specification in a pleasant Rust code base? This project requires someone who can write Rust comfortably; it will require reading and refactoring some existing code. You don't need to be an expert in exotic lifetimes and trait bounds and such; the code doesn't use them.

  • Implement SVG2/CSS3 features - Are you excited by all the SVG2 features in Inkscape, and would like to add support for them in librsvg? Would you like to do small changes to many parts of the code to implement small features, one at a time? Do you like test-driven development? This project requires someone who can write Rust code at a medium level; you'll learn a lot by cutting&pasting from existing code and refactoring things to implement SVG2 features.

Important: Outreachy's December 2020 / March 2021 round is available only for students in the Southern hemisphere. People in the Northern hemisphere can wait until the 2021 mid-year round.

You can see GNOME's projects in Outreachy for this round. The deadline for initial contributions and project applications is October 31, 2020 at 16:00 UTC.

Tracker 3.0: What’s New?

This is part 2 of a series. Part 1 is here. Come back next week to find out more about the history and future of Tracker.

It was only a single line in the release notes. There weren’t any new graphics to show in the video. We leave cool UIs to others.

So what do we have to show, after a year of focused effort and a series of disruptive change to Tracker and GNOME?

A complete redesign

Where earlier efforts failed, Tracker has made full-text search a first-class feature in GNOME.

However, the shortcomings of the 2000’s era design have been clear for a while.

Back in 2005, around the time your grandparents first met each other on Myspace, it seemed a great idea to aggregate all the metadata we could find into a single database. The old tracker-store database from Tracker 2.x includes the search index created by Tracker Miner FS right next to user data stored by apps like Notes, Photos and Contacts. This was going to allow cool features like tagging people in your photo collection with their phone number and online status (before the surveillence-advertising industry showed how creepy that actually is). Ivan Frade’s decade-old talk “Semantic social desktop& mobile devices” is a great insight into the thinking of the time.

I’m going to dig into Tracker’s origins in a future article, but for now — note that “Security on graphs” is listed in Ivan’s presentation as a “to-do” item.

Security

In a world of untrusted Flatpak apps, “to-do” isn’t good enough. Any app that uses the system search service, even any app that stores RDF data with Tracker, requests a Flatpak D-Bus permission for org.freedesktop.Tracker1. This gives access to the entire tracker-store database, right down to the search terms indexed from ‘Documents’ folder. Imagine your documents as a savoury snack, stored in a big monolithic building. You accidentally install and run a malicious app, represented here as a hungry seagull…

To solve this, we had to make access control more granular. It didn’t make sense to retrofit this to tracker-store. During 2019, Carlos casually eliminated the monolithic tracker-store altogether and in its place implemented a desktop-wide distributed database, taking Tracker from a “public-by-default” model to “private-by-default”

The new libtracker-sparql-3 API lets apps store SPARQL data anywhere they like. You can keep it private, if you just want a lightweight database. Nautilus and Notes are already doing this, to store starred files and note data respectively.

If and when you want to publish data, it’s done by creating a TrackerEndpoint on DBus. Using a SPARQL federated query, one Tracker SPARQL database can pull data from multiple others in a single query. This, for example, allows Photos to merge photo metadata from the search index with album metadata stored in its own database. (I wrote more about this back in March).

The search index created by Tracker Miner FS is published at org.freedesktop.Tracker3.Miner.Files, but we don’t let Flatpak apps access this directly. A new Flatpak portal gates access to search based on content type. You can now install a music player app and let it search ONLY your music collection, where previously your options were “let the app search everything” or “break it”.

A clearer architecture

“I’m finally starting to “get” tracker 3. And it’s like an epiphany. “

Antonio

If someone asked “What actually is Tracker?” I used to find it tricky to answer. We narrowed the focus down to two things: a lightweight database, and a search engine.

For the last 3 years we worked on separating these two concerns, and as of Tracker 3.0 we are done. Were we starting from search, we could find clearer names for the two parts than ‘tracker’ and ‘tracker-miners’, but we kept the repos and package names the same to avoid making the 2.x to 3.x transition harder for distributors.

The name “Tracker” refers to the overall project. You can use “GNOME Tracker” for clarity where needed. The project maintains two code repositories:

  • Tracker SPARQL: a distributed database, provided as a GObject C library and implementing the full SPARQL 1.1 query standard.
  • Tracker Miners: a content indexer for the desktop, providing the
    Tracker Miner FS system service and its companion Tracker Extract.

The tracker3 commandline tool can operate on any Tracker SPARQL database, and it has some extensions for searching and managing the Tracker Miner FS indexer.

Decentralisation

The headline feature is there’s one less reason to claim “Flatpak’s sandbox is a lie!”. Decentralisation brings more benefits too:

  • You can backup app data by running tracker3 export on the app’s SPARQL database. Useful for Notes, Photos and more.
  • Apps can bundle Tracker Miners inside Flatpak, allowing them to run on platforms that don’t ship a suitable version of Tracker Miners in the base OS.
  • Apps are no longer limited to the Nepomuk data model when storing data. Tracker Miner FS still uses the Nepomuk ontologies, but apps can write their own. Distributed queries work even across different data models.
  • Tracker’s test suite now sets up a private database for testing using public API, avoiding some hideous hacks.
  • Apps test suites can also set up private databases and even a private instance of the indexer. GNOME’s search and content apps have rather low test coverage at present. I suspect this is partly because the old design of Tracker made it hard to write good tests.
  • A distributed database is fundamentally a cool thing that you definitely need.

Stability

A system service, like a Victorian child, should be “seen and not heard”. Nobody wants the indexer to drain the battery, burn out the fan or lock up the desktop.

We prioritize any issue which reports the Tracker daemons have been behaving badly. In collaboration with many helpful bug reporters, we removed two codepaths in 3.0 that could trigger high CPU usage. One major change is we no longer index all plain text files, only those with an allowed extension. If you unpack Linux kernel tarballs in your Music folder, this is for you! (Remember Tracker isn’t designed to index source code). We also dropped a buggy and pointless codepath that tried and mostly failed to extract metadata from random image/* type files using GStreamer’s Discoverer API.

Tracker Extract is designed for robustness but it also needs to report errors. If extraction of foo.flac fails it may indicate a bug in Tracker, or in GStreamer, or libflac, or (more likely) the file is corrupt or mis-labelled. In Tracker Miners 3 we have improved how extraction errors are reported — instead of using the journal, we log errors to disk (at ~/.cache/tracker3/files/errors/). This prevents any ‘spamming’ of the journal when many errors are detected. You can check for errors by running tracker3 status. Perhaps Nautilus could make these errors visible in future too.

Since Tracker Miners 3.0.0 was released, distro beta testers found two issues that could cause high CPU usage. These are fixed in the 3.0.1 release. If you see any issues with Tracker Miners 3.0.1, please report them on GitLab!

Here I also want to mention Benjamin’s excellent work to improve resource management for system services. Tracker Miner FS tries to avoid heavy resource use but filesystem IO is infinitely complicated and we cannot defend against every possible situation. Strange filesystems or bugs in dependencies can cause high CPU or IO consumption. If the kernel’s scheduler is not smart, it may focus on these tasks at the expense of the important shell and app processes, leaving the desktop effectively locked. Benjamin’s work lets the kernel know to prioritize a responsive desktop above any system services like Tracker Miner FS. Mac OS X could do this since 2013.

Whatever this decade brings, it should be free from desktop lockups!

Standards

The 2.x to 3.x transition was difficult partly because Tracker missed some big pieces of the SPARQL standard. Implementing a 3.x-to-2.x translation layer was out of the question — we had no motivation to re-implement the quirks of 2.x just so apps could avoid porting to 3.x.

I don’t see another major version break in Tracker’s near future, but we are now prepared. Tracker implements almost all of the SPARQL 1.1 standard.

SPARQL is not without its drawbacks — more on that in a future article — but aside from a few simple C and DBus interfaces, all of Tracker’s functionality is accessible through this W3C standard query language. Better to reuse standards than to make our own.

…and more

We have a new website, improved documentation. The tracker3 commandline tool saw loads of cleanups and improvements. Files are automatically re-processed when the relevant tracker-extract module changes — a ten year old feature request. Debugging is nicer as a keyword-enabled TRACKER_DEBUG variable replaces the old TRACKER_VERBOSITY. Deprecated APIs and dependencies are gone, including the venerable intltool. The core and Nepomuk ontologies are slimmed down and better organised. We measure test coverage, and test coverage is higher than ever. We enabled Coverity static analysis too which has found some obscure bugs. I’m no doubt forgetting some things.

A few of these changes impact everyone, but mostly the improvements benefit power users, app developers, and ourselves as maintainers. It’s crucial that a volunteer-driven project like Tracker is easy and fun to maintain, otherwise it can only fail. I think we have paved the way for a bright future.

Come back next week to find out more about the background and future of Tracker.

October 02, 2020

Integrating codespell into your CI

If you have never heard of codespell, it's a command line utility to check for common misspellings with the possibility to add your own dictionaries. I have looked lately at integrating it as part of the CI for some of my projects. 

Gitlab CI

Currently codespell doesn't provide a docker image that could be used for CI. You can create your own or use a pretty simple image I have setup and pushed to hub.docker.com. 

The docker image is a simple as 

FROM python:3.8

RUN pip3 install codespell

Once you have an image up and ready, you can add it to your .gitlab-ci.yml

codespell:
  image: "docker.io/bilelmoussaoui/codespell"
  script:
    - codespell -S "*.png,*.po,.git,*.jpg" -f

-S argument is a comma separated glob pattern, directories or files to ignore. The -f argument makes codespell check the file names as well.

 You can find more about the possible options you can pass by running codepsell --help

Github Actions

Codespell provides a Github Action so integrating it as a workflow should be straightforward 

on:
  push:
    branches: [master]
  pull_request:

name: Spell Check

jobs:
  codespell:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: codespell-project/actions-codespell@master
        with:
          check_filenames: true

See the list of arguments you can pass to the action.

By default codespell uses two dictionaries "clear" and "rare". You can tweak that list to use something else by passing --builtin "clear,usage,code".

October 01, 2020

GNOME Internet Radio Locator 3.4.0 with C-SPAN for Fedora Core 32

GNOME Internet Radio Locator 3.4.0 features updated language translations, new, improved map marker palette and now as well as C-SPAN from United States Supreme Court, Congress and Senate, also includes streaming radio from Washington, United States of America; WAMU/NPR, London, United Kingdom; BBC World Service, Berlin, Germany; Radio Eins, Norway; NRK, and Paris, France; France Inter/Info/Culture, as well as 119 other radio stations from around the world with live audio streaming implemented through GStreamer.  The project lives on www.gnomeradio.org and Fedora 32 RPM packages for version 3.4.0 of GNOME Internet Radio Locator are now also available:

gnome-internet-radio-locator.spec

gnome-internet-radio-locator-3.4.0-1.fc32.src.rpm

gnome-internet-radio-locator-3.4.0-1.fc32.x86_64.rpm

To install GNOME Internet Radio Locator 3.4.0 on Fedora Core 32 in Terminal:

sudo dnf install http://www.gnomeradio.org/~ole/fedora/RPMS/x86_64/gnome-internet-radio-locator-3.4.0-1.fc32.x86_64.rpm

Closing the gap (in flexbox 😇)

Flexbox had a lot of early problems, but by mid-May 2020 where our story begins, both Firefox and Chromium had done a lot of work on improving things with this feature. WebKit, however, hadn’t caught up. Prioritizing the incredible amounts of work a web engine requires is difficult. The WebKit implementation was still passable for very many (most) cases of the core features, and it didn’t have problems that caused crashes or something that urgently demanded attention, so engineers dedicated their limited time toward other things. The net result, however, was that as this choice repeated many times, the comparative state of WebKit’s flexbox implementation had fallen behind pretty significantly.
Web Platform Tests (WPT) is a huge ongoing effort from many people to come up with a very extensive list of tests that could help both spec editors and implementors to make sure we have great compatibility. In the case of flexbox, for example, there are currently 773 tests (2926 subtests) and WebKit was failing a good amount of them. This matters a lot because there are things that flexbox is ideal for, and it is exceptionally widely used. In mid-May, Igalia was contracted to improve things here, and in this post, I’ll explain and illustrate how we did that.

The Challenge

The main issues were (in no particular order):
  • min-width:auto and min-height:auto handling
  • Nested flexboxes in column flows
  • Flexboxes inside tables and viceversa
  • Percentages in heights with indefinite sizes
  • WebKit CI not runnning many WPT flexbox tests
  • and of course… lack of gap support in Flexbox
Modifying Flexbox layout code is a challenge by itself. Tiny modifications in the source code could cause huge differences in the final layout. You might even have a patch that passes all the tests and regresses multiple popular web sites.
Good news is that we were able to tackle most of those issues. Let’s review what changes you could eventually expect from future releases of Safari (note that Apple doesn’t disclose information about future products and/or releases) and the other WebKit based browsers (like GNOME Web).

Flexbox gaps 🥳🎉

Probably one of the most awaited features in WebKit by web developers. It’s finally here after Firefox and Chrome landed it not so long ago. The implementation was initially inspired by the one in Chrome but then it diverged a bit in the final version of the patch. The important thing is that the behaviour should be the same, at least all the tests in WPT related to gaps are passing now in WebKit trunk.
<div style="display: flex; flex-wrap: wrap; gap: 1ch">
  <div style="background: magenta; color: white">Lorem</div>
  <div style="background: green; color: white">ipsum</div>
  <div style="background: orange; color: white">dolor</div>
  <div style="background: blue; color: white">sit</div>
  <div style="background: brown; color: white">amet</div>
</div>

Tables as flex items

Tables should obey the flex container sizing whenever they are flex items. As it can be seen in the examples bellow, the tables’ layout code was kicking in and ignoring the constraints set by the flex container. Tables should do what the flex algorithm mandates and thus they should allow being stretched/squeezed as required.
<div style="display:flex; width:100px; background:red;">
  <div style="display:table; width:10px; max-width:10px; height:100px; background:green;">
    <div style="width:100px; height:10px; background:green;"></div>
  </div>
</div>

Tables with items exceeding the 100% of available size

This is the case of tables placed inside flex items. The automatic layout table algorithm was generating tables with unlimited widths when the sum of the sizes of their columns (expressed in percentages) was exceeding the 100%. It was impossible to fulfill at the same time the constraints set by tables and flexbox algorithms.
<div style="display:flex; width:100px; height:100px; align-items:flex-start; background:green;">
  <div style="flex-grow:1; flex-shrink:0;">
    <table style="height:50px; background:green;" cellpadding="0" cellspacing="0">
      <tr>
        <td style="width:100%; background:green;"> </td>
        <td style="background:green;"> </td>
      </tr>
    </table>
  </div>
</div>

Note how the table was growing indefinitely (I cropped the “Before” picture to fit in the post) to the right before the fix.

Alignment in single-line flexboxes

Interesting case. The code was considering that single-line flexboxes were those where all the flex items were placed in a single line after computing the required space for them. Though sensible, that’s not what a single line flexbox is, it’s a flex container with flex-wrap:nowrap. This means that a flex container with flex-wrap:wrap whose children do not need more than 1 flex line to be placed is not a single-line flex container from the specs POV (corolary: implementing specs is hard).
<div style="display: flex; flex-wrap: wrap; align-content: flex-end; width: 425px; height: 70px; border: 2px solid black">
  <div style="height: 20px">This text should be at the bottom of its container</div>
</div>

Percentages in flex items with indefinite sizes

One of the trickiest ones. Although it didn’t involve a lot of code it caused two serious regressions in Youtube’s upload form and when viewing Twitter videos in fullscreen which required some previous fixes and delayed a bit the landing of this patch. Note that this behaviour was really conflictive from the pure specification POV as there were many changes over the time. Defining a good behaviour is really complicated. Without entering in too much details, flexbox has a couple of cases were sizes are considered as definite when they are theoretically indefinite. In this case we consider that if the flex container main size is definite then the post-flexing size of flex items is also treated as definite.
<div style="display: flex; flex-direction: column; height: 150px; width: 150px; border: 2px solid black;">
  <div>
    <div style="height: 50%; overflow: hidden;">
      <div style="width: 50px; height: 50px; background: green;"></div>
    </div>
  <div>
  <div style="flex: none; width: 50px; height: 50px; background: green;"></div>
</div>

Hit testing with overlapping flex items

There were some issues with pointer events passing through overlapping flex items (due to negative margins for example). This was fixed by letting the hit testing code proceed in reverse (the opposite to painting) order-modified document order instead of using the raw order from the DOM.
<div style="display:flex; border: 1px solid black; width: 300px;">
  <a style="width: 200px;" href="#">Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua</a>
  <div style="margin-left: -200px; width: 130px; height: 50px; background: orange;"></div>
</div>

In the “Before” case hit testing was bypassing the orange block and thus, the cursor was showing a hand because it detected that it was hovering a link. After the fix, the cursor is properly rendered as an arrow because the orange block covers the underneath link.

Computing percentages with scrollbars

In this case the issue was that, in order to compute percentages in heights, we were incorrectly using the size of the scrollbars too.
<div style="display: inline-flex; height: 10em;">
  <div style="overflow-x: scroll;">
    <div style="width: 200px; height: 100%; background: green"></div>
  </div>
</div>

Note that in the “After” picture the horizontal scrollbar background is visible while in the “Before” the wrong height computation made the flex item overlap the scrollbar.

Image items with specific sizes

The flex layout algorithm needs the intrinsic sizes of the flex items to compute their sizes and the size of the flex container. Changes to those intrinsic sizes should trigger new layouts, and the code was not doing that.
<!-- Just to showcase how the img bellow is not properly sized -->
<div style="position: absolute; background-color: red; width: 50px; height: 50px; z-index: -1;"></div>
 
<div style="display: flex; flex-direction: column; width: 100px; height: 5px;">
  <img style="width: 100px; height: 100px;" src="https://wpt.live/css/css-flexbox/support/100x100-green.png">
</div>


Nested flexboxes with ‘min-height: auto’

Another tricky one and another one related to the handling of nested column flexboxes. As in the previous issue with nested column flexboxes the problem was that we were not supporting this case. For those wanting to have a deeper understanding of the issue this bug was about implementing section 4.5 of the specs. This was one of the more complicated ones to fix, Edward Lorenz would love that part of the layout code, the slightest change in one of those source code lines could trigger huge changes in the final rendering.
<div style='display:flex; flex-direction: column; overflow-y: scroll; width: 250px; height: 250px; border: 1px solid black'>
  <div style='display:flex;'>
    <div style="width: 100px; background: blue"></div>
    <div style='width: 120px; background: orange'></div>
    <div style='width: 10px; background: yellow; height: 300px'></div>
  </div>
</div>

As it can be seen, in the “Before” picture the blue and orange blocks are sized differently to the yellow one. That’s fixed in the “After” picture.

Percentages in quirks mode

Another one affecting how percentages are computed in heights, but this one specific to quirks mode. We’re matching now Firefox, Chrome and pre-Chromium Edge, i.e., flexbox should not care much about quirks mode since it was invented many years after quirky browsers dominated the earth.
<!DOCTYPE html PUBLIC>
<div style="width: 100px; height: 50px;">
  <div style="display: flex; flex-direction: column; outline: 2px solid blue;">
    <div style="flex: 0 0 50%"></div>
  </div>
</div>

Percentages in ‘flex-basis’

Percentages were working generally fine inside flex-basis, however there was one particular problematic case. It arose whenever that percentage was refererring to, oh surprise, and indefinite height. And again, we’re talking about nested flexboxes with column flows. Indeed, definite/indefinite sizes is one of the toughest things to get right from the layout POV. In this particular case, the fix was to ignore the percentages and and treat them as height: auto.
<div style="display: flex; flex-direction: column; width: 200px;">
  <div style="flex-basis: 0%; height: 100px; background: red;">
    <div style="background: lime">Here's some text.</div>
  </div>
</div>

Flex containers inside STF tables

Fixing a couple of test cases submitted by an anonymous Opera employee 8! years ago. This is another case of competing layout contexts trying to do things their own way.
<div style="display: table; background:red">
   <div style="display: flex; width: 0px">
      <p style="margin: 1em 1em;width: 50px">Text</p>
      <p style="margin: 1em 1em;width: 50px">Text</p>
      <p style="margin: 1em 1em;width: 50px">Text</p>
   </div>
</div>

After the fix the table is properly sized to 0px width and thus no red is seen.

Conclusions

These examples are just some interesting ones I’ve chosen to highlight. In the end, almost 50 new flexbox tests are passing in WebKit that weren’t back in May!. I wouldn’t like to forget the great job done by my colleague Carlos Lopez who imported tons of WPT flexbox tests into the WebKit source tree. He also performed awesome triage work which made my life a lot easier.
Investing in interoperability is a huge deal for the web. It’s good for everyone, from spec authors to final users, including browser vendors, downstream ports or web authors. So if you care about the web, or your business orbits around web technologies, you should definitely promote and invest on interoperability.

Implementing standards or fixing bugs in web engines is the kind of work we happily do at Igalia on a daily basis. We are the second largest contributor to both WebKit and Chrome/Blink, so if you have an annoying bug on a particular web engine (Gecko and Servo as well) that you want to be fixed, don’t hesitate and contact us, we’d be glad to help. Also, should you want to be part of a workers-owned cooperative with an asambleary decision-making mechanism and a strong focus on free software technologies join us!.

Acknowledgements

Many thanks to WebKit reviewers from Apple and Igalia like Darin Adler, Manuel Rego, Javier Fernández or Daniel Bates who made the process really easy for me, always providing very nice feedback for the patches I submitted.
I’m also really thankful to Googlers like Christian Biesinger, David Grogan and Stephen McGruer who worked on the very same things in Blink and/or provided very nice guidance and support when porting patches.

September 30, 2020

GtkSourceView gets a JIT

I just merged a new regex implementation for GtkSourceView’s language specifications. Previously it used GRegex (based on PCRE) and now it uses PCRE2 directly similar to what VTE did.

Not only does this get us on a more modern PCRE implementation, but it also allows us to use new features such as a JIT.

JITs are interesting in that you can trade a little bit of memory and time to generate executable code upfront for huge gains in execution time. Given that you only compile language specifications once per regex, but execute them many, many times, it’s a worthwhile feature for GtkSoureView.

Trying to highlight the minified HTML of google.com/ won’t even highlight (due to timeouts) with GRegex. But with PCRE2 and the JIT, it can get by.

In many cases, I found that the cost to JIT was about 4x vs PCRE2 without JIT. For execution times it is about 4x reduction to use the JIT (but sometimes many times faster than that). When you run these regexes millions of times across an edited file, it can really cut down on the amount of energy consumed as well as time taken away from doing things like rendering GTK’s scene graph.

I should note that it’s about 4x improvement on a per-regex basis, so when you run potentially thousands of those in one main loop cycle, the improvement can be much more drastic in what you can do.

If you have any issues with language specifications please let us know! It’s a very large change, so I wouldn’t be surprised if there is some fallout.

Regex Creation
Language Min (msec) Max (msec) Average (msec) # of Calls Notes
C .001 .104 .007 91 gtktreeview.c
C (with JIT) .004 .383 .031 91
Difference .003 .279 .024
CSS .001 .711 .022 197 gtk-contained.css
CSS (with JIT) .004 3.147 .101 198
Difference .003 2.436 .079
Regex Execution Loading File
Language Min (msec) Max (msec) Average (msec) # Calls
C .000 .196 .003 17698 gtktreeview.c
C (with JIT) .000 .061 .001 17698
Difference -.000 -.135 -.002 ~35 ms
CSS .000 .211 .022 84347 gtk-contained.css
CSS (with JIT) .000 .061 .001 74812
Difference -.000 -.135 -.002 ~150 ms

GTK 3.99.2

The GTK 3.99.2 release continues the topics from 3.99.1: api cleanup, new and polished demos, better documentation. You can see the details here.

One small note on the topic of documentation is that we are relying on some unreleased gtk-doc features. Therefore, we now include gtk-doc as a subproject in the gtk release tarball. If you are a distributor, don’t be surprised that building GTK installs gtk-doc tools now.

The big news in this snapshot is our work on exposing the power of the new GL-based rendering stack a bit more.

Warmup: Shadertoy

gtk4-demo includes a Shadertoy demo now.

The demo using a GtkGLArea widget to run GLSL snipplets that are compatible with the ones found on shadertoy.com. Many of the examples found there will work, if you paste them into the editor of this demo.

This is fun, but somewhat limited. The GLSL is confined to its ‘sandbox’, the GtkGLArea widget, which is using GL api to compile and use the shaders.

Shaders as first-class objects

This is not our first attempt to make a shadertoy lookalike. When we first looked at it, we thought that we would make a shader abstraction that applications could use. We put it to the side when it turned out that making it work across different renderers and backends would require us to write our own shader compiler—too much work.

But after our shadertoy success, we revisited the idea of shaders as first-class objects, with more modest goals: We use GLSL, and don’t attempt to make the shaders work with anything but the OpenGL renderer.

In 3.99.2, we now have:

With these pieces in place, we made a demo that shows various uses of shaders. It is maybe a bit overloaded, and some of the effects are a bit over-the-top, but it gets the point across: you can use shaders in your widgets.

 

What we haven’t done yet is adding widgets that have shader support built-in. The demo showcases a few likely candidates:

A shader paintable. As you may recall, GdkPaintable is a very flexible interface for anything that can ‘paint’. Shaders certainly qualify. The GskShaderPaintable in gtk-demo uses a shader without input textures to just produce pixels, and we add it to a GtkPicture widget to make it appear in the widget tree.

A shader bin. This is a very simple container that can use shaders to draw effects on top of a child widget. It works with shaders that take a single input texture (for the child widget).

A shader stack. This is a stack-like container that shows one of many child widgets, and uses a shader for the transition whenever the visible child changes. It works with shaders that expect two input textures (for the old and new active child).

Thankfully, making custom widgets is a lot easier in GTK 4 than it used to be, so the render node API should be enough to get you started on some fun experiments. You can of course take the gtk4-demo code as a starting point.

You can debug it

Apart from widgets, the shader support is fully integrated. The GTK inspector can handle shader nodes like any other render nodes, you can serialize them and e.g. load the resulting file in gtk4-node-editor:

If you need to see the input that GTK sends to the shader compiler,  setting the environment variable

GDK_DEBUG=shaders

can be helpful.

Whats next?

After this GL adventure we’ll now focus on landing more of the new accessibility infrastructure.

Friends of GNOME Update September 2020

Welcome to the September 2020 edition of Friends of GNOME Update!

A red maple leaf on a tree stump
“fall leaf” by JustyCinMD is licensed under CC BY 2.0

GNOME 3.38 Orbis is out!

We released GNOME 3.38 Orbis! The release, of course, includes an amazing release video we highly recommend checking out. Release notes are available online.

GNOME on the Road

Several Foundation staff presented at GNOME Africa Onboard Virtual. Kristi Progri helped kick off the event with Foundation vice-president Regina Nkemchor Adejo. M de Blanc and Rosanna Yuen talked about the GNOME code of conduct. Melissa Wu reprised her session on What it’s Like to Be New to GNOME.

Rosanna will also be presenting at All Things Open. On October 20 at 3:30pm ET, you can catch “GNOME Foundation Then and Now — 20 years of bringing free software to the desktop.”

Community Education Challenge

Exciting things are happening with the Community Education Challenge! While the phase one winners work on their projects, our fabulous judges have been hosting office hours to discuss the Challenge. To keep up with office hours and other Challenge news, sign up for the email list.

GNOME.Asia

We’ve been working with the local team on GNOME.Asia. In addition to other developments, the Call for Proposals is open. You can submit to the CfP until 18 October 2020. GNOME.Asia 2020 will be taking place online.

Grants Strategy

We want to help you fund your GNOME projects! While the Foundation is not giving out grants, we are helping with grant applications for specific parts of the project. If you have any ideas, please add them to the wiki.

Annual Report

Each year the GNOME Foundation produces an annual report. This report covers Foundation and community activities over the past year. This year’s report is now underway.

Thank you!

As always, thank you for supporting GNOME, the Foundation, and the community!

September 29, 2020

Porting EBU R128 audio loudness analysis from C to Rust – Porting Details

This blog post is part two of a four part series

  1. Overview, summary and motivation
  2. Porting approach with various details, examples and problems I ran into along the way
  3. Performance optimizations
  4. Building Rust code into a C library as drop-in replacement

In this part I’ll go through the actual porting process of the libebur128 C code to Rust, the approach I’ve chosen with various examples and a few problems I was running into.

It will be rather technical. I won’t explain details about how the C code works but will only focus on the aspects that are relevant for porting to Rust, otherwise this blog post would become even longer than it already is.

Porting

With the warnings out of the way, let’s get started. As a reminder, the code can be found on GitHub and you can also follow along the actual chronological porting process by going through the git history there. It’s not very different to what will follow here but just in case you prefer looking at diffs instead.

Approach

The approach I’ve taken is basically the same that Federico took for librsvg or Joe Neeman’s took for nnnoiseless:

  1. Start with the C code and safe Rust bindings around the C API
  2. Look for a function or component very low in the call graph without dependencies on (too much) other C code
  3. Rewrite that code in Rust and add an internal C API for it
  4. Call the new internal C API for that new Rust code from the C code and get rid of the C implementation of the component
  5. Make sure the tests are still passing
  6. Go to 2. and repeat

Compared to what I did when porting the FFmpeg loudness normalization filter this has the advantage that at every step there is a working version of the code and you don’t only notice at the very end that somewhere along the way you made a mistake. At each step you can validate that what you did was correct and the amount of code to debug if something went wrong is limited.

Thanks to Rust having a good FFI story for interoperability with C in either direction, writing the parts of the code that are called from C or calling into C is not that much of a headache and not worse than actually writing C.

Rust Bindings around C Library

This step could’ve been skipped if all I cared about was having a C API for the ported code later, or if I wanted to work with the tests of the C library for validation and worry about calling it from Rust at a later point. In this case I had already done safe Rust bindings around the C library before, and having a Rust API made it much easier to write tests that could be used during the porting and that could be automatically run at each step.

bindgen

As a first step for creating the Rust bindings there needs to be a way to actually call into the C code. In C there are the header files with the type definitions and function declarations, but Rust can’t directly work from those. The solution to this was in this case bindgen, which basically converts the C header files into something that Rust can understand. The resulting API is completely unsafe still but can be used in a next step to write safe Rust bindings around it.

I would recommend using bindgen for any non-trivial C API for which there is no better translation tool available, or for which there is no machine-readable description of the API that could be used instead by another tool. Parsing C headers is no fun and there is very little information available in C for generating safe bindings. For example for GObject-based libraries, using gir would be a better idea as it works from a rich XML description of the API that contains information about e.g. ownership transfer and allows to autogenerate safe Rust bindings in many cases.

Also the dependency on clang makes it hard to run bindgen as part of every build, so instead I’ve made sure that the code generated by bindgen is platform independent and included it inside the repository. If you use bindgen, please try to do the same. Requiring clang for building your crate makes everything more complicated for your users, especially if they’re unfortunate enough to use Windows.

But back to the topic. What bindgen generates is basically a translation of the C header into Rust: type definitions and function declarations. This looks for example as follows

\#[repr(C)]
\#[derive(Debug, Copy, Clone)]
pub struct ebur128_state {
    pub mode: ::std::os::raw::c_int,
    pub channels: ::std::os::raw::c_uint,
    pub samplerate: ::std::os::raw::c_ulong,
    pub d: *mut ebur128_state_internal,
}

extern "C" {
    pub fn ebur128_init(
        channels: ::std::os::raw::c_uint,
        samplerate: ::std::os::raw::c_ulong,
        mode: ::std::os::raw::c_int,
    ) -> *mut ebur128_state;

    pub fn ebur128_destroy(st: *mut *mut ebur128_state);

    pub fn ebur128_add_frames_int(
        st: *mut ebur128_state,
        src: *const ::std::os::raw::c_int,
        frames: usize,
    ) -> ::std::os::raw::c_int;
}

Based on this it is possible to call the C functions directly from unsafe Rust code and access the members of all the structs. It requires working with raw pointers and ensuring that everything is done correctly at any point to not cause memory corruption or worse. It’s just like using the API from C with a slightly different syntax.

Build System

To be able to call into the C API its implementation somehow has to be linked into your crate. As the C code later also has to be modified to call into the already ported Rust functions instead of the original C code, it makes most sense to build it as part of the crate instead of linking to an external version of it.

This can be done with the cc crate. It is called into from cargo‘s build.rs for configuring it, for example for configuring which C files to compile and how. Once done it is possible to call any exported C function from the Rust code. The build.rs is not really complicated in this case

fn main() {
    cc::Build::new()
        .file("src/c/ebur128.c")
        .compile("ebur128");
}

Safe Rust API

With all that in place a safe Rust API around the unsafe C functions can be written now. How this looks in practice differs from API to API and might require some more thought in case of a more complex API to ensure everything is still safe and sound from a Rust point of view. In this case it was fortunately rather simple.

For example the struct definition, the constructor and the destructor (Drop impl) looks as follows based on what bindgen generated above

pub struct EbuR128(ptr::NonNull<ffi::ebur128_state>);

The struct is a simple wrapper around std::ptr::NonNull, which itself is a zero-cost wrapper around raw pointers that additionally ensures that the stored pointer is never NULL and allows additional optimizations to take place based on that.

In other words: the Rust struct is just a raw pointer but with additional safety guarantees.

impl EbuR128 {
    pub fn new(channels: u32, samplerate: u32, mode: Mode) -> Result<Self, Error> {
        static ONCE: std::sync::Once = std::sync::Once::new();

        ONCE.call_once(|| unsafe { ffi::ebur128_libinit() });

        unsafe {
            let ptr = ffi::ebur128_init(channels, samplerate as _, mode.bits() as i32);
            let ptr = ptr::NonNull::new(ptr).ok_or(Error::NoMem)?;
            Ok(EbuR128(ptr))
        }
    }
}

The constructor is slightly more complicated as it also has to ensure that the one-time initialization function is called, once. This requires using std::sync::Once as above.

After that it calls the C constructor with the given parameters. This can return NULL in various cases when not enough memory could be allocated as described in the documentation of the C library. This needs to be handled gracefully here and instead of panicking an error is returned to the caller. ptr::NonNull::new() is returning an Option and if NULL is passed it would return None. If this happens it is transformed into an error together with an early return via the ? operator.

In the end the pointer then only has to be wrapped in the struct and be returned.

impl Drop for EbuR128 {
    fn drop(&mut self) {
        unsafe {
            let mut state = self.0.as_ptr();
            ffi::ebur128_destroy(&mut state);
        }
    }
}

The Drop trait is used for defining what should happen if a value of the struct goes out of scope and what should be done to clean up after it. In this case this means calling the destroy function of the C library. It takes a pointer to a pointer to its state, which is then set to NULL. As such it is necessary to store the raw pointer in a local variable and pass a mutable reference to it. Otherwise the ptr::NonNull would end up with a NULL pointer inside it, which would result in undefined behaviour.

The last function that I want to mention here is the one that takes a slice of audio samples for processing

    pub fn add_frames_i32(&mut self, frames: &[i32]) -> Result<(), Error> {
        unsafe {
            if frames.len() % self.0.as_ref().channels != 0 {
                return Err(Error::NoMem);
            }

            let res = ffi::ebur128_add_frames_int(
                self.0.as_ptr(),
                frames.as_ptr(),
                frames.len() / self.0.as_ref().channels,
            );
            Error::from_ffi(res as ffi::error, || ())
        }
    }

Apart from calling the C function it is again necessary to check various pre-conditions before doing so. The C function will cause out of bounds reads if passed a slice that doesn’t contain a sample for each channel, so this must be checked beforehand or otherwise the caller (safe Rust code!) could cause out of bounds memory accesses.

In the end after calling the function its return value is converted into a Result, converting any errors into the crate’s own Error enum.

As can be seen here, writing safe Rust bindings around the C API requires reading of the documentation of the C code and keeping all the safety guarantees of Rust in mind to ensure that it is impossible to violate those safety guarantees, no matter what the caller passes into the functions.

Not having to read the documentation and still being guaranteed that the code can’t cause memory corruption is one of the big advantages of Rust. If there even is documentation or it mentions such details.

Replacing first function: Resampler

Once the safe Rust API is done, it is possible to write safe Rust code that makes use of the C code. And among other things that allows to write tests to ensure that the ported code still does the same as the previous C code. But this will be the topic of the next section. Writing tests is boring, porting code is more interesting and that’s what I will start with.

To find the first function to port, I first read the C code to get a general idea of how the different pieces fit together and what the overall structure of the code is. Based on this I selected the resampler that is used in the true peak measurement. It is one of the leaf functions of the call-graph, that is, it is not calling into any other C code and is relatively independent from everything else. Unlike many other parts of the code it is also already factored out into separate structs and functions.

In the C code this can be found in the interp_create(), interp_destroy() and interp_process() functions. The resulting Rust code can be found in the interp module, which provides basically the same API in Rust except for the destroy() function which Rust provides for free, and corresponding extern "C" functions that can be called from C.

The create() function is not very interesting from a porting point of view: it simply allocates some memory and initializes it. The C and Rust versions of it look basically the same. The Rust version is only missing the checks for allocation failures as those can’t currently be handled in Rust, and handling allocation failures like in the C code is almost useless with modern operating systems and overcommitting allocators.

Also the struct definition is not too interesting, it is approximately the same as the one in C just that pointers to arrays are replaced by Vecs and lengths of them are directly taken from the Vec instead of storing them separately. In a later version the Vec were replaced by boxed slices and SmallVec but that shouldn’t be a concern here for now.

The main interesting part here is the processing function and how to provide all the functions to the C code.

Processing Function: Using Iterators

The processing function, interp_process(), is basically 4 nested for loops over each frame in the input data, each channel in each frame, the interpolator factors and finally the filter coefficients.

unsigned int out_stride = interp->channels * interp->factor;

for (frame = 0; frame < frames; frame++) {
  for (chan = 0; chan < interp->channels; chan++) {
    interp->z[chan][interp->zi] = *in++;
    outp = out + chan;
    for (f = 0; f < interp->factor; f++) {
      acc = 0.0;
      for (t = 0; t < interp->filter[f].count; t++) {
        int i = (int) interp->zi - (int) interp->filter[f].index[t];
        if (i < 0) {
          i += (int) interp->delay;
        }
        c = interp->filter[f].coeff[t];
        acc += (double) interp->z[chan][i] * c;
      }
      *outp = (float) acc;
      outp += interp->channels;
    }
  }
  out += out_stride;
  interp->zi++;
  if (interp->zi == interp->delay) {
    interp->zi = 0;
  }
}

This could be written the same way in Rust but slice indexing is not very idiomatic in Rust and using Iterators is preferred as it leads to more declarative code. Also all the indexing would lead to suboptimal performance due to the required bounds checks. So the main task here is to translate the code to iterators as much as possible.

When looking at the C code, the outer loop iterates over chunks of channels samples from the input and chunks of channels * factor samples from the output. This is exactly what the chunks_exact iterator on slices does in Rust. Similarly the second outer loop iterates over all samples of the input chunks and the two-dimensional array z, which has delay items per channel. On the Rust side I represented z as a flat, one-dimensional array for simplicty, so instead of indexing the chunks_exact() iterator is used again for iterating over it.

This leads to the following for the two outer loops

for (src, dst) in src
    .chunks_exact(self.channels)
    .zip(dst.chunks_exact_mut(self.channels * self.factor)) {
    for (channel_idx, (src, z)) in src
        .iter()
        .zip(self.z.chunks_exact_mut(delay))
        .enumerate() {
        // insert more code here
    }
}

Apart from making it more clear what the data access patterns are, this is also less error prone and gives more information to the compiler for optimizations.

Inside this loop, a ringbuffer z / zi is used to store the incoming samples for each channel and keeping the last delay number of samples for further processing. We’ll keep this part as it is for now and use explicit indexing. While a VecDeque, or any similar data structure from other crates, could be used here, it would complicate the code more, cause more allocations. I’ll revisit this piece of code in part 3 of this blog post.

The first inner loop now iterates over all filters, of which there are factor many and chunks of size channels from the output, for which the same translation as before is used. The second inner loop iterates over all coefficients of the filter and the z ringbuffer, and sums up the product of both which is then stored in the output.

So overall the body of the second outer loop above with the two inner loops would look as follows

z[self.zi] = *src;

for (filter, dst) in self
    .filter
    .iter()
    .zip(dst.chunks_exact_mut(self.channels)) {
        let mut acc = 0.0;
        for (c, index) in &filter.coeff {
            let mut i = self.zi as i32 - *index as i32;
            if i < 0 {
                i += self.delay as i32;
            }
            acc += z[i] as f64 * c;
        }

        dst[channel_idx] = acc as f32;
    }
}

self.zi += 1;
if self.zi == self.delay {
    self.zi = 0;
}

The full code after porting can be found here.

My general approach for porting C code with loops is what I did above: first try to understand the data access patterns and then find ways to express these with Rust iterators, and only if there is no obvious way use explicit indexing. And if the explicit indexing turns out to be a performance problem due to bounds checks, first try to reorganize the data so that direct iteration is possible (which usually also improves performance due to cache locality) and otherwise use some well-targeted usages of unsafe code. But more about that in part 3 of this blog post in the context of optimizations.

Exporting C Functions

To be able to call the above code from C, a C-compatible function needs to be exported for it. This involves working with unsafe code and raw pointers again as that’s the only thing C understands. The unsafe Rust code needs to assert the implicit assumptions that can’t be expressed in C and that the calling C code has to follow.

For the interpolator, functions with the same API as the previous C code are exported to keep the amount of changes minimal: create(), destroy() and process() functions.

\#[no_mangle]
pub unsafe extern "C" fn interp_create(taps: u32, factor: u32, channels: u32) -> *mut Interp {
    Box::into_raw(Box::new(Interp::new(
        taps,
        factor,
        channels,
    )))
}

\#[no_mangle]
pub unsafe extern "C" fn interp_destroy(interp: *mut Interp) {
    drop(Box::from_raw(interp));
}

The #[no_mangle] attribute on functions makes sure that the symbols are exported as is instead of being mangled by the compiler. extern "C" makes sure that the function has the same calling convention as a corresponding C function.

For the create() function the interpolator struct is wrapped in a Box, which is the most basic mechanism to do heap allocations in Rust. This is needed because the C code shouldn’t have to know about the layout of the Interp struct and should just handle it as an opaque pointer. Box::into_raw() converts the Box into a raw pointer that can be returned directly to C. This also passes ownership of the memory to C.

The destroy() function is doing the inverse of the above and calls Box::from_raw() to get a Box back from the raw pointer. This requires that the raw pointer passed in was actually allocated via a Box and is of the correct type, which is something that can’t really be checked. The function has to trust the C code to do this correctly. The standard approach to memory safety in C: trust that everybody is writing correct code.

After getting back the Box, it only has to be dropped and Rust will take care of deallocating any memory as needed.

\#[no_mangle]
pub unsafe extern "C" fn interp_process(
    interp: *mut Interp,
    frames: usize,
    src: *const f32,
    dst: *mut f32,
) -> usize {
    use std::slice;

    let interp = &mut *interp;
    let src = slice::from_raw_parts(src, interp.channels * frames);
    let dst = slice::from_raw_parts_mut(dst, interp.channels * interp.factor * frames);

    interp.process(src, dst);

    interp.factor * frames
}

The main interesting part for the processing function is the usage of slice::from_raw_parts. From the C side, the function again has to trust that the pointer are correct and actually contains frames number of audio frames. In Rust a slice knows about its size so some conversion between the two is needed: a slice of the correct size has to be created around the pointer. This does not involve any copying of memory, it only stores the length information together with the raw pointer. This is also the reason why it’s not required anywhere to pass the length separately to the Rust version of the processing function.

With this the interpolator is fully ported and the C functions can be called directly from the C code. On the C side they are declared as follows and then called as before

typedef void * interpolator;

extern interpolator* interp_create(unsigned int taps, unsigned int factor, unsigned int channels);
extern void interp_destroy(interpolator* interp);
extern size_t interp_process(interpolator* interp, size_t frames, float* in, float* out);

The commit that did all this also adds some tests to ensure that everything still works correctly. It also contains some optimizations on top of the code above and is not 100% the same code.

Writing Tests

For testing that porting the code part by part to Rust doesn’t introduce any problems I went for the common two layer approach: 1. integration tests that test if the whole thing still works correctly, and 2. unit tests for the rewritten component alone.

The integration tests come in two variants inside the ebur128 module: one variant just testing via assertions that the results on a fixed input are the expected ones, and one variant comparing the C and Rust implementations. The unit tests only come in the second variant for now.

To test the C implementation in the integration tests an old version of the crate that had no ported code yet is pulled in. For comparing the C implementation of the individual components, I extracted the C code into a separate C file that exported the same API as the corresponding Rust code and called both from the tests.

Comparing Floating Point Numbers

The first variant is not very interesting apart from the complications involved when comparing floating point numbers. For this I used the float_eq crate, which provides different ways of comparing floating point numbers.

The first variant of tests use it by checking if the absolute difference between the expected and actual result is very small, which is less strict than the ULP check used for the second variant of tests. Unfortunately this was required because depending on the used CPU and toolchain the results differ from the expected static results (generated with one CPU and toolchain), while for the comparison between the C and Rust implementation on the same CPU the results are basically the same.

quickcheck

quickcheck is a crate that allows to write randomized property tests. This seemed like the perfect tool for writing tests that compare the two implementations: the property to check for is equality of results, and it should be true for any possible input.

Using quickcheck is simple in principle. You write a test function that takes the inputs as parameters, do the processing and then check that the property you want to test for holds by either using assertions or returning a bool or TestResult.

\#[quickcheck]
fn compare_c_impl_i16(signal: Signal<i16>) {
    let mut ebu = EbuR128::new(signal.channels, signal.rate, ebur128::Mode::all()).unwrap();
    ebu.add_frames_i16(&signal.data).unwrap();

    let mut ebu_c =
        ebur128_c::EbuR128::new(signal.channels, signal.rate, ebur128_c::Mode::all()).unwrap();
    ebu_c.add_frames_i16(&signal.data).unwrap();

    compare_results(&ebu, &ebu_c, signal.channels);
}

quickcheck will then generate random values for the function parameters via the Arbitrary impl of the given types and call it many times. If one run fails, it tries to find a minimal testcase that fails based on “shrinking” the initial failure case and then prints that failing, shrunk testcase.

And this is the part of using quickcheck that involves some more effort: writing a reasonable Arbitrary impl for the inputs that can also be shrunk in a useful way on failures.

For the tests here I came up with a Signal type. Its Arbitrary implementation creates an audio signal with 1-16 channels and multiple sine waves of different amplitudes and frequencies. Shrinking first tries to reproduce the problem with a single channel, and then halving the signal length.

This worked well in practice so far. It doesn’t cover all possible inputs but should cover anything that can fail, and the simple shrinking approach also helped to find smallish testcases if something failed. But of course it’s not a perfect solution, only a practical one.

Based on these sets of tests I could be reasonably certain that the C and Rust implementation provide exactly the same results, so I could start porting the next part of the code.

True Peak: Macros vs. Traits

C doesn’t really have any mechanisms for writing code that is generic over different types (other than void *), or any more advanced means of abstraction than functions and structs. For that reason the C code uses macros via the C preprocessor in various places to write code once for the different input types (i16, i32, f32 and f64 or in C terms short, int, float and double). The C preprocessor is just a fancy mechanism for string concatenation so this is rather unpleasant to write and read, results might not even be valid C code and the resulting compiler errors are often rather confusing.

In Rust macros could also be used for this. While more clean than C macros because of macro hygiene rules and working on a typed token tree instead of just strings, this still would end up with hard to write and read code with possibly confusing compiler errors. For abstracting over different types, Rust provides traits. These allow to write code generic over different types with a well-defined interface, and allow to do much more but I won’t cover more here.

One example of macro usage in the C code is the input processing, of which the true peak measurement is one part. In C this basically looks as follows, with some parts left out because they’re not relevant for the true peak measurement itself

static void ebur128_check_true_peak(ebur128_state* st, size_t frames) {
  size_t c, i, frames_out;

  frames_out =
      interp_process(st->d->interp, frames, st->d->resampler_buffer_input,
                     st->d->resampler_buffer_output);

  for (i = 0; i < frames_out; ++i) {
    for (c = 0; c < st->channels; ++c) {
      double val =
          (double) st->d->resampler_buffer_output[i * st->channels + c];

      if (EBUR128_MAX(val, -val) > st->d->prev_true_peak[c]) {
        st->d->prev_true_peak[c] = EBUR128_MAX(val, -val);
      }
    }
  }
}

\#define EBUR128_FILTER(type, min_scale, max_scale)                             \
  static void ebur128_filter_##type(ebur128_state* st, const type* src,        \
                                    size_t frames) {                           \
    static double scaling_factor =                                             \
        EBUR128_MAX(-((double) (min_scale)), (double) (max_scale));            \
    double* audio_data = st->d->audio_data + st->d->audio_data_index;          \
                                                                               \
    // some other code                                                         \
                                                                               \
    if ((st->mode & EBUR128_MODE_TRUE_PEAK) == EBUR128_MODE_TRUE_PEAK &&       \
        st->d->interp) {                                                       \
      for (i = 0; i < frames; ++i) {                                           \
        for (c = 0; c < st->channels; ++c) {                                   \
          st->d->resampler_buffer_input[i * st->channels + c] =                \
              (float) ((double) src[i * st->channels + c] / scaling_factor);   \
        }                                                                      \
      }                                                                        \
      ebur128_check_true_peak(st, frames);                                     \
    }                                                                          \
                                                                               \
    // some other code                                                         \
}

EBUR128_FILTER(short, SHRT_MIN, SHRT_MAX)
EBUR128_FILTER(int, INT_MIN, INT_MAX)
EBUR128_FILTER(float, -1.0f, 1.0f)
EBUR128_FILTER(double, -1.0, 1.0)

What the invocations of the macro at the bottom do is to take the whole macro body and replace the usages of type, min_scale and max_scale accordingly. That is, one ends up with a function ebur128_filter_short() that works on a const short *, uses 32768.0 as scaling_factor and the corresponding for the 3 other types.

To convert this to Rust, first a trait that provides all the required operations has to be defined and then implemented on the 4 numeric types that are supported as audio input. In this case, the only required operation is to convert the input values to an f32 between -1.0 and +1.0.

pub(crate) trait AsF32: Copy {
    fn as_f32(self) -> f32;
}

impl AsF32 for i16 {
    fn as_f32(self) -> f32 {
        self as f32 / -(i16::MIN as f32)
    }
}

// And the same for i32, f32 and f64

Once this trait is defined and implemented on the needed types the Rust function can be written generically over the trait

pub(crate) fn check_true_peak<T: AsF32>(&mut self, src: &[T], peaks: &mut [f64]) {
    assert!(src.len() <= self.buffer_input.len());
    assert!(peaks.len() == self.channels);

    for (o, i) in self.buffer_input.iter_mut().zip(src.iter()) {
        *o = i.as_f32();
    }

    self.interp.process(
        &self.buffer_input[..(src.len())],
        &mut self.buffer_output[..(src.len() * self.interp_factor)],
    );

    for (channel_idx, peak) in peaks.iter_mut().enumerate() {
        for o in self.buffer_output[..(src.len() * self.interp_factor)]
            .chunks_exact(self.channels) {
            let v = o[channel_idx].abs() as f64;
            if v > *peak {
              *peak = v;
            }
        }
    }
}

This is not a direct translation of the C code though. As part of rewriting the C code I also factored out the true peak detection from the filter function into its own function. It is called from the filter function shown in the C code a bit further above. This way it was easy to switch only this part from a C implementation to a Rust implementation while keeping the other parts of the filter, and to also test it separately from the whole filter function.

All this can be found in this commit together with tests and benchmarks. The overall code is a bit different than what is listed above, and also the latest version in the repository looks a bit different but more on that in part 3 of this blog post.

One last thing worth mentioning here is that the AsF32 trait is not public API of the crate and neither are the functions generic over the input type. Instead, the generic functions are only used internally and the public API only provides 4 functions that are specialized to the concrete input types. This keeps the API surface smaller and makes the API easier to understand for users.

Loudness History

The next component I ported to Rust was the loudness history data structure. This is used to keep a history of previous loudness measurements for giving longer-term results and it exists in two variants: a histogram-based one and a queue-based one with a maximum length.

As part of the history data structure there are also a couple of operations to calculate values from it, but I won’t go into details of them here. They were more or less direct translations of the C code.

In C this data structure and the operations on it were distributed all over the code and part of the main state struct, so the first step to port it over to Rust was to identify all those pieces and put them behind some kind of API.

This is also something that I noticed when porting the FFmpeg loudness normalization filter: Rust making it much less effort to define new structs, functions or even modules than C seems to often lead to more modular code with clear component boundaries instead of everything put together in the same place. Requirements from the borrow checker often also make it more natural to split components into separate structs and functions.

Check the commit for the full details but in the end I ended up with the following functions that are called from the C code

typedef void * history;

extern history* history_create(int use_histogram, size_t max);
extern void history_add(history* hist, double energy);
extern void history_set_max_size(history* hist, size_t max);
extern double history_gated_loudness(const history* hist);
extern double history_relative_threshold(const history* hist);
extern double history_loudness_range(const history* hist);
extern void history_destroy(history *hist);

Enums

In the C code the two variants of the history were implemented by having both of them always present in the state struct but only one of them initialized and then at every usage site having code like the following

if (st->d->use_histogram) {
    // histogram code follows here
} else {
    // queue code follows here
}

Doing it like this is error prone, easy to forget and having fields in the struct that are unused in certain configurations seems wasteful. In Rust this situation is naturally expressed with enums

enum History {
    Histogram(Histogram),
    Queue(Queue),
}

struct Histogram { ... }
struct Queue { ... }

This allows to use the same storage for both variants and at each usage site the compiler enforces that both variants are explicitly handled

match history {
    History::Histogram(ref hist) => {
        // histogram code follows here
    }
    History::Queue(ref queue) => {
        // queue code follows here
    }
}

Splitting it up like this with an enum also leads to implementing the operations of the two variants directly on their structs and only implementing the common code directly implemented on the history enum. This also improves readability because it’s immediately clear what a piece of code applies to.

Logarithms

As mentioned in the overview already, there are some portability concerns in the C code. One of them showed up when porting the history and comparing the results of the ported code with the C code. This resulted in the following rather ugly code

fn energy_to_loudness(energy: f64) -> f64 {
    #[cfg(feature = "internal-tests")]
    {
        10.0 * (f64::ln(energy) / f64::ln(10.0)) - 0.691
    }
    #[cfg(not(feature = "internal-tests"))]
    {
        10.0 * f64::log10(energy) - 0.691
    }
}

In the C code, ln(x) / ln(10) is used everywhere for calculating the base 10 logarithm. Mathematically that’s the same thing but in practice this is unfortunately not true and the explicit log10() function is both faster and more accurate. Unfortunately it’s not available everywhere in C (it’s available since C99) so was not used in the C code. In Rust it is always available so I first used it unconditionally.

When running the tests later, they failed because the results of C code where slightly different than the results of the Rust code. In the end I tracked it down to the usage of log10(), so for now when comparing the two implementations the slower and less accurate version is used.

Lazy Initialization

Another topic I already mentioned in the overview is one-time initialization. For using the histogram efficiently it is necessary to calculate the values at the edges of each histogram bin as well as the center values. These values are always the same so they could be calculated once up-front. The C code calculated them whenever a new instance was created.

In Rust, one could build something around std::sync::Once together with static mut variables for storing the data, but that would be not very convenient and also would require using some unsafe code. static mut variables are inherently unsafe. Instead this can be simplified with the lazy_static or once_cell crates, and the API of the latter is also available now as part of the Rust standard library in the nightly versions.

Here I used lazy_static, which leads to the following code

lazy_static::lazy_static! {
    static ref HISTOGRAM_ENERGIES: [f64; 1000] = {
        let mut energies = [0.0; 1000];

        for (i, o) in energies.iter_mut().enumerate() {
            *o = f64::powf(10.0, (i as f64 / 10.0 - 69.95 + 0.691) / 10.0);
        }

        energies
    };

    static ref HISTOGRAM_ENERGY_BOUNDARIES: [f64; 1001] = {
        let mut boundaries = [0.0; 1001];

        for (i, o) in boundaries.iter_mut().enumerate() {
            *o = f64::powf(10.0, (i as f64 / 10.0 - 70.0 + 0.691) / 10.0);
        }

        boundaries
    };
}

On first access to e.g. HISTOGRAM_ENERGIES the corresponding code would be executed and from that point onwards it would be available as a read-only array with the calculated values. In practice this later turned out to cause performance problems, but more on that in part 3 of this blog post.

Another approach for calculating these constant numbers would be to calculate them at compile-time via const functions. This is almost possible with Rust now, the only part missing is a const variant of f64::powf(). It is also not available as a const function in C++ so there is probably a deeper reason behind this. Otherwise the code would look exactly like the code above except that the variables would be plain static instead of static ref and all calculations would happen at compile-time.

In the latest version of the code, and until f64::powf() is available as a const function, I’ve decided to simply include a static array with the calculated values inside the code.

Data Structures

And the last topic for the history is an implementation detail of the queue-based implementation. As I also mentioned during the overview, the C code is using a linked-list-based queue, and this is exactly where it is used.

The queue is storing one f64 value per entry, which means that in the end there is one heap memory allocation of 12 or 16 bytes per entry, depending on pointer size. That’s a lot of very small allocations, each allocation is 50-100% bigger than the actual payload and that’s ignoring any overhead by the allocator itself. By this quite some memory is wasted, and by having each value in a different allocation and having to follow a pointer to the next one, any operations on all values is not going to be very cache efficient.

As C doesn’t have any data structures in its standard library and this linked-list-based queue is something that is readily available on the BSDs and Linux at least, it probably makes sense to use it here instead of implementing a more efficient data structure inside the code. But it still seems really suboptimal for this use-case.

In Rust the standard library provides an ringbuffer-based VecDeque, which offers exactly the API that is needed here, stores all values tightly packed and thus doesn’t have any memory wasted per value and at the same time provides better cache efficiency. And it is available everywhere where the Rust standard library is available, unlike the BSD queue used by the C implementation.

In practice, apart from the obvious savings in memory, this also caused the Rust implementation without any further optimizations to take only 50%-70% of the time that the C implementation took, depending on the operation.

Filter: Flushing Denormals to Zero

Overall porting the filter function from C was the same as everything mentioned before so I won’t go into details here. The whole commit porting it can be found here.

There is only one aspect I want to focus on here: if available on x86/x86-64 the MXCSR register temporarily gets the _MM_FLUSH_ZERO_ON bit set to flush denormal floating point number to zero. That is, denormals (i.e. very small numbers close to zero) as result of any floating point operation are considered to be zero. If hardware support is not available, at the end of each filter call the values kept for the next call are manually set to zero if they contain denormals.

This is done in the C code for performance reasons. Operations on denormals are generally much slower than on normalized floating point numbers and it has a measurable impact on the performance in this case.

In Rust this had to be replicated. Not only for the performance reasons but also because otherwise the results of both implementations would be slightly different and comparing them in the tests would be harder.

On the C side, this requires some build system integration and usage of the C preprocessor to decide whether the hardware support for this can be used or not, and then some conditional code that is used in the EBUR128_FILTER macro that was shown a few sections above already. Specifically this is the code

\#if defined(__SSE2_MATH__) || defined(_M_X64) || _M_IX86_FP >= 2
\#include <xmmintrin.h>
\#define TURN_ON_FTZ                                                            \
  unsigned int mxcsr = _mm_getcsr();                                           \
  _mm_setcsr(mxcsr | _MM_FLUSH_ZERO_ON);
\#define TURN_OFF_FTZ _mm_setcsr(mxcsr);
\#define FLUSH_MANUALLY
\#else
\#warning "manual FTZ is being used, please enable SSE2 (-msse2 -mfpmath=sse)"
\#define TURN_ON_FTZ
\#define TURN_OFF_FTZ
\#define FLUSH_MANUALLY                                                         \
  st->d->v[c][4] = fabs(st->d->v[c][4]) < DBL_MIN ? 0.0 : st->d->v[c][4];      \
  st->d->v[c][3] = fabs(st->d->v[c][3]) < DBL_MIN ? 0.0 : st->d->v[c][3];      \
  st->d->v[c][2] = fabs(st->d->v[c][2]) < DBL_MIN ? 0.0 : st->d->v[c][2];      \
  st->d->v[c][1] = fabs(st->d->v[c][1]) < DBL_MIN ? 0.0 : st->d->v[c][1];
\#endif

This is not really that bad and my only concern here would be that it’s relatively easy to forget calling TURN_OFF_FTZ once the filter is done. This would then affect all future floating point operations outside the filter and potentially cause a lot of problems. This blog post gives a nice example of an interesting bug caused by this and shows how hard it was to debug it.

When porting this to more idiomatic Rust, this problem does not exist anymore.

This is the Rust implementation I ended up with

\#[cfg(all(
    any(target_arch = "x86", target_arch = "x86_64"),
    target_feature = "sse2"
))]
mod ftz {
    #[cfg(target_arch = "x86")]
    use std::arch::x86::{_mm_getcsr, _mm_setcsr, _MM_FLUSH_ZERO_ON};
    #[cfg(target_arch = "x86_64")]
    use std::arch::x86_64::{_mm_getcsr, _mm_setcsr, _MM_FLUSH_ZERO_ON};

    pub struct Ftz(u32);

    impl Ftz {
        pub fn new() -> Option<Self> {
            unsafe {
                let csr = _mm_getcsr();
                _mm_setcsr(csr | _MM_FLUSH_ZERO_ON);
                Some(Ftz(csr))
            }
        }
    }

    impl Drop for Ftz {
        fn drop(&mut self) {
            unsafe {
                _mm_setcsr(self.0);
            }
        }
    }
}

\#[cfg(not(any(all(
    any(target_arch = "x86", target_arch = "x86_64"),
    target_feature = "sse2"
))))]
mod ftz {
    pub struct Ftz;

    impl Ftz {
        pub fn new() -> Option<Self> {
            None
        }
    }
}

While a bit longer, it is also mostly whitespace. The important part to notice here is that when using the hardware support, a struct with a Drop impl is returned and once this struct is leaving the scope it would reset the MXCSR register again to its previous value. This way it can’t be forgotten and would also be reset as part of stack unwinding in case of panics.

On the usage side this looks as follows

let ftz = ftz::Ftz::new();

// all the calculations

if ftz.is_none() {
    // manual flushing of denormals to zero
}

No macros are required in Rust for this and all the platform-specific code is nicely abstracted away in a separate module. In the future support for this on e.g. ARM could be added and it would require no changes anywhere else, just the addition of another implementation of the Ftz struct.

Making it bullet-proof

As Anthony Ramine quickly noticed and told me on Twitter, the above is not actually sufficient. For non-malicious code using the Ftz type everything is alright: on every return path, including panics, the register would be reset again.

However malicious (or simply very confused?) code could make use of e.g. mem::forget(), Box::leak() or some other function to “leak” the Ftz value and cause the Drop implementation to never actually run and reset the register’s value. It’s perfectly valid to leak memory in safe Rust, so it’s not a good idea to rely on Drop implementations too much.

The solution for this can be found in this commit but the basic idea is to never actually give out a value of the Ftz type but only pass an immutable reference to safe Rust code. This then basically looks as follows

mod ftz {
    pub fn with_ftz<F: FnOnce(Option<&Ftz>) -> T, T>(func: F) -> T {
        unsafe {
            let ftz = Ftz::new();
            func(Some(&ftz))
        }
    }
}

ftz::with_ftz(|ftz| {
    // do things or check if `ftz` is None
});

This way it is impossible for any code outside the ftz module to leak the value and prevent resetting of the register.

Input Processing: Order of Operations Matters

The other parts of the processing code were relatively straightforward to port and not really different from anything I already mentioned above. However as part of porting that code I ran into a problem that took quite a while to debug: Once ported, the results of the C and Rust implementation were slightly different again.

I went through the affected code in detail and didn’t notice anything obvious. Both the C code and the Rust code were doing the same, so why are the results different?

This is the relevant part of the C code

size_t i;
double channel_sum;

channel_sum = 0.0;
if (st->d->audio_data_index < frames_per_block * st->channels) {
  for (i = 0; i < st->d->audio_data_index / st->channels; ++i) {
    channel_sum += st->d->audio_data[i * st->channels + c] *
                   st->d->audio_data[i * st->channels + c];
  }
  for (i = st->d->audio_data_frames -
           (frames_per_block - st->d->audio_data_index / st->channels);
       i < st->d->audio_data_frames; ++i) {
    channel_sum += st->d->audio_data[i * st->channels + c] *
                   st->d->audio_data[i * st->channels + c];
  }
} else {
  for (i = st->d->audio_data_index / st->channels - frames_per_block;
       i < st->d->audio_data_index / st->channels; ++i) {
    channel_sum += st->d->audio_data[i * st->channels + c] *
                   st->d->audio_data[i * st->channels + c];
  }
}

and the first version of the Rust code

let mut channel_sum = 0.0;

if audio_data_index < frames_per_block * channels {
    channel_sum += audio_data[..audio_data_index]
        .chunks_exact(channels)
        .map(|f| f[c] * f[c])
        .sum();

    channel_sum += audio_data
        [(audio_data.len() - frames_per_block * channels + audio_data_index)..]
        .chunks_exact(channels)
        .map(|f| f[c] * f[c])
        .sum();
} else {
    channel_sum += audio_data
        [(audio_data_index - frames_per_block * channels)..audio_data_index]
        .chunks_exact(channels)
        .map(|f| f[c] * f[c])
        .sum();
}

The difference between the two variants is the order of floating point operations in the if branch. The C code sums up all values into the same accumulator, while the Rust code first sums both parts into a separate accumulator and then adds them together. I changed it to do exactly the same and that caused the tests to pass again.

The order in which floating point operations are done matters, unfortunately, and in the example above the difference was big enough to cause the tests to fail. And the above is a nice practical example that shows that addition on floating point numbers is actually not associative.

Last C Code: Replacing API Layer

The last step was the most satisfying one: getting rid of all the C code. This can be seen in this commit. Note that the performance numbers in the commit message are wrong. At that point both versions were much closer already performance-wise and the Rust implementation was even faster in some configurations.

Up to that point all the internal code was already ported to Rust and the only remaining C code was the public C API, which did some minor tasks and then called into the Rust code. Technically this was not very interesting so I won’t get into any details here. It doesn’t add any new insights and this blog post is already long enough!

If you check the git history, all commits that followed after this one were cleanups, addition of some new features, adding more tests, performance optimizations (see the next part of this blog post) and adding a C-compatible API around the Rust implementation (see the last part of this blog post). The main part of the work was done.

Difficulty and Code Size

With the initial porting out of the way, I can now also answer the first two questions I wanted to get answered as part of this project.

Porting the C code to Rust was not causing any particular difficulties. The main challenges were

  • Understanding what the C code actually does and mapping the data access patterns to Rust iterators for more idiomatic and faster code. This also has the advantage of making the code clearer than with the C-style for loops. Porting was generally a pleasant experience and the language did not get in the way when implementing such code.
    Also, by being able to port it incrementally I could always do a little bit during breaks and didn’t have to invest longer, contiguous blocks of time into the porting.
  • Refactoring the C code to be able to replace parts of it step by step with Rust implementations. This was complicated a bit by the fact that the C code did not factor out the different logical components nicely but instead kept everything entangled.
    From having worked a lot with C for the more than 15 years, I wouldn’t say that this is because the code is bad (it is not!) but simply because C encourages writing code like this. Defining new structs or new functions seems like effort, and even worse if you try to move code into separate files because then you also have to worry about a header file and keeping the code and the header file in sync. Rust simplifies this noticeably and the way how the language behaves encourages splitting the code more into separate components.

Now for the size of the code. This is a slightly more complicated question. Rust and the default code style of rustfmt cause code to be spread out over more lines and to have more whitespace than the structurally same code in C with the common code styles used in C. In my experience, visually Rust code looks much less dense than C code for this reason.

Intuitively I would say that I have written much less Rust code for the actual implementation than there was C code, but lacking any other metrics let’s take a look at the lines of code while ignoring tests and comments. I used tokei for this.

  • 1211 lines for the Rust code
  • 1741 lines for the C code. Of this, 494 lines are headers and 367 lines of the headers are the queue implementation. That is, there are 1247 lines of non-header C code.

This makes the Rust implementation only slightly smaller if we ignore the C headers. Rust allows to write more concise code so I would have expected the difference to be bigger. At least partially this can probably be attributed to the different code formatting that causes Rust code to be less visually dense and as a result have code spread out over more lines than otherwise.

In any case, overall I’m happy with the results so far.

I will look at another metric of code size in the last part of this blog post for some further comparison: the size of the compiled code.

Next Part

In the next part of this blog post I will describe the performance optimizations I did to make the Rust implementation at least as fast as the C one and the problems I ran into while doing so. The previous two parts of the blog post had nothing negative to say about Rust but this will change in the third part. The Rust implementation without any optimizations was already almost as fast as the C implementation thanks to how well idiomatic Rust code can be optimized by the compiler, but the last few percent were not as painless as one would hope. In practice the performance difference probably wouldn’t have mattered.

From looking at the git history and comparing the code, you will also notice that some of the performance optimizations already happened as part of the porting. The final code is not exactly what I presented above.

Tracker 3.0: It’s Here!

This is part 1 of a series. Come back next week to find out more about the changes in Tracker 3.0.

It’s too early to say “Job done”. But we’ve passed the biggest milestone on the project we announced last year: version 3.0 of Tracker is released and the rollout has begun!

We wanted to port all the core GNOME apps in a single release, and we almost achieved this ambitious goal. Nautilus, Boxes, Music, Rygel and Totem all now use Tracker 3. Photos will require 2.x until the next release. Outside of GNOME core, some apps are ported and some are not, so we are currently in a transitional period.

The important thing is only Tracker Miner FS 3 needs to run by default. Tracker Miner FS is the filesystem indexer which allows apps to do instant search and content discovery.

Since Photos 3.38 still uses Tracker 2.x we have modified it to start Tracker Miner FS 2 along with the app. This means the filesystem index in the central Tracker 2 database is kept up-to-date while Photos is running. This will increase resource usage, but only while you are using Photos. Other apps which are not yet ported may want to use the same method while they finish porting to Tracker 3 — see Photos merge request 142 to see how it’s done.

Flatpak apps can safely use Tracker Miner FS 3 on the host, via Tracker’s new portal which guards access to your data based on the type of content. It’s up to the app developer whether they use the system Tracker Miner service, or whether they run another instance inside the sandbox. There are upsides and downsides to both approaches.

We published some guidance for distributors in this thread on discourse.gnome.org.

Gratitude

We all owe thanks to Carlos for his huge effort re-thinking and re-implementing the core of Tracker. We should also thank Red Hat for sponsoring some of this work.

I also want to thank all the maintainers who collaborated with us. Marinus and Jean were early adopters in GNOME Music and gave valuable feedback including coming to the regular meetings, along with Jens who also ported Rygel early in the cycle. Bastien dug into reviewing the tracker3 grilo plugin, and made some big improvements for building Tracker Miners inside a Flatpak. In Nautilus, Ondrej and Antonio did some heroic last minute review of my branch and together we reworked the Starred Files feature to fix some long standing issues.

The new GNOME VM images were really useful for testing and catching issues early. The chat room is very responsive and friendly, Abderrahim, Jordan and Valentin all helped me a lot to get a working VM with Tracker 3.

GNOME’s release team were also responsive and helpful, right up to the last minute freeze break request which was crucial to avoiding a “Tracker Miner FS 2 and 3 running in parallel” scenario.

Thanks also to GNOME’s translation teams for keeping up with all the string changes in the CLI tool, and to distro packagers who are now working to make Tracker 3 available
to you.

Coming soon to your distro.

It takes time for a new GNOME release to reach users, because most distros have their own testing phase.

We can use Repology to see where Tracker 3 is available. Note that some distros package it in a new tracker3 package while others update the existing tracker package.

Let’s see both:

Packaging status Packaging status

Coming up…

I have a lot more to write about following the Tracker 3.0 release. I’ll be publishing a series of blog posts over the next month. Make sure you subscribe to my blog or to Planet GNOME to see them all!

Building GStreamer on Windows the Correct Way

For the past 4 years, Tim and I have spent thousands of hours on better Windows support for GStreamer. Starting in May 2016 when I first wrote about this and then with the first draft of the work before it was revised, updated, and upstreamed.

Since then, we've worked tirelessly to improve Windows support in GStreamer  with patches to many projects such as the Meson build system, GStreamer's Cerbero meta-build system, and writing build files for several non-GStreamer projects such as x264, openh264, ffmpeg, zlib, bzip2, libffi, glib, fontconfig, freetype, fribidi, harfbuzz, cairo, pango, gtk, libsrtp, opus, and many more that I've forgotten. 

More recently, Seungha has also been working on new GStreamer elements for Windows such as d3d11, mediafoundation, wasapi2, etc. Sometimes we're able to find someone to sponsor all this work, but most of the time it's on our own dime.

Most of this has been happening in the background; noticed only by people who follow GStreamer development. I think more people should know about the work that's been happening upstream, and the official and supported ways to build GStreamer on Windows. Searching for this on Google can be a very confusing experience with the top results being outdated links or just plain clickbait.

So here's an overview of your options when you want to use GStreamer on Windows:

Installing GStreamer on Windows

 
GStreamer has released MinGW binary installers for Windows since the early 1.0 days using the Cerbero meta-build system which was created by Andoni for the non-upstream "GStreamer SDK" project, which was based on GStreamer 0.10.
 
Today it supports building GStreamer with both MinGW and Visual Studio, and even supports outputting UWP packages. So you can actually go and download all of those from the download page:


This is the easiest way to get started with GStreamer on Windows.
 

Building GStreamer yourself for Deployment

 
If you need to build GStreamer with a custom configuration for deployment, the easiest option is to use Cerbero, which is a meta-build system. It will download all the dependencies for you (including most of the build-tools), build them with Autotools, CMake, or Meson (as appropriate), and output a neat little MSI installer.
 
The README contains all the information you need, including screenshots for how to set things up:


As of a few days ago, after months of work the native Cerbero Windows builds have also been integrated into our Continuous Integration pipeline that runs on every merge request, which further improves the quality of our Windows support. We already had native Windows CI using gst-build, but this increases our coverage.

Contributing to GStreamer on Windows

 
If you want to contribute to GStreamer from Windows, the best option is to use gst-build (created by Thibault), which is basically a meson 'wrapper' project that has all the gstreamer repositories aggregated as subprojects. Once again, the README file is pretty easy to follow and has screenshots for how to set things up:


This is also the method used by all GStreamer developers to hack on gstreamer on all platforms, so it should work pretty well out of the box, and it's tested on the CI. If it doesn't work, come poke us on #gstreamer on FreeNode IRC or on the gstreamer mailing list.
 

It's All Upstream.

 
You don't need any special steps, and you don't need to read complicated blog posts to build GStreamer on Windows. Everything is upstream.

This post previously contained examples of such articles and posts that are spreading misinformation, but I have removed those paragraphs after discussion with the people who were responsible for them, and to keep this post simple. All I can hope is that it doesn't happen again.

September 28, 2020

20 Million Downloads from the LVFS

A few hours ago the LVFS provided its 20 millionth firmware update and although it’s just another somewhat unusual base-10 number, it’s an achievement I’m immensely proud of. As one of my friends said last week, “20 million of anything is a big deal”. Right from the start, the fwupd daemon and LVFS website data provider was a result of collaboration between many different companies and open source projects, and is now cemented as an integral part of the firmware ecosystem. People building open source projects, especially low level infrastructure like this, are not good at celebrating success and it’s no wonder so many talented maintainers burn out over long years of dedicated service. This post celebrates some of the things we’ve done.

Little known to most people, fwupd and the LVFS grew out of the frustration of distributing the ColorHug firmware. If you bought one of those devices all those years ago, you can know you were a tiny part in starting all this. I still use ColorHug devices for all kinds of automated firmware testing, perhaps even more so than for screen calibration. My experience building OpenHardware devices really pushed me to make the LVFS free-for-all, on the logic that I wouldn’t have been able to justify even a $100/year subscription. Certainly making the service free in all respects meant that it was almost risk-free for companies to test the service.

Now the LVFS analyses uploaded firmware for security problems, keeps millions of devices up to date, and also helps governments buy secure hardware using initiatives like the upcoming Host Security ID that I’ll talk more about in future blog posts. How many devices we’ve updated is impossible to know exactly as many large companies and departments mirror the entire LVFS; we just know it’s at least 20 million. In reality it’s probably a single digit multiple of that, although there’s no real way of knowing. We know 1.5 million people have sent the optional “it works for me” report we ask from CLI users, and given the CLI downloads account for ~1% of all downloads it could be a lot higher than 20 million.

A huge number of devices are supported on the LVFS now. There are currently 2393 different public firmwares uploaded by 1401 users from 106 different vendors, using 39 different protocols to update hardware. We’ve run 40,378 automated tests on those files, and extracted 1,170,543 file volume objects which can be scanned by YARA. All impressive numbers I’m sure you’ll agree.

There is one specific person I would like to thank first, my co-maintainer for both projects: Mario Limonciello who is a Senior Principal Software Development Engineer at Dell. Mario has reviewed thousands of my patches over the years, and contributed hundreds himself. I really appreciate his trust and also his confidence to tell me the half-baked and incomplete thing I’m proposing is actually insane. Together we’ve created an architecture that’s easy to maintain with a clean and modern design. Dell were also the first major laptop OEM to tell their suppliers “you need to ship firmware updates using the LVFS” and so did a lot of the initial plumbing work. We re-used most of the initial plugins when other OEMs decided they’d like to join the initiative later.

Peter Jones is another talented member of our team at Red Hat and wrote a lot of the low level UEFI code we’ve used millions of times. Peter has to understand all the crazy broken things that firmware vendors decide to do, and is responsible for most of the EFI code in fwupd. Over the years fwupd has absorbed two of his projects, fwupdate and most recently dbxtool. Without Peter there would have been no UpdateCapsule support, and that’s about half the updates on the LVFS.

Also notable to mention here is Logitech. They’ve shipped a ton of firmware (literally, millions) for their Unifying hardware and were also early adopters of the LVFS and fwupd. Nestor Lopez Casado, many thanks for all your help over the years and I’m glad MouseJack made all this a requirement :)

I’d also like to thank Lenovo; not a specific person, as Lenovo is split up into ThinkPad, ThinkStation and ThinkCenter groups and in each the engineers would probably like to remain anonymous. Lenovo as a combined group is shipping a huge amount of firmware now via the LVFS, and most of the Lenovo supply chain is already wired up to supporting the LVFS. A lot of the ODMs for Lenovo have had to actually install Fedora and learn how to program with GLib C to create a fwupd plugins to support laptop models that are not even on the shelves yet. Training up dozens of people in “how to write a fwupd plugin and deal with a grumpy maintainer” took a lot of time, but now we have companies building custom silicon submitting ready-to-roll plugins, fixes and even enhancements to the GUI tools.

The LVFS isn’t just a way for OEMs to distribute firmware, like a shared FTP site. The LVFS plumbs-itself into the ODM and ISV relationships, so we can get a pipeline right from the firmware author, all the way to the end user. As ODMs such as Wistron and Foxconn use the LVFS for one OEM, it’s very simple for them to also support other OEMs. The feedback loop from vendors to users and back to vendors again has been invaluable when debugging problems with specific firmware releases.

More recently various groups at Google have also been pushing suppliers of the Chromebook ecosystem to use fwupd and the LVFS. I’ve been told there’s now a “fwupd group” inside Google and I know that there are more than a few different models that will only be updatable using fwupd. Google are also using some of the consulting companies familiar with LVFS and fwupd so that it’s not just me explaining how this works over and over to various different small ODMs. I think in the next year this consulting side will explode and help grow the ecosystem even further.

I’d love to list all the OEMs, ODMs, ISVs and consultants that have helped over the last 5 years, but this blog entry would be even larger than it already is. Just know that I appreciate your support, help and guidance.

Talk of growing the ecosystem, the Linux Foundation are taking over the actual site maintenance; I’m not a sysadmin, and Terraform and Docker still scare me. This Christmas we’ll move the little VM I have running in Amsterdam to a proper scalable architecture with 24/7 support. This will let us provide some kind of uptime guarantee and also means I don’t have to worry about applying a security update when I’m on holiday without internet access. The Linux Foundation have been paying the entire CDN cost for the last few years, and for that my wallet is truly grateful.

Finally, I must also thank Red Hat for letting me work on this stuff. Over the last few years fwupd has gone from my “20%” hobby to almost taking over all my developer time. Red Hat doesn’t get enough credit for all the essential plumbing they tirelessly do, and without them paying my salary every month there is no way this kind of free-to-upload and free-to-download service could exist. I firmly believe that fwupd is mutually beneficial to all the Red Hat partners like Intel, Dell and Lenovo – both so that more hardware gets purchased from OEMs, and also so that customers running Fedora and RHEL have up to date firmware.

At a conference last year, I presented a talk where the penultimate slide was “the LVFS is just a website that runs cron jobs” and I had someone I respect come up to me afterwards and tell me something in a stern voice I’ll remember forever: “You didn’t just create a website – you changed an industry!

Lets look forward to the next 20 million updates.

September 26, 2020

Synced plaintext TODO and notes

Being a paper hater, I have kept all my work and private notes in digital form pretty much forever. The tools have changed over the times of course, but I’m really happy with my current system now. I am a strong proponent of plain-text formats for just about everything, due to being simple, efficient, universal, implemementation/tool agnostic, and effectively trackable in revision control. I spend my entire work life as a software developer in vim and mutt, so it’s just straightforward to do the same for notes and TODO lists.

September 25, 2020

AppStore Reviews Should be Stricter

Since the AppStore launched, developers have complained about the review process as too strict. Applications are mostly rejected either for not meeting requirements, not having enough functionality or circumventing Apple’s business model.

Yet, the AppStore reviews are too lax and they should be much stricter.

Let me explain why I think so, what I believe some new rules need to be, and how the AppStore can be improved.

Prioritizing the Needs of the Many

Apple states that they have 28 million registered developers, but I believe that only a fraction of those are actively developing applications on a daily basis. That number is closer to 5 million developers.

I understand deeply why developers are frustrated with the AppStore review process - I have suffered my fair share of AppStore rejections: both by missing simple issues and by trying to push the limits of what was allowed. I founded Xamarin, a company that built tools for mobile developers, and had a chance to become intimately familiar with the rejections that our own customers got.

Yet, there are 1.5 billion active Apple devices, devices that people trust to be keep their data secure and private. The overriding concern should be the 1.5 billion active users, and not the 0.33% (or 1.86% if you are feeling generous).

People have deposited their trust on Apple and Google to keep their devices safe. I wrote about this previously. While it is an industry sport to make fun of Google, I respect the work that Google puts on securing and managing my data - so much that I have trusted them with my email, photographs and documents for more than 15 years.

I trust both companies both because of their public track record, and because of conversations that I have had with friends working at both companies about their processes, their practices and the principles that they care about (Keeping up with Information Security is a part-time hobby of minex).

Today’s AppStore Policies are Insufficient

AppStore policies, and their automated and human reviews have helped nurture and curate the applications that are available. But with a target market as large and rich as iOS and Android these ecosystems have become a juicy target for scammers, swindlers, gangsters, nation states and hackers.

While some developers are upset with the Apple Store rejections, profiteers have figured out that they can make a fortune while abiding by the existing rules. These rules allow behaviors that are in either poor taste, or explicitly manipulating the psyche of the user.

First, let me share my perspective as a parent, and

I have kids aged 10, 7 and 4, and my eldest had access to an iPad since she was a year old, and I have experienced first hand how angering some applications on the AppStore can be to a small human.

It breaks my heart every time they burst out crying because something in these virtual worlds was designed to nag them, is frustrating or incomprehensible to them. We sometimes teach them how to deal with those problems, but this is not always possible. Try explaining to a 3 year old why they have to watch a 30 seconds ad in the middle of a dinosaur game to continue playing, or teach them that at arbitrary points during the game tapping on the screen will not dismiss an ad, but will instead take them to download another app, or direct them another web site.

This is infuriating.

Another problem happens when they play games defective by design. By this I mean that these games have had functionality or capabilities removed that can be solved by purchasing virtual items (coins, bucks, costumes, pets and so on).

I get to watch my kids display a full spectrum of negative experiences when they deal with these games.

We now have a rule at home “No free games or games with In-App Purchases”. While this works for “Can I get a new game?”, it does not work for the existing games that they play, and those that they play with their friends.

Like any good rule, there are exceptions, and I have allowed the kids to buy a handful of games with in-app purchases from reputable sources. They have to pay for those from their allowance.

These dark patterns are not limited applications for kids, read the end of this post for a list of negative scenarios that my followers encountered that will ring familiar.

Closing the AppStore Loopholes

Applications using these practices should be banned:

  • Those that use Dark Patterns to get users to purchase applications or subscriptions: These are things like “Free one week trial”, and then they start charging a high fee per week. Even if this activity is forbidden, some apps that do this get published.

  • Defective-by-design: there are too many games out there that can not be enjoyed unless you spend money in their applications. They get the kids hooked up, and then I have to deal with whiney 4 year olds, 7 year olds and 10 year old to spend their money on virtual currencies to level up.

  • Apps loaded with ads: I understand that using ads to monetize your application is one way of supporting the development, but there needs to be a threshold on how many ads are on the screen, and shown by time, as these apps can be incredibly frustrating to use. And not all apps offer a “Pay to remove the ad”, I suspect because the pay-to-remove is not as profitable as showing ads non-stop.

  • Watch an ad to continue: another nasty problem are defective-by-design games and application that rather than requesting money directly, steer kids towards watching ads (sometimes “watch an ad for 30 seconds”) to get something or achieve something. They are driving ad revenue by forcing kids to watch garbage.

  • Install chains: there are networks of ill-behaved applications that trick kids into installing applications that are part of their network of applications. It starts with an innocent looking app, and before the day is over, you are have 30 new scammy apps installed on your machine.

  • Notification Abuse: these are applications that send advertisements or promotional offers to your device. From discount offers, timed offers and product offerings. It used to be that Apple banned these practices on their AppReview guidelines, but I never saw those enforced and resorted to turning off notifications. These days these promotions are allowed. I would like them to be banned, have the ability to report them as spam, and infringers to have their notification rights suspended.

  • ) Ban on Selling your Data to Third Parties: ban applications that sell your data to third parties. Sometimes the data collection is explicit (for example using the Facebook app), but sometimes unknowingly, an application uses a third party SDK that does its dirty work behind the scenes. Third party SDKs should be registered with Apple, and applications should disclose which third party SDKs are in use. If one of those 3rd party SDKs is found to abuse the rules or is stealing data, all applications that rely on the SDK can be remotely deactivated. While this was recently in the news this is by no means a new practice, this has been happening for years.

One area that is grayer are Applications that are designed to be addictive to increase engagement (some games, Facebook and Twitter) as they are a major problem for our psyches and for our society. Sadly, it is likely beyond the scope of what the AppStore Review team can do. One option is to pass legislation that would cover this (Shutdown Laws are one example).

Changes in the AppStore UI

It is not apps for children that have this problem. I find myself thinking twice before downloading applications with "In App Purchases". That label has become a red flag: one that sends the message "scammy behavior ahead"

I would rather pay for an app than a free app with In-App Purchases. This is unfair to many creators that can only monetize their work via an In-App Purchases.

This could be addressed either by offering a free trial period for the app (managed by the AppStore), or by listing explicitly that there is an “Unlock by paying” option to distinguish these from “Offers In-App Purchases” which is a catch-all expression for both legitimate, scammy or nasty sales.

My list of wishes:

  • Offer Trial Periods for applications: this would send a clear message that this is a paid application, but you can try to use it. And by offering this directly by the AppStore, developers would not have to deal with the In-App purchase workflow, bringing joy to developers and users alike.

  • Explicit Labels: Rather than using the catch-all “Offers In-App Purchases”, show the nature of the purchase: “Unlock Features by Paying”, “Offers Subscriptions”, “Buy virtual services” and “Sells virtual coins/items”

  • Better Filtering: Today, it is not possible to filter searches to those that are paid apps (which tend to be less slimy than those with In-App Purchases)

  • Disclose the class of In-App Purchases available on each app that offers it up-front: I should not have to scroll and hunt for the information and mentally attempt to understand what the item description is to make a purchase

  • Report Abuse: Human reviewers and automated reviews are not able to spot every violation of the existing rules or my proposed additional rules. Users should be able to report applications that break the rules and developers should be aware that their application can be removed from circulation for breaking the rules or scammy behavior.

Some Bad Practices

Check some of the bad practices in this compilation

September 24, 2020

Some Bad App Practices

Some bad app patterns as some followers described them

You can read more in the replies to my request a few weeks ago:

September 23, 2020

GNOME Shell user research goings on

It’s been a while since we last blogged about the GNOME Shell design work that’s been happening. While we might not have blogged in a bit, there’s been a lot going on behind the scenes, particularly on the research side, and it’s about time that we told everyone about what we’ve been up to.

As a side note: a great team has developed around this initiative. The existing design team of Jakub, Tobias and myself has been joined by Maria Komarova from System76. Maria has a particularly strong research background and was immensely helpful in running interviews. The development side has also been fully engaged with the process, particularly through Georges and Florian.

Research and goals

In addition to thinking and discussing the design goals that we outlined previously (here and here), the design team has conducted two research exercises over the past six months. The main research exercise was a series of interviews with existing GNOME users. We also ran a small survey, to get some baseline numbers on user behaviour. I’ll mostly be talking about the interviews here.

The goals of the interviews was to a) get general information on how the shell gets used, b) evaluate how the existing shell design is performing, and c) identify ways in which the shell could be improved for our users. In addition to these fairly specific goals, we also wanted to develop a deeper understanding of our users: how they use their computers, their concerns, predispositions, interests and attitudes.

Interview-based research is a classic research approach for answering these kinds of questions, which are particularly appropriate to ask early on in a design process, when the emphasis is on exploration, insight and understanding. (Some readers might be more familiar with usability testing as the means through which data-driven design gets done, and that is a useful approach, but it typically happens later in the process.)

The survey

The survey element of our research was a smaller exercise than the interviews, so I won’t go into it in much detail. It was solely conducted within the confines of Red Hat’s Desktop Team, purely for convenience’s sake, and was able to provide some initial numbers for use by the design team.

The survey was an opportunity to ask users about their number of running apps, open windows, workspace usage, and a number of other behavioural factors. The key goal of the exercise was to identify typical workloads as well as the extremes, in order to inform our design requirements around the shell and the overview.

For example, through this exercise, we found out that the modal number of open windows was 8, and the highest number was 28. We also found that most people (63%) were using a single workspace, and that the largest number of workspaces in use was 10.

Clearly, such a limited exercise has to be treated with a high degree of caution, but some quick preliminary numbers were a useful frame of reference and are something that we hope to build upon in the future.

The interviews

The rest of this post will be devoted to the interview exercise that we conducted. Here’s what we did…

We conducted seven interviews in total, each conducted remotely. Interviewees were recruited from through the team’s professional contacts. We avoided talking to anyone who is directly involved in desktop development, and we tried to recruit research participants with a range of roles and technical backgrounds. We had some developers, but we also had people who work in sales, marketing and support.

During each interview, we got general background information about the interviewee and their computing activities. Then we conducted a screen sharing exercise in which the interviewee showed us how their desktop was set up. Then we ran through some exercises to see how they performed basic tasks.

We also used the interviews to ask evaluative questions, to see what the interviewee thought about the current shell design, and how it compares to other desktops.

We took notes on each interview. We mapped out the responses we got. We identified themes and patterns.

Research results

Let’s start by talking about the user behaviours that we observed. A quick summary:

  • The number of apps in use was generally low, 3-5 being the common range, both in the interviews and the survey.
  • Search was prominent as a way of launching apps, but we also observed occasional pointer-driven app launching (curiously, this seemed common when launching some apps and not others).
  • Engagement with the app grid was very low.
  • About half the interviewees had customised the favourites in the dash.
  • Workspace usage was quite mixed: some used workspaces extensively, while others never touched them [1, 2].

These findings have direct relevance to the design areas that the team are investigating. For example, they imply that our designs need to cater both to users who use workspaces extensively, and those who don’t.

The results also posed some conundrums. For instance, is lack of engagement with the current app grid an indication that its design needs to be improved, or that the feature should be demoted or relegated, somehow?

Likes and dislikes

While much of the research findings related to observed behaviours, the interviews also allowed us to ask evaluative research questions. We wanted to know how the existing GNOME Shell design is performing, and how it can be improved.

The main thing that came across here is that overall, the existing design appeared to be working well for the users we spoke to. With one exception, all the interviewees thought that the design compared favourably with other desktops they’d used. (Of course, this insight needs to be taken with a touch of salt [3].)

Most of the interviewees identified things that they like about the current shell design. They seemed to really like the search feature, in particular its speed, but also that it’s just a single keystroke away. The overview also received regular praise (one interviewee described it as “fantastic”).

The other thing that repeatedly came up as a positive aspect of the existing design is its none-distracting nature. Interviewees paid the shell compliments by saying things like “it’s not cluttered” and “it gets out of my way”.

The main issues with the current design that came out of the research was the frequent lack of engagement with the dash and app grid. We also got a hint that onboarding might be a struggle for some, and we got a clear message that touchpad gestures need to be improved (confirming one of our initial hypotheses).

What’s next?

We are already planning additional rounds of research (which we will endeavour to blog about in a more timely fashion!) Some of these will be open to wider participation so, if you are interested, we would love to have you involved!

The design team is also exploring various concepts for the shell, which we hope to present in the not too distant future. Indeed, one of the reasons for all this research work is to help us be confident about the designs before we present them to the world! Watch this space.

Footnotes

[1] There was also behavioural variation within the group of workspace users: some had developed systems for themselves, with certain workspaces dedicated to particular activities, while others allowed their workspace usage to evolve organically as they worked.

[2] Workspace usage did not appear to be positively related to technical expertise. If anything, the opposite was true, with the less technically-expert making more use of workspaces rather than less.

[3] Some notes on: representativeness and generalisability. How can only seven users provide useful results? Won’t we have missed critical user types? Won’t the sample be unrepresentative?

The first thing to say here is that small-N studies (those with a small number of participants) are valuable and can deliver significant insights, and have important advantages over large-N surveys. In particular, they allow accurate observations to be made, without making assumptions about what’s going to be seen.

Second, small-N studies like this allow relationships to be identified, and these can be used to infer qualities of a wider population. For example, if you observe that technically expert users tend to use the keyboard more, and you know that you have a lot of technically expert users, you can make a guess that keyboard use is important for the population as a whole.

Third and finally, experience tells us that small research studies can and do pick up on general trends and behaviours with remarkable speed (such as with the rule of five in user testing).

That said, we certainly do need to acknowledge the non-representative nature of our interview sample. We did include some relatively non-technical users, but the sample as a whole was skewed towards the more technical/professional end of the spectrum, and was taken from organisations that are invested in the Linux/GNOME desktop to varying degrees.

This doesn’t mean that the results we got are invalid. We are confident that our research findings are accurate with regards to the people we spoke to, and that those people are indicative of a larger section of the GNOME user population. The only question is is whether there are other types of user who we didn’t include.

September 22, 2020

GtkSourceView Next

Earlier this year I started a branch to track GTK 4 development which is targeted for release by end-of-year. I just merged it which means that our recently released gtksourceview-4-8 branch is going to be our LTS for GTK 3. As you might remember from the previous maintainer, GtkSourceView 4.x is the continuation of the GtkSourceView 3.x API with all the deprecated API removed and a number of API improvements.

Currently, GtkSourceView.Next is 5.x targeting the GTK 4.x API. It’s a bit of an unfortunate number clash, but it’s been fine for WebKit so we’ll see how it goes.

It’s really important that we start getting solid testing because GtkSourceView is used all over the place and is one of those “must have” dependencies when moving to a new GTK major ABI.

Preparations in GTK 4

Since I also spend time contributing to GTK, I decided to help revamp GtkTextView for GTK 4. My goal was to move various moving parts into GtkTextView directly so that we could make them more resilient.

Undo Support

One feature was undo support. GTK 4 now has native support for undo by implementing text history in a compact form within GTK itself. You can now set the enable-undo properties to TRUE on GtkTextView, GtkEditable widgets like GtkText or GtkEntry, and others.

GPU Rendered Text (sort of)

Matthias Clasen and I sat down one afternoon last year and wrote a new PangoRenderer for GSK using render nodes and the texture atlas provided by the OpenGL and Vulkan renderers. Since then, GtkTextView gained a GtkTextLineDisplay cache so that we can keep these immutable render nodes around across multiple snapshots.

Text is still rendered on the CPU into a texture atlas, which is uploaded to the GPU and re-used when possible. Maybe someday things like pathfinder will provide a suitable future.

GtkTextView and Widgets

Previously, the gutters for GtkTextView were simply a GdkWindow which could be rendered to with Cairo. This didn’t fit well into the “everything should be a widget” direction for GTK 4. So now you can pack a widget into each of the 4 gutters around the edges of a GtkTextView. This means you can handle input better too using GtkGesture and GtkEventControllers. More importantly, though, it means you can improve performance of gutter rendering using snapshots and cached render nodes when it makes sense to do so.

Changes in GtkSourceView Next

Moving to a new major ABI is a great time to do cleanups too as it will cause the least amount of friction. So I took this opportunity to revamp much of the GtkSourceView code. We follow more modern GObject practices and have bumped our compiler requirements to closely match GTK 4 itself. This still means no g_autoptr() usage from within GtkSourceView sadly thanks to MSVC being … well the worse C compiler still in wide use.

GtkSourceGutterRenderer is now a GtkWidget

Now that we have margins which can contain widgets and contribute to the render node tree, both GtkSourceGutter and GtkSourceGutterRenderer are GtkWidget. This will mean you need to change custom gutter renderers a bit, but in practice it means a lot less code than they previously contained. It also makes supporting HiDPI much easier.

GtkSourceCompletion Revamp

I spent a lot of time making completion a pleasing experience in GNOME Builder and that work has finally made it upstream. To improve performance and simplicity of implementation, this has changed the GtkSourceCompletionProvider and GtkSourceCompletionProposal interfaces in significant ways.

GtkSourceCompletionProposal is now a mostly superfluous type used to denote a specialized GObject. It doesn’t have any functions in the vtable nor any properties currently and the goal is to avoid adding them. Simply G_IMPLEMENT_INTERFACE (GTK_SOURCE_TYPE_COMPLETION_PROPOSAL, NULL) when defining your proposal object GType.

This is because all of the completion provider implementation can now be performed from GtkSourceCompletionProvider. This interface focus on using interfaces like GListModel (like the rest of GTK 4) and how to asynchronously generate and refine the results with additional key-presses.

The completion window has been revamped and now allows proposals to fill a number of columns including an icon, return-type (Left Hand Side), Typed Text, and supplementary text. It resizes with content and ensures that we only inflate the number of GObjects necessary to view the current set. A fixed number of widgets are also created to reduce CSS and measurement costs.

Further, proposals may now have “alternates” which allows for providers to keep all of the DoSomething() proposals with 20 overloaded forms for each base type in whatever language of the day is being used from clogging up the suggestions.

The new GtkSourceCompletionCell widget is a generic container used throughout completion for everything from containing icons, text, or even custom widgetry for the completion details popover.

Completion Preview

GtkSourceGutterLines

A new abstraction, GtkSourceGutterLines, was added to help reduce overhead in generation of content in the gutter. The design of gutters lead to an exorbitant amount of measurement work on every frame. This was actually the biggest hurdle in making GTK 3 applications scroll smoothly. The new design allows for all the renderers to collect information about lines in one pass (along with row height measurements) and then snapshot in their second pass. Combined with the ability to cache render nodes, gutter renderers should have what they need to remain fast even in HiDPI environments.

The implementation of this also has a few nice details to further reduce overhead, but I’ll leave that to those interested in reading the code.

GtkSourceBuffer::cursor-moved

GtkSourceBuffer now has a cursor-moved signal. This seemed to be something implemented all over the place so we might as well have it upstream.

Reduce signal emission overhead

A number of places have had signal emission overhead reduced. Especially in property notifications.

Spaces Drawing

The GtkSourceSpaceDrawer now caches render nodes for drawing spaces. This should improve the performance in the vast majority of cases. However, one case still could be improved upon: tabs when the tab width changes (generally when used after text or spaces).

New Features

Snippets

A new snippet engine has landed based on a much improved version from GNOME Builder. You can provide bundles using an XML snippets file. You can also create them dynamically from your application and insert them into the GtkSourceView. In fact, many completion providers are expected to do this.

The snippet language is robust and shares many features and implementation details from GNOME Builder.

Assistants

A new subsystem, GtkSourceAssistant is used to provide accessory information in a GtkSourceView. Currently this type is private and an implementation detail. However, GtkSourceCompletion and GtkSourceSnippet build upon it to provide some of their features. In the long term, we expect hover providers to also take advantage of this subsystem.

Sysprof Support

GtkSourceView now uses the Sysprof collector API just like GTK 4 does (among many other GNOME projects). This means you can get profiling information about renderings right in the Sysprof visualizer along other data.

Future Work

PCRE2

With GRegex on the chopping block for deprecation, it’s time to start moving to PCRE2 much like VTE did. Doing so will not only make us more deprecation safe, but ensure that we can actually use the JIT feature of the regex engine. With how much regexes are used by the highligting engine, this should be a fairly sizable improvement.

This has now been implemented.

Hover Providers

In GNOME Builder, we added an abstraction for “Hover Providers”. This is also a thing in the Language Server Protocol realm. Nothing exists upstream in GtkSourceView for this and that should probably change. Otherwise all the trickyness in making transient popovers work is put on application authors.

Style Schemes

I would like to remove or revamp some of our default style schemes. They do not handle the world of dyanmic GTK themes so well and become a constant source of bug reports by applications that want a “one size fits all” style scheme. I’m not sure yet on the complete right answer long term here, but my expectation is that we’d want to move toward a default style scheme that is mostly font changes rather than color changes which eventually fall apart on the more … interesting themes.

Anyway, that’s all for now!

Porting EBU R128 audio loudness analysis from C to Rust

Over the last few weeks I ported the libebur128 C library to Rust, both with a proper Rust API as well as a 100% compatible C API.

This blog post will be split into 4 parts that will be published over the next weeks

  1. Overview and motivation
  2. Porting approach with various details, examples and problems I ran into along the way
  3. Performance optimizations
  4. Building Rust code into a C library as drop-in replacement

If you’re only interested in the code, that can be found on GitHub and in the ebur128 crate on crates.io.

The initial versions of the ebur128 crate was built around the libebur128 C library (and included its code for ease of building), version 0.1.2 and newer is the pure Rust implementation.

EBU R128

libebur128 implements the EBU R128 loudness standard. The Wikipedia page gives a good summary of the standard, but in short it describes how to measure loudness of an audio signal and how to use this for loudness normalization.

While this intuitively doesn’t sound very complicated, there are lots of little details (like how human ears are actually working) that make this not as easy as one might expect. This results in there being many different ways for measuring loudness and is one of the reasons why this standard was introduced. Of course it is also not the only standard for this.

libebur128 is also the library that I used in the GStreamer loudness normalization plugin, about which I wrote a few weeks ago already. By porting the underlying loudness measurement code to Rust, the only remaining C dependency of that plugin is GStreamer itself.

Apart from that it is used by FFmpeg, but they include their own modified copy, as well as many other projects that need some kind of loudness measurement and don’t use ReplayGain, another older but widely used standard for the same problem.

Why?

Before going over the details of what I did, let me first explain why I did this work at all. libebur128 is a perfectly well working library, in wide use for a long time and probably rather bug-free at this point and it was already possible to use the C implementation from Rust just fine. That’s what the initial versions of the ebur128 crate were doing.

My main reason for doing this was simply because it seemed like a fun little project. It isn’t a lot of code that is changing often so once ported it should be more or less finished and it shouldn’t be much work to stay in sync with the C version. I started thinking about doing this already after the initial release of the C-based ebur128 release, but after reading Joe Neeman’s blog post about porting another C audio library (RNNoise) to Rust this gave me the final push to actually start with porting the code and to follow through until it’s done.

However, don’t go around and ask other people to rewrite their projects in Rust (don’t be rude) or think that your own rewrite is magically going to be much faster and less buggy than the existing implementation. While Rust saves you from a big class of possible bugs, it doesn’t save you from yourself and usually rewrites contain bugs that didn’t exist in the original implementation. Also getting good performance in Rust requires, like in every other language, some effort. Before rewriting any software, think about the goals of this rewrite realistically as well as the effort required to actually get it finished.

Apart from fun there were also a few technical and non-technical reasons for me to look into this. I’m going to just list two here (curiosity and portability). I will skip the usual Rust memory-safety argument as that seems less important with this code: the C code is widely used for a long time, not changing a lot and has easy to follow memory access patterns. While it definitely had a memory safety bug (see above), it was rather difficult to trigger and it was fixed in the meantime.

Curiosity

Personally and at my company Centricular we try to do any new projects where it makes sense in Rust. While this worked very well in the past and we got great results, there were some questions for future projects that I wanted to get some answers, hard data and personal experience for

  • How difficult is it to port a C codebase function by function to Rust while keeping everything working along the way?
  • How difficult is it to get the same or better performance with idiomatic Rust code for low-level media processing code?
  • How much bigger or smaller is the resulting code and do Rust’s higher-level concepts like iterators help to keep code concise?
  • How difficult is it to create a C-compatible library in Rust with the same API and ABI?

I have some answers to all these questions already but previous work on this was not well structured and the results were also not documented, which I’m trying to change here now. Both to have a reference for myself in the future as well as for convincing other people that Rust is a reasonable technology choice for such projects.

As you can see the general pattern of these questions are introducing Rust into an existing codebase, replacing existing components with Rust and writing new components in Rust, which is also relates to my work on the Rust GStreamer bindings.

Portability

C is a very old language and while there is a standard, each compiler has its own quirks and each platform different APIs on top of the bare minimum that the C standard defines. C itself is very portable, but it is not easy to write portable C code, especially when not using a library like GLib that hides these differences and provides basic data structures and algorithms.

This seems to be something that is often forgotten when the portability of C is given as an argument against Rust, and that’s the reason why I wanted to mention this here specifically. While you can get a C compiler basically everywhere, writing C code that also runs well everywhere is another story and C doesn’t make this easy by design. Rust on the other hand makes writing portable code quite easy in my experience.

In practice there were three specific issues I had for this codebase. Most of the advantages of Rust here are because it is a new language and doesn’t have to carry a lot of historical baggage.

Mathematical Constants and Functions

Mathematical constants are not actually part of any C standard. While most compilers just define M_PI (for Ï€), M_E (for ğ�–¾) and others in math.h nonetheless as they’re defined by POSIX and UNIX98.

Microsoft’s MSVC doesn’t, but instead you have to #define _USE_MATH_DEFINES before including math.h.

While not a big problem per-se, it is annoying and indeed caused the initial version of the ebur128 Rust crate to not compile with MSVC because I forgot about it.

Similarly, which mathematical functions are available depends a lot on the target platform and which version of the C standard is supported. An example of this is the log10 function to calculate the base-10 logarithm. For portability reasons, libebur128 didn’t use it but instead calculated it via the natural logarithm (ln(x) / ln(10) = log10(x)) because it’s only available in POSIX and since C99. While C99 is from 1999, there are still many compilers out there that don’t fully support it, again most prominently MSVC until very recently.

Using log10 instead of going via the natural logarithm is faster and more precise due to floating point number reasons, which is why the Rust implementation uses it but in C it would be required to check at build-time if the function is available or not, which complicates the build process and can easily be forgotten. libebur128 decided to not bother with these complications and simply not use it. Because of that, some conditional code in the Rust implementation is necessary for ensuring that both implementations return the same results in the tests.

Data Structures

libebur128 uses a linked-list-based queue data structure. As the C standard library is very minimal, no collection data structures are included. However on the BSDs and also on Linux with the GNU C library there is one available in sys/queue.h.

Of course MSVC does not have this and other compilers/platforms probably won’t have it either, so libebur128 included a local copy of that queue implementation. Now when building, one has to decide whether there is a system implementation available or otherwise use the internal version. Or simply always use the internal version.

Copying implementations of basic data structures and algorithms into every single project is ugly and error-prone, so let’s maybe not do that. C not having a standardized mechanism for dependency handling doesn’t help with this, which is unfortunately why this is very common in C projects.

One-time Initialization

Thread-safe one-time initialization is another thing that is not defined by the C standard, and depending on your platform there are different APIs available for it or none at all. POSIX again defines one that is widely available, but you can’t really depend on it unconditionally.

This complicates the code and build procedure, so libebur128 simply did not do that and did its one-time initializations of some global arrays every time a new instance was created. Which is probably fine, but a bit wasteful and probably strictly-speaking according to the C standard not actually thread-safe.

The initial version of the ebur128 Rust crate side-stepped this problem by simply doing this initialization once with the API provided by the Rust standard library. See part 2 and part 3 of this blog post for some more details about this.

Easier to Compile and Integrate

A Rust port only requires a Rust compiler, a mixed C/Rust codebase requires at least a C compiler in addition and some kind of build system for the C code.

libebur128 uses CMake, which would be an additional dependency so in the initial version of the ebur128 crate I went via cargo‘s build.rs build scripts and the cc crate as building libebur128 is easy enough. This works but build scripts are problematic for integration of the Rust code into other build systems than cargo.

The Rust port also makes use of conditional compilation in various places. Unlike in C with the preprocessor, non-standardized and inconsistent platform #defines and it being necessary to integrate everything in a custom way into the build system, Rust has a principled and well-designed approach to this problem. This makes it easier to keep the code clean, easier to maintain and more portable.

In addition to build system related simplifications, by not having any C code it is also much easier to compile the code to other targets like WebAssembly, which is natively supported by Rust. It is also possible to compile C to WebAssembly but getting both toolchains to agree with each other and produce compatible code seems not very easy.

Overview

As mentioned above, the code can be found on GitHub and in the ebur128 crate on crates.io.

The current version of the code produces the exact same results as the C version. This is enforced by the quickcheck tests that are running randomized inputs through both versions and check that the results are the same. The code also succeeds all the tests in the EBU loudness test set, so should hopefully be standards compliant as long as the test implementation is not wrong.

Performance-wise the Rust implementation is at least as fast as the C implementation. In some configurations it’s a few percent faster but probably not enough that it actually matters in practice. There are various benchmarks for both versions in different configurations available. The benchmarks are based on the criterion crate, which uses statistical methods to give as accurate as possible results. criterion also generates nice results with graphs for making analysis of the results more pleasant. See part 3 of this blog post for more details.

Writing tests and benchmarks for Rust is so much easier and feels more natural then doing it in C, so the Rust implementation has quite good coverage of the different code paths now. Especially no struggling with build systems was necessary like it would have been in C thanks to cargo and Rust having built-in support. This alone seems to have the potential to cause Rust code having, on average, better quality than similar code written in C.

It is also possible to compile the Rust implementation into a C library with the great cargo-c tool. This easily builds the code as a static/dynamic C library and installs the library, a C header file and also a pkg-config file. With this the Rust implementation is a 100% drop-in replacement of the C libebur128. It is not even necessary to recompile existing code. See part 4 of this blog post for more details.

Dependencies

Apart from the Rust standard library the Rust implementation depends on two other, small and widely used crates. Unlike with C, depending on external dependencies is rather simple with Rust and cargo. The two crates in question are

  • smallvec for a dynamically sized vectors/arrays that can be stored on the stack up to a certain size and only then fall back to heap allocations. This allows to avoid a couple of heap allocations under normal usage.
  • bitflags, which provides a macro for implementing properly typed bitflags. This is used in the constructor of the main type for selecting the features and modes that should be enabled, which directly maps to how the C API works (just with less type-safety).

Unsafe Code

A common question when announcing a Rust port of some C library is how much unsafe code was necessary to reach the same performance as the C code. In this case there are two uses of unsafe code outside the FFI code to call the C implementation in the tests/benchmarks and the C API.

Resampler

The True Peak measurement is using a resampler to upsample the audio signal to a higher sample rate. As part of the most inner loop of the resampler a statically sized ringbuffer is used.

As part of that ringbuffer, explicit indexing of a slice is needed. While the indexes are already manually checked to wrap around when needed, the Rust compiler and LLVM can’t figure that out so additional bounds checks plus panic handling is present in the compiled code. Apart from slowing down the loop with the additional condition, the panic code also causes the whole loop to be optimized less well.

So to get around that, unsafe indexing into the slice is used for performance reasons. While it requires a human now to check the memory safety of the code instead of relying on the compiler, the code in question is simple and small enough that it shouldn’t be a problem in practice.

More on this in part 2 and part 3 of this blog post.

Flushing Denormals to Zero

The other use of unsafe code is in the filter that is applied to the incoming audio signal. On x86/x86-64 the MXCSR register temporarily gets the _MM_FLUSH_ZERO_ON bit set to flush denormal floating point number to zero. That is, denormals (i.e. very small numbers close to zero) as result of any floating point operation are considered as zero.

This happens both for performance reasons as well as correctness reasons. Operations on denormals are generally much slower than on normalized floating point numbers. This has a measurable impact on the performance in this case.

Also as the C library does the same and not flushing denormals to zero would lead to slightly different results. While this difference doesn’t matter in practice as it’s very very small, it would make it harder to compare the results of both implementations as they wouldn’t be as close to each other anymore.

Doing this affects every floating point operation that happens while that bit is set, but because these are only the floating point operations performed by this crate and it’s guaranteed that the bit is unset again (even in case of panics) before leaving the filter, this shouldn’t cause any problems for other code.

Additional Features

Once the C library was ported and performance was comparable to the C implementation, I shortly checked the issues reported on the C library to check if there’s any useful feature requests or bug reports that I could implement / fix in the Rust implementation. There were three, one of which I also wanted for a future project.

None of the new features are available via the C API at this point for compatibility reasons.

Resetting the State

For this one there was a PR already for the C library. Previously the only way to reset all measurements was to create a new instance, which involves new memory allocations, filter initialization, etc..

It’s easy enough to provide a reset method to do only the minimal work required to reset all measurements and restart with a fresh state so I’ve added that to the Rust implementation.

Fix set_max_window() to actually work

This was a bug introduced in the C implementation a while ago in an attempt to prevent integer overflows when calculating sizes of memory allocations, which then would cause memory safety bugs because less memory was allocated than expected. Accidentally this fix restricted the allowed values for the maximum window size too much. There is a PR for fixing this in the C implementation.

On the Rust side this bug also existed because I simply ported over the checks. If I hadn’t ported over the checks, or ported an earlier version without the checks, there fortunately wouldn’t have been any memory safety bug on the Rust side though but instead one of two situations would have happened instead

  1. In debug builds integer overflows cause a panic, so instead of allocating less memory than expected during the setting of the parameters there would’ve been a panic immediately instead of invalid memory accesses later.
  2. In release builds integer overflows simply wrap around for performance reasons. This would’ve caused less memory than expected to be allocated, but later when trying to access the memory there would’ve been a panic when trying to access memory outside the allocated area.

While a panic is also not nice, it at least leads to no undefined behaviour and prevents worse things from happening.

The proper fix in this case was to not restrict the maximum window size statically but to instead check for overflows during the calculations. This is the same the PR for the C implementation does, but on the Rust side this is much easier because of built-in operations like checked_mul for doing an overflow-checking multiplication. In C this requires some rather convoluted code (check the PR for details).

Support for Planar Audio Input

The last additional feature that I implemented was support for planar audio input, for which also a PR to the C implementation exists already.

Most of the time audio signals have the samples of each channel interleaved with each other, so for example for stereo you have an array of samples with the first sample for the left channel, the first sample for the right channel, the second sample for the left channel, etc.. While this representation has some advantages, in other situations it is easier or faster to work with planar audio: the samples of each channel are contiguous one after another, so you have e.g. first all the samples of the left channel one after another and only then all samples of the right channel.

The PR for the C implementation does this with some code duplication of existing macro code (which can be prevented by making the macros more complicated), on the Rust side I implemented this without any code duplication by adding an internal abstraction for interleaved/planar audio and iterating over the samples and then working with that in normal, generic Rust code. This required some minor refactoring and code reorganization but in the end was rather painless. Note that most of the change is addition of new tests and moving some code around.

When looking at the Samples trait, the main part of this refactoring, one might wonder why I used closures instead of Rust iterators for iterating over the samples and the reason is unfortunately performance. More on this in part 3 of this blog post.

Next Part

In the next part of this blog post I will describe the porting approach in detail and also give various examples for how to port C code to idiomatic Rust, and some examples of problems I was running into.

September 17, 2020

Games 3.38

Games 3.38.0

I wanted to start this blog post with “It’s that time of year again”, but looks like Michael beat me to it. So, let’s take a look at some of the changes in GNOME Games 3.38:

retro-gtk 1.0

The library Games uses to implement Libretro frontend, retro-gtk, has been overhauled this cycle. I’ve already covered the major changes in previous blog post, but to recap:

  • Cores now run in a separate process. This provides a better isolation: a crashing core will display an error screen instead of crashing the whole app. This can also improve performance in case the window takes a long time to redraw, for example with fractional scaling.
  • Libretro cores that require OpenGL should now work correctly.
  • The core timing should now be more accurate.
  • Fast-forwarding should actually work, although we don’t make use of it in Games at the moment.

Finally, retro-gtk now has proper docs, published here to go along with the stable API.

Nintendo 64 support

The Legend of Zelda: Ocarina of Time running in Games 3.38, with controller pak switcher open

This was on the radar for a while, but wasn’t possible because all available Nintendo 64 cores use hardware acceleration. Now that retro-gtk supports OpenGL, we can ship ParaLLEl N64 core, and it just works. There’s even a menu to switch between Controller Pak and Rumble Pak.

Unfortunately, retro-gtk still doesn’t support Vulkan rendering, so we can’t enable the fast and accurate ParaLLEl RDP renderer yet.

Collections

Collections in Games 3.38

Neville Antony has been working on implementing collections as part of his GSoC project. So far we have favorites, recently played games and user-created collections.

You can read more about the project in Neville’s blog

Faster loading

In my previous blog post I already mentioned some improvements to the app startup time, but a noticeable progress was made since then. Newly added aggressive caching allows the app to show the collection almost instantly after the first run, and then it can scan and update the collection in background.

Search provider

Search Provider in Games 3.38

Having a complete cache of the collection also allowed to easily implement a fast search provider, so the collection can now be searched directly from GNOME Shell search.

Nintendo DS screen gap

Nintendo DS has two screens. Most games use one of the screens to display the game itself, and the other one for status or map. However, some games use both screens and rely on the fact they are arranged vertically with a gap between them.

If the two screens are shown immediately adjacent to each other, these games look confusing, so in 3.38 a screen gap will be automatically added when using vertical mode.

Sonic Colors in Games 3.36 Sonic Colors in Games 3.38

The size of the screen gap is optimized for a few known games, such as Contra 4, Sonic Colors or Yoshi’s Island DS, other games use a generic value of 80 pixels.

If you know another game that needs to use a specific gap size, please mention it in the comments or open an issue on GitLab.

Gamepad hot plugging in Flatpak

Previously, a major limitation of Flatpak version was that gamepads had to be plugged in before starting the app. Thanks to a change in libmanette 0.2.4, this has been fixed and gamepads can be connected or disconnected at any time now.

Technically, this isn’t a 3.38 addition, and in fact the 3.36.1 build on Flathub already supports it. Nevertheless, it was done since my last blog post and so here we are. :)

Miscellaneous changes

Games covers in Games 3.38

Game covers now look prettier thanks to Neville: they are now rounded and have a blurred version of the cover as background.

Preferences window in Games 3.38

The preferences window has been overhauled to be more similar to the libhandy preferences; unfortunately we can’t use HdyPreferencesWindow yet because of some missing functionality.

Controller testing in Games 3.38

When testing or re-mapping a controller, analog stick indicators now show the stick’s precise position instead of just highlighting one of the edges.

Search empty state in Games 3.38

Veerasamy Sevagen implemented an empty state screen for the collection search.

Swipe to go back in Games 3.38

Swipe gestures to go back are now supported throughout the app. The only place that lacks the gesture now is exiting from a running game, because the game may require pointer input, and having a swipe there may cause conflicts.

Getting Games

Download on Flathub

As always, the latest version of the app is available on Flathub.

September 16, 2020

The GNOME Extensions Rebooted Initiative

With the advent of the new release of GNOME 3.38 – we want to start focusing next cycle on improving the GNOME Extensions experience.

I’m using my blog for now – but we will have a extensions blog where we can start chatting about what’s going on in this important space.

What Extensions Rebooted Initiative is about

It is not a surprise that most people are aware that a large number of extensions break after each release which causes a lot of friction in the community.

Extensions Rebooted is a collaborative effort to address the issues around the GNOME Shell extension ecosystem. We want to start addressing this by making a number of policy changes and technological improvements while building a sustainable community.

Here are some highlights on how we plan to creating a better experience for GNOME extensions:

  • Proper documentation of how extensions work, reasonable expectations to be an extensions developer, participating in the GNOME extensions community.
  • Build CI pipeline (a virtual machine) for extension writers to test their extensions prior to GNOME releases.
  • Centralizing extensions for break testing on the GNOME gitlab space
  • Creating a forum for extension developers and extension writers to work together for the GNOME release cycle

To appreciate and expand on the details of this project, you should check out the Extensions Rebooted Bof on the last GUADEC and my GUADEC talk.

The Extensions Rebooted initiative’s ultimate goal is to get the extensions community to work with each other, have closer ties with GNOME shell developers and provide documentation and tools.

Extension writers are encouraged to get involved and build this better experience. Consumers of extensions are requested to help spread the word and encourage extensions developers to participate so we can all benefit.

To get involved:

GNOME Discourse:

  • Use the “extensions” tag when submitting questions about extensions.

Chat:

Gitlab:

The success of GNOME extensions cannot happen without participation and contributions from the community and so I hope that all of us who write extensions, who are interested in providing technical documentation, and have experience in CI pipelines/devops  can come together and make extensions a sustainable part of the GNOME ecosystem.

The next post will talk about using a pre-built VM image that extension developers can use to test their extensions and have them ready to be used prior to GNOME 3.38 appearing on distributions.

Epiphany 3.38 and WebKitGTK 2.30

It’s that time of year again: a new GNOME release, and with it, a new Epiphany. The pace of Epiphany development has increased significantly over the last few years thanks to an increase in the number of active contributors. Most notably, Jan-Michael Brummer has solved dozens of bugs and landed many new enhancements, Alexander Mikhaylenko has polished numerous rough edges throughout the browser, and Andrei Lisita has landed several significant improvements to various Epiphany dialogs. That doesn’t count the work that Igalia is doing to maintain WebKitGTK, the WPE graphics stack, and libsoup, all of which is essential to delivering quality Epiphany releases, nor the work of the GNOME localization teams to translate it to your native language. Even if Epiphany itself is only the topmost layer of this technology stack, having more developers working on Epiphany itself allows us to deliver increased polish throughout the user interface layer, and I’m pretty happy with the result. Let’s take a look at what’s new.

Intelligent Tracking Prevention

Intelligent Tracking Prevention (ITP) is the headline feature of this release. Safari has had ITP for several years now, so if you’re familiar with how ITP works to prevent cross-site tracking on macOS or iOS, then you already know what to expect here.  If you’re more familiar with Firefox’s Enhanced Tracking Protection, or Chrome’s nothing (crickets: chirp, chirp!), then WebKit’s ITP is a little different from what you’re used to. ITP relies on heuristics that apply the same to all domains, so there are no blocklists of naughty domains that should be targeted for content blocking like you see in Firefox. Instead, a set of innovative restrictions is applied globally to all web content, and a separate set of stricter restrictions is applied to domains classified as “prevalent” based on your browsing history. Domains are classified as prevalent if ITP decides the domain is capable of tracking your browsing across the web, or non-prevalent otherwise. (The public-friendly terminology for this is “Classification as Having Cross-Site Tracking Capabilities,” but that is a mouthful, so I’ll stick with “prevalent.” It makes sense: domains that are common across many websites can track you across many websites, and domains that are not common cannot.)

ITP is enabled by default in Epiphany 3.38, as it has been for several years now in Safari, because otherwise only a small minority of users would turn it on. ITP protections are designed to be effective without breaking too many websites, so it’s fairly safe to enable by default. (You may encounter a few broken websites that have not been updated to use the Storage Access API to store third-party cookies. If so, you can choose to turn off ITP in the preferences dialog.)

For a detailed discussion covering ITP’s tracking mitigations, see Tracking Prevention in WebKit. I’m not an expert myself, but the short version is this: full third-party cookie blocking across all websites (to store a third-party cookie, websites must use the Storage Access API to prompt the user for permission); cookie-blocking latch mode (“once a request is blocked from using cookies, all redirects of that request are also blocked from using cookies”); downgraded third-party referrers (“all third-party referrers are downgraded to their origins by default”) to avoid exposing the path component of the URL in the referrer; blocked third-party HSTS (“HSTS […] can only be set by the first-party website […]”) to stop abuse by tracker scripts; detection of cross-site tracking via link decoration and 24-hour expiration time for all cookies created by JavaScript on the landing page when detected; a 7-day expiration time for all other cookies created by JavaScript (yes, this applies to first-party cookies); and a 7-day extendable lifetime for all other script-writable storage, extended whenever the user interacts with the website (necessary because tracking companies began using first-party scripts to evade the above restrictions). Additionally, for prevalent domains only, domains engaging in bounce tracking may have cookies forced to SameSite=strict, and Verified Partitioned Cache is enabled (cached resources are re-downloaded after seven days and deleted if they fail certain privacy tests). Whew!

WebKit has many additional privacy protections not tied to the ITP setting and therefore not discussed here — did you know that cached resources are partioned based on the first-party domain? — and there’s more that’s not very well documented which I don’t understand and haven’t mentioned (tracker collusion!), but that should give you the general idea of how sophisticated this is relative to, say, Chrome (chirp!). Thanks to John Wilander from Apple for his work developing and maintaining ITP, and to Carlos Garcia for getting it working on Linux. If you’re interested in the full history of how ITP has evolved over the years to respond to the changing threat landscape (e.g. tracking prevention tracking), see John’s WebKit blog posts. You might also be interested in WebKit’s Tracking Prevention Policy, which I believe is the strictest anti-tracking stance of any major web engine. TL;DR: “we treat circumvention of shipping anti-tracking measures with the same seriousness as exploitation of security vulnerabilities. If a party attempts to circumvent our tracking prevention methods, we may add additional restrictions without prior notice.” No exceptions.

Updated Website Data Preferences

As part of the work on ITP, you’ll notice that Epiphany’s cookie storage preferences have changed a bit. Since ITP enforces full third-party cookie blocking, it no longer makes sense to have a separate cookie storage preference for that, so I replaced the old tri-state cookie storage setting (always accept cookies, block third-party cookies, block all cookies) with two switches: one to toggle ITP, and one to toggle all website data storage.

Previously, it was only possible to block cookies, but this new setting will additionally block localStorage and IndexedDB, web features that allow websites to store arbitrary data in your browser, similar to cookies. It doesn’t really make much sense to block cookies but allow other types of data storage, so the new preferences should better enforce the user’s intent behind disabling cookies. (This preference does not yet block media keys, service workers, or legacy offline web application cache, but it probably should.) I don’t really recommend disabling website data storage, since it will cause compatibility issues on many websites, but this option is there for those who want it. Disabling ITP is also not something I want to recommend, but it might be necessary to access certain broken websites that have not yet been updated to use the Storage Access API.

Accordingly, Andrei has removed the old cookies dialog and moved cookie management into the Clear Personal Data dialog, which is a better place because anyone clearing cookies for a particular website is likely to also want to clear other personal data. (If you want to delete a website’s cookies, then you probably don’t want to leave its SQL databases intact, right?) He had to remove the ability to clear data from a particular point in time, because WebKit doesn’t support this operation for cookies, but that function is probably  rarely-used and I think the benefit of the change should outweigh the cost. (We could bring it back in the future if somebody wants to try implementing that feature in WebKit, but I suspect not many users will notice.) Treating cookies as separate and different from other forms of website data storage no longer makes sense in 2020, and it’s good to have finally moved on from that antiquated practice.

New HTML Theme

Carlos Garcia has added a new Adwaita-based HTML theme to WebKitGTK 2.30, and removed support for rendering HTML elements using the GTK theme (except for scrollbars). Trying to use the GTK theme to render web content was fragile and caused many web compatibility problems that nobody ever managed to solve. The GTK developers were never very fond of us doing this in the first place, and the foreign drawing API required to do so has been removed from GTK 4, so this was also good preparation for getting WebKitGTK ready for GTK 4. Carlos’s new theme is similar to Adwaita, but gradients have been toned down or removed in order to give a flatter, neutral look that should blend in nicely with all pages while still feeling modern.

This should be a fairly minor style change for Adwaita users, but a very large change for anyone using custom themes. I don’t expect everyone will be happy, but please trust that this will at least result in better web compatibility and fewer tricky theme-related bug reports.

Screenshot demonstrating new HTML theme vs. GTK theme
Left: Adwaita GTK theme controls rendered by WebKitGTK 2.28. Right: hardcoded Adwaita-based HTML theme with toned down gradients.

Although scrollbars will still use the GTK theme as of WebKitGTK 2.30, that will no longer be possible to do in GTK 4, so themed scrollbars are almost certain to be removed in the future. That will be a noticeable disappointment in every app that uses WebKitGTK, but I don’t see any likely solution to this.

Media Permissions

Jan-Michael added new API in WebKitGTK 2.30 to allow muting individual browser tabs, and hooked it up in Epiphany. This is good when you want to silence just one annoying tab without silencing everything.

Meanwhile, Charlie Turner added WebKitGTK API for managing autoplay policies. Videos with sound are now blocked from autoplaying by default, while videos with no sound are still allowed. Charlie hooked this up to Epiphany’s existing permission manager popover, so you can change the behavior for websites you care about without affecting other websites.

Screenshot displaying new media autoplay permission settings
Configure your preferred media autoplay policy for a website near you today!

Improved Dialogs

In addition to his work on the Clear Data dialog, Andrei has also implemented many improvements and squashed bugs throughout each view of the preferences dialog, the passwords dialog, and the history dialog, and refactored the code to be much more maintainable. Head over to his blog to learn more about his accomplishments. (Thanks to Google for sponsoring Andrei’s work via Google Summer of Code, and to Alexander for help mentoring.)

Additionally, Adrien Plazas has ported the preferences dialog to use HdyPreferencesWindow, bringing a pretty major design change to the view switcher:

Screenshot showing changes to the preferences dialog
Left: Epiphany 3.36 preferences dialog. Right: Epiphany 3.38. Note the download settings are present in the left screenshot but missing from the right screenshot because the right window is using flatpak, and the download settings are unavailable in flatpak.

User Scripts

User scripts (like Greasemonkey) allow you to run custom JavaScript on websites. WebKit has long offered user script functionality alongside user CSS, but previous versions of Epiphany only exposed user CSS. Jan-Michael has added the ability to configure a user script as well. To enable, visit the Appearance tab in the preferences dialog (a somewhat odd place, but it really needs to be located next to user CSS due to the tight relationship there). Besides allowing you to do, well, basically anything, this also significantly enhances the usability of user CSS, since now you can apply certain styles only to particular websites. The UI is a little primitive — your script (like your CSS) has to be one file that will be run on every website, so don’t try to design a complex codebase using your user script — but you can use conditional statements to limit execution to specific websites as you please, so it should work fairly well for anyone who has need of it. I fully expect 99.9% of users will never touch user scripts or user styles, but it’s nice for power users to have these features available if needed.

HTTP Authentication Password Storage

Jan-Michael and Carlos Garcia have worked to ensure HTTP authentication passwords are now stored in Epiphany’s password manager rather than by WebKit, so they can now be viewed and deleted from Epiphany, which required some new WebKitGTK API to do properly. Unfortunately, WebKitGTK saves network passwords using the default network secret schema, meaning its passwords (saved by older versions of Epiphany) are all leaked: we have no way to know which application owns those passwords, so we don’t have any way to know which passwords were stored by WebKit and which can be safely managed by Epiphany going forward. Accordingly, all previously-stored HTTP authentication passwords are no longer accessible; you’ll have to use seahorse to look them up manually if you need to recover them. HTTP authentication is not very commonly-used nowadays except for internal corporate domains, so hopefully this one-time migration snafu will not be a major inconvenience to most users.

New Tab Animation

Jan-Michael has added a new animation when you open a new tab. If the newly-created tab is not visible in the tab bar, then the right arrow will flash to indicate success, letting you know that you actually managed to open the page. Opening tabs out of view happens too often currently, but at least it’s a nice improvement over not knowing whether you actually managed to open the tab or not. This will be improved further next year, because Alexander is working on a completely new tab widget to replace GtkNotebook.

New View Source Theme

Jim Mason changed view source mode to use a highlight.js theme designed to mimic Firefox’s syntax highlighting, and added dark mode support.

Image showing dark mode support in view source mode
Embrace the dark.

And More…

  • WebKitGTK 2.30 now supports video formats in image elements, thanks to Philippe Normand. You’ll notice that short GIF-style videos will now work on several major websites where they previously didn’t.
  • I added a new WebKitGTK 2.30 API to expose the paste as plaintext editor command, which was previously internal but fully-functional. I’ve hooked it up in Epiphany’s context menu as “Paste Text Only.” This is nice when you want to discard markup when pasting into a rich text editor (such as the WordPress editor I’m using to write this post).
  • Jan-Michael has implemented support for reordering pinned tabs. You can now drag to reorder pinned tabs any way you please, subject to the constraint that all pinned tabs stay left of all unpinned tabs.
  • Jan-Michael added a new import/export menu, and the bookmarks import/export features have moved there. He also added a new feature to import passwords from Chrome. Meanwhile, ignapk added support for importing bookmarks from HTML (compatible with Firefox).
  • Jan-Michael added a new preference to web apps to allow running them in the background. When enabled, closing the window will only hide the the window: everything will continue running. This is useful for mail apps, music players, and similar applications.
  • Continuing Jan-Michael’s list of accomplishments, he removed Epiphany’s previous hidden setting to set a mobile user agent header after discovering that it did not work properly, and replaced it by adding support in WebKitGTK 2.30 for automatically setting a mobile user agent header depending on the chassis type detected by logind. This results in a major user experience improvement when using Epiphany as a mobile browser. Beware: this functionality currently does not work in flatpak because it requires the creation of a new desktop portal.
  • Stephan Verbücheln has landed multiple fixes to improve display of favicons on hidpi displays.
  • Zach Harbort fixed a rounding error that caused the zoom level to display oddly when changing zoom levels.
  • Vanadiae landed some improvements to the search engine configuration dialog (with more to come) and helped investigate a crash that occurs when using the “Set as Wallpaper” function under Flatpak. The crash is pretty tricky, so we wound up disabling that function under Flatpak for now. He also updated screenshots throughout the  user help.
  • Sabri Ünal continued his effort to document and standardize keyboard shortcuts throughout GNOME, adding a few missing shortcuts to the keyboard shortcuts dialog.

Epiphany 3.38 will be the final Epiphany 3 release, concluding a decade of releases that start with 3. We will match GNOME in following a new version scheme going forward, dropping the leading 3 and the confusing even/odd versioning. Onward to Epiphany 40!