GNOME.ORG

24 hours a day, 7 days a week, 365 days per year...

February 22, 2018

Fedora Atomic Workstation: Works on the beach

My trip is getting really close, so I decided to upgrade my system to rawhide. Wait, what ? That is usually what everybody would tell you not to do. Rawhide has this reputation for frequent breakage, and who knows if my apps will work any given day. Not something you want to deal with while traveling.

Image result for project atomic logo

With rpm-ostree, installing a newer OS is very similar to updating your current OS: a new image is downloaded in the background, and when the download is complete, you boot into the new image. The previous image is still available to boot back into, as a safety net.  That is the reason that I felt confident enough to try this a day before a major trip:

rpm-ostree rebase \
   fedora-ws-rawhide:fedora/rawhide/x86_64/workstation
systemctl reboot

I would love to say that things went perfectly and I was back to a working system in 10 minutes. But it was not quite as easy, and i did encounter a few (solvable) problems. It is worth pointing out that while I was solving these problems, rpm-ostree had already downloaded all of the rawhide image, but I was still safely running my F27 OS. At no point was there a mess of a half-upgraded system with a mix of old and new rpms. I was running my old system until I had solved all the problems and had a OS image that was ready, and then I booted into it. A safe, atomic switch.

Problem 1: The rpmfusion repo is not available for f28 yet. It is a common occurrence that 3rd party repositories lag behind the Fedora releases a bit, so this is not surprising. It is a bit unfortunate that i had to remove my layered rpms from the repository to work around this.

Problem 2: buildah now in the base image. This is a good thing, of course, but it caused rpm-ostree to complain about the conflict between the OS image and the layered package. In this case, I removed the layered rpm without any qualms.

Problem 3: Rawhide repositories had a bad day. For some reason,  they were missing the repomd.xml file today.

This is a good reminder that as long as you are using package layering, you haven’t really left the world of yum repositories and out-of-sync mirrors behind. rpm-ostree has to check the yum repositories for updates to the layered packages, which means that it can be hit by the same issues as dnf on a traditional Fedora workstation.

For my rebase to proceed,  I had to remove everything that was layered on top of the OS image. After I did that, rpm-ostree no longer needed to look at yum repositories, and switched my system to the already-downloaded rawhide image.

After the reboot, I’m now running rawhide… and all my applications are just the same as they were before. A nice aspect of the Atomic Workstation approach is that (flatpak) applications are decoupled from the OS. I can update the one without the other. We are not entirely there yet: as you can see in the screenshot below, a number of applications are still installed as part of the OS image.

More importantly, the screenshot shows that GNOME software will support updating the OS on the Atomic Workstation in Fedora 28. It does so by talking to the rpm-ostree daemon.

Switching from one Fedora release to the next ls already working pretty well in the last few releases. With the Atomic Workstation, it can become as undramatic as installing the latest updates.

One could almost do it on the beach.

1+ year of Fedora and GNOME hardware enablement

A year and a couple of months ago, Christian Schaller asked me to pivot a little bit from working full time on Fleet Commander to manage a new team we were building to work on client hardware enablement for Fedora and GNOME with an emphasis on upstream. The idea was to fill the gap in the organization where nobody really owned the problem of bringing up new client hardware features vertically across the stack (from shell down to the kernel), or rather, ensure Fedora and GNOME both work great on modern laptops. Part of that deal was to take over the bootloader and start working closer to customers and hardware manufacturing parnters.

15525894974_17d28e4e5c_z.jpgSound Blaster 16 PnP, Danipuntocom @ Flickr, CC BY-NC 2.0

At first I hesitated as I wasn’t sure I could do a good job, y’all know how the impostor syndrome works specially outside of the comfort zone, also had very little engineering experience on the kernel or hardware related fields outside the hardware design I did at uni.

However after some thinking I thought this was a terribly exciting prospect and I had some ideas as to how to go about it and do a decent job.

Fast forward 16 months and I’m loving it, in a relatively short period of time we’ve been able to build an amazing team that have been able to execute quite a few important highlights to make Fedora and GNOME work better on laptops:

  • Peter Jones and Javier Martinez are taking care of the bootloader stack for Fedora, from GRUB2 to the UEFI tooling, including the secure boot and the low level bits of the firmware update mechanisms. Our current efforts revolve around http boot, enabling TPM2 in the bootloader for Trusted Boot and implement the Boot Loader Spec in Fedora across the supported architectures to improve reliability when updating kernels, BLS prevents you from needing to generate a GRUB configuration file every time a kernel is installed or removed.
  • Hans de Goede has been working on some neglected areas hardware support wise. When he was transferred he was still working on improving optimus support for NVIDIA hardware. You’ve probably seen two major highlights from Hans’ work lately, one is us spending some time to help VirtualBox upstream their guest drivers since we wanted to make Fedora and the Linux ecosystem at large work out of the box (pun not intended), this is really important as VirtualBox is the first approach many Windows and Mac users have to a Linux operating system and desktop, so we’ve decided to treat it as an important hardware platform for us as we do with KVM/QEMU/GNOME Boxes. He has also been working  and most recently he has hit headlines with his amazing and thorough work on improving battery life in Fedora by trying to gather data on which power saving defaults we can enable safely on which devices.
  • Christian Kellner has been doing tons of vertical integration, first he revamped GNOME Battery Bench, to improve battery teststing and gather better data about consumption  when we try new laptops. He has also been looking at fingerprint, bluetooth and pulseaudio issues. But more recently he has taken the torch to implement Thunderbolt 3 Security levels. This is a pretty big deal and has required a ton of design work with jimmac and Bastien. For those unaware, this is
  • Last but not least, Benjamin Berg has been doing a lot of work behind the scenes to improve laptop testing and coming up with a testing suite people can use to test a laptop and bring back a standardized results to Fedora to keep track of regressions and gaps on specific laptop models. We’re trying to automate as much as we can but we still have to write a lot of manual tests for this. This is ongoing work but I think it’s going to help Fedora and the larger Linux ecosystem to be more thorough when it comes to testing hardware and preventing regressions.

Beyond the engineering efforts, we are working with OEMs and silicon vendors as well to try to educate them in the difficult transition from the proprietary OS model to contribute upstream. Some of them are doing really great work and others need improvement. While I can’t share any specific details of these conversations, I must say it’s an incredibly exciting moment for Linux on laptops/workstations and if we are able to push enough silicon vendors to have a more fluent relationship with upstream I think we really have a chance to reduce the problems people have with newer hardware at least on the enterprise offerings at first.

I have to say, it is an incredibly humbling experience to work with this team, I’m learning a lot about the space and I’m excited about the things we’re planning for the next couple of years and the opportunities those efforts could bring for the free software desktop.

February 21, 2018

Using groff to write papers

In 1993, I discovered Linux, when I was an undergraduate university student. Linux gave me the same power as the Big Unix systems in our campus computer labs, but on my personal computer. I was immediately hooked.

But in the early 1990s, Linux didn't have a lot of applications. When I needed a word processor to write a paper for class, I rebooted into MS-DOS and ran WordPerfect or the shareware word processor, Galaxy Write. I wanted to stay in Linux as much as possible, but I also needed to write papers for class.

I knew a bit about the nroff and troff text processing systems from our campus computer labs, and I was pleased to find that nroff and troff existed on Linux as GNU groff. So I taught myself how to use the groff macro sets to write my class papers. The first macros I learned were the "e" macros, also known as "groff -me" because that was how you invoked the macros from the command line.

I recently wrote an article for OpenSource about How to format academic papers on Linux with "groff -me." I cover the basics for writing most papers, and skip the really esoteric stuff like keeps and displays, nested lists, tables, and figures. This is just an introduction for how to use "groff -me" to write common documents, like papers for class.

Applicative Functors for Fun and Parsing

PSA: This post has a bunch of Haskell code, but I’m going to try to make it more broadly accessible. Let’s see how that goes.

I’ve been proceeding apace with my 3rd year in Abhinav‘s Haskell classes at Nilenso, and we just got done with the section on Applicative Functors. I’m at that point when I finally “get” it, so I thought I’d document the process, and maybe capture my a-ha moment of Applicatives.

I should point out that the ideas and approach in this post are all based on Abhinav’s class material (and I’ve found them really effective in understanding the underlying concepts). Many thanks are due to him, and any lack of clarity you find ahead is in my own understanding.

Functors and Applicatives

Functors represent a type or a context on which we can meaningfully apply (map) a function. The Functor typeclass is pretty straightforward:

class Functor f where
  fmap :: (a -> b) -> f a -> f b

Easy enough. fmap takes a function that transforms something of type a to type b and a value of type a in a context f. It produces a value of type b in the same context.

The Applicative typeclass adds two things to Functor. Firstly, it gives us a means of putting things inside a context (also called lifting). The second is to apply a function within a context.

class Functor f => Applicative f where
  pure :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b

We can see pure lifts a given value into a context. The apply function (<*>) intuitively looks like fmap, with the difference that the function is within a context. This becomes key when we remember that Haskell functions are curried (and can thus be partially applied). This would then allow us to write something like:

maybeAdd :: Maybe Int -> Maybe Int -> Maybe Int
maybeAdd ma mb = pure (+) <*> ma <*> mb

This function takes two numbers in the Maybe context (that is, they either exist, or are Nothing), and adds them. The result will be the sum if both numbers exist, or Nothing if either or both do not.

Go ahead and convince yourself that it is not possible to express this with just fmap.

Parsers

There are many ways of looking at what a parser is. Let’s work with one definition: A parser,

  • Takes some input
  • Converts some or all of it into something else if it can
  • Returns whatever input was not used in the conversion

How do we represent something that converts something to something else? It’s a function, of course. Let’s write that down as a type:

newtype Parser i o = Parser (i -> (Maybe o, i))

This more or less directly maps to what we just said. A Parser is a data type which has two type parameters — an input type and an output type. It contains a function that takes one argument of the input type, and produces a tuple of Maybe the output type (signifying if parsing succeeded) and the rest of the input.

We can name the field runParser, so it becomes easier to get a hold of the function inside our Parser type:

newtype Parser i o = Parser { runParser :: i -> (Maybe o, i) }

Parser combinators

The “rest” part is important for the reason that we would like to be able to chain small parsers together to make bigger parsers. We do this using “parser combinators” — functions that take one or more parsers and return a more complex parser formed by combining them in some way. We’ll see some of those ways as we go along.

Parser instances

Before we proceed, let’s define Functor and Applicative instances for our Parser type.

instance Functor (Parser i) where
  fmap f p = Parser $ \input ->
    let (mo, i) = runParser p input
    in (f <$> mo, i)

The intuition here is clear — if I have a parser that takes some input and provides some output, fmaping a function on that parser translates to applying that function on the output of the parser.

instance Applicative (Parser i) where
  pure x = Parser $ \input -> (Just x, input)

  pf <*> po = Parser $ \input ->
    case runParser pf input of
         (Just f, rest) -> case runParser po rest of
                                (Just o, rest') -> (Just (f o), rest')
                                (Nothing, _)    -> (Nothing, input)
         (Nothing, _)   -> (Nothing, input)

The Applicative instance is a bit more involved than Functor. What we’re doing first is “running” the first parser which gives us the function we want to apply (remember that this is a curried function, so rather than parsing out a function, we are most likely parsing out a value and creating a function with that). If we succeed, then we run the second parser to get a value to apply the function to. If this is also successful, we apply the function to the value, and return the result within the parser context (i.e. the result, and the rest of the input).

Implementing some parsers

Now let’s take our new data type and instances for a spin. Before we write a real parser, let’s write a helper function. A common theme while parsing a string is to match a single character on a predicate — for example, “is this character an alphabet”, or “is this character a semi-colon”. We write a function to take a predicate and return the corresponding parser:

satisfy :: (Char -> Bool) -> Parser String Char
satisfy p = Parser $ \input ->
  case input of
       (c:cs) | p c -> (Just c, cs)
       _            -> (Nothing, input)

Now let’s try to make a parser that takes a string, and if it finds a ASCII digit character, provides the corresponding integer value. We have a function from the Data.Char module to match ASCII digit characters — isDigit. We also have a function to take a digit character and give us an integer — digitToInt. Putting this together with satisfy above.

import Data.Char (digitToInt, isDigit)

digit :: Parser String Int
digit = digitToInt <$> satisfy isDigit

And that’s it! Note how we used our higher-order satisfy function to match a ASCII digit character and the Functor instance to apply digitToInt to the result of that parser (reminder: <$> is just the infix form of writing fmap — this is the same as fmap digitToInt (satisfy digit).

Another example — a character parser, which succeeds if the next character in the input is a specific character we choose.

char :: Char -> Parser String Char
char x = satisfy (x ==)

Once again, the satisfy function makes this a breeze. I must say I’m pleased with the conciseness of this.

Finally, let’s combine character parsers to create a word parser — a parser that succeeds if the input is a given word.

word :: String -> Parser String String
word ""     = Parser $ \input -> (Just "", input)
word (c:cs) = (:) <$> char c <*> word cs

A match on an empty word always succeeds. For anything else, we just break down the parser to a character parser of the first character and a recursive call to the word parser for the rest. Again, note the use of the Functor and Applicative instance. Let’s look at the type signature of the (:) (list cons) function, which prepends an element to a list:

(:) :: a -> [a] -> [a]

The function takes two arguments — a single element of type a, and a list of elements of type a. If we expand the types some more, we’ll see that the first argument we give it is a Parser String Char and the second is a Parser String [Char] (String is just an alias for [Char]).

In this way we are able to take the basic list prepend function and use it to construct a list of characters within the Parser context. (a-ha!?)

JSON

JSON is a relatively simple format to parse, and makes for a good example for building a parser. The JSON website has a couple of good depictions of the JSON language grammar front and center.

So that defines our parser problem then — we want to read a string input, and convert it into some sort of in-memory representation of the JSON value. Let’s see what that would look like in Haskell.

data JsonValue = JsonString String
               | JsonNumber JsonNum
               | JsonObject [(String, JsonValue)]
               | JsonArray [JsonValue]
               | JsonBool Bool
               | JsonNull

-- We represent a number as an infinite precision
-- floating point number with a base 10 exponent
data JsonNum = JsonNum { negative :: Bool
                       , signif   :: Integer
                       , expo     :: Integer
                       }

The JSON specification does not really tell us what type to use for numbers. We could just use a Double, but to make things interesting, we represent it as an arbitrary precision floating point number.

Note that the JsonArray and JsonObject constructors are recursive, as they should be — a JSON array is an array of JSON values, and a JSON object is a mapping from string keys to JSON values.

Parsing JSON

We now have the pieces we need to start parsing JSON. Let’s start with the easy bits.

null

To parse a null we literally just look for the word “null”.

jsonNull :: Parser String JsonValue
jsonNull = word "null" $> JsonNull

The $> operator is a shortcut for fmap . const — it evaluates the argument on the left, and then fmaps the argument on the right onto it. If the word "null" parser is successful (Just "null"), we’ll fmap the JsonValue representing null to replace the string "null" (i.e. we’ll get a (Just JsonNull, <rest of the input>)).

true and false

First a quick detour:

instance Alternative (Parser i) where
  empty = Parser $ \input -> (Nothing, input)
  p1 <|> p2 = Parser $ \input ->
      case runParser p1 input of
           (Nothing, _) -> case runParser p2 input of
                                (Nothing, _) -> (Nothing, input)
                                justValue    -> justValue
           justValue    -> justValue

The Alternative instance is easy to follow once you understand Applicative. We define an empty parser that matches nothing. Then we define the alternative operator (<|>) as we might intuitively imagine.

We run the parser given as the first argument first, if it succeeds we are done. If it fails, we run the second parser on the whole input again, if it succeeds, we return that value. If both fail, we return Nothing.

Parsing true and false with this in our belt looks like:

jsonBool :: Parser String JsonValue
jsonBool =  (word "true" $> JsonBool True)
        <|> (word "false" $> JsonBool False)

We are easily able express the idea of trying to parse for the string “true”, and if that fails, trying again for the string “false”. If either matches, we have a boolean value, if not, Nothing. Again, nice and concise.

String

This is only slightly more complex. We need a couple of helper functions first:

hexDigit :: Parser String Int
hexDigit = digitToInt <$> satisfy isHexDigit

digitsToNumber :: Int -> [Int] -> Integer
digitsToNumber base digits = foldl (\num d -> num * fromIntegral base + fromIntegral d) 0 digits

hexDigit is easy to follow. It just matches anything from 0-9 and a-f or A-F.

digitsToNumber is a pure function that takes a list of digits, and interprets it as a number in the given base. We do some jumping through hoops with fromIntegral to take Int digits (mapping to a normal word-sized integer) and produce an Integer (arbitrary sized integer).

Now follow along one line at a time:

jsonString :: Parser String String
jsonString = (char '"' *> many jsonChar <* char '"')
  where
    jsonChar =  satisfy (\c -> not (c == '\"' || c == '\\' || isControl c))
            <|> word "\\\"" $> '"'
            <|> word "\\\\" $> '\\'
            <|> word "\\/"  $> '/'
            <|> word "\\b"  $> '\b'
            <|> word "\\f"  $> '\f'
            <|> word "\\n"  $> '\n'
            <|> word "\\r"  $> '\r'
            <|> word "\\t"  $> '\t'
            <|> chr . fromIntegral . digitsToNumber 16 <$> (word "\\u" *> replicateM 4 hexDigit)

A string is a valid JSON character, surrounded by quotes. The *> and <* operators allow us to chain parsers whose output we wish to discard (since the quotes are not part of the actual string itself). The many function comes from the Alternative typeclass. It represents zero or more instances of context. In our case, it tries to match zero or more jsonChar parsers.

So what does jsonChar do? Following the definition of a character in the JSON spec, first we try to match something that is not a quote ("), a backslash (\) or a control character. If that doesn’t match, we try to match the various escape characters that the specification mentions.

Finally, if we get a \u followed by 4 hexadecimal characters, we put them in a list (replicateM 4 hexDigit chains 4 hexDigit parsers and provides the output as a list), convert that list into a base 16 integer (digitsToNumber), and then convert that to a Unicode character (chr).

The order of chaining these parsers does matter for performance. The first parser in our <|> chain is the one that is most likely (most characters are not escaped). This follows from our definition of the Alternative instance. We run the first parser, then the second, and so on. We want this to succeed as early as possible so we don’t run more parsers than necessary.

Arrays

Arrays and objects have something in common — they have items which are separated by some value (commas for array values, commas for each key-value pair in an object, and colons separating keys and values). Let’s just factor this commonality out:

sepBy :: Parser i v -> Parser i s -> Parser i [v]
sepBy v s = (:) <$> v <*> many (s *> v) 
         <|> pure []

We take a parser for our values (v), and a parser for our separator (s). We try to parse one or more v separated by s, and or just return an empty list in the parser context if there are none.

Now we write our JSON array parser as:

jsonArray :: Parser String JsonValue
jsonArray = JsonArray <$> (char '[' *> (json `sepBy` char ',') <* char ']')

Nice, that’s really succinct. But wait! What is json?

Putting it all together

We know that arrays contain JSON values. And we know how to parse some JSON values. Let’s try to put those together for our recursive definition:

json :: Parser String JsonValue
json =  jsonNull
    <|> jsonBool
    <|> jsonString
    <|> jsonArray
--  <|> jsonNumber
--  <|> jsonObject

And that’s it!

The JSON object and number parsers follow the same pattern. So far we’ve ignored spaces in the input, but those can be consumed and ignored easily enough based on what we’ve learned.

You can find the complete code for this exercise on Github.

Some examples of what this looks like in the REPL:

*Json> runParser json "null"
(Just null,"")

*Json> runParser json "true"
(Just true,"")

*Json> runParser json "[null,true,\"hello!\"]"
(Just [null, true, "hello!" ],"")

Concluding thoughts

If you’ve made it this far, thank you! I realise this is long and somewhat dense, but I am very excited by how elegantly Haskell allows us to express these ideas, using fundamental aspects of its type(class) system.

A nice real world example of how you might use this is the optparse-applicative package which uses these ideas to greatly simplify the otherwise dreary task of parsing command line arguments.

I hope this post generates at least some of the excitement in you that it has in me. Feel free to leave your comments and thoughts below.

How to write GStreamer Elements in Rust Part 2: A raw audio sine wave source

A bit later than anticipated, this is now part two of the blog post series about writing GStreamer elements in Rust. Part one can be found here, and I’ll assume that everything written there is known already.

In this part, a raw audio sine wave source element is going to be written. It will be similar to the one Mathieu was writing in his blog post about writing such a GStreamer element in Python. Various details will be different though, but more about that later.

The final code can be found here.

Table of Contents

  1. Boilerplate
  2. Caps Negotiation
  3. Query Handling
  4. Buffer Creation
  5. (Pseudo) Live Mode
  6. Unlocking
  7. Seeking

Boilerplate

The first part here will be all the boilerplate required to set up the element. You can safely skip this if you remember all this from the previous blog post.

Our sine wave element is going to produce raw audio, with a number of channels and any possible sample rate with both 32 bit and 64 bit floating point samples. It will produce a simple sine wave with a configurable frequency, volume/mute and number of samples per audio buffer. In addition it will be possible to configure the element in (pseudo) live mode, meaning that it will only produce data in real-time according to the pipeline clock. And it will be possible to seek to any time/sample position on our source element. It will basically be a more simply version of the audiotestsrc element from gst-plugins-base.

So let’s get started with all the boilerplate. This time our element will be based on the BaseSrc base class instead of BaseTransform.

use glib;
use gst;
use gst::prelude::*;
use gst_base::prelude::*;
use gst_audio;

use byte_slice_cast::*;

use gst_plugin::properties::*;
use gst_plugin::object::*;
use gst_plugin::element::*;
use gst_plugin::base_src::*;

use std::{i32, u32};
use std::sync::Mutex;
use std::ops::Rem;

use num_traits::float::Float;
use num_traits::cast::NumCast;

// Default values of properties
const DEFAULT_SAMPLES_PER_BUFFER: u32 = 1024;
const DEFAULT_FREQ: u32 = 440;
const DEFAULT_VOLUME: f64 = 0.8;
const DEFAULT_MUTE: bool = false;
const DEFAULT_IS_LIVE: bool = false;

// Property value storage
#[derive(Debug, Clone, Copy)]
struct Settings {
    samples_per_buffer: u32,
    freq: u32,
    volume: f64,
    mute: bool,
    is_live: bool,
}

impl Default for Settings {
    fn default() -> Self {
        Settings {
            samples_per_buffer: DEFAULT_SAMPLES_PER_BUFFER,
            freq: DEFAULT_FREQ,
            volume: DEFAULT_VOLUME,
            mute: DEFAULT_MUTE,
            is_live: DEFAULT_IS_LIVE,
        }
    }
}

// Metadata for the properties
static PROPERTIES: [Property; 5] = [
    Property::UInt(
        "samples-per-buffer",
        "Samples Per Buffer",
        "Number of samples per output buffer",
        (1, u32::MAX),
        DEFAULT_SAMPLES_PER_BUFFER,
        PropertyMutability::ReadWrite,
    ),
    Property::UInt(
        "freq",
        "Frequency",
        "Frequency",
        (1, u32::MAX),
        DEFAULT_FREQ,
        PropertyMutability::ReadWrite,
    ),
    Property::Double(
        "volume",
        "Volume",
        "Output volume",
        (0.0, 10.0),
        DEFAULT_VOLUME,
        PropertyMutability::ReadWrite,
    ),
    Property::Boolean(
        "mute",
        "Mute",
        "Mute",
        DEFAULT_MUTE,
        PropertyMutability::ReadWrite,
    ),
    Property::Boolean(
        "is-live",
        "Is Live",
        "(Pseudo) live output",
        DEFAULT_IS_LIVE,
        PropertyMutability::ReadWrite,
    ),
];

// Stream-specific state, i.e. audio format configuration
// and sample offset
struct State {
    info: Option,
    sample_offset: u64,
    sample_stop: Option,
    accumulator: f64,
}

impl Default for State {
    fn default() -> State {
        State {
            info: None,
            sample_offset: 0,
            sample_stop: None,
            accumulator: 0.0,
        }
    }
}

// Struct containing all the element data
struct SineSrc {
    cat: gst::DebugCategory,
    settings: Mutex,
    state: Mutex,
}

impl SineSrc {
    // Called when a new instance is to be created
    fn new(element: &BaseSrc) -> Box> {
        // Initialize live-ness and notify the base class that
        // we'd like to operate in Time format
        element.set_live(DEFAULT_IS_LIVE);
        element.set_format(gst::Format::Time);

        Box::new(Self {
            cat: gst::DebugCategory::new(
                "rssinesrc",
                gst::DebugColorFlags::empty(),
                "Rust Sine Wave Source",
            ),
            settings: Mutex::new(Default::default()),
            state: Mutex::new(Default::default()),
        })
    }

    // Called exactly once when registering the type. Used for
    // setting up metadata for all instances, e.g. the name and
    // classification and the pad templates with their caps.
    //
    // Actual instances can create pads based on those pad templates
    // with a subset of the caps given here. In case of basesrc,
    // a "src" and "sink" pad template are required here and the base class
    // will automatically instantiate pads for them.
    //
    // Our element here can output f32 and f64
    fn class_init(klass: &mut BaseSrcClass) {
        klass.set_metadata(
            "Sine Wave Source",
            "Source/Audio",
            "Creates a sine wave",
            "Sebastian Dröge ",
        );

        // On the src pad, we can produce F32/F64 with any sample rate
        // and any number of channels
        let caps = gst::Caps::new_simple(
            "audio/x-raw",
            &[
                (
                    "format",
                    &gst::List::new(&[
                        &gst_audio::AUDIO_FORMAT_F32.to_string(),
                        &gst_audio::AUDIO_FORMAT_F64.to_string(),
                    ]),
                ),
                ("layout", &"interleaved"),
                ("rate", &gst::IntRange::::new(1, i32::MAX)),
                ("channels", &gst::IntRange::::new(1, i32::MAX)),
            ],
        );
        // The src pad template must be named "src" for basesrc
        // and specific a pad that is always there
        let src_pad_template = gst::PadTemplate::new(
            "src",
            gst::PadDirection::Src,
            gst::PadPresence::Always,
            &caps,
        );
        klass.add_pad_template(src_pad_template);

        // Install all our properties
        klass.install_properties(&PROPERTIES);
    }
}

impl ObjectImpl for SineSrc {
    // Called whenever a value of a property is changed. It can be called
    // at any time from any thread.
    fn set_property(&self, obj: &glib::Object, id: u32, value: &glib::Value) {
        let prop = &PROPERTIES[id as usize];
        let element = obj.clone().downcast::().unwrap();

        match *prop {
            Property::UInt("samples-per-buffer", ..) => {
                let mut settings = self.settings.lock().unwrap();
                let samples_per_buffer = value.get().unwrap();
                gst_info!(
                    self.cat,
                    obj: &element,
                    "Changing samples-per-buffer from {} to {}",
                    settings.samples_per_buffer,
                    samples_per_buffer
                );
                settings.samples_per_buffer = samples_per_buffer;
                drop(settings);

                let _ =
                    element.post_message(&gst::Message::new_latency().src(Some(&element)).build());
            }
            Property::UInt("freq", ..) => {
                let mut settings = self.settings.lock().unwrap();
                let freq = value.get().unwrap();
                gst_info!(
                    self.cat,
                    obj: &element,
                    "Changing freq from {} to {}",
                    settings.freq,
                    freq
                );
                settings.freq = freq;
            }
            Property::Double("volume", ..) => {
                let mut settings = self.settings.lock().unwrap();
                let volume = value.get().unwrap();
                gst_info!(
                    self.cat,
                    obj: &element,
                    "Changing volume from {} to {}",
                    settings.volume,
                    volume
                );
                settings.volume = volume;
            }
            Property::Boolean("mute", ..) => {
                let mut settings = self.settings.lock().unwrap();
                let mute = value.get().unwrap();
                gst_info!(
                    self.cat,
                    obj: &element,
                    "Changing mute from {} to {}",
                    settings.mute,
                    mute
                );
                settings.mute = mute;
            }
            Property::Boolean("is-live", ..) => {
                let mut settings = self.settings.lock().unwrap();
                let is_live = value.get().unwrap();
                gst_info!(
                    self.cat,
                    obj: &element,
                    "Changing is-live from {} to {}",
                    settings.is_live,
                    is_live
                );
                settings.is_live = is_live;
            }
            _ => unimplemented!(),
        }
    }

    // Called whenever a value of a property is read. It can be called
    // at any time from any thread.
    fn get_property(&self, _obj: &glib::Object, id: u32) -> Result {
        let prop = &PROPERTIES[id as usize];

        match *prop {
            Property::UInt("samples-per-buffer", ..) => {
                let settings = self.settings.lock().unwrap();
                Ok(settings.samples_per_buffer.to_value())
            }
            Property::UInt("freq", ..) => {
                let settings = self.settings.lock().unwrap();
                Ok(settings.freq.to_value())
            }
            Property::Double("volume", ..) => {
                let settings = self.settings.lock().unwrap();
                Ok(settings.volume.to_value())
            }
            Property::Boolean("mute", ..) => {
                let settings = self.settings.lock().unwrap();
                Ok(settings.mute.to_value())
            }
            Property::Boolean("is-live", ..) => {
                let settings = self.settings.lock().unwrap();
                Ok(settings.is_live.to_value())
            }
            _ => unimplemented!(),
        }
    }
}

// Virtual methods of gst::Element. We override none
impl ElementImpl for SineSrc { }

impl BaseSrcImpl for SineSrc {
    // Called when starting, so we can initialize all stream-related state to its defaults
    fn start(&self, element: &BaseSrc) -> bool {
        // Reset state
        *self.state.lock().unwrap() = Default::default();

        gst_info!(self.cat, obj: element, "Started");

        true
    }

    // Called when shutting down the element so we can release all stream-related state
    fn stop(&self, element: &BaseSrc) -> bool {
        // Reset state
        *self.state.lock().unwrap() = Default::default();

        gst_info!(self.cat, obj: element, "Stopped");

        true
    }
}

struct SineSrcStatic;

// The basic trait for registering the type: This returns a name for the type and registers the
// instance and class initializations functions with the type system, thus hooking everything
// together.
impl ImplTypeStatic for SineSrcStatic {
    fn get_name(&self) -> &str {
        "SineSrc"
    }

    fn new(&self, element: &BaseSrc) -> Box> {
        SineSrc::new(element)
    }

    fn class_init(&self, klass: &mut BaseSrcClass) {
        SineSrc::class_init(klass);
    }
}

// Registers the type for our element, and then registers in GStreamer under
// the name "sinesrc" for being able to instantiate it via e.g.
// gst::ElementFactory::make().
pub fn register(plugin: &gst::Plugin) {
    let type_ = register_type(SineSrcStatic);
    gst::Element::register(plugin, "rssinesrc", 0, type_);
}

If any of this needs explanation, please see the previous blog post and the comments in the code. The explanation for all the structs fields and what they’re good for will follow in the next sections.

With all of the above and a small addition to src/lib.rs this should compile now.

mod sinesrc;
[...]

fn plugin_init(plugin: &gst::Plugin) -> bool {
    [...]
    sinesrc::register(plugin);
    true
}

Also a couple of new crates have to be added to Cargo.toml and src/lib.rs, but you best check the code in the repository for details.

Caps Negotiation

The first part that we have to implement, just like last time, is caps negotiation. We already notified the base class about any caps that we can potentially handle via the caps in the pad template in class_init but there are still two more steps of behaviour left that we have to implement.

First of all, we need to get notified whenever the caps that our source is configured for are changing. This will happen once in the very beginning and then whenever the pipeline topology or state changes and new caps would be more optimal for the new situation. This notification happens via the BaseTransform::set_caps virtual method.

fn set_caps(&self, element: &BaseSrc, caps: &gst::CapsRef) -> bool {
        use std::f64::consts::PI;

        let info = match gst_audio::AudioInfo::from_caps(caps) {
            None => return false,
            Some(info) => info,
        };

        gst_debug!(self.cat, obj: element, "Configuring for caps {}", caps);

        element.set_blocksize(info.bpf() * (*self.settings.lock().unwrap()).samples_per_buffer);

        let settings = *self.settings.lock().unwrap();
        let mut state = self.state.lock().unwrap();

        // If we have no caps yet, any old sample_offset and sample_stop will be
        // in nanoseconds
        let old_rate = match state.info {
            Some(ref info) => info.rate() as u64,
            None => gst::SECOND_VAL,
        };

        // Update sample offset and accumulator based on the previous values and the
        // sample rate change, if any
        let old_sample_offset = state.sample_offset;
        let sample_offset = old_sample_offset
            .mul_div_floor(info.rate() as u64, old_rate)
            .unwrap();

        let old_sample_stop = state.sample_stop;
        let sample_stop =
            old_sample_stop.map(|v| v.mul_div_floor(info.rate() as u64, old_rate).unwrap());

        let accumulator =
            (sample_offset as f64).rem(2.0 * PI * (settings.freq as f64) / (info.rate() as f64));

        *state = State {
            info: Some(info),
            sample_offset: sample_offset,
            sample_stop: sample_stop,
            accumulator: accumulator,
        };

        drop(state);

        let _ = element.post_message(&gst::Message::new_latency().src(Some(element)).build());

        true
    }

In here we parse the caps into a AudioInfo and then store that in our internal state, while updating various fields. We tell the base class about the number of bytes each buffer is usually going to hold, and update our current sample position, the stop sample position (when a seek with stop position happens, we need to know when to stop) and our accumulator. This happens by scaling both positions by the old and new sample rate. If we don’t have an old sample rate, we assume nanoseconds (this will make more sense once seeking is implemented). The scaling is done with the help of the muldiv crate, which implements scaling of integer types by a fraction with protection against overflows by doing up to 128 bit integer arithmetic for intermediate values.

The accumulator is the updated based on the current phase of the sine wave at the current sample position.

As a last step we post a new LATENCY message on the bus whenever the sample rate has changed. Our latency (in live mode) is going to be the duration of a single buffer, but more about that later.

BaseSrc is by default already selecting possible caps for us, if there are multiple options. However these defaults might not be (and often are not) ideal and we should override the default behaviour slightly. This is done in the BaseSrc::fixate virtual method.

fn fixate(&self, element: &BaseSrc, caps: gst::Caps) -> gst::Caps {
        // Fixate the caps. BaseSrc will do some fixation for us, but
        // as we allow any rate between 1 and MAX it would fixate to 1. 1Hz
        // is generally not a useful sample rate.
        //
        // We fixate to the closest integer value to 48kHz that is possible
        // here, and for good measure also decide that the closest value to 1
        // channel is good.
        let mut caps = gst::Caps::truncate(caps);
        {
            let caps = caps.make_mut();
            let s = caps.get_mut_structure(0).unwrap();
            s.fixate_field_nearest_int("rate", 48_000);
            s.fixate_field_nearest_int("channels", 1);
        }

        // Let BaseSrc fixate anything else for us. We could've alternatively have
        // called Caps::fixate() here
        element.parent_fixate(caps)
    }

Here we take the caps that are passed in, truncate them (i.e. remove all but the very first Structure) and then manually fixate the sample rate to the closest value to 48kHz. By default, caps fixation would result in the lowest possible sample rate but this is usually not desired.

For good measure, we also fixate the number of channels to the closest value to 1, but this would already be the default behaviour anyway. And then chain up to the parent class’ implementation of fixate, which for now basically does the same as Caps::fixate(). After this, the caps are fixated, i.e. there is only a single Structure left and all fields have concrete values (no ranges or sets).

Query Handling

As our source element will work by generating a new audio buffer from a specific offset, and especially works in Time format, we want to notify downstream elements that we don’t want to run in Pull mode, only in Push mode. In addition would prefer sequential reading. However we still allow seeking later. For a source that does not know about Time, e.g. a file source, the format would be configured as Bytes. Other values than Time and Bytes generally don’t make any sense.

The main difference here is that otherwise the base class would ask us to produce data for arbitrary Byte offsets, and we would have to produce data for that. While possible in our case, it’s a bit annoying and for other audio sources it’s not easily possible at all.

Downstream elements will try to query this very information from us, so we now have to override the default query handling of BaseSrc and handle the SCHEDULING query differently. Later we will also handle other queries differently.

fn query(&self, element: &BaseSrc, query: &mut gst::QueryRef) -> bool {
        use gst::QueryView;

        match query.view_mut() {
            // We only work in Push mode. In Pull mode, create() could be called with
            // arbitrary offsets and we would have to produce for that specific offset
            QueryView::Scheduling(ref mut q) => {
                q.set(gst::SchedulingFlags::SEQUENTIAL, 1, -1, 0);
                q.add_scheduling_modes(&[gst::PadMode::Push]);
                return true;
            }
            _ => (),
        }
        BaseSrcBase::parent_query(element, query)
    }

To handle the SCHEDULING query specifically, we first have to match on a view (mutable because we want to modify the view) of the query check the type of the query. If it indeed is a scheduling query, we can set the SEQUENTIAL flag and specify that we handle only Push mode, then return true directly as we handled the query already.

In all other cases we fall back to the parent class’ implementation of the query virtual method.

Buffer Creation

Now we have everything in place for a working element, apart from the virtual method to actually generate the raw audio buffers with the sine wave. From a high-level BaseSrc works by calling the create virtual method over and over again to let the subclass produce a buffer until it returns an error or signals the end of the stream.

Let’s first talk about how to generate the sine wave samples themselves. As we want to operate on 32 bit and 64 bit floating point numbers, we implement a generic function for generating samples and storing them in a mutable byte slice. This is done with the help of the num_traits crate, which provides all kinds of useful traits for abstracting over numeric types. In our case we only need the Float and NumCast traits.

Instead of writing a generic implementation with those traits, it would also be possible to do the same with a simple macro that generates a function for both types. Which approach is nicer is a matter of taste in the end, the compiler output should be equivalent for both cases.

fn process(
        data: &mut [u8],
        accumulator_ref: &mut f64,
        freq: u32,
        rate: u32,
        channels: u32,
        vol: f64,
    ) {
        use std::f64::consts::PI;

        // Reinterpret our byte-slice as a slice containing elements of the type
        // we're interested in. GStreamer requires for raw audio that the alignment
        // of memory is correct, so this will never ever fail unless there is an
        // actual bug elsewhere.
        let data = data.as_mut_slice_of::().unwrap();

        // Convert all our parameters to the target type for calculations
        let vol: F = NumCast::from(vol).unwrap();
        let freq = freq as f64;
        let rate = rate as f64;
        let two_pi = 2.0 * PI;

        // We're carrying a accumulator with up to 2pi around instead of working
        // on the sample offset. High sample offsets cause too much inaccuracy when
        // converted to floating point numbers and then iterated over in 1-steps
        let mut accumulator = *accumulator_ref;
        let step = two_pi * freq / rate;

        for chunk in data.chunks_mut(channels as usize) {
            let value = vol * F::sin(NumCast::from(accumulator).unwrap());
            for sample in chunk {
                *sample = value;
            }

            accumulator += step;
            if accumulator >= two_pi {
                accumulator -= two_pi;
            }
        }

        *accumulator_ref = accumulator;
    }

This function takes the mutable byte slice from our buffer as argument, as well as the current value of the accumulator and the relevant settings for generating the sine wave.

As a first step, we “cast” the byte slice to one of the target type (f32 or f64) with the help of the byte_slice_cast crate. This ensures that alignment and sizes are all matching and returns a mutable slice of our target type if successful. In case of GStreamer, the buffer alignment is guaranteed to be big enough for our types here and we allocate the buffer of a correct size later.

Now we convert all the parameters to the types we will use later, and store them together with the current accumulator value in local variables. Then we iterate over the whole floating point number slice in chunks with all channels, and fill each channel with the current value of our sine wave.

The sine wave itself is calculated by val = volume * sin(2 * PI * frequency * (i + accumulator) / rate), but we actually calculate it by simply increasing the accumulator by 2 * PI * frequency / rate for every sample instead of doing the multiplication for each sample. We also make sure that the accumulator always stays between 0 and 2 * PI to prevent any inaccuracies from floating point numbers to affect our produced samples.

Now that this is done, we need to implement the BaseSrc::create virtual method for actually allocating the buffer, setting timestamps and other metadata and it and calling our above function.

fn create(
        &self,
        element: &BaseSrc,
        _offset: u64,
        _length: u32,
    ) -> Result {
        // Keep a local copy of the values of all our properties at this very moment. This
        // ensures that the mutex is never locked for long and the application wouldn't
        // have to block until this function returns when getting/setting property values
        let settings = *self.settings.lock().unwrap();

        // Get a locked reference to our state, i.e. the input and output AudioInfo
        let mut state = self.state.lock().unwrap();
        let info = match state.info {
            None => {
                gst_element_error!(element, gst::CoreError::Negotiation, ["Have no caps yet"]);
                return Err(gst::FlowReturn::NotNegotiated);
            }
            Some(ref info) => info.clone(),
        };

        // If a stop position is set (from a seek), only produce samples up to that
        // point but at most samples_per_buffer samples per buffer
        let n_samples = if let Some(sample_stop) = state.sample_stop {
            if sample_stop <= state.sample_offset {
                gst_log!(self.cat, obj: element, "At EOS");
                return Err(gst::FlowReturn::Eos);
            }

            sample_stop - state.sample_offset
        } else {
            settings.samples_per_buffer as u64
        };

        // Allocate a new buffer of the required size, update the metadata with the
        // current timestamp and duration and then fill it according to the current
        // caps
        let mut buffer =
            gst::Buffer::with_size((n_samples as usize) * (info.bpf() as usize)).unwrap();
        {
            let buffer = buffer.get_mut().unwrap();

            // Calculate the current timestamp (PTS) and the next one,
            // and calculate the duration from the difference instead of
            // simply the number of samples to prevent rounding errors
            let pts = state
                .sample_offset
                .mul_div_floor(gst::SECOND_VAL, info.rate() as u64)
                .unwrap()
                .into();
            let next_pts: gst::ClockTime = (state.sample_offset + n_samples)
                .mul_div_floor(gst::SECOND_VAL, info.rate() as u64)
                .unwrap()
                .into();
            buffer.set_pts(pts);
            buffer.set_duration(next_pts - pts);

            // Map the buffer writable and create the actual samples
            let mut map = buffer.map_writable().unwrap();
            let data = map.as_mut_slice();

            if info.format() == gst_audio::AUDIO_FORMAT_F32 {
                Self::process::(
                    data,
                    &mut state.accumulator,
                    settings.freq,
                    info.rate(),
                    info.channels(),
                    settings.volume,
                );
            } else {
                Self::process::(
                    data,
                    &mut state.accumulator,
                    settings.freq,
                    info.rate(),
                    info.channels(),
                    settings.volume,
                );
            }
        }
        state.sample_offset += n_samples;
        drop(state);

        gst_debug!(self.cat, obj: element, "Produced buffer {:?}", buffer);

        Ok(buffer)
    }

Just like last time, we start with creating a copy of our properties (settings) and keeping a mutex guard of the internal state around. If the internal state has no AudioInfo yet, we error out. This would mean that no caps were negotiated yet, which is something we can’t handle and is not really possible in our case.

Next we calculate how many samples we have to generate. If a sample stop position was set by a seek event, we have to generate samples up to at most that point. Otherwise we create at most the number of samples per buffer that were set via the property. Then we allocate a buffer of the corresponding size, with the help of the bpf field of the AudioInfo, and then set its metadata and fill the samples.

The metadata that is set is the timestamp (PTS), and the duration. The duration is calculated from the difference of the following buffer’s timestamp and the current buffer’s. By this we ensure that rounding errors are not causing the next buffer’s timestamp to have a different timestamp than the sum of the current’s and its duration. While this would not be much of a problem in GStreamer (inaccurate and jitterish timestamps are handled just fine), we can prevent it here and do so.

Afterwards we call our previously defined function on the writably mapped buffer and fill it with the sample values.

With all this, the element should already work just fine in any GStreamer-based application, for example gst-launch-1.0. Don’t forget to set the GST_PLUGIN_PATH environment variable correctly like last time. Before running this, make sure to turn down the volume of your speakers/headphones a bit.

export GST_PLUGIN_PATH=`pwd`/target/debug
gst-launch-1.0 rssinesrc freq=440 volume=0.9 ! audioconvert ! autoaudiosink

You should hear a 440Hz sine wave now.

(Pseudo) Live Mode

Many audio (and video) sources can actually only produce data in real-time and data is produced according to some clock. So far our source element can produce data as fast as downstream is consuming data, but we optionally can change that. We simulate a live source here now by waiting on the pipeline clock, but with a real live source you would only ever be able to have the data in real-time without any need to wait on a clock. And usually that data is produced according to a different clock than the pipeline clock, in which case translation between the two clocks is needed but we ignore this aspect for now. For details check the GStreamer documentation.

For working in live mode, we have to add a few different parts in various places. First of all, we implement waiting on the clock in the create function.

fn create(...
        [...]
        state.sample_offset += n_samples;
        drop(state);

        // If we're live, we are waiting until the time of the last sample in our buffer has
        // arrived. This is the very reason why we have to report that much latency.
        // A real live-source would of course only allow us to have the data available after
        // that latency, e.g. when capturing from a microphone, and no waiting from our side
        // would be necessary..
        //
        // Waiting happens based on the pipeline clock, which means that a real live source
        // with its own clock would require various translations between the two clocks.
        // This is out of scope for the tutorial though.
        if element.is_live() {
            let clock = match element.get_clock() {
                None => return Ok(buffer),
                Some(clock) => clock,
            };

            let segment = element
                .get_segment()
                .downcast::()
                .unwrap();
            let base_time = element.get_base_time();
            let running_time = segment.to_running_time(buffer.get_pts() + buffer.get_duration());

            // The last sample's clock time is the base time of the element plus the
            // running time of the last sample
            let wait_until = running_time + base_time;
            if wait_until.is_none() {
                return Ok(buffer);
            }

            let id = clock.new_single_shot_id(wait_until).unwrap();

            gst_log!(
                self.cat,
                obj: element,
                "Waiting until {}, now {}",
                wait_until,
                clock.get_time()
            );
            let (res, jitter) = id.wait();
            gst_log!(
                self.cat,
                obj: element,
                "Waited res {:?} jitter {}",
                res,
                jitter
            );
        }

        gst_debug!(self.cat, obj: element, "Produced buffer {:?}", buffer);

        Ok(buffer)
    }

To be able to wait on the clock, we first of all need to calculate the clock time until when we want to wait. In our case that will be the clock time right after the end of the last sample in the buffer we just produced. Simply because you can’t capture a sample before it was produced.

We calculate the running time from the PTS and duration of the buffer with the help of the currently configured segment and then add the base time of the element on this to get the clock time as result. Please check the GStreamer documentation for details, but in short the running time of a pipeline is the time since the start of the pipeline (or the last reset of the running time) and the running time of a buffer can be calculated from its PTS and the segment, which provides the information to translate between the two. The base time is the clock time when the pipeline went to the Playing state, so just an offset.

Next we wait and then return the buffer as before.

Now we also have to tell the base class that we’re running in live mode now. This is done by calling set_live(true) on the base class before changing the element state from Ready to Paused. For this we override the Element::change_state virtual method.

impl ElementImpl for SineSrc {
    fn change_state(
        &self,
        element: &BaseSrc,
        transition: gst::StateChange,
    ) -> gst::StateChangeReturn {
        // Configure live'ness once here just before starting the source
        match transition {
            gst::StateChange::ReadyToPaused => {
                element.set_live(self.settings.lock().unwrap().is_live);
            }
            _ => (),
        }

        element.parent_change_state(transition)
    }
}

And as a last step, we also need to notify downstream elements about our latency. Live elements always have to report their latency so that synchronization can work correctly. As the clock time of each buffer is equal to the time when it was created, all buffers would otherwise arrive late in the sinks (they would appear as if they should’ve been played already at the time when they were created). So all the sinks will have to compensate for the latency that it took from capturing to the sink, and they have to do that in a coordinated way (otherwise audio and video would be out of sync if both have different latencies). For this the pipeline is querying each sink for the latency on its own branch, and then configures a global latency on all sinks according to that.

This querying is done with the LATENCY query, which we will now also have to handle.

fn query(&self, element: &BaseSrc, query: &mut gst::QueryRef) -> bool {
        use gst::QueryView;

        match query.view_mut() {
            // We only work in Push mode. In Pull mode, create() could be called with
            // arbitrary offsets and we would have to produce for that specific offset
            QueryView::Scheduling(ref mut q) => {
                [...]
            }
            // In Live mode we will have a latency equal to the number of samples in each buffer.
            // We can't output samples before they were produced, and the last sample of a buffer
            // is produced that much after the beginning, leading to this latency calculation
            QueryView::Latency(ref mut q) => {
                let settings = *self.settings.lock().unwrap();
                let state = self.state.lock().unwrap();

                if let Some(ref info) = state.info {
                    let latency = gst::SECOND
                        .mul_div_floor(settings.samples_per_buffer as u64, info.rate() as u64)
                        .unwrap();
                    gst_debug!(self.cat, obj: element, "Returning latency {}", latency);
                    q.set(settings.is_live, latency, gst::CLOCK_TIME_NONE);
                    return true;
                } else {
                    return false;
                }
            }
            _ => (),
        }
        BaseSrcBase::parent_query(element, query)
    }

The latency that we report is the duration of a single audio buffer, because we’re simulating a real live source here. A real live source won’t be able to output the buffer before the last sample of it is captured, and the difference between when the first and last sample were captured is exactly the latency that we add here. Other elements further downstream that introduce further latency would then add their own latency on top of this.

Inside the latency query we also signal that we are indeed a live source, and additionally how much buffering we can do (in our case, infinite) until data would be lost. The last part is important if e.g. the video branch has a higher latency, causing the audio sink to have to wait some additional time (so that audio and video stay in sync), which would then require the whole audio branch to buffer some data. As we have an artificial live source, we can always generate data for the next time but a real live source would only have a limited buffer and if no data is read and forwarded once that runs full, data would get lost.

You can test this again with e.g. gst-launch-1.0 by setting the is-live property to true. It should write in the output now that the pipeline is live.

In Mathieu’s blog post this was implemented without explicit waiting and the usage of the get_times virtual method, but as this is only really useful for pseudo live sources like this one I decided to explain how waiting on the clock can be achieved correctly and even more important how that relates to the next section.

Unlocking

With the addition of the live mode, the create function is now blocking and waiting on the clock for some time. This is suboptimal as for example a (flushing) seek would have to wait now until the clock waiting is done, or when shutting down the application would have to wait.

To prevent this, all waiting/blocking in GStreamer streaming threads should be interruptible/cancellable when requested. And for example the ClockID that we got from the clock for waiting can be cancelled by calling unschedule() on it. We only have to do it from the right place and keep it accessible. The right place is the BaseSrc::unlock virtual method.

struct ClockWait {
    clock_id: Option,
    flushing: bool,
}

struct SineSrc {
    cat: gst::DebugCategory,
    settings: Mutex,
    state: Mutex,
    clock_wait: Mutex,
}

[...]

    fn unlock(&self, element: &BaseSrc) -> bool {
        // This should unblock the create() function ASAP, so we
        // just unschedule the clock it here, if any.
        gst_debug!(self.cat, obj: element, "Unlocking");
        let mut clock_wait = self.clock_wait.lock().unwrap();
        if let Some(clock_id) = clock_wait.clock_id.take() {
            clock_id.unschedule();
        }
        clock_wait.flushing = true;

        true
    }

We store the clock ID in our struct, together with a boolean to signal whether we’re supposed to flush already or not. And then inside unlock unschedule the clock ID and set this boolean flag to true.

Once everything is unlocked, we need to reset things again so that data flow can happen in the future. This is done in the unlock_stop virtual method.

fn unlock_stop(&self, element: &BaseSrc) -> bool {
        // This signals that unlocking is done, so we can reset
        // all values again.
        gst_debug!(self.cat, obj: element, "Unlock stop");
        let mut clock_wait = self.clock_wait.lock().unwrap();
        clock_wait.flushing = false;

        true
    }

To make sure that this struct is always initialized correctly, we also call unlock from stop, and unlock_stop from start.

Now as a last step, we need to actually make use of the new struct we added around the code where we wait for the clock.

// Store the clock ID in our struct unless we're flushing anyway.
            // This allows to asynchronously cancel the waiting from unlock()
            // so that we immediately stop waiting on e.g. shutdown.
            let mut clock_wait = self.clock_wait.lock().unwrap();
            if clock_wait.flushing {
                gst_debug!(self.cat, obj: element, "Flushing");
                return Err(gst::FlowReturn::Flushing);
            }

            let id = clock.new_single_shot_id(wait_until).unwrap();
            clock_wait.clock_id = Some(id.clone());
            drop(clock_wait);

            gst_log!(
                self.cat,
                obj: element,
                "Waiting until {}, now {}",
                wait_until,
                clock.get_time()
            );
            let (res, jitter) = id.wait();
            gst_log!(
                self.cat,
                obj: element,
                "Waited res {:?} jitter {}",
                res,
                jitter
            );
            self.clock_wait.lock().unwrap().clock_id.take();

            // If the clock ID was unscheduled, unlock() was called
            // and we should return Flushing immediately.
            if res == gst::ClockReturn::Unscheduled {
                gst_debug!(self.cat, obj: element, "Flushing");
                return Err(gst::FlowReturn::Flushing);
            }

The important part in this code is that we first have to check if we are already supposed to unlock, before even starting to wait. Otherwise we would start waiting without anybody ever being able to unlock. Then we need to store the clock id in the struct and make sure to drop the mutex guard so that the unlock function can take it again for unscheduling the clock ID. And once waiting is done, we need to remove the clock id from the struct again and in case of ClockReturn::Unscheduled we directly return FlowReturn::Flushing instead of the error.

Similarly when using other blocking APIs it is important that they are woken up in a similar way when unlock is called. Otherwise the application developer’s and thus user experience will be far from ideal.

Seeking

As a last feature we implement seeking on our source element. In our case that only means that we have to update the sample_offset and sample_stop fields accordingly, other sources might have to do more work than that.

Seeking is implemented in the BaseSrc::do_seek virtual method, and signalling whether we can actually seek in the is_seekable virtual method.

fn is_seekable(&self, _element: &BaseSrc) -> bool {
        true
    }

    fn do_seek(&self, element: &BaseSrc, segment: &mut gst::Segment) -> bool {
        // Handle seeking here. For Time and Default (sample offset) seeks we can
        // do something and have to update our sample offset and accumulator accordingly.
        //
        // Also we should remember the stop time (so we can stop at that point), and if
        // reverse playback is requested. These values will all be used during buffer creation
        // and for calculating the timestamps, etc.

        if segment.get_rate() < 0.0 {
            gst_error!(self.cat, obj: element, "Reverse playback not supported");
            return false;
        }

        let settings = *self.settings.lock().unwrap();
        let mut state = self.state.lock().unwrap();

        // We store sample_offset and sample_stop in nanoseconds if we
        // don't know any sample rate yet. It will be converted correctly
        // once a sample rate is known.
        let rate = match state.info {
            None => gst::SECOND_VAL,
            Some(ref info) => info.rate() as u64,
        };

        if let Some(segment) = segment.downcast_ref::() {
            use std::f64::consts::PI;

            let sample_offset = segment
                .get_start()
                .unwrap()
                .mul_div_floor(rate, gst::SECOND_VAL)
                .unwrap();

            let sample_stop = segment
                .get_stop()
                .map(|v| v.mul_div_floor(rate, gst::SECOND_VAL).unwrap());

            let accumulator =
                (sample_offset as f64).rem(2.0 * PI * (settings.freq as f64) / (rate as f64));

            gst_debug!(
                self.cat,
                obj: element,
                "Seeked to {}-{:?} (accum: {}) for segment {:?}",
                sample_offset,
                sample_stop,
                accumulator,
                segment
            );

            *state = State {
                info: state.info.clone(),
                sample_offset: sample_offset,
                sample_stop: sample_stop,
                accumulator: accumulator,
            };

            true
        } else if let Some(segment) = segment.downcast_ref::() {
            use std::f64::consts::PI;

            if state.info.is_none() {
                gst_error!(
                    self.cat,
                    obj: element,
                    "Can only seek in Default format if sample rate is known"
                );
                return false;
            }

            let sample_offset = segment.get_start().unwrap();
            let sample_stop = segment.get_stop().0;

            let accumulator =
                (sample_offset as f64).rem(2.0 * PI * (settings.freq as f64) / (rate as f64));

            gst_debug!(
                self.cat,
                obj: element,
                "Seeked to {}-{:?} (accum: {}) for segment {:?}",
                sample_offset,
                sample_stop,
                accumulator,
                segment
            );

            *state = State {
                info: state.info.clone(),
                sample_offset: sample_offset,
                sample_stop: sample_stop,
                accumulator: accumulator,
            };

            true
        } else {
            gst_error!(
                self.cat,
                obj: element,
                "Can't seek in format {:?}",
                segment.get_format()
            );

            false
        }
    }

Currently no support for reverse playback is implemented here, that is left as an exercise for the reader. So as a first step we check if the segment has a negative rate, in which case we just fail and return false.

Afterwards we again take a copy of the settings, keep a mutable mutex guard of our state and then start handling the actual seek.

If no caps are known yet, i.e. the AudioInfo is None, we assume a rate of 1 billion. That is, we just store the time in nanoseconds for now and let the set_caps function take care of that (which we already implemented accordingly) once the sample rate is known.

Then, if a Time seek is performed, we convert the segment start and stop position from time to sample offsets and save them. And then update the accumulator in a similar way as in the set_caps function. If a seek is in Default format (i.e. sample offsets for raw audio), we just have to store the values and update the accumulator but only do so if the sample rate is known already. A sample offset seek does not make any sense until the sample rate is known, so we just fail here to prevent unexpected surprises later.

Try the following pipeline for testing seeking. You should be able to seek the current time drawn over the video, and with the left/right cursor key you can seek. Also this shows that we create a quite nice sine wave.

gst-launch-1.0 rssinesrc ! audioconvert ! monoscope ! timeoverlay ! navseek ! glimagesink

And with that all features are implemented in our sine wave raw audio source.

February 20, 2018

Why I am excited about Unity in 2018

While I had promised my friend Lucas that I would build a game in Unity for what seems like a decade, I still have not managed to build one.

Recently Aras shared his excitement for Unity in 2018. There is a ton on that blog post to unpack.

What I am personally excited about is that Unity now ships an up-to-date Mono in the core.

Jonathan Chambers and his team of amazing low-level VM hackers have been hard at work in upgrading Unity's VM and libraries to bring you the latest and greatest Mono runtime to Unity. We have had the privilege of assisting in this migration and providing them with technical support for this migration.

The work that the Unity team has done lays down the foundation for an ongoing update to their .NET capabilities, so future innovation on the platform can be quickly adopted, bringing new and more joyful capabilities to developers in the platform.

With this new runtime, Unity developers will be able to access and consume a large portion of third party .NET libraries, including all those shiny .NET Standard Libraries - the new universal way of sharing code in the .NET world.

C# 7

The Unity team has also provided very valuable guidance to the C# team which have directly influenced features in C# 7 like ref locals and returns - In our own tests using C# for an AR application, we doubled the speed of managed-code AR processing by using these new features.

When users use the new Mono support in Unity, they default to C# 6, as this is the version that Mono's C# compiler fully supports. One of the challenges is that Mono's C# compiler has not fully implemented support for C# 7, as Mono itself moved to Roslyn.

The team at Unity is now working with the Roslyn team to adopt the Roslyn C# compiler in Unity. Because Roslyn is a larger compiler, it is a slower compiler to startup, and Unity does many small incremental compilations. So the team is working towards adopting the server compilation mode of Roslyn. This runs the Roslyn C# compiler as a reusable service which can compile code very quickly, without having to pay the price for startup every time.

Visual Studio

If you install the Unity beta today, you will also see that on Mac, it now defaults to Visual Studio for Mac as its default editor.

JB evain leads our Unity support for Visual Studio and he has brought the magic of his Unity plugin to Visual Studio for Mac.

As Unity upgrades its Mono runtime, they also benefit from the extended debugger protocol support in Mono, which bring years of improvements to the debugging experience.

Magic!

Mono has a pure C# implementation of the Windows.Forms stack which works on Mac, Linux and Windows. It emulates some of the core of the Win32 API to achieve this.

While Mono's Windows.Forms is not an actively developed UI stack, it is required by a number of third party libraries, some data types are consumed by other Mono libraries (part of the original design contract), so we have kept it around.

On Mac, Mono's Windows.Forms was built on top of Carbon, an old C-based API that was available on MacOS. This backend was written by Geoff Norton for Mono many years ago.

As Mono switched to 64 bits by default, this meant that Windows.Forms could not be used. We have a couple of options, try to upgrade the 32-bit Carbon code to 64-bit Carbon code or build a new backend on top of Cocoa (using Xamarin.Mac).

For years, I had assumed that Carbon on 64 was not really supported, but a recent trip to the console shows that Apple has added a 64-bit port. Either my memory is failing me (common at this age), or Apple at some point changed their mind. I spent all of 20 minutes trying to do an upgrade, but the header files and documentation for the APIs we rely on are just not available, so at best, I would be doing some guess work as to which APIs have been upgraded to 64 bits, and which APIs are available (rumors on Google searches indicate that while Carbon is 64 bits, not all APIs might be there).

I figured that I could try to build a Cocoa backend with Xamarin.Mac, so I sent this pull request to let me do this outside of the Mono tree on my copious spare time, so this weekend I did some work on the Cocoa Driver.

But this morning, on twitter, Filip Navarra noticed the above, and contacted me:

He has been kind enough to upload this Cocoa-based backend to GitHub.

Going Native

There are a couple of interesting things about this Windows.Forms backend for Cocoa.

The first one, is that it is using sysdrawing-coregraphics, a custom version of System.Drawing that we had originally developed for iOS users that implements the API in terms of CoreGraphics instead of using Cairo, FontConfig, FreeType and Pango.

The second one, is that some controls are backed by native AppKit controls, those that implement the IMacNativeControl interface. Among those you can find Button, ComboBox, ProgressBar, ScrollBar and the UpDownStepper.

I will now abandon my weekend hack, and instead get this code drop integrated as the 64-bit Cocoa backend.

Stay tuned!

2018-02-20 Tuesday

  • Mail; old-friend / partner call, more partner poking.

CSS Grid

This would totally have been a tweet or a facebook post, but I’ve decided to invest a little more energy and post these on my blog, accessible to everybody. Getting old, I guess. We’re all mortal and the web isn’t open by its own.

In the past few days I’ve been learning about CSS grid while redesigning Flatpak and Flathub sites (still coming). And with the knowledge of really grokking only a fraction of it, I’m in love. So far I really dig:

  • Graceful fallback
  • Layout fully controlled by the theme
  • Controlled whitespace (meaning the layout won’t fall apart when you add or remove some whitespace)
  • Reasonable code legibility
  • Responsive layouts even without media queries

The fact that things are sized and defined very differently and getting grips with implicit sizing will take some time, but it seems to have all the answers to the problems I ran into so far. Do note that I never got super fluent in the flexbox, either.

I love the few video bites that Jen Simmons publishes periodically. The only downside to all this is seeing the mess with the legacy grid systems I have on the numerous websites like this one.

February 19, 2018

2018-02-19 Monday

  • Babes back to school; quick Old Story New Bible study at breakfast. Mail chew, chat with Florian, sync with Kendy, status report, mail. Plugged away at admin.

Rust things I miss in C

Librsvg feels like it is reaching a tipping point, where suddenly it seems like it would be easier to just port some major parts from C to Rust than to just add accessors for them. Also, more and more of the meat of the library is in Rust now.

I'm switching back and forth a lot between C and Rust these days, and C feels very, very primitive these days.

A sort of elegy to C

I fell in love with the C language about 24 years ago. I learned the basics of it by reading a Spanish translation of The C Programming Language by K&R second edition. I had been using Turbo Pascal before in a reasonably low-level fashion, with pointers and manual memory allocation, and C felt refreshing and empowering.

K&R is a great book for its style of writing and its conciseness of programming. This little book even taught you how to implement a simple malloc()/free(), which was completely enlightening. Even low-level constructs that seemed part of the language could be implemented in the language itself!

I got good at C over the following years. It is a small language, with a small standard library. It was probably the perfect language to implement Unix kernels in 20,000 lines of code or so.

The GIMP and GTK+ taught me how to do fancy object orientation in C. GNOME taught me how to maintain large-scale software in C. 20,000 lines of C code started to seem like a project one could more or less fully understand in a few weeks.

But our code bases are not that small anymore. Our software now has huge expectations on the features that are available in the language's standard library.

Some good experiences with C

Reading the POV-Ray code source code for the first time and learning how to do object orientation and inheritance in C.

Reading the GTK+ source code and learning a C style that was legible, maintainable, and clean.

Reading SIOD's source code, then the early Guile sources, and seeing how a Scheme interpreter can be written in C.

Writing the initial versions of Eye of Gnome and fine-tuning the microtile rendering.

Some bad experiences with C

In the Evolution team, when everything was crashing. We had to buy a Solaris machine just to be able to buy Purify; there was no Valgrind back then.

Debugging gnome-vfs threading deadlocks.

Debugging Mesa and getting nowhere.

Taking over the intial versions of Nautilus-share and seeing that it never free()d anything.

Trying to refactor code where I had no idea about the memory management strategy.

Trying to turn code into a library when it is full of global variables and no functions are static.

But anyway — let's get on with things in Rust I miss in C.

Automatic resource management

One of the first blog posts I read about Rust was "Rust means never having to close a socket". Rust borrows C++'s ideas about Resource Acquisition Is Initialization (RAII), Smart Pointers, adds in the single-ownership principle for values, and gives you automatic, deterministic resource management in a very neat package.

  • Automatic: you don't free() by hand. Memory gets deallocated, files get closed, mutexes get unlocked when they go out of scope. If you are wrapping an external resource, you just implement the Drop trait and that's basically it. The wrapped resource feels like part of the language since you don't have to babysit its lifetime by hand.

  • Deterministic: resources get created (memory allocated, initialized, files opened, etc.), and they get destroyed when they go out of scope. There is no garbage collection: things really get terminated when you close a brace. You start to see your program's data lifetimes as a tree of function calls.

After forgetting to free/close/destroy C objects all the time, or worse, figuring out where code that I didn't write forgot to do those things (or did them twice, incorrectly)... I don't want to do it again.

Generics

Vec<T> really is a vector of whose elements are the size of T. It's not an array of pointers to individually allocated objects. It gets compiled specifically to code that can only handle objects of type T.

After writing many janky macros in C to do similar things... I don't want to do it again.

Traits are not just interfaces

Rust is not a Java-like object-oriented language. Instead it has traits, which at first seem like Java interfaces — an easy way to do dynamic dispatch, so that if an object implements Drawable then you can assume it has a draw() method.

However, traits are more powerful than that.

Associated types

Traits can have associated types. As an example, Rust provies the Iterator trait which you can implement:

pub trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}

This means that whenever you implement Iterator for some iterable object, you also have to specify an Item type for the things that will be produced. If you call next() and there are more elements, you'll get back a Some(YourElementType). When your iterator runs out of items, it will return None.

Associated types can refer to other traits.

For example, in Rust, you can use for loops on anything that implements the IntoIterator trait:

pub trait IntoIterator {
    /// The type of the elements being iterated over.
    type Item;

    /// Which kind of iterator are we turning this into?
    type IntoIter: Iterator<Item=Self::Item>;

    fn into_iter(self) -> Self::IntoIter;
}

When implementing this trait, you must provide both the type of the Item which your iterator will produce, and IntoIter, the actual type that implements Iterator and that holds your iterator's state.

This way you can build webs of types that refer to each other. You can have a trait that says, "I can do foo and bar, but only if you give me a type that can do this and that".

Slices

I already posted about the lack of string slices in C and how this is a pain in the ass once you get used to having them.

Modern tooling for dependency management

Instead of

  • Having to invoke pkg-config by hand or with Autotools macros
  • Wrangling include paths for header files...
  • ... and library files.
  • And basically depending on the user to ensure that the correct versions of libraries are installed,

You write a Cargo.toml file which lists the names and versions of your dependencies. These get downloaded from a well-known location, or from elsewhere if you specify.

You don't have to fight dependencies. It just works when you cargo build.

Tests

C makes it very hard to have unit tests for several reasons:

  • Internal functions are often static. This means they can't be called outside of the source file that defined them. A test program either has to #include the source file where the static functions live, or use #ifdefs to remove the statics only during testing.

  • You have to write Makefile-related hackery to link the test program to only part of your code's dependencies, or to only part of the rest of your code.

  • You have to pick a testing framework. You have to register tests against the testing framework. You have to learn the testing framework.

In Rust you write

#[test]
fn test_that_foo_works() {
    assert!(foo() == expected_result);
}

anywhere in your program or library, and when you type cargo test, IT JUST FUCKING WORKS. That code only gets linked into the test binary. You don't have to compile anything twice by hand, or write Makefile hackery, or figure out how to extract internal functions for testing.

This is a very killer feature for me.

Documentation, with tests

Rust generates documentation from comments in Markdown syntax. Code in the docs gets run as tests. You can illustrate how a function is used and test it at the same time:

/// Multiples the specified number by two
///
/// ```
/// assert_eq!(multiply_by_two(5), 10);
/// ```
fn multiply_by_two(x: i32) -> i32 {
    x * 2
}

Your example code gets run as tests to ensure that your documentation stays up to date with the actual code.

Hygienic macros

Rust has hygienic macros that avoid all of C's problems with things in macros that inadvertently shadow identifiers in the code. You don't need to write macros where every symbol has to be in parentheses for max(5 + 3, 4) to work correctly.

No automatic coercions

All the bugs in C that result from inadvertently converting an int to a short or char or whatever — Rust doesn't do them. You have to explicitly convert.

No integer overflow

Enough said.

Generally, no undefined behavior in safe Rust

In Rust, it is considered a bug in the language if something written in "safe Rust" (what you would be allowed to write outside unsafe {} blocks) results in undefined behavior. You can shift-right a negative integer and it will do exactly what you expect.

Pattern matching

You know how gcc warns you if you switch() on an enum but don't handle all values? That's like a little baby.

Rust has pattern matching in various places. It can do that trick for enums inside a match() expression. It can do destructuring so you can return multiple values from a function:

impl f64 {
    pub fn sin_cos(self) -> (f64, f64);
}

let angle: f64 = 42.0;
let (sin_angle, cos_angle) = angle.sin_cos();

You can match() on strings. YOU CAN MATCH ON FUCKING STRINGS.

let color = "green";

match color {
    "red"   => println!("it's red"),
    "green" => println!("it's green"),
    _       => println!("it's something else"),
}

You know how this is illegible?

my_func(true, false, false)

How about this instead, with pattern matching on function arguments:

pub struct Fubarize(pub bool);
pub struct Frobnify(pub bool);
pub struct Bazificate(pub bool);

fn my_func(Fubarize(fub): Fubarize, 
           Frobnify(frob): Frobnify, 
           Bazificate(baz): Bazificate) {
    if fub {
        ...;
    }

    if frob && baz {
        ...;
    }
}

...

my_func(Fubarize(true), Frobnify(false), Bazificate(true));

Standard, useful error handling

I've talked at length about this. No more returning a boolean with no extra explanation for an error, no ignoring errors inadvertently, no exception handling with nonlocal jumps.

#[derive(Debug)]

If you write a new type (say, a struct with a ton of fields), you can #[derive(Debug)] and Rust will know how to automatically print that type's contents for debug output. You no longer have to write a special function that you must call in gdb by hand just to examine a custom type.

Closures

No more passing function pointers and a user_data by hand.

Conclusion

I haven't done the "fearless concurrency" bit yet, where the compiler is able to prevent data races in threaded code. I imagine it being a game-changer for people who write concurrent code on an everyday basis.

C is an old language with primitive constructs and primitive tooling. It was a good language for small uniprocessor Unix kernels that ran in trusted, academic environments. It's no longer a good language for the software of today.

Rust is not easy to learn, but I think it is completely worth it. It's hard because it demands a lot from your understanding of the code you want to write. I think it's one of those languages that make you a better programmer and that let you tackle more ambitious problems.

February 18, 2018

Projects and features Meson could use help with

A question I was asked during my LCA2018 presentation was how people could help the Meson project. I could not come up with proper projects off the cuff, so here are a bunch of things that have come up since. Feel free to contact us via IRC, email or any other medium if you wish to contribute.

WrapDB wrangler

WrapDB provides a simple way to download source dependencies automatically. Basically it takes an upstream release tarball, adds Meson build files to it if needed and publishes the result on the web. The work consists mostly of reviewing and merging submissions from the community. Creating your own is also fine. This is a fairly lightweight task, only requiring actions every now and then (submissions come less than once a week, typically).

CI fixer upper

For CI we use the free tiers of Travis and Appveyor. This works fairly well but it is very slow because our testing matrix is huge. Running the full test suite through AppVeyor takes about an hour. This slows us down a fair bit and in addition both CI providers have a nasty habit of breaking down fairly often. We also don't want to do priced tiers because they get ridiculously expensive for our usage pattern (as in, a few months of paid for macOS would cost more than a brand new Mac Mini).

We don't have any good ideas on how to make this better. If you do, let us know.

Large scale regression tester

Meson is being used by a fairly large number of projects. This makes fixing bugs and refactoring code challenging because there is the possibility of regressions. It would be nice if we could do something similar to Rust developers and rebuild all or a large fraction of projects using Meson with the trunk version every now and then.

XCode backend improvements

The XCode backend is currently a bit crappy. The main reason for this is that the XCode project file format is awful in many ways. The two main reasons being that it is completely undocumented and the fact that it is not really a file format as such, but more of a memory dump of XCode's internal data structures. But if you are the sort of person who enjoys a challenge of battling against windmills, this might be for you.

Meson build file rewriter

Integration with IDEs and the like is important and we want to provide tools for operations such as "add source file X to target Y" so everyone does not have to write their own implementations. There is actually code for this in trunk but it is quite limited and has bitrotted a fair bit. Resurrecting and making the code actually work would be very welcome.

Introspect improvements

This one also aims to improve the IDE integration features of Meson. As an example you can only get information about build targets one by one. This means that getting the information from a project that has thousands of targets takes forever. We really need a batch exporter so IDEs can grab all necessary project information in one go. There are probably a bunch of other things to improve as well.

Could these be done as part of gsoc/outreachy/other?

Possibly. Meson is not really an "entity" in the gsoc sense but we could potentially get something accepted under the Gnome umbrella. However anyone is welcome to submit patches, obviously, and several of the topics listed above are not nicely self-contained projects that would fit in the gsoc mold at all.

Introducing deviced

Over the past couple of weeks I’ve been heads down working on a new tool along with Patrick Griffis. The purpose of this tool is to make it easier to integrate IDEs and other tooling with GNU-based gadgets like phones, tablets, infotainment, and IoT devices.

Years ago I was working on a GNOME-integrated home router with davidz which sadly we never finished. One thing that was obvious to me in that moment of time was that I will not do another large scale project until I have better tooling. That was Builder’s genesis and device integration is what will make it truly useful to myself and others who love playing with GNU-friendly gadgets.

Now, building an IDE is a long process. There is a ton of code to write, trade-offs to work through, and persistence beyond what any reasonable programmer would voluntarily sign up for. But the ends justify the slog.

So what we’ve created is uninterestingly called “deviced”. It currently has three components. A deviced daemon lives on the target device that we’re interested in writing software for. A GObject-based libdeviced library provides access to discover and connect to devices and do interesting things on them. Lastly, devicectl is a readline-based command line tool that allows you to interact with these devices without having to write a program using libdeviced.

The APIs in libdeviced are appropriately abstracted so that we can provide different transports in the future. Currently, we only have network-based communication but we will implement a USB transport in the not-too-distant future. Other protocols such as SSH or custom micro-controllers can be added. Although something like SSH is more complex because it’d be the combination of both a protocol and how to run commands to get the intended effect, which is non-portable. It will be possible to support devices that do not run deviced, but that is currently out of scope.

To allow devices to be discover-able, deviced will broadcast it’s presence using mDNS on networks it is configured to listen (based on network-manager connection UUID). Long term my goal is that you can configure deviced access in Control Center, similar to “Sharing and Privacy”. The network protocol is rather simple as it’s just JSON-RPC over TLS with self-signed certificates. When a client connects to the daemon, a gnome-shell notification is presented allowing you to accept the connection. At that point, the client certificate is saved for future validation.

Our libdeviced library is GObject introspectable and should therefore work with a number of languages.

Right now, only Flatpak applications are supported, but we have abstractions to allow for contributions to support additional application layers like docker or plain old .desktop files. Currently you can push flatpak applications and runtimes to the device and install them and run them. If you have a new enough Flatpak, you can do delta updates.

It can even bridge multiple PTY devices for a shell, which isn’t really meant to be an SSH replacement, but more of a single abstraction we can use to be able to control a debugger and inferior from the IDE tooling.

There are still lots of little bugs to shake out and more bits to implement, but this is a pretty sweet 2-week proof of concept.

https://gitlab.gnome.org/chergert/deviced/

Here is a 20 second demo running on a single machine. It’s the same when using multiple machines except you get the notification on the programmable device rather than on your workstation. Obviously for IoT devices we’d need to create some sort of freedesktop notification bridge or alternate notification mechanism.

Anytime you work on a new project people will inevitably ask “why not just use XYZ”. In this case, I would expect both SSH and ADB to fall into that category. Most importantly, libdeviced is going to be about providing a single “remote device” abstraction for us in Builder. So it’s reasonable that we could abstract both of those systems from libdeviced. But neither of those provide the work-flow I envision for out-of-box experience, hence the deviced daemon. In the ADB case, it will be very difficult to get code upstream and released to distributions as it is increasingly unlikely our use-case is interesting to upstream. There were experimental patches to ADB a couple years ago to support flatpak so we didn’t take on this effort without considering our options. Ultimately, this prototype was to see the feasibility of making something that solves our problems while not locking us out of supporting other systems in the future.

Fleet Commander is looking for a GSoC student to help us take over the world

Fleet Commander has seen quite a lot of progress recently, of which I should blog about soon. For those unaware, Fleet Commander is an effort to make GNOME great for IT administrators in large deployments, allowing them to deploy desktop and application configuration profiles across hundreds of machines with ease through a web administration UI based on Cockpit. It is mostly implemented in Python.

One important aspect of large deployments is their identity management systems to handle large numbers of users, groups and hosts. On the free software end, FreeIPA is the project that we’ve integrated Fleet Commander with. FreeIPA is an administrative interface and sets of APIs that integrates LDAP directory, DNS and other related services together. Another way to describe FreeIPA is as the Linux’s counterpart of Microsoft’s Active Directory.
logoactivedirectory.png
And that’s precisely what the GSoC idea we want a student for is about, we think that the best way to encourage GNOME usage in large organizations is to have tools that ease the migration, many organizations may have an existing Microsoft Windows deployment managed by Active Directory, so we want Fleet Commander to be able to use Active Directory as well as FreeIPA as the identity management system and data store for the profile data (by using Group Policy Objects).

This project would be mostly implemented in Python and it will require talking to AD’s LDAP server and CIFS/Samba storage, we are fairly confident that it can be achieved during the GSoC term.

If you are looking for a fun GSoC project, you’re skilled in Python and are interested in becoming a GNOME contributor by helping it reach a larger user base and take it one step closer to world domination and make some money in the process you should apply!

We’re hanging around in the #fleet-commander IRC channel in irc.freenode.net if you want to approach us to get a better understanding of the idea, look for ogutierrez, fidencio and aruiz if you have any questions.

February 17, 2018

On Compiling WebKit (now twice as fast!)

Are you tired of waiting for ages to build large C++ projects like WebKit? Slow headers are generally the problem. Your C++ source code file #includes a few headers, all those headers #include more, and those headers #include more, and more, and more, and since it’s C++ a bunch of these headers contain lots of complex templates to slow down things even more. Not fun.

It turns out that much of the time spent building large C++ projects is effectively spent parsing the same headers again and again, over, and over, and over, and over, and over….

There are three possible solutions to this problem:

  • Shred your CPU and buy a new one that’s twice as fast.
  • Use C++ modules: import instead of #include. This will soon become the best solution, but it’s not standardized yet. For WebKit’s purposes, we can’t use it until it works the same in MSVCC, Clang, and three-year-old versions of GCC. So it’ll be quite a while before we’re able to take advantage of modules.
  • Use unified builds (sometimes called unity builds).

WebKit has adopted unified builds. This work was done by Keith Miller, from Apple. Thanks, Keith! (If you’ve built WebKit before, you’ll probably want to say that again: thanks, Keith!)

For a release build of WebKitGTK+, on my desktop, our build times used to look like this:

real 62m49.535s
user 407m56.558s
sys 62m17.166s

That was taken using WebKitGTK+ 2.17.90; build times with any 2.18 release would be similar. Now, with trunk (or WebKitGTK+ 2.20, which will be very similar), our build times look like this:

real 33m36.435s
user 214m9.971s
sys 29m55.811s

Twice as fast.

The approach is pretty simple: instead of telling the compiler to build the original C++ source code files that developers see, we instead tell the compiler to build unified source files that look like this:

// UnifiedSource1.cpp
#include "CSSValueKeywords.cpp"
#include "ColorData.cpp"
#include "HTMLElementFactory.cpp"
#include "HTMLEntityTable.cpp"
#include "JSANGLEInstancedArrays.cpp"
#include "JSAbortController.cpp"
#include "JSAbortSignal.cpp"
#include "JSAbstractWorker.cpp"

Since files are included only once per translation unit, we now have to parse the same headers only once for each unified source file, rather than for each individual original source file, and we get a dramatic build speedup. It’s pretty terrible, yet extremely effective.

Now, how many original C++ files should you #include in each unified source file? To get the fastest clean build time, you would want to #include all of your C++ source files in one, that way the compiler sees each header only once. (Meson can do this for you automatically!) But that causes two problems. First, you have to make sure none of the files throughout your entire codebase use conflicting variable names, since the static keyword and anonymous namespaces no longer work to restrict your definitions to a single file. That’s impractical in a large project like WebKit. Second, because there’s now only one file passed to the compiler, incremental builds now take as long as clean builds, which is not fun if you are a WebKit developer and actually need to make changes to it. Unifying more files together will always make incremental builds slower. After some experimentation, Apple determined that, for WebKit, the optimal number of files to include together is roughly eight. At this point, there’s not yet much negative impact on incremental builds, and past here there are diminishing returns in clean build improvement.

In WebKit’s implementation, the files to bundle together are computed automatically at build time using CMake black magic. Adding a new file to the build can change how the files are bundled together, potentially causing build errors in different files if there are symbol clashes. But this is usually easy to fix, because only files from the same directory are bundled together, so random unrelated files will never be built together. The bundles are always the same for everyone building the same version of WebKit, so you won’t see random build failures; only developers who are adding new files will ever have to deal with name conflicts.

To significantly reduce name conflicts, we now limit the scope of using statements. That is, stuff like this:

using namespace JavaScriptCore;
namespace WebCore {
//...
}

Has been changed to this:

namespace WebCore {
using namespace JavaScriptCore;
// ...
}

Some files need to be excluded due to unsolvable name clashes. For example, files that include X11 headers, which contain lots of unnamespaced symbols that conflict with WebCore symbols, don’t really have any chance. But only a few files should need to be excluded, so this does not have much impact on build time. We’ve also opted to not unify most of the GLib API layer, so that we can continue to use conventional GObject names in our implementation, but again, the impact of not unifying a few files is minimal.

We still have some room for further performance improvement, because some significant parts of the build are still not unified, including most of the WebKit layer on top. But I suspect developers who have to regularly build WebKit will already be quite pleased.

Weekend Website Experiment

As you may know if you read this blog via Planet GNOME, the GNOME project is busy switching to GitLab for its code hosting and bug tracking. I like GitLab! It’s a large step up from Bugzilla, which was what GNOME used for the last 20 years. Compared to GitHub, GitLab is about equal, with a few nicer things and a few less nice things.

The one thing that I miss from Bugzilla is a dashboard showing the overall status of the bugs for your project. I thought it would not be too hard to use the GitLab API to do some simple queries and plop them on a web page. So, last weekend I gave it a try. The final result is here. Click the button to log into GitLab, and you’ll be redirected back to the page where you’ll get the results of the queries.

I’d like to write up what I did because I learned a new thing, and I think more writeups illustrating the trial and error of learning a new thing are always good.

My goals were to write something without being too meticulous, and write something that was not intended to scale. (Both are things that I normally disapprove of. It’s good to try the other side once in a while.) So, I was just going to mix the HTML, CSS, and JS all in one file. I used the base CSS and overall page structure from my personal website.

I decided to use GitHub Pages for hosting. My personal website is already hosted there. GitLab also offers GitLab Pages, which is very similar, but GNOME hasn’t enabled it yet on their GitLab instance. So, I created a fork of GNOME/gjs on GitHub, and created a gh-pages branch. Whatever you commit to that branch shows up as your project’s GitHub Pages site on the web.

The first thing I figured out is that you have to be authenticated to query issues in the API, even if the same information is publicly available on the web. That’s too bad! My little project suddenly went from “easy” to “figuring out something I’ve never done before.” But that’s also exciting.

First, I got the queries running on a local webpage, using a temporary personal GitLab access token. Each item looked kind of like this:

Number of crashers: 16
Number of bugs: 25
etc.

Next, I decided to tackle the authentication problem. I did some searches on variations of “gitlab authentication plugin”, “add gitlab authentication to webpage”, etc., to see if there was something ready-made I could drop in. No such luck. I did find NodeJS modules that I could have used, had I been writing the site using NodeJS. I weighed the unknown cost of implementing the authentication in plain old browser JS against the unknown cost of setting up tools that I was unfamiliar with in order to use the ready-made module (I wasn’t even sure what tools I would have to use — Webpack, I think?) and decided to keep looking.

I next searched for things like “gitlab oauth2 in browser”, “gitlab oauth2 example”, since I knew that the login used the OAuth2 protocol. Eventually I landed on this page and figured out that the magic words I wanted were “implicit flow” or “implicit grant”:

Implicit Flow – This flow is designed for user-agent only apps (e.g. single page web application running on GitLab Pages).

That sounded like exactly my use case, so I read further. You have to send the user to a certain page on the GitLab site, and send a redirect URL which the user will be sent back to with the authorization token in the URL hash. I managed to keep everything on the same webpage. In pseudocode, the flow looks like this:

if we have the authorization token:
    fetch the numbers with the API calls
else if there is a hash in the URL:
    token = get the token from the hash
    store the token
    fetch the numbers with the API calls
else:
    show a button that links to the authorization URL on GitLab

For storing the token so that you don’t have to log in every time, I used localStorage. I have no idea if that’s good practice or not, but from what I could read online it seems that it’s at least not bad practice. It’s quite easy to retrieve the token, but only if you have access to the browser where it is stored. I don’t think localStorage can leak out over the web, but with the recently discovered vulnerabilities who knows…

Last, I made it look nicer. I had a pretty good idea of what I wanted it to look like: I wanted the numbers to be large, in colored boxes with rounded corners and thick borders. I tried a few things with floating <div>s before giving up and using a CSS flex layout. This makes the page probably unviewable on older browsers, but I was seriously done with CSS positioning.

The code is here, or just “View Source” while you’re on the page.

What I would do differently

Writing HTML, CSS, and JS directly for the web is tedious and repetitive. I wish my younger self had used some sort of framework like Bootstrap to make my personal site. Failing that, I wish I had decided not to reuse components from my personal site to do this, and instead started with a fresh site using a framework. Bootstrap or Semantic UI are two that I know of, and maybe should have tried out. The code ended up being 263 lines of HTML, CSS, and JS, much of it just repeated items in order to do the boxes in different colors.

Reuse of this code

You may notice I did not release this code under an open source license. That’s because it’s probably full of bad practices, so I don’t want people to copy it. If you can convince me that it’s done right or tell me what I did wrong, then I’ll (fix it and) open-source it, and other GitLab maintainers might find it useful.

February 16, 2018

Fedora Atomic Workstation for development

I’m frequently building GTK+.  Since I am using Fedora Atomic Workstation now, i have to figure out how to do GTK+ development in this new environment. GTK+ may be a good example for the big middle ground of things that are not desktop applications, but also not part of the OS itself.

Image result for project atomic logo

Last week I figured out how to use a buildah container to build release tarballs for GNOME modules, and I actually used that setup to produce a GTK+ release as well.

But for fixing bugs and other development, I generally need to run test cases and demo apps, like the venerable gtk-demo. Running these outside the container does not work, since the GTK+ libraries I built are linked against libraries that are installed inside the container and not present on the host, such as libvulkan. I could of course resort to package layering to install them on the host, but that would miss the point of using Atomic Workstation.

The alternative is running the demo apps inside the container, which should work – its the same filesystem that they were built in. But they can’t talk to the compositor, since the Wayland socket is on the outside: /run/user/1000/wayland-0. I tried to work around this by making the socket visible in the container, but my knowledge of container tools and buildah is too limited to make it work. My apps still complain about not being able to open a display connection.

What now ? I decided that while GTK+ is not a desktop application, I can treat my test apps like one and write a flatpak manifest for them. This way, I can use GNOME builders awesome flatpak support to build and run them, like I already did for  GNOME recipes.

Here is a minimal  flatpak manifest that works:

{
  "id" : "org.gtk.gtk-demo",
  "runtime" : "org.gnome.Sdk",
  "runtime-version" : "master",
  "sdk" : "org.gnome.Sdk",
  "command" : "gtk4-demo",
  "finish-args" : [
    "--socket=wayland"
  ],
  "modules" : [
   {
     "name" : "graphene",
     "buildsystem" : "meson",
     "builddir" : true,
     "sources" : [
       {
         "type" : "git",
         "url" : "https://github.com/ebassi/graphene.git"
       }
     ]
   },
   {
     "name" : "gtk+",
     "buildsystem" : "meson",
     "builddir" : true,
     "sources" : [
       {
         "type" : "git",
         "url" : "https://gitlab.gnome.org/GNOME/gtk.git"
       }
     ]
   }
 ]
}

After placing this json file into the toplevel directory of my  GTK+ checkout, it appears as a new build configuration in GNOME builder:

If you look closely, you’ll notice that I added another manifest, for gtk4-widget-factory. You can have multiple manifests in your tree, and GNOME builder will let you switch between them in the Build Preferences.

After all this preparation, I can now hit the play button and have my demo app run right from inside GNOME builder. Note that the application is running inside a flatpak sandbox, using the runtime that was specified in the Build Preferences, so it is cleanly separated from the OS. And I can easily build and run against different runtimes, to test compatibility with older GNOME releases.

This may be the final push that makes me switch to GNOME Builder for day-to-day development on Fedora Atomic Workstation: It just works!

Automatically finding slow headers in C++ projects

A common problem in older C++ codebases is that sources compile slowly due to massive header includes. Headers include other headers, which include even more headers and then, somewhere in the guts of the system, someone includes a header that is very slow to parse. Now things are slow and nobody really knows why.

Trawling through the header soup manually is not feasible. Even if you were to manually inspect the headers, it is difficult to know which are the slow ones. Educated guesses can be made, such as anything having the word "boost" in its name is slow, but this only gets you so far. Fortunately it turns out that it is fairly straightforward to write a tool to find the slow ones automatically.

We need two things to be able to reliably measure the inclusion time breakdown of the headers of any source file.

  1. The transitive list of all header files it includes.
  2. The exact compiler flags used to compile the source.
The former can be obtained from a dependency file that the compiler can be told to generate during compilation (and which almost all modern build systems use by default). The latter can be obtained from the compilation_commands database which is also generated by most build tools today. The actual algorithm is simple: for each dependency header, create a dummy cpp file that just #includes that header, compile the source and measure the time it took.

I created a repo with the measurement script and a sample project to test it on. It has one source file and a few internal headers that include external headers. Here's the top part of its output:

0.5875 ../h1.h
0.5254 /usr/include/c++/7/regex
0.2779 /usr/include/c++/7/shared_mutex
0.2747 /usr/include/c++/7/condition_variable
0.2685 ../h2.h
0.2563 /usr/include/c++/7/locale
0.2445 /usr/include/c++/7/sstream
0.2337 ../h3.h
0.2330 /usr/include/c++/7/iostream
0.2329 /usr/include/c++/7/istream

Iostream has been traditionally considered to be big, bloated and slow to compile. However in this simple example we find that shared_mutex is even slower.

There are, of course, many caveats with this method. The main one being that this does not measure the code generation time, only parsing time. These two are usually highly correlated, though.

On Python Shebangs

So, how do you write a shebang for a Python program? Let’s first set aside the python2/python3 issue and focus on whether to use env. Which of the following is correct?

#!/usr/bin/env python
#!/usr/bin/python

The first option seems to work in all environments, but it is banned in popular distros like Fedora (and I believe also Debian, but I can’t find a reference for this). Using env in shebangs is dangerous because it can result in system packages using non-system versions of python. python is used in so many places throughout modern systems, it’s not hard to see how using #!/usr/bin/env in an important package could badly bork users’ operating systems if they install a custom version of python in /usr/local. Don’t do this.

The second option is broken too, because it doesn’t work in BSD environments. E.g. in FreeBSD, python is installed in /usr/local/bin. So FreeBSD contributors have been upstreaming patches to convert #!/usr/bin/python shebangs to #!/usr/bin/env python. Meanwhile, Fedora has begun automatically rewriting #!/usr/bin/env python to #!/usr/bin/python, but with a warning that this is temporary and that use of #!/usr/bin/env python will eventually become a fatal error causing package builds to fail.

So obviously there’s no way to write a shebang that will work for both major Linux distros and major BSDs. #!/usr/bin/env python seems to work today, but it’s subtly very dangerous. Lovely. I don’t even know what to recommend to upstream projects.

Next problem: python2 versus python3. By now, we should all be well-aware of PEP 394. PEP 394 says you should never write a shebang like this:

#!/usr/bin/env python
#!/usr/bin/python

unless your python script is compatible with both python2 and python3, because you don’t know what version you’re getting. Your python script is almost certainly not compatible with both python2 and python3 (and if you think it is, it’s probably somehow broken, because I doubt you regularly test it with both). Instead, you should write the shebang like this:

#!/usr/bin/env python2
#!/usr/bin/python2
#!/usr/bin/env python3
#!/usr/bin/python3

This works as long as you only care about Linux and BSDs. It doesn’t work on macOS, which provides /usr/bin/python and /usr/bin/python2.7, but still no /usr/bin/python2 symlink, even though it’s now been six years since PEP 394. It’s hard to understate how frustrating this is.

So let’s say you are WebKit, and need to write a python script that will be truly cross-platform. How do you do it? WebKit’s scripts are only needed (a) during the build process or (b) by developers, so we get a pass on the first problem: using /usr/bin/env should be OK, because the scripts should never be installed as part of the OS. Using #!/usr/bin/env python — which is actually what we currently do — is unacceptable, because our scripts are python2 and that’s broken on Arch, and some of our developers use that. Using #!/usr/bin/env python2 would be dead on arrival, because that doesn’t work on macOS. Seems like the option that works for everyone is #!/usr/bin/env python2.7. Then we just have to hope that the Python community sticks to its promise to never release a python2.8 (which seems likely).

…yay?

LVFS will block old versions of fwupd for some firmware

The ability to restrict firmware to specific versions of fwupd and the existing firmware version was added to fwupd in version 0.8.0. This functionality was added so that you could prevent the firmware being deployed if the upgrade was going to fail, either because:

  • The old version of fwupd did not support the new hardware quirks
  • If the upgraded-from firmware had broken upgrade functionality

The former is solved by updating fwupd, the latter is solved by following the vendor procedure to manually flash the hardware, e.g. using a DediProg to flash the EEPROM directly. Requiring a specific fwupd version is used by the Logitech Unifying receiver update for example, and requiring a previous minimum firmware version is used by one (soon to be two…) laptop OEMs at the moment.

Although fwupd 0.8.0 was released over a year ago it seems people are still downloading firmware with older fwupd versions. 98% of the downloads from the LVFS are initiated from gnome-software, and 2% of people using the fwupdmgr command line or downloading the .cab file from the LVFS using a browser manually.

At the moment, fwupd is being updated in Ubuntu xenial to 0.8.3 but it is still stuck at the long obsolete 0.7.4 in Debian stable. Fedora, or course, is 100% up to date with 1.0.5 in F27 and 0.9.6 in F26 and F25. Even RHEL 7.4 has 0.8.2 and RHEL 7.5 will be 1.0.1.

Detecting the fwupd version also gets slightly more complicated, as the user agent only gives us the ‘client version’ rather than the ‘fwupd version’ in most software. This means we have to use the minimum fwupd version required by the client when choosing if it is safe to provide the file. GNOME Software version 3.26.0 was the first version to depend on fwupd ≥ 0.8.0 and so anything newer than that would be safe. This gives a slight problem, as Ubuntu will be shipping an old gnome-software 3.20.x and a new-enough fwupd 0.8.x and so will be blacklisted for any firmware that requires a specific fwupd version. Which includes the Logitech security update…

The user agent we get from gnome-software is gnome-software/3.20.1 and so we can’t do anything very clever. I’m obviously erring on not bricking a tiny amount of laptop hardware rather than making a lot of Logitech hardware secure on Ubuntu 16.04, given the next LTS 18.04 is out on April 26th anyway. This means people might start getting a detected fwupd version too old message on the console if they try updating using 16.04.

A workaround for xenial users might be if someone at Canonical could include this patch that changes the user agent in gnome-software package to be gnome-software/3.20.1 fwupd/0.8.3 and I can add a workaround in the LVFS download code to parse that. Comments welcome.

February 14, 2018

What open source software programs I love

Earlier this week, someone asked me what Free software and open source software programs I really love. I thought I'd share that here, too.

As I started to go through my favorite programs, I realized it was quite long. So I'm trying to keep the list short here, just the programs I use the most:

I'll start with Linux. I first installed Linux in 1993, when I was still an undergraduate university student. When I heard about Linux, a free version of Unix that I could run on my 386 computer at home, I immediately wanted to try it out. My first Linux distribution was Softlanding Linux System (SLS) 1.03, with Linux kernel 0.99 alpha patch level 11. That required a whopping 2MB of RAM, or 4MB if you wanted to compile programs, and 8MB to run X windows.

I ran a dual-boot Linux and Windows system at home until about 1998, using Windows only to play games. Then I switched to Linux full-time, and haven't looked back. Today, I run Fedora Linux, with GNOME as the desktop.

My other favorite operating system is FreeDOS, but that's not a surprise because I am the founder and project coordinator for the FreeDOS Project. FreeDOS is a complete, free, DOS-compatible operating system that you can use to play classic DOS games, run legacy business software, or develop embedded systems. Any program that works on MS-DOS should also run on FreeDOS.

I usually boot FreeDOS inside a PC emulator called QEMU. I used to run DOSEmu, which was ideal for writing FreeDOS programs because DOSEmu boots its C: drive from a folder in my Linux home directory. That made it really easy to transfer files between FreeDOS and Linux. In QEMU, I set up a folder that is mapped in QEMU as a D: drive.

I write a lot of articles, and now some books, and I use LibreOffice for all of my finish work. In total honesty, I do my collaboration and initial drafts via Google Docs, but all my final drafts and formatting is done in LibreOffice.

Many of my articles are about writing programs, and I use GNU Emacs as my editor. I'll use vim to write shell scripts, and GNOME gedit to edit web pages, but I prefer GNU Emacs for all my programming work. Emacs was my first Unix editor, even before I learned vi, so I'll always have a fondness for it.

While I could compile and debug programs from inside GNU Emacs, I prefer to do that work at the command line using GNOME Terminal.

For any graphics work, I rely on GIMP. This works great for creating graphics for my websites, or enhancing a personal photo, or creating a new cover for my next book.

And finally, I like to listen to music while I'm working, so I usually have Rhythmbox running in the background. I like to listen to one of several streaming radio stations, or I'll listen to my own MP3 music collection.

Moving a portal


Portals are a fundamental concept in flatpak. They are the way a sandboxed application can access information and services from the host in a safe, controlled way.

Most of the portals in use are implemented by a module called xdg-desktop-portal, with backend implementations for Gtk+ and KDE. Many of the portals in it, such as the important file chooser portal relies on a lowlevel portal called the document portal. It is a combined dbus and fuse service that controls access to files with fine-grained per-application permissions.

The snap developers are interested in using portals for snap packages, which is great for application developers as they only have to target a single API. However, historically the document portal was shipped as part of flatpak, which is suddenly a major problem.

To fix this we had to move the document portal from flatpak to xdg-desktop-portal, and I’m happy to announce that with todays releases of xdg-desktop-portal and flatpak we now achieved this.

Packagers need to be careful about this when updating to the new versions so that only one copy of the document portal is installed. The stable version of flatpak (0.10.4) can be built with or without the document portal, depending on what version of the desktop portal you have. The unstable flatpak release doesn’t have the document portal at all, and requires you to use the new desktop portal.

Packaging is hard. Packager-friendly is harder.

Releasing software is no small feat, especially in 2018. You could just upload your source code somewhere (a Git, Subversion, CVS, etc, repo – or tarballs on Sourceforge, or whatever), but it matters what that source looks like and how easy it is to consume. What does the required build environment look like? Are there any dependencies on other software, and if so, which versions? What if the versions don’t match exactly?

Most languages feature solutions to the build environment dependency – Ruby has Gems, Perl has CPAN, Java has Maven. You distribute a manifest with your source, detailing the versions of the dependencies which work, and users who download your source can just use those.

Then, however, we have distributions. If openSUSE or Debian wants to include your software, then it’s not just a case of calling into CPAN during the packaging process – distribution builds need to be repeatable, and work offline. And it’s not feasible for packagers to look after 30 versions of every library – generally a distribution will contain 1-3 versions of a given library, and all software in the distribution will be altered one way or another to build against their version of things. It’s a long, slow, arduous process.

Life is easier for distribution packagers, the more the software released adheres to their perfect model – no non-source files in the distribution, minimal or well-formed dependencies on third parties, swathes of #ifdefs to handle changes in dependency APIs between versions, etc.

Problem is, this can actively work against upstream development.

Developers love npm or NuGet because it’s so easy to consume – asking them to abandon those tools is a significant impediment to developer flow. And it doesn’t scale – maybe a friendly upstream can drop one or two dependencies. But 10? 100? If you’re consuming a LOT of packages via the language package manager, as a developer, being told “stop doing that” isn’t just going to slow you down – it’s going to require a monumental engineering effort. And there’s the other side effect – moving from Yarn or Pip to a series of separate download/build/install steps will slow down CI significantly – and if your project takes hours to build as-is, slowing it down is not going to improve the project.

Therein lies the rub. When a project has limited developer time allocated to it, spending that time on an effort which will literally make development harder and worse, for the benefit of distribution maintainers, is a hard sell.

So, a concrete example: MonoDevelop. MD in Debian is pretty old. Why isn’t it newer? Well, because the build system moved away from a packager ideal so far it’s basically impossible at current community & company staffing levels to claw it back. Build-time dependency downloads went from a half dozen in the 5.x era (somewhat easily patched away in distributions) to over 110 today. The underlying build system changed from XBuild (Mono’s reimplementation of Microsoft MSBuild, a build system for Visual Studio projects) to real MSbuild (now FOSS, but an enormous shipping container of worms of its own when it comes to distribution-shippable releases, for all the same reasons & worse). It’s significant work for the MonoDevelop team to spend time on ensuring all their project files work on XBuild with Mono’s compiler, in addition to MSBuild with Microsoft’s compiler (and any mix thereof). It’s significant work to strip out the use of NuGet and Paket packages – especially when their primary OS, macOS, doesn’t have “distribution packages” to depend on.

And then there’s the integration testing problem. When a distribution starts messing with your dependencies, all your QA goes out the window – users are getting a combination of literally hundreds of pieces of software which might carry your app’s label, but you have no idea what the end result of that combination is. My usual anecdote here is when Ubuntu shipped Banshee built against a new, not-regression-tested version of SQLite, which caused a huge performance regression in random playback. When a distribution ships a broken version of an app with your name on it – broken by their actions, because you invested significant engineering resources in enabling them to do so – users won’t blame the distribution, they’ll blame you.

Releasing software is hard.

February 13, 2018

Automatic bug report and stack traces for GIMP

While I was working on yet-another-crash without a backtrace, I realized that we could just generate automatic backtraces upon crashes and tell people about it. This is how I ended up writing a debug tool for GIMP, popping-up a dialog with a nice text encouraging to report bugs. You’ll notice that the main text is non-technical. The goal is not to display non-understandable error messages which nobody will understand. All the technical part is in the below section and is just to be copied by a single button click and reported to us verbatim. 🙂

This technical part contains: GIMP version (and commit information if available), compiler, main dependency version, and finally the errors and backtraces of these errors.

Note: this doesn’t “report” the bug on your behalf. Anyone still has to make the conscious action and go on our bug tracker. But we make things easier and just a few buttons and a copy-paste away.

Someone asked me if I could make a blog post about it, so here it is.

How does this work?

Used to be based on glib…

We already had some backtracing capability in GIMP, mostly using glib API g_on_error_stack_trace(). The main problems of this API:

  • that this function outputs to stdout (which means that you needed to run GIMP from terminal to get the trace, and until now this was only used with specific command line options or on unstable builds);
  • sometimes it was not working for weird reasons;
  • it works only in Unix-like operating systems (in particular not in Windows);
  • it is based on gdb only (as I soon discovered)

So I ended up looking what this function was doing. As I said, the basics is that it simply uses gdb if it is installed on the machine. I am still unsure why, but it was doing so using the interactive mode, therefore entering commands through the standard input with a pipe. Why is it weird? Because gdb has a batch mode especially done for such non-interactive calls. I suspect actually that some of the times g_on_error_stack_trace() failed to work correctly was maybe because it was stuck (but I am not sure, I have not tried to dig much more, so maybe I say shit). But the worse issue was that it was simply printing to stdout. So if I wanted to get the output inside a string in order to use it in the graphical interface (we should not expect people to run GIMP from a terminal!), I had to do more piping of the output. Well at some point, that was just ridiculous to stack processes one after another after another after…

… then based directly on GDB…

This is how I started to reimplement the feature. I simply run gdb in batch mode, and I keep the result in a string for later display in a dialog. This was actually very straightforward. See commit bb88a2d52f.

This also allowed me to get a slightly better stacktrace since I could customize the command. So I request “backtrace full“, getting us local variable contents.

… and LLDB…

Then I remembered that some bug reporter on macOS was using lldb, the debugger from the LLVM project. Since LLVM is default on macOS, I assumed that LLDB is much more common there than gdb too. So I added support for it. This was quite easy too, I just had to search for command equivalency. See commit 4ca31b0571.

… and finally the GNU libc!

Finally I was told of the backtrace() and backtrace_symbols() API. This seems to be a GNU-only API (man says these are GNU extensions). Anyway this should make these always present on common Linux distributions, which is very good news. It means that we will always get “something” on Linux (also the result is much quicker than calling gdb or lldb). Unfortunately the output of  backtrace() is not that exhaustive: basically you get function names, and in particular neither file name nor line number even if you built with debug info, nor variable and parameters contents. So it’s a bit less useful. Yet it’s better than nothing! See commit 4fd1c6c97c.

So in the end, my tool tries gdb, then if absent, lldb, and finally fallbacks to backtrace() if available. This should hopefully gives us traces of crashes and errors in most cases!

The difficulties

Issue 1: do not rely on memory allocation after a crash

There were still a few issues. One of them is that you may notice that I use this dialog for 2 kind of errors: fatal errors (crashes) and non-fatal errors (WARNING, CRITICAL, etc.). l use the same code, but while testing, I realized that I often could not create the dialog from the main process when GIMP crashed. In Linux at least, once the program crashed, I was able to catch the terminating signal enough to do last minute actions, but it seems allocating more memory was not amongst the possible actions (that was my assumption based on tests, I may be wrong, don’t take this for manual talk). Well I guess that makes sense to forbid more memory management, especially if the crash is related to memory bugs. This means that even just creating a new dialog is not possible (requiring allocation of a new GTK+ widget).

This is why when crashing, I run the dialog as a separate process, whereas I run it from within the main process for non-fatal bugs.

Issue 2: backtrace() needs to be run by the main process

When running as a separate process, should the back trace be generated by this other process or from the main process? At first it made sense to have it generated through the new process, but then this has 2 inconveniences:

  1. I am duplicating the back-trace generation code (since I sometimes need to run it from within GIMP, sometimes from outside) and code duplication is never good (even maintenance-wise, you end up with different version. This sucks). You can make common core code as exception, but it’s just not ideal (it makes the build rules complicated).
  2. From the outside process, I can attach to the main process with gdb or lldb but I cannot use backtrace()anymore. That would mean that a lot of people would not get the auto-generated traces (not everyone installs a debugger!).

This is why I decided that the backtrace is always generated by the main process and in case of a crash, it is passed along through a file, instead of a parameter. I could have piped it which would have been just as easy, but Dr. Mingw (see below the Win32 section) was already using a file. So I chose to do the same to be as consistent across platforms as possible (also a file has some advantages: in the extreme case where the dialog breaks too, we may ask a bug reporter to look if a file has still been generated with the info).

Also since — as I said in issue 1 — memory allocations are more likely to fail during crash handling, you need to use backtrace_symbols_fd() instead of backtrace_symbols().
The _fd() variant is guaranteed to run without memory allocation (this is written in the man). And now we have traces on most systems, still with GNUlibc fallback!

Issue 3: error avalanches

Another issue is that, in case of non-fatal errors, you may often have a few of them one after another. Sometimes they may be generated as dominos (you get the second as a consequence of the first error), sometimes it’s because of long-running operations which would just reproduce the same errors many times.

Worse case scenario: a long-time contributor, Massimo, directed me to a bug which would output dozens of thousands of errors in a few seconds. Actually that depends on the size of a selection, and in some of my tests, I had hundreds of thousands of errors!
Obviously you don’t want to create a dialog each time (this example was not even a bug which crashes GIMP, but creating hundreds of thousands of dialogs may do the killing job!). So you have to just update the current dialog with additional errors. But even doing so is very time consuming. Updating a dialog hundreds of thousands of times in a few seconds is at least likely to freeze the whole GUI for a dozen of minutes (I know, I tried!).

So I decided to limit the backtracing, but even the error handling. In a single dialog, I add up to 3 backtraces and 10 errors at most. Any more errors would just be redirected to stderr.

Issue 4: debugging preferences

Moreover do we want the dialog to appear for every kind of errors? In particular, we have WARNING, CRITICAL then all fatal errors. CRITICAL are usually really bad, so we definitely want debug info here. But what about WARNING? I mean, they  are bad too, and they are signs of a bug somewhere. But these are more minor bugs, sometimes also bugs on external data which we warn about (and have no control on). Also we often output warnings when we encounter bugs in other software (for instance, one of the recent bugs where my dialog worked was on a bug in KDE’s API for color picking, and there is not much we can do about it in GIMP but report upstream). So I added finer-grained settings, because you certainly don’t want to make creating with GIMP painful if it pop-ups errors every few hours!

Actually it is even possible to disable all debugging through GIMP preferences, even during crashes, if someone is really not interested at all in reporting bugs, hence contributing to GIMP improvement.

Note: on Windows, the debugging preferences page doesn’t exist at all because the backend we use is not customizable anyway. See dedicated section below.

Issue 5: multi-threading

As explained, we don’t only handle crashes, but also runtime errors. Since GEGL is so close to the GIMP project, it made sense to handle its errors as well (actually long-term, it would make sense to handle errors from any dependency, but let’s do it step by step). So I also catch GEGL’s WARNINGs and CRITICALs. But then I realized that since GEGL uses a lot of multi-threading, getting a backtrace from the main thread when the error happened in another was completely useless.

This combined to the fact GTK+ code must be run in the main thread, therefore to create or update my debugging dialog, I need to pass the information from the thread where the bug occurred to the main GTK+ thread. This can be done with gdk_threads_add_idle_full(). This call obviously adds a delay so you’d end up getting traces from the wrong code, and after an unknown delay. This is double useless.

As a consequence, to handle multi-threaded debugging, I needed to make sure that the stack trace was generated from the thread the error happened, without any delay, and only then it could be sent to the main thread with an idle function.

Issue 6: the tweaking

Then you have all these little details to make the experience not too terrible (at least I am not saying we should make it a good experience, a bug is never a good experience! ;P).

For instance handling a crash, I add a button “Restart”, allowing — as the name implies — to at least restart GIMP immediately.

When non-fatal bugs are reported, we should advise people to save their images and restart GIMP (of course, for crashes, they won’t have the possibility to save themselves, so don’t make them sadder by reminding them).

Also I have to be extra careful to not generate new WARNING or CRITICAL from within this code because then you could create cyclic calls. You don’t want to end up crashing the software because of the debugger which initially fired up only for a minor bug.

Well you get the idea! These are the kind of tweaking you just discover as you implement such a system and you have to take care of them as you go on.

Future work

Something we have been discussing would be to save the opened images in backup files upon crashing. Of course with some kind of crashes, it may not be possible, but that is worth trying at least!

I’ve actually started working on it (with commit d916fedf92  from yesterday). As expected, it’s working most of the time, but while testing various crash conditions, I had some cases where last-second backup failed. I have not dived into the code yet to understand why and what, and if there is a solution to these.

GIMP is quite stable now (at least on GNU/Linux), and quite rarely crashes (well I say this but we had some instability these last few days because of core changes in selection and channels so the auto-debug dialog was very useful). But for this one time when it happens, handling it the most gracefully possible implies saving the current state of work. Then obviously next step will be to propose recovery on next GIMP start.

More on this later as I will continue working on it…

What about Windows?

Now the last remaining issue is Win32! Having GDB or LLDB there might be possible (I have not checked) but probably not the best path. It turns out a contributor, Mukund Sivaraman, did already add support for backtrace generation on Windows upon a crash, back in 2015. This is using the ExcHndl library from the Dr. Mingw project. Basically this is extremely easy to use since there are only 2 functions in the API: one to init the library, one to choose a file where the backtrace will be outputted.

void ExcHndlInit(void);
bool ExcHndlSetLogFileNameA(const char *szLogFileName);

So yes, since 2015, backtraces were simply outputted into a file somewhere, and people just never knew where and how to find it. What I did was simply to piggy-back on this feature, grab the backtrace from the generated file, and display it in our GUI. And that’s it!

Since I needed my own code to run after Dr. Mingw, I had a look how this tool actually made its job. In its code, I saw it was using  SetUnhandledExceptionFilter()to run its action just before the crash. What I did was adding another exception handler with the same function, but registering my handler first beforeinit() Dr. Mingw. This way Dr. Mingw call my handler immediately after its own because it keeps track of any handler previously set and call it after itself.
See commits ae3cd00fbd and 4e5a5dbb87.

Now this has a few limitations: the backtrace generated by Dr. Mingw is not that complete compared to a good gdb backtrace. Also sometimes, I had some crashes which this tool would not catch. I am no Win32 expert and did not spend much time on it, so I don’t know why.
Finally this works only on crashes, in particular I cannot generate backtraces on a whim as I can do on other platforms, which allows to generate backtraces even on WARNINGs or CRITICALs messages for easier debugging, even without a crash.

Well in the end, Win32 always ends up less featured and most annoying to debug. I guess there is nothing to be done since I remind we are still looking for Win32 developers on GIMP. We have had very few contributions of Windows developers for all the years I’ve been around, quite sadly! If you are interested to contribute on this cool piece of software, be very welcome!

We got our first reports with automatic traces!

Even though the tool is still only present in the development version, some people build GIMP from master, and we already got a few bug reports with traces included directly! This is very cool.
Actually even Aryeom got such dialogs, which resulted in some bug fixes already (and more to come)! 🙂

So yeah when I fixed my first bugs thanks to these automatically generated back traces, that made me happy because I felt this new tool will make life a lot simpler and I knew my time was well spent. 😉

You’d think a developer of GIMP would not be happy to get a back trace. And yeah, I’d prefer that GIMP was perfectly bug-free. But there is no such things, and as long as we get bugs, we may as well get well-illustrated reports to easily fix them. This is why I am happy! We are constantly on our way to a much more stable GIMP.

Yeah!

Reminder: my Free Software coding can be funded on:
Liberapay, Patreon or Tipeee through ZeMarmot project.

fwupd now tells you about known issues

After a week of being sick and not doing much, I’m showing the results of a day-or-so of hacking:

So, most of that being familiar to anyone that’s followed my previous blog posts. But wait, what’s that about a known issue?

That one little URL for the user to click on is the result of a rule engine being added to the LVFS. Of course, firmware updates shouldn’t ever fail, but in the real world they do, because distros don’t create /boot/efi correctly (cough, Arch Linux) or just because some people are running old versions of efivar, a broken git snapshot of libfwupdate or because a vendor firmware updater doesn’t work with secure boot turned on (urgh). Of all the failures logged on the LVFS, 95% fall into about 3 or 4 different failure causes, and if we know hundreds of people are hitting an issue we already understand we can provide them with some help.

So, how does this work? If you’re a user you don’t see any of this, you just download the metadata and firmware semi-automatically and get on with your life. If you’re a blessed hardware vendor on the LVFS (i.e. you can QA the updates into the stable branch) you can also create and view the rules for firmware owned by just your vendor group:

This new functionality will be deployed to the LVFS during the next downtime window. Comments welcome.

February 12, 2018

Python for GNOME Mobile?

As you may already know, Python is one of the hottest programming language out there, with thousand of job offerings, so makes sense, at least for me, to push this language as official one for GNOME Mobile applications.

elementary OS is doing a good job on engagement new developers, while use Vala as its official language. For me, Vala is a good candidate for advanced/performance constrained Mobile applications.

Both languages uses GNOME’s technologies, through GObject Introspection. So, any new widget designed for responsive Mobile applications, will be available to Python and Vala.

An old License issue on GLib’s static linking on Android, can be tackled by Purism, in a form of tools to allow a dynamically loaded version. For free software, this is not an issue, but for proprietary one.

Provide a high level programing language, potentially distributed in binary form, could incentive app development.

On Vala side, allowing to develop software in this highly productive GObject focused programing language, can push up games or any performance constrained applications, development offering; while you get all goodness of GObject and C world. Thanks to C, GNOME technologies, are available to many other languages; so, Rust  and C++, could find their own way.

This is just for discussion and a proposal to Mobile OSs, using GObject based software.

Shelved Wallpapers

GNOME 3.28 will release with another batch of new wallpapers that only a freaction of you will ever see. Apart from those I also made a few for different purposes that didn’t end up being used, but it would be a shame to keep shelved.

So here’s a bit of isometric goodness I quite enjoy on my desktop, you might as well.

GNOME Tweaks 3.28 Progress Report 2

GNOME 3.28 has reached its 3.27.90 milestone. This milestone is important because it means that GNOME is now at API Freeze, Feature Freeze, and UI Freeze. From this point on, GNOME shouldn’t change much, but that’s good because it allows for distros, translators, and documentation writers to prepare for the 3.28 release. It also gives time to ensure that new feature are working correctly and as many important bugs as possible are fixed. GNOME 3.28 will be released in approximately one month.

If you haven’t read my last 3.28 post, please read it now. So what else has changed in Tweaks this release cycle?

Desktop

As has been widely discussed, Nautilus itself will no longer manage desktop icons in GNOME 3.28. The intention is for this to be handled in a GNOME Shell extension. Therefore, I had to drop the desktop-related tweaks from GNOME Tweaks since the old methods don’t work.

If your Linux distro will be keeping Nautilus 3.26 a bit longer (like Ubuntu), it’s pretty easy for distro maintainers to re-enable the desktop panel so you’ll still get all the other 3.28 features without losing the convenient desktop tweaks.

As part of this change, the Background tweaks have been moved from the Desktop panel to the Appearance panel.

Touchpad

Historically, laptop touchpads had two or three physical hardware buttons just like mice. Nowadays, it’s common for touchpads to have no buttons. At least on Windows, the historical convention was a click in the bottom left would be treated as a left mouse button click, and a click in the bottom right would be treated as a right mouse button click.

Macs are a bit different in handling right click (or secondary click as it’s also called). To get a right-click on a Mac, just click with two fingers simultaneously. You don’t have to worry about whether you are clicking in the bottom right of the touchpad so things should work a bit better when you get used to it. Therefore, this is even used now in some Windows computers.

My understanding is that GNOME used Windows-style “area” mouse-click emulation on most computers, but there was a manually updated list of computers where the Mac style “fingers” mouse-click emulation was used.

In GNOME 3.28, the default is now the Mac style for everyone. For the past few years, you could change the default behavior in the GNOME Tweaks app, but I’ve redesigned the section now to make it easier to use and understand. I assume there will be some people who prefer the old behavior so we want to make it easy for them!

GNOME Tweaks 3.27.90 Mouse Click Emulation

For more screenshots (before and after), see the GitLab issue.

Other

There is one more feature pending for Tweaks 3.28, but it’s incomplete so I’m not going to discuss it here yet. I’ll be sure to link to a blog post about it when it’s ready though.

For more details about what’s changed, see the NEWS file or the commit log.

Long-term distribution support?

A question: how long is reasonable for an ISV to keep releasing software for an older distribution? When is it fair for them to say “look, we can’t feasibly support this old thing any more”.

For example, Debian 7 is still considered supported, via the Debian LTS project. Should ISV app vendors keep producing builds built for Debian 7, with its ancient versions of GCC or CMake, rudimentary C++11 support, ARM64 bugs, etc? How long is it fair to expect an ISV to keep spitting out builds on top of obsolete toolchains?

Let’s take Mono as an example, since, well, that’s what I’m paid to care about. Right now, we do builds for:

  • Debian 7 (oldoldstable, supported until May 2018)
  • Debian 8 (oldstable, supported until April 2020)
  • Debian 9 (stable, supported until June 2022)
  • Raspbian 8 (oldstable, supported until June 2018)
  • Raspbian 9 (stable, supported until June 2020)
  • Ubuntu 12.04 (EOL unless you pay big bucks to Canonical – but was used by TravisCI long after it was EOL)
  • Ubuntu 14.04 (LTS, supported until April 2019)
  • Ubuntu 16.04 (LTS, supported until April 2021)
  • CentOS 6 (LTS, supported until November 2020)
  • CentOS 7 (LTS, supported until June 2024)

Supporting just these is a problem already. CentOS 6 builds lack support for TLS 1.2+, as that requires GCC 4.7+ – but I can’t just drop it, since Amazon Linux (used by a surprising number of people on AWS) is based on CentOS 6. Ubuntu 12.04 support requires build-dependencies on a secret Mozilla-team maintained copy of GCC 4.7 in the archive, used to keep building Firefox releases.

Why not just use the CDN analytics to form my opinion? Well, it seems most people didn’t update their sources.list after we switched to producing per-distribution binaries some time around May 2017 – so they’re still hardcoding wheezy in their sources. And I can’t go by user agent to determine their OS, as Azure CDN helpfully aggregates all of them into “Debian APT-HTTP/1.x” rather than giving me the exact version numbers I’d need to cross-reference to determine OS release.

So, with the next set of releases coming on the horizon (e.g. Ubuntu 18.04), at what point is it okay to say “no more, sorry” to an old version?

Answers on a postcard. Or the blog comments. Or Twitter. Or Gitter.

February 11, 2018

FOSDEM 2018

The GNOME Foundation advisory board meeting was happening on Friday the 2nd so I travelled to Brussels on Thursday. Years ago, there were two train routes from Strasbourg to Brussels: the direct one was using slow trains, through a large part of Belgium and Luxembourg, and took a bit more than 5 hours; the other one meant taking a TGV from Strasbourg to Paris (~2 hours), changing stations (5 minutes walk from Gare de l’Est to Gare du Nord) and taking a Thalys to Brussels (~2 hours). I was pleased to learn that there was now a direct TGV route. Even if the announced time of 3 hours and 50 minutes was only a tiny bit shorter than the indirect one, the confort of a journey with no connection adds real value. Of course I wasn’t expecting a direct route to go through the Charles de Gaulle airport train station, but well… still better than the alternative! This nice journey was made possible thanks to the financial support of the Foundation.

Then on Saturday I went to attend my 11th FOSDEM (I did ten in a row and skipped last year). The first day was dedicated to the hallway track, spending my time with people I knew and had not met for a while. I also was behind the GNOME booth for a bit, but nothing compared to the likes of Bastian, Benjamin, Carlos, Florian, or Luis. After failing to get in that Matrix talk and that Rust one, as well as that one, I went across the street to watch Shaun’s talk, from which I want to share that nugget of wisdom:

The problem with XML is that it’s XML. -- Shaun McCance

I stayed in the Tool the docs room for the next talk by Jessica Parsons, “Finding a home for docs. I liked her approach where she doesn’t come up with a single solution that is supposed to solve all cases, but instead studies a few choices with their pros and cons. After lunch I joined a group of friends to cheer for Marc while he presented the best practises he’s pushing into Sympa. We were then conveniently on the right floor to head to Tobias’s talk.

My return trip was in the afternoon on Monday so I joined a group of friends to visit the Atomium in the morning. Quite surprising that it took me so many trips to Brussels before I got to see this place. Built in 1958 for the World’s fair —in a way it’s the Belgian Eiffel tower— as a representation of a magnified iron crystal, the cubic structure doesn’t look one bit dated. To contrast with that, the exhibition it hosts about its creation and the historical context gives out a classic vibe reminiscent of Stark expo in the Marvel movies.

Buying a ticket to the Atomium also grants you access to the Art and Design Atomium Museum. The exhibit there was focused on the use of plastic since its creation. While most of the items we saw qualified as stuff we wouldn’t want to have at home because of their style, it was fun to see pieces from another era, some of which we may have had when we were children.

Same “direct” trip for the way home, concluding that journey uneventfully.

February 10, 2018

Math Tricks for Kilograms and Pounds

I’m going to share some math tricks for converting between kilograms and pounds, something I often deal with when weightlifting. This post is long, but if you stick around to the end, I’ll share the super secret divisibility rule for 11. (Originally posted to Facebook. Reposted on my blog at Jim Campbell’s suggestion.)

If you find a mathematician or engineer who’s good at doing math in their head and ask them how they do it, you’ll find they have a handful of techniques they apply in different situations. Many of these techniques involve turning problems into things we’re already good at. What are we good at? Well, multiplying and dividing by 10 is trivial. It’s just moving a decimal point. And most of us are pretty good at doubling and halving things. So, multiplying and dividing by 10 and 2 are sweet spots.

1kg is approximately 2.2lb. (If you need better precision, use a computer.) So to convert kilograms to pounds, we have to multiply by 2.2. Let’s pull out that distributive law. 2.2x = (2 + 0.2)x = 2x + 0.2x = 2x + (2x)/10. We’ve reduced the problem to multiplication by 2 and division by 10.

“Woah Shaun, stop with the algebra!” OK. Take the kilograms. Double it. Take that double, shift the decimal place. Add the double and the decimal-shifted double. For example, take 150kg, a respectable squat weight. Double it = 300. Divide by 10 = 30. Add these two together = 330lb. (Google will tell you the answer is 330.693lb.)

But what about converting pounds back to kilograms? Do we have to divide by 2.2? That sounds awful. But division by 2.2 is multiplication by 5/11. Does that make it any better? Yes! Really? YES! Division by 11 is awesome, and for some reason, nobody learns it in school.

To divide by 11, first divide by 10. That’s your current total. Divide that by 10 and subtract the result from the current total. That’s your new current total. Divide that by 10 and add it to the current total. That’s your new current total. Continue dividing by 10 and alternating addition and subtraction. Do this until you die of exhaustion, you see the two-digit repeating pattern, or you’re happy with the precision. Recall that 2.2 was only an approximate conversion to begin with, so I stop when all the action is after the decimal point. Round whole numbers are good enough for me.

But we needed to multiply by 5/11, not just 1/11. No worries. To multiply by x/11, instead of starting with 1/10, start with x/10. Luckily, 5/10 is just 1/2. We like dividing by 2.

My body weight is about 190lb. What is that in kilograms? Half of 190 is 95. Divide by 10 for 9.5. Subtract 9.5 from 95 for 85.5. Divide 9.5 by 10 for almost one. Addition is next, so let’s call it 86kg. (Google will tell you the answer is 86.1826kg.)

So there you go. Quick tricks to help you get approximate conversions between kilograms and pounds in your head.

But what about the super secret divisibility rule for 11 I promised? It follows the same pattern as the technique I gave for dividing by 11. Just do alternating addition and subtraction on the digits of a number. If the result is divisible by 11, so is the original number. Is 1936 divisible by 11? 1-9+3-6 = -11. It sure is.

Eleven is awesome.

*tap* *tap* *tap* testing testing *tap* *tap* *tap*

Still here? Cool. Here’s a bonus tip that wasn’t in my original post. All that stuff about 11? It works the same way for whatever number “11” happens to represent in any radix. Need to divide by 11_8 (decimal 9) in octal? Same division algorithm. Need to check divisibility by 11_16 (decimal 17) in hexadecimal? Same division rule. Looking for some fun weekend math? Take a look at the divisibility rules you know, figure out why they work, and use that to figure out what divisibility rules you have in other bases. Hint: every divisibility rule I can name stems from three basic kinds of tests.

February 09, 2018

Nextcloud Talk: video conferencing the open way

For instant messaging I’ve been primarily using Telegram. I think it’s a good compromise between openness and features and mass adoption. It can also do encrypted audio calls, but it can’t do video calls and audio/video conferences of multiple people.

nextcloud

That’s why I was looking for some tool for video calling and conferencing. I didn’t want something completely closed (Skype, Hangouts,…) and ideally something I can run on my server.

I’ve been a big fan of Nextcloud and running it on my Fedora VPS for 1,5 year. In my opinion it’s a great open platform for online services. They used to offer SpreedMe service which was pretty clumsy and difficult to install and I never fell for it. Fortunately they recently announced Nextcloud Talk, a complete rewrite, open source and based on WebRTC. Is it what I was looking for?

It requires Nextcloud 13, so I had to wait until this version was out this week. (I actually find it quite strange to announce and do a big PR for an app that requires a version of Nextcloud that hasn’t been released yet.) The installation is super-simple now. You just go to the application store, click “Enable”, and that’s it.

I’ve been using it for several days, so what is my experience with it? You can make calls with other users in your Nextcloud instance (it also supports federation, so you can extend it to users of other connected instances), but you can create a conference room to which you can invite other people via a link (can be protected by a password).

Besides basic audio and video calls it allows you to share a screen and there is a text chat available to participants which is handy e.g. for sharing links. It just works in modern browsers. You send someone a link, they open it, and you can start talking. Nextcloud Talk also have apps for Android and iOS, so you can join calls from your phone. But they can only do video and audio, they don’t support text chat yet and you can create a new call room in them.

Feature-wise Nextcloud Talk is already fairly close to Bluejeans, the enterprise solution we use for video conferencing in Red Hat.

call-in-action

Are there any problems? It’s the first release, there definitely are. One-to-one calls between registered users work reliably. I can’t say the same about conference calls with unregistered users. I tested it with two colleagues of mine who I invited via a link. I could only see video of one of them, he could see me, but couldn’t see the other person… Also connecting all participants is not always reliable.

Nextcloud offers its own STUN server. In settings you can add more STUN servers or even TURN server (but it’s not very desirable because all traffic then goes through your TURN server). I wonder if that would help.

There are also some problems in the UI. You can close the panel with the chat, but the icon for getting it back is black and it’s placed in the black corner of the video output of the other person, so it’s invisible. The UI of the mobile app sometimes sort of freezes, so it’s impossible to hang up.

But overall Nextcloud Talk looks very promising as a solution for those who want to easily deploy a video conferencing system on their premise. As I said one-to-one calls already work well for me and I hope the video conferences of multiple people will improve with future releases or I will find settings that fix the problems I’m having.

Nautilus team grows – It’s my pleasure to welcome Ernestas Kulik as co-maintainer

Becoming a maintainer of a FOSS project is not easy. It requires much more than just code skills. It’s about responsibility, product management, vision, community and hard work in long-term.

Becoming a maintainer of a FOSS project like Nautilus is even harder, it requires a sense of what being used by millions of people and delivering to business entitles. It also requires understanding the complexity of a file manager, and the old code that lies behind.

Now, becoming maintainer of a project that already has a maintainer working full time on it… that’s a different level.

Ernestas started contributing more than 2 years ago as a community member on his free time. Did major code work like porting Nautilus to Meson, make Nautilus work on Flatpak, improved search, improved operations, took the lead on fixing all deprecations (we had many!), worked on a prototype for a new cache/operations backend, dig into other libraries deeper on the stack to fix something that was visible in Nautilus, and many other things. What was most noticeable is the quality Ernestas strives for in all these contributions.

However, the above would make him “just” a good software programmer, the important part that makes him a good maintainer were other actions. Ernestas took the lead on newcomers bugs review/assignment, took the lead on legal matters like the GPL3 vs GPL2 issue we had with extensions, reviewed code from other contributors (including me), worked without dilation on critical issues in a timely manner, worked on tasks important to our vision, engaged in bug reports with good communication, helped with project direction, considered all sides and the big picture when taking decisions and last, all of this with excellence. If you ever wondered what someone has to do to become a good maintainer/co-maintainer, here’s the answer.

So without more dilation, welcome Ernestas in his new role as co-maintainer, go on IRC and congratulate him 🎉

February 08, 2018

Entering the “home stretch” for GNOME 3.28

Earlier this week I´ve released GNOME Maps 3.27.90 (even though I just read an e-mail about the deadline for the release tarballs had been postponed for one week just after uploading the tarball).

This weekend I (like some 8000 others) participated in an exciting FOSDEM with lots of interesting talks and the week before that I gave presentation of GNOME Maps, and in particular the public transit functionality for TrafikLab (the sort of “developer community” driven by the Swedish organization Samtrafiken AB, who coordinates and aggregates data from all public transit operators, both commercial/private and regional/public ones.

One of the larger features landed in 3.27.90, which isnt´t visible on the surface is that Maps now uses some new language features introduced to JS in the ES6 standard, namely classes and ”arrow functions”.

So, when it comes to classes, as known from traditional OO languages such as Java or C++, earlier one would typically use prototypes to model object classes, but as of ES6 the language has support for a more traditional classes and with a method syntax. GJS also gained a new way to declare GObject classes.

So when earlier declaring an extending class would look something like this:

var MyListBoxRow = new Lang.Class({
    Name: 'MyListRow',
    Extends: Gtk.ListBoxRow,
    Template: 'resource:///<app-id>/ui/my-list-box-row.ui',

    ...
    myMethod: function(args) {

    }
});


this now becomes:

 
var MyListBoxRow = GObject.registerClass({
   Template: 'resource:///<app-id>/ui/my-list-box-row.ui'
} ,class MyListBoxRow extends Gtk.ListBoxRow {

  ...
  myMethod(args) {

  }
});

and in cases where we don´t need to inherit from GObject (such as when not declaring any signals, i.e. for utility data-bearing classes) we can skip the registering part and it becomes just a simple ES6 class:

var SomeClass = class {
   ...
   someMethod(args) {

   }
}

We still need the assign using “var” to export those outside the module in question, but when we gain ES7 support in GJS we should be able to utilize the “export” keyword here instead. Another simplication that should arrive with ES7 is that we´d be able to use a decorator pattern in place of GObject.registerClass so that it would become something like:

@GObject.registerClass 
class MyListRow extends Gtk.ListBoxRow

Techinically this could be done today using a transpiler step (using something like Babel) in the build system. These decorators will pretty much be higher-order functions. But I choose not to do this at this point, since we still use GNU Autotools as our build system and eventually we should switch to Meson.

The second change involves using the “arrow operator” to bind this to anonymous functions (in async callbacks). So instead of something like:

asyncFunctionCall(onObject, (function(arg) {
     doStuffWith(arg);
}).bind(this);

this becomes:

asyncFunctionCall(onObject, (arg) => doStuffWith(arg));

These changes results in a reduction of 284 lines of code, which isn´t too bad for a change that doesn´t actually involving removing or really rewriting any code.

Thanks go to Philip Chimento (and Endless) for bringing these improvements for GJS!

Some other changes since the last release is some visual fixes and tooltips for some of the buttons in routing sidebar contributed by Vibhanshu Vaibhav and a fix for a keyboard navigation bug (that I introduces when changing the behavior of the search entry to always activate when starting to type with no other entry active) contributed by Tomasz Miąsko. Thank you!

 

February 07, 2018

Calculator, System Monitor and games happenings

We are getting close to the 3.28 release, we are in the freeze, so it's time for a quick summary of what happened this cycle with the projects I occasionally contribute to.

Calculator was the major player this cycle (well, lately, to be more precise) having a quick bugs cleanup (both on gnome bugzilla and ubuntu launchpad), merging of older patches, creating new bugs by merging old patches, here's a few of the most relevant ones:
* Meson port got into the calculator repository, thanks Robert Ancell and Niels de Graef for the patches sent, and for the people reporting bugs since that happened, I am trying to keep up with the bugs as they are coming in. Thankfully, Meson is not only faster, but a lot easier to understand and use (and with better reference) for me, so the fixes do not (always) start with me shouting out for help. Please, go ahead and try the meson build and if you find anything to complain about, just do it on the issue tracker (bugzilla, or hopefully gitlab soon).
* If you did use the ans variable, be aware, that it was replaced with _ variable (do you know python), to avoid being confused with a time unit in a language. This was quite a big failure of me handling the issue, as it popped up late in the cycle, and didn't know how to handle it, as it would have required a freeze exception, translation updates, etc, something which I wasn't ready for (the first one is the hardest). Instead, I chose the easy way for me, which meant lots of headache for some people (not being to be able to use results of previous calculations with the given locale), more headache for maintainers in various distributions dealing with the bugs and patches related to that. Well, that is something I need to get better at, namely not choosing the easy way out and postponing to the next cycle in case of bugs found late in the cycle.
* Calculator is resizable again. This is a somewhat controversial move (just like the one to make it non-resizable), hopefully people will forget me for it. For now it is freely resizable, the history view (the view showing previous calculations) expanding vertically, and the buttons remain fixed-height. The problem is that both the history and the buttons area expands horizontally, and the buttons expanding horizontally can result in very wide buttons, which is not ideal. Thankfully, Allan Day already has the mockups ready on how the calculator should resize, and Emanuelle Bassi already has built emeus, a constraint-based GTK container, because I haven't found a way of describing Allan's mockups in current GTK+CSS terms.

System-Monitor is not dead yet, and will not die anytime soon. That is a statement which was not clear until now, with Usage being in development. We had several discussions with the designers about how to make one application out of the two, merging the two, but we agreed that probably the target audience is different: Usage is for simple use-cases, easy-to-use (and fairly beautiful, I have to admit), and System Monitor is for monitoring your system, your processes. Usage handles applications, with as few details as possible, e.g. network traffic, disk usage, CPU usage, while System Monitor monitors processes, their statuses, uptimes, cgroups, open files, etc. for advanced users.
On the development front there are only a couple of changes worth mentioning there:
* Dark theme support for charts - if you use a dark theme, I'm sure you've already been blinded by the hardcoded white background of the Resources' charts. Thanks to Voldemar Khramtsov it is fixed, please check the implementation with your themes and report bugs. I have experimented with ~15 different themes, and made sure to have theme-based colors for the background and the grid of the charts, but it is really hard to make the non-theme dependent colors of the charts visible on a theme-dependent background, and we might need some more tweaking. Ideas are welcome.
* Multiple terms search in process list - you can filter the process list by multiple words, separated with ' ' or '|', e.g. "foo bar" or "foo|bar" for showing only processes matching foo or bar. There was a discussion on making this search regexp-based, but I didn't see a use-case for it. Let me know if you would use a regexp-filtering in the process tree, and explain why you would use (real use-case), and I will re-consider the decision taken in the bug.

Other recent work:
* My first Meson port was swell-foop, the same as for everything I do, please check it, and if you see anything wrong, just let me know.
* I did some GetText migrations, mostly on games, and most of them have already been merged, and already received some fixes (e.g. appdata was not installed, as I thought that removing the APPSTREAM_XML automake rules is part of the migration).
* I finally got to play a bit with flatpak and flatpak-builder, which greatly simplify building and distributing apps. I intend to do some more excercises on the games I maintain, as the games from the old GNOME Games suite (not to be confused with the new GNOME Games application for playing emulator games) are not present on flathub.
* I rediscovered my old project eMines, written as an elementary minesweeper using py-gtk (almost 7 years ago), just pulled it and it is still working.

Uh-oh, and I almost forgot, I proposed a GSoC idea for modernizing Five-Or-More, aka GNOME lines a bit, as that game didn't get the quick make-over most other games received a couple of years ago. Let's see whether it will happen or not.

Behind the GNOME Booth, FOSDEM 2018

I did catch a cold, but I had a great time at FOSDEM this year! Friday was spent reviewing a branch with Florian which adds a disconnect entry to the context popover in Polari. It has now landed.

Saturday was spent selling lots and lots of socks. I choose this year not to go to any talks and instead hangout with fellow GNOMEies in the booth and have a chat with bypassing users. I’m accumulating many advertising arguments for buying socks including that it allows you to have feet on your feet and that you have an excuse to say “GNOME Socks!” as much as you want, once you own a pair. ;-) Kat brought the awesome hoodies and then we had a big load of leftover t-shirts from GUADEC 2017 which we more or less sold (I think there’s still some 20 left in small). In the end we sold a 160 pairs of socks which is almost half the enormous stock of socks I purchased. When the evening came by and the booth had to close, we went to the GNOME Beer Event in La Bécasse, where I had my annual taste of Lambic Blanc, which is one of the few beers I really enjoy drinking.


420 pairs of lovely GNOME socks ready to warm your feet. (CC-BY-SA 4.0)

Sunday went by with more booth-standing and then a GNOME Newcomer Workshop. We tried a new format which involved me matchmaking newcomers with existing GNOME developers from projects each newcomer was interested in. Instead of going big classroom style, the idea is to get more 1-on-1 and pair programming going during workshops. Thanks to Elias, Xaviju, Gwan and Florian for attending the workshop! I hope I’ll get to chat with you in the chatrooms, or who knows maybe meet again at GUADEC 2018?

In the evening me, Tobias, David and Julian hung out in the apartment I had arranged where I cooked an oriental lentil soup with flatbread. Coming to GNOME Recipes soon™!


Photos by Julian Sparber, food by me.

Speaking at FOSDEM 2018 in Brussels, Belgium

As in the last ten (or so) years I attended FODSEM, the biggest European Free Software event. This year, though, I went a day earlier to attend one of the fringe events, the CHAOSSCon.

I didn’t take notice of the LinuxFoundation announcing CHAOSS, an attempt to bundle various efforts regarding measuring and creating metrics of Open Source projects. The CHAOSS community is thus a bunch of formerly separate projects now having one umbrella.

OpenStack’s Ildiko Vancsa opened the conference by saying that metrics is what drives our understanding of communities and that we’re all interested in numbers. That helps us to understand how projects work and make a more educated guess how healthy a project currently is, and, more importantly, what needs to be done in order to make it more sustainable. She also said that two communities within the CHAOSS project exist: The Metrics and the Software team. The metrics care about what information should be extracted and how that can be presented in an informational manner. The Software team implements the extraction parts and makes the analytics. She pointed the audience to the Wiki which hosts more information.

Georg Link from the metrics team then continued saying that health cannot universally be determined as every project is different and needs a different perspective. The metrics team does not work at answering the health question for each and every project, but rather enables such conclusions to be drawn by providing the necessary infrastructure. They want to provide facts, not opinions.

Jesus from Bitergia and Harish from Red Hat were talking on behalf of the technical team. Their idea is to build a platform to understand how software is developed. The core projects are prospector, cregit, ghdata, and grimoire, they said.

I think that we in the GNOME community can use data to make more informed decisions. For example, right now we’re fading out our Bugzilla instance and we don’t really have any way to measure how successful we are. In fact, we don’t even know what it would mean to be successful. But by looking at data we might get a better feeling of what we are interested in and what metric we need to refine to express better what we want to know. Then we can evaluate measures by looking at the development of the metrics over time. Spontaneously, I can think of these relatively simple questions: How much review do our patches get? How many stale wiki links do we have? How soon are security issues being dealt with? Do people contribute to the wiki, documentation, or translations before creating code? Where do people contribute when coding stalls?

Bitergia’s Daniel reported on Diversity and Inclusion in CHAOSS and he said he is building a bridge between the metrics and the software team. He tried to produce data of how many women were contributing what. Especially, whether they would do any technical work. Questions they want to answer include whether minorities take more time to contribute or what impact programs like the GNOME Outreach Program for Women have. They do need to code up the relevant metrics but intend to be ready for the next OpenStack Gender diversity report.

Bitergia’s CEO talked about the state of the GriomoireLab suite.
It’s software development analysis toolkit written largely in Python, ElasticSearch, and Kibana. One year ago it was still complicated to run the stack, he said. Now it’s easy and organisations like the Document Foundation run run a public instance. Also because they want to be as transparent as possible, he said.

Yousef from Mozilla’s Open Innovation team then showed how they make use of Grimoire to investigate the state of their community. They ingest data from Github, Bugzilla, newsgroup, meetups, discourse, IRC, stackoverflow, their wiki, rust creates, and a few other things reaching back as far as 20 years. Quite impressive. One of the graphs he found interesting was one showing commits by time zone. He commented that it was not as diverse as he hope as there were still many US time zones and much fewer Asian ones.

Raymond from the Linux Foundation talked about Metrics in Open Source Communities, what are they measuring and what do they do with the data. Measuring things is not too complicated, he said. But then you actually need to do stuff with it. Certain things are simply hard to measure, he said. As an example he gave the level of user or community support people give. Another interesting aspect he mentioned is that it may be a very good thing when numbers go down, also because projects may follow a hype cycle, too. And if your numbers drop, it’ll eventually get to a more mature phase, he said. He closed with a quote he liked and noted that he’s not necessarily making fun of senior management: Not everything that can be counted counts and not everything that counts can be counted.

Boris then talked about Crossminer, which is a European funded research project. They aim for improving the management of software projects by providing in-context recommendations and analytics. It’s a continuation of the Ossmeter project. He said that such projects usually die after the funding runs out. He said that the Crossminer project wants to be sustainable and survive the post-funding state by building an actual community around the software the project is developing. He presented a rather high level overview of what they are doing and what their software tries to achieve. Essentially, it’s an Eclipse plugin which gives you recommendations. The time was too short for going into the details of how they actually do it, I suppose.

Eleni talked about merging identities. When tapping various data sources, you have to deal with people having different identity domains. You may want to merge the identities belonging to the same person, she said. She gave a few examples of what can go wrong when trying to merge identities. One of them is that some identities do not represent humans but rather bots. Commonly used labels is a problem, she said. She referred to email address prefixes which may very well be the same for different people, think j.wright@apple.com, j.wright@gmail.com, j.wright@amazon.com. They have at least 13 different problems, she said, and the impact of wrongly merging identities can be to either underestimate or overestimate the number of community members. Manual inspection is required, at least so far, she said.

The next two days were then dedicated to FOSDEM which had a Privacy Devroom. There I had a talk on PrivacyScore.org (slides). I had 25 minutes which I was overusing a little bit. I’m not used to these rather short slots. You just warm up talking and then the time is already up. Anyway, we had very interesting discussions afterwards with a few suggestions regarding new tests. For example, someone mentioned that detecting a CDN might be worthwhile given that CloudFlare allegedly terminates 10% of today’s Web traffic.

When sitting with friends we noticed that FOSDEM felt a bit like Christmas for us: Nobody really cares a lot about Christmas itself, but rather about the people coming together to spend time with each other. The younger people are excited about the presents (or the talks, in this case), but it’s just a matter of time for that to change.

It’s been an intense yet refreshing weekend and I’m looking very much forward to coming back next time. For some reason it feels really good to see so many people caring about Free Software.

design notes on inline caches in guile

Ahoy, programming-language tinkerfolk! Today's rambling missive chews the gnarly bones of "inline caches", in general but also with particular respect to the Guile implementation of Scheme. First, a little intro.

inline what?

Inline caches are a language implementation technique used to accelerate polymorphic dispatch. Let's dive in to that.

By implementation technique, I mean that the technique applies to the language compiler and runtime, rather than to the semantics of the language itself. The effects on the language do exist though in an indirect way, in the sense that inline caches can make some operations faster and therefore more common. Eventually inline caches can affect what users expect out of a language and what kinds of programs they write.

But I'm getting ahead of myself. Polymorphic dispatch literally means "choosing based on multiple forms". Let's say your language has immutable strings -- like Java, Python, or Javascript. Let's say your language also has operator overloading, and that it uses + to concatenate strings. Well at that point you have a problem -- while you can specify a terse semantics of some core set of operations on strings (win!), you can't choose one representation of strings that will work well for all cases (lose!). If the user has a workload where they regularly build up strings by concatenating them, you will want to store strings as trees of substrings. On the other hand if they want to access characterscodepoints by index, then you want an array. But if the codepoints are all below 256, maybe you should represent them as bytes to save space, whereas maybe instead as 4-byte codepoints otherwise? Or maybe even UTF-8 with a codepoint index side table.

The right representation (form) of a string depends on the myriad ways that the string might be used. The string-append operation is polymorphic, in the sense that the precise code for the operator depends on the representation of the operands -- despite the fact that the meaning of string-append is monomorphic!

Anyway, that's the problem. Before inline caches came along, there were two solutions: callouts and open-coding. Both were bad in similar ways. A callout is where the compiler generates a call to a generic runtime routine. The runtime routine will be able to handle all the myriad forms and combination of forms of the operands. This works fine but can be a bit slow, as all callouts for a given operator (e.g. string-append) dispatch to a single routine for the whole program, so they don't get to optimize for any particular call site.

One tempting thing for compiler writers to do is to effectively inline the string-append operation into each of its call sites. This is "open-coding" (in the terminology of the early Lisp implementations like MACLISP). The advantage here is that maybe the compiler knows something about one or more of the operands, so it can eliminate some cases, effectively performing some compile-time specialization. But this is a limited technique; one could argue that the whole point of polymorphism is to allow for generic operations on generic data, so you rarely have compile-time invariants that can allow you to specialize. Open-coding of polymorphic operations instead leads to code bloat, as the string-append operation is just so many copies of the same thing.

Inline caches emerged to solve this problem. They trace their lineage back to Smalltalk 80, gained in complexity and power with Self and finally reached mass consciousness through Javascript. These languages all share the characteristic of being dynamically typed and object-oriented. When a user evaluates a statement like x = y.z, the language implementation needs to figure out where y.z is actually located. This location depends on the representation of y, which is rarely known at compile-time.

However for any given reference y.z in the source code, there is a finite set of concrete representations of y that will actually flow to that call site at run-time. Inline caches allow the language implementation to specialize the y.z access for its particular call site. For example, at some point in the evaluation of a program, y may be seen to have representation R1 or R2. For R1, the z property may be stored at offset 3 within the object's storage, and for R2 it might be at offset 4. The inline cache is a bit of specialized code that compares the type of the object being accessed against R1 , in that case returning the value at offset 3, otherwise R2 and offset r4, and otherwise falling back to a generic routine. If this isn't clear to you, Vyacheslav Egorov write a fine article describing and implementing the object representation optimizations enabled by inline caches.

Inline caches also serve as input data to later stages of an adaptive compiler, allowing the compiler to selectively inline (open-code) only those cases that are appropriate to values actually seen at any given call site.

but how?

The classic formulation of inline caches from Self and early V8 actually patched the code being executed. An inline cache might be allocated at address 0xcabba9e5 and the code emitted for its call-site would be jmp 0xcabba9e5. If the inline cache ended up bottoming out to the generic routine, a new inline cache would be generated that added an implementation appropriate to the newly seen "form" of the operands and the call-site. Let's say that new IC (inline cache) would have the address 0x900db334. Early versions of V8 would actually patch the machine code at the call-site to be jmp 0x900db334 instead of jmp 0xcabba6e5.

Patching machine code has a number of disadvantages, though. It inherently target-specific: you will need different strategies to patch x86-64 and armv7 machine code. It's also expensive: you have to flush the instruction cache after the patch, which slows you down. That is, of course, if you are allowed to patch executable code; on many systems that's impossible. Writable machine code is a potential vulnerability if the system may be vulnerable to remote code execution.

Perhaps worst of all, though, patching machine code is not thread-safe. In the case of early Javascript, this perhaps wasn't so important; but as JS implementations gained parallel garbage collectors and JS-level parallelism via "service workers", this becomes less acceptable.

For all of these reasons, the modern take on inline caches is to implement them as a memory location that can be atomically modified. The call site is just jmp *loc, as if it were a virtual method call. Modern CPUs have "branch target buffers" that predict the target of these indirect branches with very high accuracy so that the indirect jump does not become a pipeline stall. (What does this mean in the face of the Spectre v2 vulnerabilities? Sadly, God only knows at this point. Saddest panda.)

cry, the beloved country

I am interested in ICs in the context of the Guile implementation of Scheme, but first I will make a digression. Scheme is a very monomorphic language. Yet, this monomorphism is entirely cultural. It is in no way essential. Lack of ICs in implementations has actually fed back and encouraged this monomorphism.

Let us take as an example the case of property access. If you have a pair in Scheme and you want its first field, you do (car x). But if you have a vector, you do (vector-ref x 0).

What's the reason for this nonuniformity? You could have a generic ref procedure, which when invoked as (ref x 0) would return the field in x associated with 0. Or (ref x 'foo) to return the foo property of x. It would be more orthogonal in some ways, and it's completely valid Scheme.

We don't write Scheme programs this way, though. From what I can tell, it's for two reasons: one good, and one bad.

The good reason is that saying vector-ref means more to the reader. You know more about the complexity of the operation and what side effects it might have. When you call ref, who knows? Using concrete primitives allows for better program analysis and understanding.

The bad reason is that Scheme implementations, Guile included, tend to compile (car x) to much better code than (ref x 0). Scheme implementations in practice aren't well-equipped for polymorphic data access. In fact it is standard Scheme practice to abuse the "macro" facility to manually inline code so that that certain performance-sensitive operations get inlined into a closed graph of monomorphic operators with no callouts. To the extent that this is true, Scheme programmers, Scheme programs, and the Scheme language as a whole are all victims of their implementations. JavaScript, for example, does not have this problem -- to a small extent, maybe, yes, performance tweaks and tuning are always a thing but JavaScript implementations' ability to burn away polymorphism and abstraction results in an entirely different character in JS programs versus Scheme programs.

it gets worse

On the most basic level, Scheme is the call-by-value lambda calculus. It's well-studied, well-understood, and eminently flexible. However the way that the syntax maps to the semantics hides a constrictive monomorphism: that the "callee" of a call refer to a lambda expression.

Concretely, in an expression like (a b), in which a is not a macro, a must evaluate to the result of a lambda expression. Perhaps by reference (e.g. (define a (lambda (x) x))), perhaps directly; but a lambda nonetheless. But what if a is actually a vector? At that point the Scheme language standard would declare that to be an error.

The semantics of Clojure, though, would allow for ((vector 'a 'b 'c) 1) to evaluate to b. Why not in Scheme? There are the same good and bad reasons as with ref. Usually, the concerns of the language implementation dominate, regardless of those of the users who generally want to write terse code. Of course in some cases the implementation concerns should dominate, but not always. Here, Scheme could be more flexible if it wanted to.

what have you done for me lately

Although inline caches are not a miracle cure for performance overheads of polymorphic dispatch, they are a tool in the box. But what, precisely, can they do, both in general and for Scheme?

To my mind, they have five uses. If you can think of more, please let me know in the comments.

Firstly, they have the classic named property access optimizations as in JavaScript. These apply less to Scheme, as we don't have generic property access. Perhaps this is a deficiency of Scheme, but it's not exactly low-hanging fruit. Perhaps this would be more interesting if Guile had more generic protocols such as Racket's iteration.

Next, there are the arithmetic operators: addition, multiplication, and so on. Scheme's arithmetic is indeed polymorphic; the addition operator + can add any number of complex numbers, with a distinction between exact and inexact values. On a representation level, Guile has fixnums (small exact integers, no heap allocation), bignums (arbitrary-precision heap-allocated exact integers), fractions (exact ratios between integers), flonums (heap-allocated double-precision floating point numbers), and compnums (inexact complex numbers, internally a pair of doubles). Also in Guile, arithmetic operators are a "primitive generics", meaning that they can be extended to operate on new types at runtime via GOOPS.

The usual situation though is that any particular instance of an addition operator only sees fixnums. In that case, it makes sense to only emit code for fixnums, instead of the product of all possible numeric representations. This is a clear application where inline caches can be interesting to Guile.

Third, there is a very specific case related to dynamic linking. Did you know that most programs compiled for GNU/Linux and related systems have inline caches in them? It's a bit weird but the "Procedure Linkage Table" (PLT) segment in ELF binaries on Linux systems is set up in a way that when e.g. libfoo.so is loaded, the dynamic linker usually doesn't eagerly resolve all of the external routines that libfoo.so uses. The first time that libfoo.so calls frobulate, it ends up calling a procedure that looks up the location of the frobulate procedure, then patches the binary code in the PLT so that the next time frobulate is called, it dispatches directly. To dynamic language people it's the weirdest thing in the world that the C/C++/everything-static universe has at its cold, cold heart a hash table and a dynamic dispatch system that it doesn't expose to any kind of user for instrumenting or introspection -- any user that's not a malware author, of course.

But I digress! Guile can use ICs to lazily resolve runtime routines used by compiled Scheme code. But perhaps this isn't optimal, as the set of primitive runtime calls that Guile will embed in its output is finite, and so resolving these routines eagerly would probably be sufficient. Guile could use ICs for inter-module references as well, and these should indeed be resolved lazily; but I don't know, perhaps the current strategy of using a call-site cache for inter-module references is sufficient.

Fourthly (are you counting?), there is a general case of the former: when you see a call (a b) and you don't know what a is. If you put an inline cache in the call, instead of having to emit checks that a is a heap object and a procedure and then emit an indirect call to the procedure's code, you might be able to emit simply a check that a is the same as x, the only callee you ever saw at that site, and in that case you can emit a direct branch to the function's code instead of an indirect branch.

Here I think the argument is less strong. Modern CPUs are already very good at indirect jumps and well-predicted branches. The value of a devirtualization pass in compilers is that it makes the side effects of a virtual method call concrete, allowing for more optimizations; avoiding indirect branches is good but not necessary. On the other hand, Guile does have polymorphic callees (generic functions), and call ICs could help there. Ideally though we would need to extend the language to allow generic functions to feed back to their inline cache handlers.

Finally, ICs could allow for cheap tracepoints and breakpoints. If at every breakable location you included a jmp *loc, and the initial value of *loc was the next instruction, then you could patch individual locations with code to run there. The patched code would be responsible for saving and restoring machine state around the instrumentation.

Honestly I struggle a lot with the idea of debugging native code. GDB does the least-overhead, most-generic thing, which is patching code directly; but it runs from a separate process, and in Guile we need in-process portable debugging. The debugging use case is a clear area where you want adaptive optimization, so that you can emit debugging ceremony from the hottest code, knowing that you can fall back on some earlier tier. Perhaps Guile should bite the bullet and go this way too.

implementation plan

In Guile, monomorphic as it is in most things, probably only arithmetic is worth the trouble of inline caches, at least in the short term.

Another question is how much to specialize the inline caches to their call site. On the extreme side, each call site could have a custom calling convention: if the first operand is in register A and the second is in register B and they are expected to be fixnums, and the result goes in register C, and the continuation is the code at L, well then you generate an inline cache that specializes to all of that. No need to shuffle operands or results, no need to save the continuation (return location) on the stack.

The opposite would be to call ICs as if their were normal procedures: shuffle arguments into fixed operand registers, push a stack frame, and when the IC returns, shuffle the result into place.

Honestly I am looking mostly to the simple solution. I am concerned about code and heap bloat if I specify to every last detail of a call site. Also maximum speed comes with an adaptive optimizer, and in that case simple lower tiers are best.

sanity check

To compare these impressions, I took a look at V8's current source code to see where they use ICs in practice. When I worked on V8, the compiler was entirely different -- there were two tiers, and both of them generated native code. Inline caches were everywhere, and they were gnarly; every architecture had its own implementation. Now in V8 there are two tiers, not the same as the old ones, and the lowest one is a bytecode interpreter.

As an adaptive optimizer, V8 doesn't need breakpoint ICs. It can always deoptimize back to the interpreter. In actual practice, to debug at a source location, V8 will patch the bytecode to insert a "DebugBreak" instruction, which has its own support in the interpreter. V8 also supports optimized compilation of this operation. So, no ICs needed here.

Likewise for generic type feedback, V8 records types as data rather than in the classic formulation of inline caches as in Self. I think WebKit's JavaScriptCore uses a similar strategy.

V8 does use inline caches for property access (loads and stores). Besides that there is an inline cache used in calls which is just used to record callee counts, and not used for direct call optimization.

Surprisingly, V8 doesn't even seem to use inline caches for arithmetic (any more?). Fair enough, I guess, given that JavaScript's numbers aren't very polymorphic, and even with a system with fixnums and heap floats like V8, floating-point numbers are rare in cold code.

The dynamic linking and relocation points don't apply to V8 either, as it doesn't receive binary code from the internet; it always starts from source.

twilight of the inline cache

There was a time when inline caches were recommended to solve all your VM problems, but it would seem now that their heyday is past.

ICs are still a win if you have named property access on objects whose shape you don't know at compile-time. But improvements in CPU branch target buffers mean that it's no longer imperative to use ICs to avoid indirect branches (modulo Spectre v2), and creating direct branches via code-patching has gotten more expensive and tricky on today's targets with concurrency and deep cache hierarchies.

Besides that, the type feedback component of inline caches seems to be taken over by explicit data-driven call-site caches, rather than executable inline caches, and the highest-throughput tiers of an adaptive optimizer burn away inline caches anyway. The pressure on an inline cache infrastructure now is towards simplicity and ease of type and call-count profiling, leaving the speed component to those higher tiers.

In Guile the bounded polymorphism on arithmetic combined with the need for ahead-of-time compilation means that ICs are probably a code size and execution time win, but it will take some engineering to prevent the calling convention overhead from dominating cost.

Time to experiment, then -- I'll let y'all know how it goes. Thoughts and feedback welcome from the compilerati. Until then, happy hacking :)

February 06, 2018

Updates on the Endless App Center / GNOME Software

The great majority of my work at Endless is to (try to) tame GNOME Software and apply the changes that make it what we simply call “the App Center” (repo here) in the Endless OS.
This is a lot of work and usually I’d love to share more often what I am doing but end up neglecting the blog due to the lack of time. So here’s a summary of what I have done the past few months.

New App Tiles

From the times where it was called the App Store (and not based on GNOME Software like know), the Endless OS’s App Center used to have what we called internally as “app thumbnails”. These were images carefully produced for each app that Endless distributed, that worked as a way to provide some visual hint and attractiveness that is many times not achieved by the apps’ own icons. Here’s a screenshot of that version:

Old version of App Center in the Endless OS: displays colorful images in squares representing the apps available

Old version of the App Center


There was a couple of problems with “app thumbnails” like that: 1) we started shipping Flathub as a remote by default, and it’s simply not scalable to go and create an image for every app that is available in the repository; and 2) even if visually appealing, the app tiles make it a bit difficult to correctly display text on them, and depending on how apps appear next to each other, it can become visually quite bloated.

Thus the solution we came up with for the 2nd problem was to give dim a little bit the effect that the thumbnails have by placing a translucid layer on top of them, and to have a dedicated area for textual information. This means we still use app thumbnails for the apps that have it, but they will all seem a bit less intense and more alike.
That still leaves us with the first problem of not having thumbnails for most apps. For fixing that we create a background from the main colors presented in the app’s logo. The background is composed of 4 gradients, each with one of the icon’s main colors. Using the logos’ colors ensures there’s some harmony between the logos and their backgrounds, and I am very happy with the result:

Generated backgrounds for apps: shows squares where the background is a mix of gradients with colors picked from the app's icon

Automatically generated backgrounds for apps

Combination of app tiles that have a thumbnail image, and some that have automatically generated backgournds

Updates from USB and LAN

As you know, Endless’ mission is to give access to computers (and all that comes with them: knowledge, entertainment, productivity) to those who are often in remote areas, with very weak or innexistent connectivity. Maybe you’ve already heard that at Endless we’re developing an “asynchronous internet” and optimize the use of the little data connectivity some of our users have. So it’s only logical that we can give the possibility for users to share the data among themselves without an internet connection. To do that, we (more specifically Philip Withnall, kudos to him!) have implemented a way in ostree for local repositories to be found, particularly, repositories in removable drives (e.g. USB keys) and LAN. This means that e.g. a teacher’s computer in a class room can download app updates from the interwebs and the sudents’ computers will just get the updates through LAN (without the need for an external connection). In the case of the USB, as you probably guessed, a user can just set up a repository in a USB, and share the drive with friends.
For the USB case, in which the number of apps available may differ a lot from the user machine’s catalog, we need to show which of those apps are available, so when inserting the USB drive in the machine, the App Center should just pop up and show a new category “USB” with the apps that are contained in it. See the following screenshot:

App Center showing the one app in an inserted USB  key, as a category

App Center showing the one app in an inserted USB key

Performance Constraints

One of our constant concerns is that our OS and apps run smoothly even on less powerful machines, since it’s what many of our users have. GNOME Software spawns a thread for every main operation that the user does, like installing or updating an app, and we’ve noticed that when a few of these operations are running in parallel, some machines will just freeze. Moreover, downloading a bunch of data in parallel may easily occupy the whole bandwidth without actually completing any of the downloads.
One can think of a lot of smart approaches for dealing with this, but for now we just implemented a restriction on the number of operations that can be run in parallel. This was implemented together with upstream and the number of possible operations in parallel is one per GB of RAM. You may argue this is good a heuristic as any other, but it gives a low number of operations for slower machines, while still allowing powerful machines to have multiple parallel operations.
For the Endless OS we just opted to limit this number to 1, but may revisit it later. Google Play also performs just one update/install at a time, so this is not such a crazy thing.

We need also to inform the user that an app is waiting for it’s turn to install/update, so in such cases an empty progress bar with a message is shown:

An app waiting for its turn to be updated: has an empty progress bar with the message "Pending update..." on top.

An app waiting for its turn to be updated

For consistency, we’ve changed also how we showed the “queued-for-install” state (a state that happens when the user clicks install and there’s no internet connection), to be consistent with the UI shown above.

Auto Updates

Another push we’re taking together with upstream is auto-updates. GNOME Software has had something similar to auto-updates for a while, though it was more like auto-downloads. This was heavily based on the fact that it’s not very safe to just go and install new versions of packages right away (apps may be running… ). So it’s up to the user to choose when to (reboot and) install those.
Flatpak though, has no such problems. Apps can be updated even while running, as the way the new update replaces an existing version happens atomically, without touching the running app’s files.

So the ideal thing to do for Flatpak is to have real auto-updates (that is, download and deploy them right away). But since GNOME Software still has to support other sorts of app distribution, it required a bit of creativity when designing the new UI for this, which Allan Day kindly did, and very patiently with me and all my opinionated views of it 🙂

I have implemented auto-updates downstream first, without some of the niceties of the new mockups, since we needed them for our very-soo-to-come new version. But the idea is to do the real implementation upstream soon.
Surely enough, even when turned on, auto-updates only happen when on unmetered connections currently; and Philip Withnall is working on a creative solution for metered connections (soon to be announced).

Misc Fixes

This post is long enough so it’s not really sensible to enumerate all the fixes done in the last months. So I will just mention that recently we have fixed important issues (upstream as well) like installing new runtimes’ extensions when an app update needs them; cancelling auto updates/downloads when their running and the connection is switched to a metered one; setting an app as updatable if one of its runtime extensions has an update (otherwise it was not possible to install those extensions), etc.

If you’re still reading this, thank you! But especially thanks to Richard and Allan for their patience and leadership upstream!
Hope you liked this. I will try to keep the updates more frequent!

February 05, 2018

summing up 95

summing up is a recurring series on topics & insights that compose a large part of my thinking and work. drop your email in the box below to get it – and much more – straight in your inbox.

Legends of the Ancient Web, by Maciej Cegłowski

Radio brought music into hospitals and nursing homes, it eased the profound isolation of rural life, it let people hear directly from their elected representatives. It brought laugher and entertainment into every parlor, saved lives at sea, gave people weather forecasts for the first time.

But radio waves are just oscillating electromagnetic fields. They really don't care how we use them. All they want is to go places at the speed of light. It is hard to accept that good people, working on technology that benefits so many, with nothing but good intentions, could end up building a powerful tool for the wicked. But we can't afford to re-learn this lesson every time.

Technology interacts with human nature in complicated ways, and part of human nature is to seek power over others, and manipulate them. Technology concentrates power. We have to assume the new technologies we invent will concentrate power, too. There is always a gap between mass adoption and the first skillful political use of a medium. With the Internet, we are crossing that gap right now.

only those who know nothing about technological history believe that technology is entirely neutral. it has always a bias towards being used in certain ways and not others. a great comparison to what we're facing now with the internet.

Silicon Valley Is Turning Into Its Own Worst Fear, by Ted Chiang

In psychology, the term “insight” is used to describe a recognition of one’s own condition, such as when a person with mental illness is aware of their illness. More broadly, it describes the ability to recognize patterns in one’s own behavior. It’s an example of metacognition, or thinking about one’s own thinking, and it’s something most humans are capable of but animals are not. And I believe the best test of whether an AI is really engaging in human-level cognition would be for it to demonstrate insight of this kind.

I used to find it odd that these hypothetical AIs were supposed to be smart enough to solve problems that no human could, yet they were incapable of doing something most every adult has done: taking a step back and asking whether their current course of action is really a good idea. Then I realized that we are already surrounded by machines that demonstrate a complete lack of insight, we just call them corporations. Corporations don’t operate autonomously, of course, and the humans in charge of them are presumably capable of insight, but capitalism doesn’t reward them for using it. On the contrary, capitalism actively erodes this capacity in people by demanding that they replace their own judgment of what “good” means with “whatever the market decides.”

the problem is this: if you're never exposed to new ideas and contexts, if you grow up only being shown one way of thinking about businesses & technology and being told that there are no other ways to think about this, you grow up thinking you know what we're doing.

The resource leak bug of our civilization, by Ville-Matias Heikkilä

When people try to explain the wastefulness of today's computing, they commonly offer something I call "tradeoff hypothesis". According to this hypothesis, the wastefulness of software would be compensated by flexibility, reliability, maintability, and perhaps most importantly, cheap programming work.

I used to believe in the tradeoff hypothesis as well. However, during recent years, I have become increasingly convinced that the portion of true tradeoff is quite marginal. An ever-increasing portion of the waste comes from abstraction clutter that serves no purpose in final runtime code. Most of this clutter could be eliminated with more thoughtful tools and methods without any sacrifices.

we too often seem to adjust to the limitations of technology, instead of creating solutions for a problem with the help of technology.

Feeds