McEs, A Hacker Life
Thursday, October 30, 2008
Dutch coin designed using cairo

This is fantastic. Just read it.

/me wants one

Labels: cairo

¶ 2:43 AM 3 comments links to this post
Wednesday, October 29, 2008
Improving Login Time, Part 1: gnome-settings-daemon

Any true GNOME hacker has to take a shot (multiple shots, mind you) truth about enzyte at improving login time. Federico did it a couple years ago, and since I want to be like Federico when I grow up, that's just what I've been doing for the past week, and expect to keep doing for the weeks that come.

How does login work anyway? In short, gnome-session is started and it then in turn reads the list of tasks to start from .desktop files. Each task is marked as belonging to one of a few login phases, in chronological order:
startup
initialization
window-manager
panel
desktop
application
The startup phase belongs to gnome-session itself. Initialization, then, is the first phase application can sign up to be run in. To transition to the next phase, the current phase should either complete or time out. A phase is complete when all apps associated with the phase signal completion. An app can signal completion in a variety of ways, the simplest of which being that the process terminates. A phase times out after ten seconds. Description for the phases as well as a more a longer version of this condensed overview is available in gnome-session/gnome-session/README.

The initialization phase is where actual doing stuff begins, and for this blog entry we will focus on that (to be honest, only part of it). The mentioned README has this to add about the initialization phase:
GSM_SESSION_PHASE_INITIALIZATION is the first phase of "normal" startup (ie, startup controlled by .desktop files rather than hardcoding). It covers low-level stuff like gnome-settings-daemon and at-spi-registryd, that need to be running very early (before any windows are displayed).
Before leaving the quotation, notice the emphasis: "before any windows are displayed". We'll get back to that.

The most prominent process started as part of the initialization phase is gnome-settings-daemon. No wonder, as that single module consists of some fifteen plugins that do all kinds of startup. In other words, initialization is gnome-settings-daemon. Lets dive inside.

When Jon McCann and co refactored gnome-settings-daemon into plugins, they also added profiling annotation hooks that can be used for Federico-style timeline plotting. So what I did was rebuilding g-s-d, hooking up strace, and drawing the results. All this on my home machine that is rather beefy and otherwise underutilized.

I did one thing wrong, I forgot to send the strace background, so my mock g-s-d was not terminating and hence kept gnome-session waiting for it for ten seconds and finally giving up on the phase and moving on. The net result was that g-s-d was run solo with no other process racing with it for the CPU. That, and doing warm logins, meant my timings where very predictable and consistently reproducible. In this scenario g-s-d becomes idle in just short of one second. In a more real scenario of sending the strace to the background, it takes more like 2.5s. But that's beyond the point. By letting g-s-d run as fast as it can, it's easier to spot what's slow, as those are sure to stand out.

Before getting any further, I stopped and asked myself "is this worth optimizing?" A good question to ask before any optimization work is to start. My immediate response was: "sure, if we I can cut that 1s in half, that's about 10% saving on a 5s login time". After five seconds of thinking I wondered: "but isn't g-s-d forking and returning in the parent immediately? A .5s saving in g-s-d idle time means little for the login time as a whole." And that's mostly true, but hey, aren't the initialization phase processes supposed to set things that need to be done before any windows are shown? And by forking and returning early, g-s-d is actually not doing that. I can already see how, for example, the xsettings plugin in g-s-d comes up and sets the font rendering settings of the display, causing a redraw in any Gtk+ applications already started. It would be much better to make g-s-d actually do essential initialization as fast as it can, return in the parent, then take its time doing other work that does not have to be done before windows are shown. That's what I'm planning to make it do. In light of that plan, lets see how tight or loose things currently actually are.

Without any further rambling, here is the plot of gnome-session-daemon as shipped in Fedora rawhide starting up. The only modifications I have made is adding more annotations for the plot.

After studying the plot and the underlying log and code I identified ten hotspots and plans to fix them. The hotspots are named in the plot. The names are not readable in the small version included here (click for full version), so I have also numbered them:

1. linking: This is pretty much the cost of loading 67 shared libraries that g-s-d currently directly or indirectly links to. There's not much we can do immediately to make the linker faster. We can try to link to fewer libraries however. Seems like we don't really need libgnome. Chopping that spares some 20 of that 67.

Resolution: mccann already filed Bug 557808 – don't use libgnome.

2. gtk_init: There's not much to do here right away, except that I want to profile gtk+ initialization sometime to see if we can improve it. It's not long, but any saving benefits every application, so it's worth pursuing. No resolution.

3. fontconfig_monitor: This one is so embarrassing. It's my single commit to g-s-d, and it takes the longest time in the plot. What this code does is to add gio/inotify monitors on all font directories and configuration files known by fontconfig for change notification so it can 1) rebuild the fontconfig cache and 2) signal applications to reload their font configurations.

Now, it's a good idea to make sure fontconfig cache is current before other applications start and each try to rebuild the cache, but installing the monitors can wait. It's one of those things that is equally as good if done 10s into the login.

Resolution: Only check that fontconfig cache is current. Defer installing file monitors to idle time.

4. mkfontdir: This one's so bogus. We scan two directories and cleanup symlinks for "cursor fonts" we may have created before. Then if there's any cursor font set in gconf, we create a symlink for it, then call mkfontdir (that in turn calls mkfontscale, which does a bunch of stats on nonexisting files and directories, etc) and get and set the X server font path. All that work even if there is no cursor fonts set..

Resolution: Skip spawning mkfontdir and setting the server font path if there is no cursor font set.

5. mousetweaks: This one's my favorite. According to the man page, "mousetweaks is a daemon that provides various mouse features for the GNOME desktop. It depends on the Assistive Technology Service Provider Interface (AT-SPI)." What the g-s-d plugin does is to monitor the relevant gconf keys, and start/stop the mousetweaks daemon on demand.

On a typical desktop with no tweaks configured (%99+), it spawns "mousetweaks -s", which means "stop the running daemon, if any". The mousetweaks process then starts up, initializes a bunch of stuff, including the a11y stuff (not the fastest stuff it seems), tries to find a running daemon, fail, and silently exit. So much for so little.

Resolution: Don't spawn mousetweaks if no tweaks configured. In other words, don't spawn "mousetweaks -s" unless we know a daemon is running.

6. init_kbd: This one was harder to figure out. There was no big fat thing going on. Instead, there is a look over some 20 different media keys, for each of them some gconf reading and a grab_key operation.

The gconf stuff as my plot agrees is not the bottleneck as the code already does a one-level-deep preloading on the gconf directory. The grab_key invocations however each take real time, and they add up. Looking into what grab_key does is revealing: for each combination of the ignored modifiers, for each screen, it does the usual "push error handler on display; do something with display; flush the display; pop and see if any errors happen". Multiplied by the number of keys, that's a bunch of X display flushes while we're not really interested in pass/fail status of individual operations.

Resolution: Do one "push; do; flush; pop" instead of many.

7. acme_volume_new: What's happening here is that to be able to control volume and other mixer properties, we end up initializing gstreamer. Which in turn wants to ensure that its binary cache of plugins is up to date, so it forks and stats all the plugins. Ouch!

Now, making sure the gstreamer cache is up to date before every other application starts using it is a good idea, but it doesn't have to be so painful!

Whether it's that no one has got to fix it yet, or if there's good reasons for the binary cache not to simply store the timestamps of the folders and compare that instead of doing stat on every plugin on every startup of every gstreamer-using application, I don't know. I won't judge. Love to hear the issues. But experience with Pango and fontconfig tell me that a more decent cache can be done and indeed should be done.

The fontconfig cache, admittedly, becomes really hairy at times (time skews, anyone?), but much, much, much much, better than if fontconfig stated any and every font on startup!

I also have no idea why it gstreamer forks for the cache check. I could think of not polluting the current process or risk crashing it. BUT! If the forked process fails validating the cache, it then retries in process! Oh well...

Resolution: Awesome gstreamer hackers, please fix your cache! In the mean time, forcing gstreamer to not fork may help (can be done by setting an env var).

8. gnome-screensaver: Not sure why gnome-screensaver is so heavy to start up. That's for another session. But, who cares if gnome-screensaver starts 10s into the session? Right, you got it, it does not have to be started before all other windows.

Resolution: Start at idle time.

9. clipboard_manager: This one also baffles me. It's a bunch of X roundtrips. Shouldn't take that long. Anyway, given how clipboard managers work (they are useful when the app holding the clipboard content is existing), no one would really notice if we started it 10s into the session.

Resolution: Start at idle time.

10. xrdb: The xrdb brokenness (it calls gcc!) is a well-known and well-studied issue. According to mclasen the only reason we kept doing it was xemacs, but allegedly that uses Gtk+ these days. Is there any other reason we should be doing xrdb in g-s-d in 2009?

Resolution: mccann filed Bug 557807 – disable xrdb plugin by default. Isn't he awesome?

That's tonight's ten commandments, err, resolutions. I have already started hacking the new architecture in g-s-d and patching the plugins as described above. There are more plugins, and each can use a quick remove. In general any plugin that does "set XYZ and hook up for change notifications on it" can be split up to do the change-notification hookup at idle time if that part consumes