Implicit cast quirks

April 15, 2011

I thought I let you know why I just had to throw away three days worth of computation data.

double importantPrefactor(bool freeEvolution);

void multistep(bool freeEvolution, /* ... */)
{
    /* ... */
    const double pre = importantPrefactor(free);
    /* ... */
}

Yes, I used the wrong identifier inside the multistep function. Still it compiles and runs without error (except for the garbage data that comes out), because there’s a goddamn implicit cast from void(*)(void*) to bool. If you don’t believe this, try this:

qDebug() << "Is there a goddamn cast?" << ((bool) ::free);

Of course the conversion gives true because the function pointer is not null. Argh!

Advertisements

[@Plasma developers: Please find my short wish list for the Plasma tablet shell near the end of this article.]

My followers on Twitter already had the news yesterday: I’m a proud owner of a brand-new WeTab. WeTabs seem to be going out of stock at most shops, so one can get very good deals here in Germany. It has 32 GB of internal storage, a 16:9 HD display (1366×768, 12″), 3G, GPS, and an 1.6 GHz Atom processor.

Of course I wanted to put Plasma Active on it. The preinstalled WeTab OS is based on Meego 1.0 (and includes shell access for experts), but Plasma Active seems to require Meego 1.1. The simplest way to go for me was to install openSUSE 11.4. Here’s how I got it working:

  1. Some early WeTabs have their BIOS locked such that they cannot boot from USB. In this case, one must flash the BIOS. Since my WeTab is new, this step was not required here.
  2. The preinstalled bootloader does not have a menu and boots straight into the first hard disk partition (i.e. WeTab OS). I followed steps 4 through 8 from this installation guide from the Meego Wiki to put a custom bootloader inside the existing bootloader (as a chainloader). This custom bootloader allows to boot from a USB stick, as described in steps 10 and 11 of the Meego installation guide.
  3. On to actually booting the installation media: I chose the KDE4 Live Image of openSUSE 11.4, and followed the advise of this (as of now unfinished) German installation guide for Plasma Active on openSUSE 11.4 on WeTab. Especially, there are important hints in there about how to get the touchscreen working properly. It also has instructions for which repos to add and which packages to install to get Plasma Active.
  4. The touchscreen is not calibrated properly by default. I haven’t yet found a calibration tool, but the WeTab Wiki offers some help in their installation instructions for Ubuntu Maverick. Grep for “calibration” on that page, and save the file there on the openSUSE partition as “/etc/X11/xorg.conf/99-calibration.conf”. After a reboot, the touchscreen is calibrated fairly well.
  5. Of course, some adjustment in KDE System Settings is necessary. For example, I found a font size of 12pt to be just right to enlarge buttons etc. to a sensible size.
  6. If you installed besides WeTab OS as the openSUSE install guide suggested, you might want to boot into WeTab OS. To do so, edit the file extlinux/extlinux.conf on the sda1 partition (i.e. the WeTab OS’s boot partition), and move the line “menu default” from the “plop” chainloader entry to the WeTab OS entry. The WeTab can then be booted from openSUSE’s Grub by selecting the “Linux other” option with the soft button in the same way the selection worked in plop (tap = next entry, hold = select entry).

So, how well is Plasma Active working? Very well for a project that young! Here are the main problems I’ve found as of now (partly plus workarounds), in no particular order:

  • The acceleration sensor is not recognized, so the screen is not rotated when I rotate the device. Screen rotation itself works, using the script from this wetab-community forum entry. (It’s basically xrandr -o, plus some calls to evdev to keep touchscreen and screen coordinates in sync.) To make it a bit more convenient, I created a directory with four scripts in them which invoke the above script with one of the four possible orientation arguments. Then I pointed a folderview applet to this directory, so I have screen rotation accessible my desktop.
  • Plasma’s screen keyboard does not vanish automatically in some situations, and does not work at all for non-standard or non-Qt widgets. For example, it does not appear when I tab into Konsole.
  • KScreenLocker does not know about the form factor, either: It wants me to enter my password, so I would have to carry a USB keyboard with me if I locked my screen or sent the tablet to sleep or hibernate. (Same problem with KDM. The openSUSE install guide recommends enabling autologin for this reason.)
  • The Plasma shell seems to have some screen-locking functionality which is active when it comes up the first time after boot, but I haven’t yet found a way to lock the screen at my wish.
  • The Plasma shell won’t react to any xrandr changes, esp. when I change the screen orientation.
  • When I restart the plasma-tablet process by hand, the root window comes up with the right size, but the font size in applets is absurdly big. It seems that it calculates the font size from the screen height only, not taking into account the relatively small screen width on the then 9:16 display.

It may very well be that the packaged version which I installed is just much too old. Anyhow, here’s a short wishlist to the Plasma Active developers. These are my most urgent wishes:

  • Please make KScreenLocker’s interface touchscreen-compatible, for example by using a PIN entry or, as a bare minimum, by giving it a screen keyboard.
  • Please let the Plasma shell react to xrandr changes.
  • Please include an alternative vertical applet layout for screens which are higher than they are wide.
  • Please, as a temporary measure, add a button to toggle the virtual keyboard permanently, for situations where it does not yet come up automatically (like in Konsole).

And another thing that bugged me: The “Present Windows” effect feels very good for window switching, but it is a bit unnatural that the Plasma shell, i.e. the “desktop”, is another window in this interface. I would find it more intuitive if:

  • the window overview showed only actual application windows
  • the top-left button would not disappear in this overview
  • clicking this button from the overview would bring up the desktop instead

The desktop is then accessed by pressing the top-left button twice. (Maemo 5 does something similar.) If there is only one active window, the window overview does not make sense, so the desktop might be displayed after the first click already.

These thoughts conclude the collection of my first impressions with Plasma’s new tablet shell. It may sound quite negative, but these are only small problems compared to how much works so great. I’m looking forward to these small problems being ironed out by an awesome team of developers. Or, to pun it another way (ha!): Gnome may, by their own claim, be “made of easy”, but KDE is made of awesome!

[Before I get to the boring (hah!) numerics stuff, some news: In order to test KDE Games on a touchscreen, and because I’ve wanted a tablet for a long time now, and because I want a 3G device, I’ve now ordered a WeTab 2. More news on this coming up soon when I have it in my hands.]

I’ve been optimizing my favorite jigsaw puzzle game Palapeli lately. Today, I looked at the bevelmap generation algorithm which renders the nice three-dimensional shadows atop the jigsaw pieces. There are a lot of norm computations in there, i.e. sqrt(x^2+y^2). All bells are ringing: There must be a way to optimize this.

Now nearly everyone who has to do with numerics and programming probably knows about the fast inverse square root which approximates 1/sqrt(x) very efficiently and sufficiently precise (about 1% maximum error). Using the inverse of this fast inverse square root does not really give an improvement, though. After all, the formula 1/InvSqrt(x*x+y*y) contains at least four floating-point operation plus everything that happens in InvSqrt.

This approach is especially inefficient because, in my special case, x and y are integers, and the result is also needed as an integer. Looking around the interwebs, I’ve found this lovely page on square root calculation, which also has a special algorithm for norms.

It all starts with two approximations to sqrt(x^2+y^2). The first one is max(abs(x), abs(y)), the second one is 1/sqrt(2)*(abs(x) + abs(y)). We need both because they are good for different values of x and y. To understand this, consider the unit circles of these approximations, that is: the set of x/y points where the approximation is 1. (You will probably make a sketch as you follow the discussion. I would have done one if it wasn’t so late already.) The unit circle for the first approximation is an upright square, while the second approximation’s unit circle is diamond-shaped (i.e. turned around by 45° compared to the first one).

An approximation is good when the unit circle comes nearby or touches the actual unit circle. For the first approximation, this happens when either x or y vanishes. The second approximation, on the other hand, is good when x and y are approximately equal. Now one can show that, at these conditions, the better approximation is always bigger than the other one. This means that we can define a third approximation by taking the maximum of both approximations. The unit circle for this approximation is a regular octagon, whose inscribed circle is the actual unit circle.

The approximation is thus always too big, except for the eight touching points of octagon and real unit circle, where approximation and exact result are identical. However, one can find a factor z such that the octagon, scaled by 1/z, is inside the circle. Scaling the approximation by (1+1/z)/2 thus distributes the approximational error to both the negative and positive side. Taking all these considerations into account, I’ve arrived at the following implementation:

template<typename T> struct fastnorm_helper { typedef T fp; };
template<> struct fastnorm_helper<int> { typedef qreal fp; };

//Approximates sqrt(x^2 + y^2) with a maximum error of 4%.
//Speedup on my machine is 30% for unoptimized and up to 50% for optimized builds.
template<typename T> inline T fastnorm(T x, T y)
{
    //need absolute values for distance measurement
    x = qAbs<T>(x); y = qAbs<T>(y);
    //There are two simple approximations to sqrt(x^2 + y^2):
    //  max(x, y) -> good for x >> y or x << y          (square-shaped unit circle)
    //  1/sqrt(2) * (x + y) -> good for x = y (approx.) (diamond-shaped unit circle)
    //The worse approximation is always bigger. By taking the maximum of both,
    //we get an octagon-shaped upper approximation of the unit circle. The
    //approximation can be refined by a prefactor (called a) below.
    typedef typename fastnorm_helper<T>::fp fp;
    static const fp a = (1 + sqrt(4 - 2 * qSqrt(2))) / 2;
    static const fp b = qSqrt(0.5);
    const T metricOne = qMax<T>(x, y);
    const T metricTwo = b * (x + y);
    return a * qMax(metricOne, metricTwo);
    //idea for this function from http://www.azillionmonkeys.com/qed/sqroot.html
    //(though the equations there have some errors at the time of this writing)
}

[License: I grant any entity the right to use the above code snippet for any purpose, without any conditions, unless such conditions are required by law. (This is the best approximation to placing the code in the public domain, which is not possible in my jurisdiction.)]

I have tested this function successfully on random input with T = double, float, int32_t. For integer types, this function needs only two floating-point multiplications, and thus runs much much faster than the trivial implementation and the InvSqrt-based solution. So please take this as my contribution towards Green IT, and reduce the time your computer spends calculating norms.

P.S. The extension of the above function for three-dimensional coordinates remains as an exercise to the reader. The only hurdle is to calculate the new prefactor z.

Update: As correctly pointed out by the commenters, if all you want is to compare the norm to some radius, e.g. sqrt(x*x + y*y) > r*r, the easiest solution is to avoid the square root and test for x*x + y*y > r*r instead.

The locate command line tool from findutils is great when you forgot where you dropped that file you worked on a week ago, but don’t want to run Strigi (plus Strigi does not index the system files). However, its output is quite convoluted when you’re looking by topic instead of exact file name.

$ locate tagaro | wc -l
4977

Looking at the output of locate without the wc, there’s quite some garbage in there. For example, my backup and files in build directories, which I am certainly not interested in. Of course there is a way to exclude these from the listing, by editing “/etc/updatedb.conf”. By default, this contains the following on my Arch system:

# directories to exclude from the slocate database:
PRUNEPATHS="/media /mnt /tmp /var/tmp /var/cache /var/lock /var/run /var/spool"

# filesystems to exclude from the slocate database:
PRUNEFS="afs auto autofs binfmt_misc cifs coda configfs cramfs debugfs devpts devtmpfs ftpfs iso9660 mqueue ncpfs nfs nfs4 proc ramfs securityfs shfs smbfs sshfs sysfs tmpfs udf usbfs vboxsf"

As you see, quite some stuff is already excluded from locate’s database, like removable devices under /media, temporary data and virtual filesystems. Apart from these defaults, I’ve also added my global build directory /home/tmp/build and my backup drive to the list. Let’s apply the changes and see if this helps:

$ sudo updatedb
$ locate tagaro | wc -l
1656

An impressive improvement! But we’re still not there: Nearly a third of the output comes from the Git source control system which Tagaro uses. Paths like “/home/stefan/Code/kde/tagaro/.git/objects/b4/3cc4cc0bdc6c92b94655b8352c3073e8d3842d” are also useless, but how can we purge these? PRUNEPATHS only filters directory paths, but `man updatedb.conf` reveals there’s another configuration parameter which specifies directory names to be ignored. So let’s add this to /etc/updatedb.conf:

PRUNENAMES=".bzr .hg .git .svn"

This filters the most important types of VCS data directories. Again, let’s check if it helps:

$ sudo updatedb
$ locate tagaro | wc -l
1080

A reduction of over 75%! Now locate shows only output which is relevant. Also, the locate database has shrunk by 60%, as has the execution speed of locate. By the way, results are even more on the spot when you give the “-b” switch to locate. locate will then print only those files and directories whose name (instead of path) contains the given key. “locate -b tagaro” gives only 25 results here.

Now on Twitter

April 2, 2011

My last post was quite big, so I’ve split out the news part to ensure that it gets some attention. I’m now on Twitter as @stefanmajewsky. Follow me for updates on my KDE development, esp. Palapeli and Tagaro. My blog posts should also be syndicated there automatically; this post is the test balloon.

The Blue Wonder Meeting last week was an excellent opportunity to do some work on Palapeli. One thing that I wanted to do for at least a year was to rewrite the multithreading code around the (fairly central) Palapeli::Puzzle class. The new class has an aspect-oriented design, where single components of the puzzle (like the metadata or the contents) are always made available by worker threads, by casting existing components. For example, a puzzle file can be casted into metadata by reading the latter from the puzzle file. The possible casting paths are actually so convoluted that I documented them in an image inside the source code directory:

Yet there are still areas where Palapeli’s concurrency needs to improve. Much of the puzzle loading is done in the GUI thread, which causes the loading animation to freeze frequently. But that’s for another day. Moving back from what I should do to what I actually did, I noticed that Palapeli’s memory usage is awful. During the sprint, I wanted to demo Palapeli’s performance with big piece counts, and created a puzzle with the maximum of 2000 pieces from a 5000×5000 px image. It actually rendered fluently, but only because I upgraded my notebook’s RAM to 2 GB last year. Palapeli used an enormous 1.6 gigabytes of memory.

I figured that’s a bit much, and looked yesterday evening into how this can be optimized. I was testing with a 30 pieces puzzle from a 4000x4000px image. After having started Palapeli and having loaded the puzzle, the application used about 800 MB memory. The first, and simple, optimization was to remove some intermediary images which are only used while loading the image. That saved something like 100 MB of memory. But wait! Shouldn’t a 4000×4000 pixel image need only 61 MB (four bytes per pixel)? Yes, that’s true, but actually we’re not talking about one big image, but about 30 images, one for each piece. While pieces might have an arbitrary shape, their image must internally be rectangular all the time, even if big parts (e.g. around the plugs) are fully transparent. This means that the memory usage for the 30 pieces is bigger than what one might expect from the complete image.

A randomly chosen piece was 855×1098 pixels big, so by a rough approximation, all 30 pieces take 30*855*1098*4 bytes = 107 MB in memory. That still does not explain why the process still occupies 700 MB. But what’s that?

There’s a shadow around the pieces! This shadow is not added by the slicer when the puzzle is created, but by the engine when the puzzle is loaded. And because we need to control how shadows cover other pieces, we cannot render the shadows into the piece image, but must store them as separate images! And to make matters even worse, we need different shadow images for selected and unselected pieces (note the difference between blue and black shadows). The shadow size for the test image is 0.15 * (width + height) = 300 pixels on each side. Putting everything together, the memory usage of piece images + normal shadows + selected shadows is about 700 MB, i.e. everything can be explained by the pure size of the displayed images. The second optimization therefore was to restrict the shadow size to 50 pixels. This reduces the memory consumption to 380 MB, and allows to play even 5000×5000 px (i.e. 25 megapixel) images with below 1 GB of RAM.

More sophisticated optimizations are certainly possible, but let’s stop at this 50% improvement for now. There’s another thing I want to show: Palapeli’s mainwindow has been cleaned up. I really liked the old tab interface. It had its time and justification, but I want to move forward and need a cleaner, more standard interface for that. So now Palapeli has only a menubar and toolbar, just like all the other apps:

This change was possible after I realized that all toolbar actions for the collection also make sense on the puzzle table: Delete and Export work on the current puzzle. Newly created or imported puzzled will be loaded into the puzzle table afterwards. The “Back to collection” button replaces the “My collection” and “Puzzle table” tabs.

No time to blog yesterday! The Gluon guys were at my flat and we spent the evening cooking some tasty vegetarian meals. (No idea where the photos from this ground-breaking event ended up, though.)

Of course, we had some productive sprint session before that, which Josef covered yesterday already. The only missing part for now is the obligatory group photo below, which is (along with other photos) available from the git repository at kde:scratch/majewsky/blue-wonder-meeting.

From left to right: Josef, Laszlo, Felix, Shantanu, Arjen, Dan, me, Friedrich, Julian and Jeffrey.