Is it just a feeling, or has the number of technical posts decreased on the Planet? I definitely need to change that! (**Update:** The post got quite long, so I’ve shortened the syndicated version.)

These days, I’m usually working on my Diplomarbeit at the Institute of Theoretical Physics at the Technical University here in Dresden. (A Diplomarbeit is about comparable to a master’s thesis, though preparation time is one year since there was no bachelor’s thesis before it.) Our working group has an in-house application for inspecting iterated functions, which is quite intuitively called Iterator. Like many scientific applications, it has on one hand a very sharp functional objective, but contains tons of different tools and plugins.

As a part-time project besides my usual work, I’m working on a Qt port of the Iterator, which is currently based on wxWidgets. Of course, a port offers the opportunity to refactor badly engineered parts of the application. It clearly shows that all involved developers are first and foremost physicists which have not received any formal training in software design. Many antipatterns can be observed in the code. The most interesting is the existence of a god class called IteratorApp. Read the rest of this entry »

## Implicit cast quirks

### April 15, 2011

I thought I let you know why I just had to throw away three days worth of computation data.

double importantPrefactor(bool freeEvolution); void multistep(bool freeEvolution, /* ... */) { /* ... */ const double pre = importantPrefactor(free); /* ... */ }

Yes, I used the wrong identifier inside the multistep function. Still it compiles and runs without error (except for the garbage data that comes out), because there’s a goddamn implicit cast from `void(*)(void*)`

to `bool`

. If you don’t believe this, try this:

qDebug() << "Is there a goddamn cast?" << ((bool) ::free);

Of course the conversion gives true because the function pointer is not null. Argh!

## Optimization tricks: Fast sqrt(x^2 + y^2)

### April 10, 2011

[Before I get to the boring (hah!) numerics stuff, some news: In order to test KDE Games on a touchscreen, and because I’ve wanted a tablet for a long time now, and because I want a 3G device, I’ve now ordered a WeTab 2. More news on this coming up soon when I have it in my hands.]

I’ve been optimizing my favorite jigsaw puzzle game Palapeli lately. Today, I looked at the bevelmap generation algorithm which renders the nice three-dimensional shadows atop the jigsaw pieces. There are a lot of norm computations in there, i.e. `sqrt(x^2+y^2)`

. All bells are ringing: There must be a way to optimize this.

Now nearly everyone who has to do with numerics and programming probably knows about the fast inverse square root which approximates `1/sqrt(x)`

very efficiently and sufficiently precise (about 1% maximum error). Using the inverse of this fast inverse square root does not really give an improvement, though. After all, the formula `1/InvSqrt(x*x+y*y)`

contains at least four floating-point operation plus everything that happens in InvSqrt.

This approach is especially inefficient because, in my special case, x and y are integers, and the result is also needed as an integer. Looking around the interwebs, I’ve found this lovely page on square root calculation, which also has a special algorithm for norms.

It all starts with two approximations to `sqrt(x^2+y^2)`

. The first one is `max(abs(x), abs(y))`

, the second one is `1/sqrt(2)*(abs(x) + abs(y))`

. We need both because they are good for different values of x and y. To understand this, consider the unit circles of these approximations, that is: the set of x/y points where the approximation is 1. (You will probably make a sketch as you follow the discussion. I would have done one if it wasn’t so late already.) The unit circle for the first approximation is an upright square, while the second approximation’s unit circle is diamond-shaped (i.e. turned around by 45° compared to the first one).

An approximation is good when the unit circle comes nearby or touches the actual unit circle. For the first approximation, this happens when either x or y vanishes. The second approximation, on the other hand, is good when x and y are approximately equal. Now one can show that, at these conditions, the better approximation is always bigger than the other one. This means that we can define a third approximation by taking the maximum of both approximations. The unit circle for this approximation is a regular octagon, whose inscribed circle is the actual unit circle.

The approximation is thus always too big, except for the eight touching points of octagon and real unit circle, where approximation and exact result are identical. However, one can find a factor `z`

such that the octagon, scaled by `1/z`

, is inside the circle. Scaling the approximation by `(1+1/z)/2`

thus distributes the approximational error to both the negative and positive side. Taking all these considerations into account, I’ve arrived at the following implementation:

template<typename T> struct fastnorm_helper { typedef T fp; }; template<> struct fastnorm_helper<int> { typedef qreal fp; }; //Approximates sqrt(x^2 + y^2) with a maximum error of 4%. //Speedup on my machine is 30% for unoptimized and up to 50% for optimized builds. template<typename T> inline T fastnorm(T x, T y) { //need absolute values for distance measurement x = qAbs<T>(x); y = qAbs<T>(y); //There are two simple approximations to sqrt(x^2 + y^2): // max(x, y) -> good for x >> y or x << y (square-shaped unit circle) // 1/sqrt(2) * (x + y) -> good for x = y (approx.) (diamond-shaped unit circle) //The worse approximation is always bigger. By taking the maximum of both, //we get an octagon-shaped upper approximation of the unit circle. The //approximation can be refined by a prefactor (called a) below. typedef typename fastnorm_helper<T>::fp fp; static const fp a = (1 + sqrt(4 - 2 * qSqrt(2))) / 2; static const fp b = qSqrt(0.5); const T metricOne = qMax<T>(x, y); const T metricTwo = b * (x + y); return a * qMax(metricOne, metricTwo); //idea for this function from http://www.azillionmonkeys.com/qed/sqroot.html //(though the equations there have some errors at the time of this writing) }

[License: I grant any entity the right to use the above code snippet for any purpose, without any conditions, unless such conditions are required by law. (This is the best approximation to placing the code in the public domain, which is not possible in my jurisdiction.)]

I have tested this function successfully on random input with T = double, float, int32_t. For integer types, this function needs only two floating-point multiplications, and thus runs much much faster than the trivial implementation and the InvSqrt-based solution. So please take this as my contribution towards Green IT, and reduce the time your computer spends calculating norms.

P.S. The extension of the above function for three-dimensional coordinates remains as an exercise to the reader. The only hurdle is to calculate the new prefactor `z`

.

**Update:** As correctly pointed out by the commenters, if all you want is to compare the norm to some radius, e.g. `sqrt(x*x + y*y) > r*r`

, the easiest solution is to avoid the square root and test for `x*x + y*y > r*r`

instead.

## Tip of the day: Speed up `locate`

### April 4, 2011

The locate command line tool from findutils is great when you forgot where you dropped that file you worked on a week ago, but don’t want to run Strigi (plus Strigi does not index the system files). However, its output is quite convoluted when you’re looking by topic instead of exact file name.

`$ locate tagaro | wc -l4977`

Looking at the output of locate without the wc, there’s quite some garbage in there. For example, my backup and files in build directories, which I am certainly not interested in. Of course there is a way to exclude these from the listing, by editing “/etc/updatedb.conf”. By default, this contains the following on my Arch system:

# directories to exclude from the slocate database: PRUNEPATHS="/media /mnt /tmp /var/tmp /var/cache /var/lock /var/run /var/spool" # filesystems to exclude from the slocate database: PRUNEFS="afs auto autofs binfmt_misc cifs coda configfs cramfs debugfs devpts devtmpfs ftpfs iso9660 mqueue ncpfs nfs nfs4 proc ramfs securityfs shfs smbfs sshfs sysfs tmpfs udf usbfs vboxsf"

As you see, quite some stuff is already excluded from locate’s database, like removable devices under /media, temporary data and virtual filesystems. Apart from these defaults, I’ve also added my global build directory /home/tmp/build and my backup drive to the list. Let’s apply the changes and see if this helps:

`$ sudo updatedb$ locate tagaro | wc -l1656`

An impressive improvement! But we’re still not there: Nearly a third of the output comes from the Git source control system which Tagaro uses. Paths like “/home/stefan/Code/kde/tagaro/.git/objects/b4/3cc4cc0bdc6c92b94655b8352c3073e8d3842d” are also useless, but how can we purge these? PRUNEPATHS only filters directory paths, but `man updatedb.conf` reveals there’s another configuration parameter which specifies directory names to be ignored. So let’s add this to /etc/updatedb.conf:

`PRUNENAMES=".bzr .hg .git .svn"`

This filters the most important types of VCS data directories. Again, let’s check if it helps:

`$ sudo updatedb$ locate tagaro | wc -l1080`

A reduction of over 75%! Now locate shows only output which is relevant. Also, the locate database has shrunk by 60%, as has the execution speed of locate. By the way, results are even more on the spot when you give the “-b” switch to locate. locate will then print only those files and directories whose name (instead of path) contains the given key. “locate -b tagaro” gives only 25 results here.

## Neo and ten-finger typing, day 2

### January 28, 2011

So I’m one day into learning both the Neo keyboard layout as well as ten-finger typing. I’m already pretty confident in the alphabet characters’ positions, although I regularly confuse A and I for each other with no apparent reason. Typing speed is still awfully slow, but it starts to improve. Next are the common symbols on layer 3, I need to write some C++ code.

## KTouch and Neo

### January 27, 2011

I have decided to try a non-standard ergonomic keyboard layout, namely Neo. They recommend KTouch for training, and it works quite well. I will write more once I’m a bit more comfortable with it. Writing just these few lines took me twelve minutes…

## Windows 7 FAIL

### January 16, 2011

[tl;dr: I try to not hate Windows, and it undermines my plan… yet again]

Most of us have a Windows installation lying around somewhere because they need it every now and then for some quirky application that won’t run in Wine. So I also have a Windows 7, which I got for free via MSDNAA, sitting around on a separate partition for these cases. As I noted in 2010 in this blog, Windows 7 is not as awful as the previous Windows releases. At times, it’s even fun to use.

But then, they’re showing you the meaning of the phrase “over-simplified user interface” again: Windows 7 removes the ability to set different wallpapers per screen. Okay, wouldn’t be so awful for an installation used so seldomly, but then this: Even when you set it to “Scale to screen size”, it will scale the image to the size of the smallest screen, and reuses this (small) image for all screens.

So I’m sitting here, at a Full HD monitor with a wallpaper as small as 1440×900. Yes, that’s awfully small when surrounded by black borders.