Tip of the day: Speed up `locate`

April 4, 2011

The locate command line tool from findutils is great when you forgot where you dropped that file you worked on a week ago, but don’t want to run Strigi (plus Strigi does not index the system files). However, its output is quite convoluted when you’re looking by topic instead of exact file name.

$ locate tagaro | wc -l
4977

Looking at the output of locate without the wc, there’s quite some garbage in there. For example, my backup and files in build directories, which I am certainly not interested in. Of course there is a way to exclude these from the listing, by editing “/etc/updatedb.conf”. By default, this contains the following on my Arch system:

# directories to exclude from the slocate database:
PRUNEPATHS="/media /mnt /tmp /var/tmp /var/cache /var/lock /var/run /var/spool"

# filesystems to exclude from the slocate database:
PRUNEFS="afs auto autofs binfmt_misc cifs coda configfs cramfs debugfs devpts devtmpfs ftpfs iso9660 mqueue ncpfs nfs nfs4 proc ramfs securityfs shfs smbfs sshfs sysfs tmpfs udf usbfs vboxsf"

As you see, quite some stuff is already excluded from locate’s database, like removable devices under /media, temporary data and virtual filesystems. Apart from these defaults, I’ve also added my global build directory /home/tmp/build and my backup drive to the list. Let’s apply the changes and see if this helps:

$ sudo updatedb
$ locate tagaro | wc -l
1656

An impressive improvement! But we’re still not there: Nearly a third of the output comes from the Git source control system which Tagaro uses. Paths like “/home/stefan/Code/kde/tagaro/.git/objects/b4/3cc4cc0bdc6c92b94655b8352c3073e8d3842d” are also useless, but how can we purge these? PRUNEPATHS only filters directory paths, but `man updatedb.conf` reveals there’s another configuration parameter which specifies directory names to be ignored. So let’s add this to /etc/updatedb.conf:

PRUNENAMES=".bzr .hg .git .svn"

This filters the most important types of VCS data directories. Again, let’s check if it helps:

$ sudo updatedb
$ locate tagaro | wc -l
1080

A reduction of over 75%! Now locate shows only output which is relevant. Also, the locate database has shrunk by 60%, as has the execution speed of locate. By the way, results are even more on the spot when you give the “-b” switch to locate. locate will then print only those files and directories whose name (instead of path) contains the given key. “locate -b tagaro” gives only 25 results here.

8 Responses to “Tip of the day: Speed up `locate`”

  1. Diederik van der Boor Says:

    Nice improvement! I’m curious though how this translates to other distributions.

    At openSUSE 11.4 (findutils v4.4.2) I haven’t yet found something like PRUNENAMES in /etc/sysconfig/locate, or a similar option in /usr/bin/updatedb

  2. tom Says:

    Note for opensuse users, ‘locate’ package in standard repo is GNU version. To use the PRUNENAMES option you want the ‘mlocate’ package instead. Search for it on software.opensuse.org/114/en (include home projects option), I found package by pascal bleser so should be ok.

  3. [Po]lentino Says:

    I love the “locate” command, and I’ve always combined its output with “grep” to better filter its result, but your solution is waaayyy appropriate xD
    Nice tip, man 😉

  4. Ivan Čukić Says:

    Another one: PRUNE_BIND_MOUNTS=”yes”

  5. jaggedsoft Says:

    Just wanted to thank you for this. I have a directory with 3.6 million files in it and growing. I tried the ‘find’ command and it took 2 days. I set up a cron to updatedb at midnight and that took several minutes to get the contents of the folder.
    Using your method, (and excluding all my other unnecessary folders) now looping through 3.6 million files takes not 3 minutes, but 2 seconds!


Leave a comment