Tip of the day: Speed up `locate`
April 4, 2011
The locate command line tool from findutils is great when you forgot where you dropped that file you worked on a week ago, but don’t want to run Strigi (plus Strigi does not index the system files). However, its output is quite convoluted when you’re looking by topic instead of exact file name.
$ locate tagaro | wc -l
4977
Looking at the output of locate without the wc, there’s quite some garbage in there. For example, my backup and files in build directories, which I am certainly not interested in. Of course there is a way to exclude these from the listing, by editing “/etc/updatedb.conf”. By default, this contains the following on my Arch system:
# directories to exclude from the slocate database: PRUNEPATHS="/media /mnt /tmp /var/tmp /var/cache /var/lock /var/run /var/spool" # filesystems to exclude from the slocate database: PRUNEFS="afs auto autofs binfmt_misc cifs coda configfs cramfs debugfs devpts devtmpfs ftpfs iso9660 mqueue ncpfs nfs nfs4 proc ramfs securityfs shfs smbfs sshfs sysfs tmpfs udf usbfs vboxsf"
As you see, quite some stuff is already excluded from locate’s database, like removable devices under /media, temporary data and virtual filesystems. Apart from these defaults, I’ve also added my global build directory /home/tmp/build and my backup drive to the list. Let’s apply the changes and see if this helps:
$ sudo updatedb
$ locate tagaro | wc -l
1656
An impressive improvement! But we’re still not there: Nearly a third of the output comes from the Git source control system which Tagaro uses. Paths like “/home/stefan/Code/kde/tagaro/.git/objects/b4/3cc4cc0bdc6c92b94655b8352c3073e8d3842d” are also useless, but how can we purge these? PRUNEPATHS only filters directory paths, but `man updatedb.conf` reveals there’s another configuration parameter which specifies directory names to be ignored. So let’s add this to /etc/updatedb.conf:
PRUNENAMES=".bzr .hg .git .svn"
This filters the most important types of VCS data directories. Again, let’s check if it helps:
$ sudo updatedb
$ locate tagaro | wc -l
1080
A reduction of over 75%! Now locate shows only output which is relevant. Also, the locate database has shrunk by 60%, as has the execution speed of locate. By the way, results are even more on the spot when you give the “-b” switch to locate. locate will then print only those files and directories whose name (instead of path) contains the given key. “locate -b tagaro” gives only 25 results here.
April 4, 2011 at 20:54
Nice improvement! I’m curious though how this translates to other distributions.
At openSUSE 11.4 (findutils v4.4.2) I haven’t yet found something like PRUNENAMES in /etc/sysconfig/locate, or a similar option in /usr/bin/updatedb
April 4, 2011 at 22:01
Well then, fork the package on OBS. 😀
April 4, 2011 at 23:06
On opensuse I find prune mentioned in /etc/sysconfig/locate and /etc/cron.daily/suse-updatedb
April 5, 2011 at 07:31
Interestingly on Ubuntu the PRUNENAMES line was there, but it was commented out.
Thanks Stefan for this nice tip!
April 4, 2011 at 23:35
Note for opensuse users, ‘locate’ package in standard repo is GNU version. To use the PRUNENAMES option you want the ‘mlocate’ package instead. Search for it on software.opensuse.org/114/en (include home projects option), I found package by pascal bleser so should be ok.
April 5, 2011 at 06:48
I love the “locate” command, and I’ve always combined its output with “grep” to better filter its result, but your solution is waaayyy appropriate xD
Nice tip, man 😉
April 5, 2011 at 07:44
Another one: PRUNE_BIND_MOUNTS=”yes”
August 15, 2012 at 06:41
Just wanted to thank you for this. I have a directory with 3.6 million files in it and growing. I tried the ‘find’ command and it took 2 days. I set up a cron to updatedb at midnight and that took several minutes to get the contents of the folder.
Using your method, (and excluding all my other unnecessary folders) now looping through 3.6 million files takes not 3 minutes, but 2 seconds!