Desktop search - a comparison of tools that crawl and extract and index your personal data on Linux

by Martin Monperrus
I have been needing a desktop search software for a long time. There are many tools for that, but I could not figure out which one is the best. That's why I made this comparison. I mainly found that beagle is the best one.

Results | A better beagle? | Other tools | Other comparisons

Results

I tested the indexation of my home directory : ~60000 files, 5 GB, various file types, files with accents, LANG=fr_FR.ISO-8859-1

Beagle is the best desktop search tool for Linux:
beagle v0.3.8 recoll v1.10.2 Google
Desktop v1.2.0.0088
tracker v0.6.6
change listening
YES
YES
YES
YES
in doc 
YES
YES
YES
YES
in sxw
YES
YES
YES
YES
In txt 
YES
YES
YES
YES
in odt 
YES
YES
YES
YES
dehyphenation 
YES
YES
YES
YES
in pdf
YES
YES
no(9)
no
stemming 
YES
YES
YES
YES
in file name
YES
YES
YES
YES
glob expression
YES
YES
no
no(7)
in ppt
YES
YES
YES
no
full text
YES
YES
YES
no
full text with line breaks 
YES
YES
no
no
in mailbox files YES(1)
YES
YES
no
in extensionless files YES
no
YES
YES
per extension (e.g.; ext:doc)
YES
no
YES
no
dash-insensitive (2)
YES
no
YES
no
accent-sensitive (3)
YES
no
YES
no
remove operator minus - (4)
YES
no
YES
no
encoding insensitive (5)
no(6) 
no
no(8)
YES
regular expression (e.g.; dumou[lp].n)
no
no
no
no
(1) if they are associated with evolution or thunderbird accounts
(2) domain-specific<=>"domain specific"
(3) this can be considered as an advantage (more precise search) or as a disadvantage (detail-sensitive)
(4) foo -bar stands for documents containing foo and not bar
(5) able to index text files that are in different encodings (UTF-8 and ISO-8859-1). The test is to retrieve to two files whose content is identical, but encoded differently
(6) beagle does not care of the LANG environment variable, it uses a default utf-8 encoding for text files, 
(7) tracker does not support glob expression, but suggests some matching keywords

(8) google desktop is not able not to pass this test (cf. note #5), but for some reasons, it indexed some other text files in both UTF-8 and ISO-8859-1
(9) actually, it indexes some pdf files, but Google desktop did not pass the test ('linearization' in my pdfs ;-))
Note that:

Why are strigi and pinot missing?

I also tested strigi and pinot.

Strigi v0.5.11 really looks alpha version. I had hard time to find the way to search (strigi:/ in konqueror). The results were really unsatisfying.

Pinot v0.85 looks really good, but for some reasons, it does not completely index my home directory (only half of the 60000 files). I even tried several times. Furthermore, the indexing process is really slow.

A better beagle?

Some ways to improve beagle:

Other tools

Swish-e does not work out of the box, you have to create a big config file with filters for every file types :-(. Swish++ is not usable for similar reasons.

Since they look neither active nor mature, I did not test:

Other comparisons

Some points of this comparison are inspired from this very good comparison at wikinfo.

Ben Martin compared Beagle and Tracker

Michal Pryc and Xusheng Hou made a comprehensive similar comparison.

http://www.goebelgroup.com/desktopmatrix.htm and http://www.kalio.info/Desktop_Search_Comparison/ are windows oriented.
Tagged as: