Ask Jeeves Blog: Mozilla’s On Fire: “We discussed Ask Jeeves desktop search and the notion of open-sourcing it. We’re open at two levels. Contributing just the core desktop indexing technology or possibly the entire desktop search application. They discussed how/what they would evaluate before accepting a major piece of code/product contribution: code size, internationalization, etc. Whether or not we partner with Mozilla on this effort, Chris and team thought it was a good idea for us to pursue overall.”
Lot’s of folks think good open-source desktop search can already be easily implemented with tools like Lucene. But desktop search has exacting requirements.
- Download size should be small, which rules out Java and C#, since you can’t afford to require a large runtime environment.
- Lots of document formats must be supported. Yet many document format conversion tools are quite large, too large for inclusion. So, a good desktop search application might need to implement its own format converters, no small burden.
- The performance requirements of desktop search are not too demanding, since the number of documents is unlikely to exceed a few million, but indexing must be unobtrusive. It needs to run in the background when the user is idle. Ideally it shouldn’t greatly disrupt the virtual memory working-set, or else, when the user returns the system will be sluggish. This probably requires platform-dependent code.
In the end, the core search and indexing code (like Lucene provides) is only a small part of the application, and Java, while cross-platform, requires a runtime that’s too big for convenient download, and doesn’t give easy access to platform-specific scheduling features.
The Beagle folks have defied these odds, albeit for a not-yet-mainstream platform.
There’s still hope for mass-market Lucene-based desktop search: GCJ is cross-platform, makes it easy to invoke platform-specifics, and may soon have a tiny runtime. A C++ port exists and a C port of Lucene is underway. Machines and networks keep getting faster; scheduling and download-size issues will diminish. In the meantime, perhaps Ask Jeeves will fill this gap.