Open Source Desktop Search

It seems Ask Jeeves may release their desktop search application as open source. On meeting with folks, Ask says:

Ask Jeeves Blog: Mozilla’s On Fire: “We discussed Ask Jeeves desktop search and the notion of open-sourcing it. We’re open at two levels. Contributing just the core desktop indexing technology or possibly the entire desktop search application. They discussed how/what they would evaluate before accepting a major piece of code/product contribution: code size, internationalization, etc. Whether or not we partner with Mozilla on this effort, Chris and team thought it was a good idea for us to pursue overall.”

Lot’s of folks think good open-source desktop search can already be easily implemented with tools like Lucene. But desktop search has exacting requirements.

  • Download size should be small, which rules out Java and C#, since you can’t afford to require a large runtime environment.
  • Lots of document formats must be supported. Yet many document format conversion tools are quite large, too large for inclusion. So, a good desktop search application might need to implement its own format converters, no small burden.
  • The performance requirements of desktop search are not too demanding, since the number of documents is unlikely to exceed a few million, but indexing must be unobtrusive. It needs to run in the background when the user is idle. Ideally it shouldn’t greatly disrupt the virtual memory working-set, or else, when the user returns the system will be sluggish. This probably requires platform-dependent code.

In the end, the core search and indexing code (like Lucene provides) is only a small part of the application, and Java, while cross-platform, requires a runtime that’s too big for convenient download, and doesn’t give easy access to platform-specific scheduling features.

The Beagle folks have defied these odds, albeit for a not-yet-mainstream platform.

There’s still hope for mass-market Lucene-based desktop search: GCJ is cross-platform, makes it easy to invoke platform-specifics, and may soon have a tiny runtime. A C++ port exists and a C port of Lucene is underway. Machines and networks keep getting faster; scheduling and download-size issues will diminish. In the meantime, perhaps Ask Jeeves will fill this gap.

13 Responses to “Open Source Desktop Search”

  1. Kevin Says:

    My comment was that GCJ could do this. I’ve been trying to get some time to work on a native app written in GCJ but just haven’t had the time.

    I have a trivial implementation of a desktop search based on Lucene but haven’t had time to release it! I’m such a bad OSS developer!

  2. Anonymous Says:

    I don’t understand the preference for a small download size. I’m sure it’s not because one actually cares about how long it takes to download the app.

    I’m searching 10 GB of content, of which 90% is downloaded or received via email. I really don’t care about 10’s of megabytes of one-time download needed to get a good search tool running (assuming one needs to get .NET, for example).

    There is probably no correlation between download size and the amount of memory required to perform the indexing. As a matter of fact, a resource friendly indexer takes more code than a memory hog.

    A desktop search tool with a full featured GUI might have a large download size, but could still have a lean and friendly indexer in terms of memory and CPU cycle usage.

    Desktop search tools with small download size tend to require a web browser as the front end, which causes other problems such as security issues.

  3. Search Engine Information Says:

    I prefer the small download size in the long run because I believe the browser requirements will disappear pretty soon.


  4. fatcrab Says:

    where can i download the source of desktop Lucene myemail

  5. miguel Says:

    One of the considerations has always been to keep Beagle portable to let it move to other platforms.

    The problem is that there is little demand for such engine today on Windows or MacOS X, considering that the space is fairly well served today and is likely going to improve. Anyways Firefox is a perfect example that I might be wrong.

    Now, regarding large runtime downloads: Mono can be cut in pieces, this is routinely done by folks distributing Mono-based applications on MacOS X: they only ship the libraries that they need, which usually amounts to four to six megabytes uncompressed.

    There are two other bits of good news: we have been working on a “linker” for .NET libraries which will help people in shipping only the bits they actually need: today the granularity is at the library level, in the future we will make this happen at the function level.

    The last good news is that Mono provides a mechanism to bundle the runtime, the libraries and the application into a single binary if they want to.

    Anyways, am big fan of all your work.

  6. Techknight Says:

    We do have few open source desktop search applications which I find are on their way to become stable and provide robust search features. Though nascent we should be soon seeing some action here. I have found two of them and mentioned my experience with them here.

  7. Anyms Says:

    I fully agree with Anonymous’ comments. I really don’t mind a one time huge download – all that matters to me is the runtime performance – the leanest it is on resources – the better. Considering the fact that it is going to run continuously in the background, it definitely cannot be a resource hog!

  8. Eloise Says:

    We will all like to present our wives, husbands and children essentially the most expensive item in their class but hardly any of us can afford the
    luxury. You can purchase art prints of works by contemporary artists
    and famous artists. But you’ll want to explain to her on how to surf safely in order that she will probably be able to avoid harmful sites that may tarnish her values and morals.

  9. Amiee Bankson Says:

    backpacking tips

  10. blog de maison confort Says:

    I am in fact thankful to the holder of this site who has shared this
    wonderful post at at this time.

  11. debbi edwards consulting Says:

    My family members always say that I am wasting my time here at web, except I know I am getting familiarity daily by reading
    thes pleasant articles or reviews.

  12. people blogging Says:

    Whats up very nice website!! Man .. Excellent ..

    Wonderful .. I will bookmark your web site and take the feeds
    additionally? I am glad to seek out numerous helpful info here within the post,
    we’d like work out extra strategies on this regard, thanks for sharing.
    . . . . .

  13. problem driving traffic Says:

    Hi this is kund of of off topic but I was waanting tto know if logs use WYSIWYG editors or if
    yyou habe to manually code with HTML. I’m starting a blog soon but ave no coding
    knowledge so I wanted to gget advice from someone with experience.
    Any help would be grdatly appreciated!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: