www 2004

Met lots of folks at WWW 2004, including:

  • Torsten Suel, who has done some great work on search optimization that Nutch/Lucene should adopt, among other things;
  • Giuseppe Attardi, who showed me some impressive benchmarks of a fetcher that uses async i/o;
  • Marc Najork, who wrote Mercator, a very extensible crawler that Nutch can learn from;
  • … and lots of other folks whose names I cannot recall.

Someone suggested that Nutch should look at Lustre for our robust, distributed filesystem needs. Does anyone have any experience with Lustre?

Thanks to Rohit Khare for inviting me!

2 Responses to “www 2004”

  1. Anonymous Says:

    Is your talk slides available for download now?

  2. Doug Cutting Says:

    OpenOffice crashed halfway through the presentation, and now refuses to open the presentation! I filed a bug report, and perhaps I can repair the presentation. If I manage to salvage it, I’ll post it on the Nutch wiki.

    In the meantime you can look at my first talk from TheServerSide Java Symposium. The WWW 2004 presentation was mostly an extended version of that.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: