Scale-up versus Scale-out

July 30, 2007

I just ran across a paper from IBM comparing scaling-up (using bigger boxes) to scaling-out (using more boxes). They use Nutch search as their workload, and conclude “… that scale-out solutions have an indisputable performance and price/performance advantage over scale-up for search workloads.” Not exactly a big surprise, but it’s good to have objective data. They also conclude that “Scale-out systems are still in a significant disadvantage with respect to scale-up when it comes to systems management.” Hmm. With frameworks like Hadoop, folks shouldn’t be bothered as much by the more frequent host failures that a scale-out system is prone to.

siren song

December 18, 2006

Nutch developer Sami Siren seems to be diving into Hadoop, with his second post, this time examining the underutilized record facility. I’m hoping that, once we get a particular bug fixed, we’ll start using records for lots of Hadoop’s internals. Some fun cases will be replacing things like the source for IntWritable with something as simple as:

class IntWritable { int value; }

Hadoop’s made the news!

November 22, 2006

I just spotted a complementary article about Hadoop, Lucene & Nutch.

objectivity, again

July 3, 2006

Battelle’s blog has elicited a good discussion of search engine objectivity. I discussed this issue a while ago. One comment led to a good article (pdf) on the topic.

travel plans

April 24, 2006

Next Thursday, I’ll be in San Francisco for the Nutch Meeting.

I’ll be in Helsinki for most of July, hosted by Wray Buntine, attending the International Workshop on Intelligent Information Access there July 6-8, among other things.

I’ll probably also attend the Open Source Information Retrieval workshop at SIGIR in August.