I just ran across a paper from IBM comparing scaling-up (using bigger boxes) to scaling-out (using more boxes). They use Nutch search as their workload, and conclude “… that scale-out solutions have an indisputable performance and price/performance advantage over scale-up for search workloads.” Not exactly a big surprise, but it’s good to have objective data. They also conclude that “Scale-out systems are still in a significant disadvantage with respect to scale-up when it comes to systems management.” Hmm. With frameworks like Hadoop, folks shouldn’t be bothered as much by the more frequent host failures that a scale-out system is prone to.
Archive for the ‘Uncategorized’ Category
Scale-up versus Scale-out
July 30, 2007siren song
December 18, 2006Nutch developer Sami Siren seems to be diving into Hadoop, with his second post, this time examining the underutilized record facility. I’m hoping that, once we get a particular bug fixed, we’ll start using records for lots of Hadoop’s internals. Some fun cases will be replacing things like the source for IntWritable with something as simple as:
class IntWritable { int value; }
Hadoop’s made the news!
November 22, 2006I just spotted a complementary article about Hadoop, Lucene & Nutch.
objectivity, again
July 3, 2006Battelle’s blog has elicited a good discussion of search engine objectivity. I discussed this issue a while ago. One comment led to a good article (pdf) on the topic.
travel plans
April 24, 2006Next Thursday, I’ll be in San Francisco for the Nutch Meeting.
I’ll be in Helsinki for most of July, hosted by Wray Buntine, attending the International Workshop on Intelligent Information Access there July 6-8, among other things.
I’ll probably also attend the Open Source Information Retrieval workshop at SIGIR in August.