Would you like to be paid to work full time on Nutch?
I know of a few companies that are gearing up to build full-scale, commercial Nutch-based web search engines. They’re not ready to make public announcements about this, but if you think you might like to work on this, please send your resumé to jobs @ lucene.com, and I’ll forward it on. If I know you from Nutch or Lucene contributions, then I’ll add my recommendation.
July 1, 2004 at 10:19 am |
Wow you guys are behind Lucene? We work a little bit with the .NET Lucene
team and are creating interfaces to leverage our technologies.
I have a very interesting challenge for your team. Nata1 Unified is trying
to find worthy oppononents for our search trials.
Task – given 4 week development window, and 3 days indexing time, 3 equal
computers for each team, in Different geographic areas, pass the trials,
which would be given by a panel of judges – i.e. depth (did we index the
valuable pages?), free text, and speed.
I’m 90% sure I can get Microsoft money to help sponsor. I KNOW that we can
raise a lot of awareness for both our platforms, and could create an
incredible competition, not just ONE trial, but many trials over time, not
just with Nutch, but with others, and hopefully others using Nata1 Unified.
So should we let Microsoft and Google run the entire search business, or
should we do something to instigate the greatest open source search
initiative that ever existed?
Let me know if your interested :-)… Don’t worry, I’ll get the money!
Sedge
http://www.Nata1.com
July 1, 2004 at 11:53 pm |
Nutch uses lucene. So why bother.
You’d better test against http://www.search.msn.com
But aren’t they using lucene too?
July 2, 2004 at 8:12 am |
Hi,
I was looking on the net for help on resolving problems with read/write concurrency on the Lucene index and lock files and came across this site. I’ve solved it now, and I’d just like to say how impressive and easy to use the Lucene engine is.
A website I helping to develop (http://www.fantasyfooty.org or http://www.mycgiserver.com/~intro/ff/FFindex.html) uses Lucene as the search engine to the backbone news service.
I notice you are involved in the web crawling part of search engine development, and I have also built a web crawler which uses multiple virtual machines (I’ve tried using one VM for the whole program, but it can’t cope). The crawler is currently specific to football (I think you yanks call it soccer(!)) news which is retrieved using regular expressions, dropped into the database and indexed onto Lucene. How much have the people at Lucene looked at Web Crawling?
One of the problems I’m having is removing documents from the index – how do you get the original id for a document once you have added it to the index? Also, how do you run the engine entirely in memory like Google do?
Keep up the good work!
July 2, 2004 at 11:23 am |
If you have Lucene questions, please send them to the Lucene Users mailing list, lucene-user@jakarta.apache.org.
Doug
January 10, 2005 at 1:29 pm |
Hi!
We are using Lucene as the core component of our search-infrastructure for a commercial project and, at least we think so ;-), have made some significant improvements to it which I can maybe get to be open-sourced as soon as the project is released (so that we have time to clarify about probably “secret” code integrated, patents and so on).
_If_ I can get this to work I’d really be interested further working on Lucene, specifically our new code, but I’ll have to make a living, right :-) so I wonder, are any jobs in this area still vacant?
regards,
Bernhard
January 10, 2005 at 1:34 pm |
Some Lucene jobs can be found (using Lucene) at:
http://www.indeed.com/search?q=lucene
September 7, 2005 at 4:29 pm |
Ok, I actually have several questions .
1.Firstly my friends and I are working on another search related project, My question here is How do you go about getting some sponsorship and whom to approach, which companies do it . We would be interested in Money or Hardware