Build a ''status'' Page for the Nutch Searchengine
$100-500 USD
Terminado
Publicado hace más de 14 años
$100-500 USD
Pagado a la entrega
Nutch is a Java based Web-Search engine. While it can run on clusters of hundreds of machines it can also be run on a single host and can provide search results via a few JSP pages provided with nutch.
Crawling would be accomplished by something like `./bin/nutch crawl [login to view URL] -dir crawl -depth 2 -topN 30000` and the HTML interface by dropping `[login to view URL]` into you favorite servlet container (I use Jetty).
Your task is to buils a JSP single page allowing to view statistis about the current search index. For that you need to use the lucene API. Probably the study of the sourcecode of the tool "Luke" can show you exactly how to query the index (see [login to view URL])
The page should display
* number of documents
* number of terms
* index last modified. Date in [login to view URL] format
* Any statistics you can get on the crawldb. [login to view URL] [login to view URL] and [login to view URL] might provide pointers
This page will be used by us to monitor if the nutch instance is "healty", still adding pages etc. Nutch is run on an intranet spidering about two dozen hosts.
## Deliverables
* JSP Page displaying statistics.
* If you need a newer version of nutch than 1.1 please provide us with the whole nutch installation
* Use OpenSource Libraries where they are available. If you copy OpenSource code please mark it clearly and mention the License of the the included code.
* Copyright of the Code written by you for the project will be assigned to us. We might OpenSource the code if we consider it of general interest.
* During development you will not get access to our servers, accounts, resources. Installation will be handled by us according to the documentation we provided.
## Platform
FreeBSD 7, JBK 1.6, nutch 1.0