Tuesday, April 06, 2004

Lucene search engine

Lucene

My article on the Lucene search engine was published in the April issue of the JavaRanch Journal. I became interested in Lucene because it is used in the open source forum software that I am enhancing. I had seen the Lucene page on Jakarta but I was never sure exactly what it did or how it was used. It is actually a nice piece of software that is fairly easy to use once you get a handle on it. The documentation is not very good, unfortunately, but that is not an uncommon problem in open source projects.

The article covers the basics of using Lucene although I have simplified the task by using text files as the files to index. However, Lucene has several converters listed on the contributions page. The converters can be used to convert other document formats such as XML and PDF into text so that Lucene can index them. Since I wrote the article version 1.4 has come closer to completion. The one feature that I am looking forward to trying out is sorting of hits. The prior release only allowed sorting by score but the new release allows sorting by any indexed field.

So check out the article and take Lucene for a spin. I think you will find it very easy to use. And since I already found the cause of the infamous java.io.IOException: Bad file descriptor you don't have to worry about searching all over the web to diagnose your bug!

No comments: