Creating Lucene Index in a Database - Apache Lucene - lucene

I am using grails searchable plugin. It creates index files on a given location. Is there any way in searchable plugin to create Lucene index in a database?

Generally, no.
You can probably attempt to implement your own format but this would require a lot of effort.

I am no expert in Lucene, but I know that it is optimized to offer fast search over the filesystem. So it would be theoretically possible to build a Lucene index over the database, but the main feature of lucene (being a VERY fast search engine) would be lost.

As a point of interest, Compass supported storage of a Lucene index in a database, using a JdbcDirectory. This was, as far as I can figure, just a bad idea.
Compass, by the way, is now defunct, having been replaced by ElasticSearch.

Related

Can we use lucene query to have fts_alfresco search?

I want to upgrade my Alfresco server to 5.2 and in all my custom webscripts am using lucene queries. Since from Alfresco 5.x lucene indexing has been removed and solr indexing is not instantaneous, am planing to use fts_alfresco search. While testing i found that few lucene queries can be used for fts_alfresco search without modifying. So my concern is will i be able to do fts_alfresco search using lucene query? If no, is there any better way to migrate all my lucene queries to fts_alfresco?
Thanks in advance.
You will need to test/check your queries since there are small differences (for instance, date range query is not the same), but in general there's no reason why you would not be able to use FTS.
I'm not sure a comprehensive documentation exists where you would see all those small differences, though. If you find it, please share.
"Alfresco FTS is compatible with most, if not all of the examples here.."
https://community.alfresco.com/docs/DOC-4673-search

Can a Lucene index be stored in an RDBMS

If I don't have access to the file system, but do have access to a MySQL instance, can I store the lucene index in a mysql database. I was able to find the DbDirectory and thought that it might do the trick. However, it looks like it works with a Berkeley DB rather than an RDBMS.
There are some contribution that stores lucene index in more simple datastores (from perspective of data model). For example, BerkleyDB and Cassandra. So technically it is possible to write implementation of Directory which would store index in Jdbc. There is one in Compass framework.
I dont believe you can, it would defeat the purpose of Lucene. If your indexing does not take to long you could consider a RAMDirectory which I believe stores it in memory.

Is lucene index created with 2.3.1 compatible with Lucene 3.0.3

We have indexed our documents with Lucene 2.3.1 and now want to move to Lucene 3.0.3 for better features. I want to know whether the index will work as is and will I be able to add more documents, with 3.0.3, to the existing index without any hassles or do I have to re-index the whole thing.
Thanks a lot in advance.
I am quite sure that the indexes will be incompatible with Lucene 3 if they were built under Lucene 2 (in fact I'm 99% positive of this).
However, you may be able to convert them rather than rebuild them. Have a look here for some high-level guidance in this area.

Index verification tools for Lucene

How can we know the index in Lucene is correct?
Detail
I created a simple program that created Lucene indexes and stored it in a folder. Using the diagnostic tool, Luke I could look inside an index and view the content.
I realise Lucene is a standard framework for building a search engine but I wanted to be sure that Lucene indexes every term that existed in a file.
Can I verify that the Lucene index creation is dependable? That not even a single term went missing?
You could always build a small program that will perform the same analysis you use when indexing your content. Then, for all the terms, query your index to make sure that the document is among the results. Repeat for all the content. But personally, I would not waste time on this. If you can open your index in Luke and if you can make a couple of queries, everything is most probably fine.
Often, the real question is whether or not the analysis you did on the content will be appropriate for the queries that will be made against your index. You have to make sure that your index will have a good balance between recall and precision.

Is there a set of best practices for building a Lucene index from a relational DB?

I'm looking into using Lucene and/or Solr to provide search in an RDBMS-powered web application. Unfortunately for me, all the documentation I've skimmed deals with how to get the data out of the index; I'm more concerned with how to build a useful index. Are there any "best practices" for doing this?
Will multiple applications be writing to the database? If so, it's a bit tricky; you have to have some mechanism to identify new records to feed to the Lucene indexer.
Another point to consider is do you want one index that covers all of your tables, or one index per table. In general, I recommend one index, with a field in that index to indicate which table the record came from.
Hibernate has support for full text search, if you want to search persistent objects rather than unstructured documents.
There's an OpenSymphony project called Compass of which you should be aware. I have stayed away from it myself, primarily because it seems to be way more complicated than search needs to be. Also, as I can tell from the documentation (I confess I haven't found the time necessary to read it all), it stores Lucene segments as blobs in the database. If you're familiar with the Lucene architecture, Compass implements a Lucene Directory on top of the database. I think this is the wrong approach. I would leverage the database's built-in support for indexing and implement a Lucene IndexReader instead. The same criticism applies to distributed cache implementations, etc.
I haven't explored this at all, but take a look at LuSql.
Using Solr would be straightforward as well but there'll be some DRY-violations with the Solr schema.xml and your actual database schema. (FYI, Solr does support wildcards, though.)
We are rolling out our first application that uses Solr tonight. With Solr 1.3, they've included the DataImportHandler that allows you to specify your database tables (they call them entities) along with their relationships. Once defined, a simple HTTP request will tirgger an import of your data.
Take a look at the Solr wiki page for DataImportHandler for details.
As introduction:
Brian McCallister wrote a nice blog post: Using Lucene with OJB.