Multiple instances of application using Lucene.Net - lucene

I'm developing a WPF application that uses Lucene.Net to index data from files being generated by a third-party process. It's low volume with new files being created no more than once a minute.
My application uses a singleton IndexWriter instance that is created at startup. Similarly an IndexSearcher is also created at startup, but is recreated whenever an IndexWriter.Commit() occurs, ensuring that the newly added documents will appear in search results.
Anyway, some users need to run two instances of the application, but the problem is that newly added documents don't show up when searching within the second instance. I guess it's because the first instance is doing the commits, and there needs to be a way to tell the second instance to recreate its IndexSearcher.
One way would be to signal this using a file create/update in conjunction with a FileSystemWatcher, but first wondered if there was anything in Lucene.Net that I could utilise?

The only thing I can think of that might be helpful for you is IndexReader.Reopen(). This will refresh the IndexReader, but only if the index has changed since the reader was originally opened. It should cause minimal disk access in the case where the index hasn't been updated, and in the case where it has, it tries to only load segments that were changed or added.
One thing to note about the API: Reopen returns an IndexReader. In the case where the index hasn't changed, it returns the same instance; otherwise it returns a new one. The original index reader is not disposed, so you'll need to do it manually:
IndexReader reader = /* ... */;
IndexReader newReader = reader.Reopen();
if(newReader != reader)
{
// Only close the old reader if we got a new one
reader.Dispose();
}
reader = newReader;
I can't find the .NET docs right now, but here are the Java docs for Lucene 3.0.3 that explain the API.

If both instance have their own IndexWriter opened on the same directory, you're in for a world of pain and intermittent bad behaviour.
an IW expects and requires exclusive control of the index directory. This is the reason for the lock file.
If the second instance can detect that there is a existing instance, then you might be able to just open an IndexReader/Searcher on the folder and reopen when the directory changes.
But then what happens if the first instance closes? The index will nolonger be updated. So the second instance would need to reinitialise, this time with an IW. Perhaps it could do this when the lock file is removed when the first instance closes.
The "better" approach would be to spin up a "service" (just a background process, maybe in the system tray). All instances of the app would then query this service. If the app is started and the service is not detected then spin it up.

Related

Using IndexWriter with SearchManager

I have a few basic questions regarding the usage of SearcherManager with IndexWriter.
I need to periodically re-build the Lucene index in the application and currently it happens on a different thread other than the one that serves the search requests.
Can I use the same IndexWriter instance through the lifetime of the application to rebuild the index periodically? Currently, I create / open it once during the startup and just call IndexWriter#commit whenever a new index is built.
I'm using SearcherManagerto acquire and release IndexSearcher instances for each search request. After the index is periodically built, I'm planning to use SearcherManager#maybeRefresh method to get refreshed IndexSearcher instances.SearcherManager instance is also created once during the startup and I intend to maintain it through out.
I do not close the IndexWriter or SearcherManager throughout the app's lifetime.
Now for the questions,
If I create a new IndexWriter every time I need to rebuild the index, will SearcherManager#maybeRefresh be able to detect that it's a new IndexWriter Instance? Or do I need to create a new SearcherManager using the newly created IndexWriter ?
What's the difference between creating a SearcherManager instance using an IndexWriter, creating it using a DirectoryReader or creating it using a Directory ?
The answers depend on how you construct your SearcherManager:
If you construct it with a DirectoryReader, all future IndexSearchers acquired from the SearcherManager will be based on that reader, i.e. all searches will provide results from the point in time you instantiated the SearcherManager. If you write data to the index/directory and run SearcherManager.maybeRefresh() afterwards, the reader will not be updated and your search results will be outdated.
If you construct the SearcherManager with an IndexWriter, SearcherManager.maybeRefresh() will update the SearcherManager's reader if data has been written and commited by the writer. All newly acquired IndexSearchers will then reflect the new state of the underlying index.
Despite having limited experience, I recommend using the latter approach. It provides a very simple way to implement near-real-time searching: At application start you create an IndexWriter and construct a SearcherManager with it. Afterwards you start a background thread that periodically commits all changes in the IndexWriter and refreshes the SearcherManager. For the lifetime of your application you can keep using the initial IndexWriter and SearcherManager without having to close/reopen them.
PS: I have only started working with Lucene a few days ago, so don't take everything I wrote here as 100% certain.

Couchbaselite - Is it possible not to create revision document for standalone application

I am building a "standalone" mobile app with ReactNative and CouchbaseLite using the library react-native-couchbase-lite.
Is it possible to have only one document(ie only the original document) without any revision document even though if i update the document multiple times. For example if i make multiple update to a ToDo task, only the original document should be updated and no extra revision document should be created.
Yes. You can tune the maxRevTreeDepth parameter. Set it via a Database object instance. It defaults to 20.
Edit: An alternative approach might be to create a new document every time, and delete the old one. This would be appropriate in a case where one wants to save only a single revision of some documents. It would require creating a new document ID each time, too.

Can you prevent RavenDB from overwriting a document that has been manually inserted?

I had a document in my Beta RavenDB instance with an id of:
document-65
I created a new RavenDB instance (Live) and copied the document from Beta to Live - opened the RavenDB Management, clicked on New and pasted the contents of the document into the 'Data' bit. I gave it the ID: document-65 as in the Beta.
All was working well, until someone recently created a new Document and overwrote the existing one. I did the copy this way as I had one document to copy, so time-wise this seemed quickest and most effective.
I presume it's Raven auto generating an ID for me, and that's something I'll have to live with now, but what I want to know is:
Can I prevent this happening?
Can I tell HiLo (or whatever) to use ID's > 65 from now on in? (If I did this again)
You can set optimistic concurrency = true to get RavenDB to check it for you.
see: http://ravendb.net/kb/16/using-optimistic-concurrency-in-real-world-scenarios

How to stop running indexing before starting another?

I'm making a web app that uses Lucene as search engine. First, the user has to select a file/directory to index and after that he is capable to search it (duh!). My problem happens when the user is trying to index a huge amount of data: for example, if it's taking too long and the user refreshs the page and try to index another directory, an exception is thrown because the first indexing is still running (write.lock shows up). Known that, how is it possible to stop the first indexing? I tried closing the IndexWriter with no success.
Thanks in advance.
Why do you want to interrupt the first indexing operation and restart it again?
In my opinion you should display a simple image which shows that the system is working (as Nielsen says: "The system should always keep users informed about what is going on, through appropriate feedback within reasonable time.") and when the user press refresh you should intercept the event and prevent the execution of another indexing process.
You are probably trying to open an indexwriter instance on the index directory which has already indexwriter open on it. If you have opened indexwriter on two different index directory then the exception with write.lock won't happen. Could you please check that the new indexwriter instance is not writing to previously opened index directory which has already indexwriter opened on it.

How do I release all lucene .net file handles?

I want to run a process that completely destroys and then rebuilds my lucene .net search index from scratch.
I'm stuck on the destroying part
I've called:
IndexWriter.Commit();
IndexWriter.Close();
Analyzer.Close();
foreach (var name in Directory.ListAll()) { Directory.ClearLock(name); Directory.DeleteFile(name); }
Directory.Close();
but the process is failing because the is still a file handler on a file '_0.cfs'
Any ideas?
Are you hosted in IIS? Try an iisreset (sometimes IIS is holding onto the files themselves).
Just call IndexWriter.DeleteAll() followed by a IndexWriter.Commit(), it will remove the index content and will enable you to start off with an empty index, while already open readers will still be able to read data until closed. The old files will automatically be removed once they are no longer used.