Using IndexWriter with SearchManager - lucene

I have a few basic questions regarding the usage of SearcherManager with IndexWriter.
I need to periodically re-build the Lucene index in the application and currently it happens on a different thread other than the one that serves the search requests.
Can I use the same IndexWriter instance through the lifetime of the application to rebuild the index periodically? Currently, I create / open it once during the startup and just call IndexWriter#commit whenever a new index is built.
I'm using SearcherManagerto acquire and release IndexSearcher instances for each search request. After the index is periodically built, I'm planning to use SearcherManager#maybeRefresh method to get refreshed IndexSearcher instances.SearcherManager instance is also created once during the startup and I intend to maintain it through out.
I do not close the IndexWriter or SearcherManager throughout the app's lifetime.
Now for the questions,
If I create a new IndexWriter every time I need to rebuild the index, will SearcherManager#maybeRefresh be able to detect that it's a new IndexWriter Instance? Or do I need to create a new SearcherManager using the newly created IndexWriter ?
What's the difference between creating a SearcherManager instance using an IndexWriter, creating it using a DirectoryReader or creating it using a Directory ?

The answers depend on how you construct your SearcherManager:
If you construct it with a DirectoryReader, all future IndexSearchers acquired from the SearcherManager will be based on that reader, i.e. all searches will provide results from the point in time you instantiated the SearcherManager. If you write data to the index/directory and run SearcherManager.maybeRefresh() afterwards, the reader will not be updated and your search results will be outdated.
If you construct the SearcherManager with an IndexWriter, SearcherManager.maybeRefresh() will update the SearcherManager's reader if data has been written and commited by the writer. All newly acquired IndexSearchers will then reflect the new state of the underlying index.
Despite having limited experience, I recommend using the latter approach. It provides a very simple way to implement near-real-time searching: At application start you create an IndexWriter and construct a SearcherManager with it. Afterwards you start a background thread that periodically commits all changes in the IndexWriter and refreshes the SearcherManager. For the lifetime of your application you can keep using the initial IndexWriter and SearcherManager without having to close/reopen them.
PS: I have only started working with Lucene a few days ago, so don't take everything I wrote here as 100% certain.

Related

Lucene IndexWriter.Close() vs indexWriter.Commit()

What is different between IndexWriter.Close() andIndexWriter.Commit() when I hava just single instance of indexWriter?
Note:The Data that I going to make index is very big then I can't close IndexWriter runtime.
Note:I want to search in documents when data are indexing at sametime.
Commit() commits pending, buffered changes to the index (which can then be found with IndexReader() ). The IndexWriter can then continue to be used for more changes. Close() also performs a Commit(), but additionally closes the IndexWriter. Note that IndexWriter implements IDisposable(), and I recommend using it.
By your first note, if you mean there are lots of documents to index, that's fine. You can use the same IndexWriter for many documents without closing it. Just loop through however many documents you want to index within the same IndexWriter using() statement.
With regards to your second note, you must perform a commit() ( or close()) before your IndexWriter() changes will be seen by an IndexReader(). You can always search with IndexReader(), but it will only see the index as it was since the last IndexWriter.Commit().
I recommend Lucene In Action for these important details. It helped me a great deal.

Multiple instances of application using Lucene.Net

I'm developing a WPF application that uses Lucene.Net to index data from files being generated by a third-party process. It's low volume with new files being created no more than once a minute.
My application uses a singleton IndexWriter instance that is created at startup. Similarly an IndexSearcher is also created at startup, but is recreated whenever an IndexWriter.Commit() occurs, ensuring that the newly added documents will appear in search results.
Anyway, some users need to run two instances of the application, but the problem is that newly added documents don't show up when searching within the second instance. I guess it's because the first instance is doing the commits, and there needs to be a way to tell the second instance to recreate its IndexSearcher.
One way would be to signal this using a file create/update in conjunction with a FileSystemWatcher, but first wondered if there was anything in Lucene.Net that I could utilise?
The only thing I can think of that might be helpful for you is IndexReader.Reopen(). This will refresh the IndexReader, but only if the index has changed since the reader was originally opened. It should cause minimal disk access in the case where the index hasn't been updated, and in the case where it has, it tries to only load segments that were changed or added.
One thing to note about the API: Reopen returns an IndexReader. In the case where the index hasn't changed, it returns the same instance; otherwise it returns a new one. The original index reader is not disposed, so you'll need to do it manually:
IndexReader reader = /* ... */;
IndexReader newReader = reader.Reopen();
if(newReader != reader)
{
// Only close the old reader if we got a new one
reader.Dispose();
}
reader = newReader;
I can't find the .NET docs right now, but here are the Java docs for Lucene 3.0.3 that explain the API.
If both instance have their own IndexWriter opened on the same directory, you're in for a world of pain and intermittent bad behaviour.
an IW expects and requires exclusive control of the index directory. This is the reason for the lock file.
If the second instance can detect that there is a existing instance, then you might be able to just open an IndexReader/Searcher on the folder and reopen when the directory changes.
But then what happens if the first instance closes? The index will nolonger be updated. So the second instance would need to reinitialise, this time with an IW. Perhaps it could do this when the lock file is removed when the first instance closes.
The "better" approach would be to spin up a "service" (just a background process, maybe in the system tray). All instances of the app would then query this service. If the app is started and the service is not detected then spin it up.

How to stop running indexing before starting another?

I'm making a web app that uses Lucene as search engine. First, the user has to select a file/directory to index and after that he is capable to search it (duh!). My problem happens when the user is trying to index a huge amount of data: for example, if it's taking too long and the user refreshs the page and try to index another directory, an exception is thrown because the first indexing is still running (write.lock shows up). Known that, how is it possible to stop the first indexing? I tried closing the IndexWriter with no success.
Thanks in advance.
Why do you want to interrupt the first indexing operation and restart it again?
In my opinion you should display a simple image which shows that the system is working (as Nielsen says: "The system should always keep users informed about what is going on, through appropriate feedback within reasonable time.") and when the user press refresh you should intercept the event and prevent the execution of another indexing process.
You are probably trying to open an indexwriter instance on the index directory which has already indexwriter open on it. If you have opened indexwriter on two different index directory then the exception with write.lock won't happen. Could you please check that the new indexwriter instance is not writing to previously opened index directory which has already indexwriter opened on it.

Apache Lucene Index Writer

I'm new to apache lucene. I started using lucene. I had faced an issue. I started indexing all the files in the directory. I didn't close the indexwriter and tried to open in luke. It prompted with an error "Index not closed". The problem over here is the code execution has been completed. How to unlock the index? If I instantiate a new IndexWriter with the same directory, will it overwrite the existing index?
I am not an expert either..
If I were you, I'd do the following..
1) Add the following snippet to code at the end, which is a must at any cost.
myIndexWriter.close();
2) Delete the existing directory manually, and rerun the whole code.
If you instantiate the new IndexWriter without deleting the directory, it will add docs to the existing index. Yes, it will result in duplicate index entries.
However, in the Lucene's perspective, all those entries are still unique, i.e. every addDocument() creates a new entry in the Index with a new unique lucene-internal-doc-id.

Lucene index updation and performance

I am working on a job portal site and have been using Lucene for job search functionality.
Users will be posting a number jobs on our site on a daily basis.We need to make sure that new job posted is searchable on the site as soon as possible.
In this context, how do I update Lucene index when a new job is posted or when an existing job is edited?
Can lucene index updating and search work in parallel?
Also,can I know any tips/best practices with respect to Lucene indexing,optimizing,performance etc?
Appreciate ur help!
Thanks!
Yes, Lucene can search from and write to an index at the same time as long as no more than 1 IndexWriter writes to it. If you want the new records visible ASAP, have the IndexWriter call the commit() function often (see IndexWriter's JavaDoc for details).
These Wiki pages might also help:
ImproveIndexingSpeed
ImproveSearchingSpeed
I have used Lucene.Net on a web site similar to what you are doing. Yes, you can do live indexes, updating to keep everything up to date? What platform are you using Lucene on, .NET, Java?
Make sure you create a new IndexSearcher as any additions after an IndexSearcher has been created are not visible to that instance.
A better approach may be to ReOpen the IndexReader if you want to resuse the same index searcher.