My question related to the next code snippet:
static void Main(string[] args)
{
Lucene.Net.Store.Directory d = FSDirectory.Open(new DirectoryInfo(/*my index path*/));
IndexWriter writer = new IndexWriter(d, new WhitespaceAnalyzer());
//Exiting without closing the indexd writer...
}
In this test, I opened an IndexWriter without closing it - so even after the test exits, the write.lock file still exists in the index directory, so I expected that the next time I open an instance of IndexWriter to that index, a LockObatinFailedException will be thrown.
Can someone please explain to me why am I wrong? I mean, does the meaning of the write.lock file is to protect creation of two IndexWriters in the same process only? that doesnt seems the right answer to me...
It looks like there is a bug with that IndexWriter constructor, if you change your code to:
IndexWriter writer = new IndexWriter("Path to index here", new WhitespaceAnalyzer());
You will get the Exception.
The lock file is used to prevent 2 IndexWriter opened on the same index, whether they are in the same process or not. You are right to expect an Exception there.
Related
I am trying to create an IndexWriter and write to a Lucene Index. Here is my code:
public class Indexer {
public static Analyzer _analyzer = new StandardAnalyzer(Lucene.Net.Util.LuceneVersion.LUCENE_48);
private void WriteToIndex() {
var config = new IndexWriterConfig(Lucene.Net.Util.LuceneVersion.LUCENE_48, _analyzer).SetUseCompoundFile(false);
using (IndexWriter indexWriter = new IndexWriter(LuceneDirectory, config)) <-- This throws an error!
{
// ....
}
}
}
But I keep getting an exception when trying to create the IndexWriter:
Exception thrown: 'Lucene.Net.Util.SetOnce`1.AlreadySetException' in Lucene.Net.dll
Additional information: The object cannot be set twice!
What am I doing wrong? The code compiles perfectly. I'm using Lucene.NET, but I'm guessing it should apply to Java too.
You are getting this exception because you are reusing an IndexWriterConfig, which is not intended to be shared between IndexWriter instances. Generate a new IndexWriterConfig instead, and it should work fine.
How to remove stop words in Lucene for the given String "This is the chemical orientation"
I think that Lucene's StopFilter is what you are looking for.
you should use standardAnalyser ,that knows about certain token types, lowercases, removes stop words, ...
example of creating an IndexWriter with standardAnalyser:
public IndexWriter Indexer(String dir) throws IOException {
IndexWriter writer;
Directory indexDir = FSDirectory.open(new File(dir).toPath());
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig cfg = new IndexWriterConfig(analyzer);
cfg.setOpenMode(OpenMode.CREATE);
writer = new IndexWriter(indexDir, cfg);
return writer;
}
i have an httpmodule that logs every visit to the site into a lucene index.
the site is hosted on godaddy and even due i have almost nothing on the page i do the tests on (about 3kb including css), it works slow.
if i try to refresh a few times, after the second or third refresh i would get Lock obtain timed out: SimpleFSLock error.
my question is, am i doing something wrong? or is this normal behavior?
is there any way to overcome this problem?
my code:
//state the file location of the index
string indexFileLocation = System.IO.Path.Combine(HttpContext.Current.ApplicationInstance.Server.MapPath("~/App_Data"), "Analytics");
Lucene.Net.Store.Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory(indexFileLocation, false);
//create an analyzer to process the text
Lucene.Net.Analysis.Analyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer();
//create the index writer with the directory and analyzer defined.
Lucene.Net.Index.IndexWriter indexWriter = new Lucene.Net.Index.IndexWriter(dir, analyzer, false);
//create a document, add in a single field
Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();
doc.Add(new Lucene.Net.Documents.Field("TimeStamp", DateTime.Now.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.NOT_ANALYZED, Lucene.Net.Documents.Field.TermVector.NO));
doc.Add(new Lucene.Net.Documents.Field("IP", request.UserHostAddress.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.NOT_ANALYZED, Lucene.Net.Documents.Field.TermVector.NO));
//write the document to the index
indexWriter.AddDocument(doc);
//optimize and close the writer
//indexWriter.Optimize();
indexWriter.Close();
I was started working my way through the second edition of 'Lucene in Action' which uses the 3.0 API, the author creates a basic INdexWriter with the following method
private IndexWriter getIndexWriter() throws CorruptIndexException, LockObtainFailedException, IOException {
return new IndexWriter(directory, new WhitespaceAnalyzer(), IndexWriter.MaxFieldLength.Unlimited);
}
In the code Below I've made the changes according the current API, with the exception that I cannot figure out how to set the writer's max field length to unlimited like the constant in the book example. I've just inserted the int 1000 below. Is this unlimited constant just gone completely in the current API?
private IndexWriter getIndexWriter() throws CorruptIndexException, LockObtainFailedException, IOException {
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_36,
new LimitTokenCountAnalyzer(new WhitespaceAnalyzer(Version.LUCENE_36), 1000));
return new IndexWriter(directory, iwc);
}
Thanks, this is just for curiosity.
IndexWriter javadoc says:
#deprecated use LimitTokenCountAnalyzer instead. Note that the
behvaior slightly changed - the analyzer limits the number of
tokens per token stream created, while this setting limits the
total number of tokens to index. This only matters if you index
many multi-valued fields though.
So, in other words, a hard-wired method has been replaced with a nice adapter/delegate pattern.
When should I use Lucene's RAMDirectory? What are its advantages over other storage mechanisms? Finally, where can I find a simple code example?
When you don’t want to permanently store your index data. I use this for testing purposes. Add data to your RAMDirectory, Do your unit tests in RAMDir.
e.g.
public static void main(String[] args) {
try {
Directory directory = new RAMDirectory();
Analyzer analyzer = new SimpleAnalyzer();
IndexWriter writer = new IndexWriter(directory, analyzer, true);
OR
public void testRAMDirectory () throws IOException {
Directory dir = FSDirectory.getDirectory(indexDir);
MockRAMDirectory ramDir = new MockRAMDirectory(dir);
// close the underlaying directory
dir.close();
// Check size
assertEquals(ramDir.sizeInBytes(), ramDir.getRecomputedSizeInBytes());
// open reader to test document count
IndexReader reader = IndexReader.open(ramDir);
assertEquals(docsToAdd, reader.numDocs());
// open search zo check if all doc's are there
IndexSearcher searcher = new IndexSearcher(reader);
// search for all documents
for (int i = 0; i < docsToAdd; i++) {
Document doc = searcher.doc(i);
assertTrue(doc.getField("content") != null);
}
// cleanup
reader.close();
searcher.close();
}
Usually if things work out with RAMDirectory, it will pretty much work fine with others. i.e. to permanently store your index.
Alternate to this is FSDirectory. You will have to take care of filesystem permissions in this case(which is not valid with RAMDirectory)
Functionally,there is not distinct advantage of RAMDirectory over FSDirectory(other than the fact that RAMDirectory will be visibly faster than FSDirectory). They both server two different needs.
RAMDirectory -> Primary memory
FSDirectory -> Secondary memory
Pretty similar to RAM & Hard disk .
I am not sure what will happen to RAMDirectory if it exceeds memory limit. I’d except a
OutOfMemoryException :
System.SystemException
thrown.