Lucene - Lock obtain timed out: SimpleFSLock on fast requests - lucene

i have an httpmodule that logs every visit to the site into a lucene index.
the site is hosted on godaddy and even due i have almost nothing on the page i do the tests on (about 3kb including css), it works slow.
if i try to refresh a few times, after the second or third refresh i would get Lock obtain timed out: SimpleFSLock error.
my question is, am i doing something wrong? or is this normal behavior?
is there any way to overcome this problem?
my code:
//state the file location of the index
string indexFileLocation = System.IO.Path.Combine(HttpContext.Current.ApplicationInstance.Server.MapPath("~/App_Data"), "Analytics");
Lucene.Net.Store.Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory(indexFileLocation, false);
//create an analyzer to process the text
Lucene.Net.Analysis.Analyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer();
//create the index writer with the directory and analyzer defined.
Lucene.Net.Index.IndexWriter indexWriter = new Lucene.Net.Index.IndexWriter(dir, analyzer, false);
//create a document, add in a single field
Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();
doc.Add(new Lucene.Net.Documents.Field("TimeStamp", DateTime.Now.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.NOT_ANALYZED, Lucene.Net.Documents.Field.TermVector.NO));
doc.Add(new Lucene.Net.Documents.Field("IP", request.UserHostAddress.ToString(), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.NOT_ANALYZED, Lucene.Net.Documents.Field.TermVector.NO));
//write the document to the index
indexWriter.AddDocument(doc);
//optimize and close the writer
//indexWriter.Optimize();
indexWriter.Close();

Related

how to use lucene-gosen analyser with lucene.net?

Please guide me how to use japanese analyser (lucene-gosen) with Lucene.net. And also suggest me some good analyzer for Lucene.net that support Japanese.
The Lucene-Gosen analyzer does not appear to be ported to Lucene.Net. You can make a request on their github page or you could help them out by porting it and submitting a pull request.
Once that analyzer exists and using the article here - using their basic code, just change the analyzer:
string strIndexDir = #"D:\Index";
Lucene.Net.Store.Directory indexDir = Lucene.Net.Store.FSDirectory.Open(new System.IO.DirectoryInfo(strIndexDir));
Analyzer std = new JapaneseAnalyzer(Lucene.Net.Util.Version.LUCENE_29); //Version parameter is used for backward compatibility. Stop words can also be passed to avoid indexing certain words
IndexWriter idxw = new IndexWriter(indexDir, std, true, IndexWriter.MaxFieldLength.UNLIMITED);
//Create an Index writer object.
Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();
Lucene.Net.Documents.Field fldText = new Lucene.Net.Documents.Field("text", System.IO.File.ReadAllText(#"d:\test.txt"), Lucene.Net.Documents.Field.Store.YES, Lucene.Net.Documents.Field.Index.ANALYZED, Lucene.Net.Documents.Field.TermVector.YES);
doc.Add(fldText);
//write the document to the index
idxw.AddDocument(doc);
//optimize and close the writer
idxw.Optimize();
idxw.Close();
Response.Write("Indexing Done");

Why I can't get the doc that added recently by IndexWriter in the search result in Lucene 4.0?

Such as the title said, I have encountered a puzzled problem.
I have built an index for my test program, then I use IndexWriter to add a document into index. The code is :
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
Document doc1 = new Document();
doc1.add(new Field("name", "张三", Field.Store.YES, Field.Index.ANALYZED));
doc1.add(new IntField("year", 2013, Field.Store.YES));
doc1.add(new TextField("content", "123456789", Field.Store.YES));
iwriter.addDocument(doc1);
iwriter.commit();
iwriter.close();
When I try to search in this index, I can't get this doc. I really get a correct result count, it is one more than before. But when I try to print the doc.get('name'), the output is wrong.
The code in search part is:
DirectoryReader ireader = DirectoryReader.open(directory);
System.out.println(ireader.numDeletedDocs());
IndexSearcher isearcher = new IndexSearcher(ireader);
// Parse a simple query that searches for "text":
QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "name", analyzer);
Query query = parser.parse("张");
ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
System.out.println(hits.length);
In results, there is a "Name: 李四".
I'm sure that I use the StandardAnalyzer during indexing and searching. And StandardAnalyzer will make one Chinese character as a single token. Why when I search "张", I will get "李四"? Is there anything wrong when I add a doc? Or the docid is mismatch?
Did you (re)open the index after adding the doc? Lucene searches only return the documents that existed as of the time the index was opened for searching.
[edit...]
Use IndexReader.Open() or IndexReader.doOpenIfChanged() to open the index again. doOpenIfChanged() has the advantage that it returns null if you can still use the old IndexReader instance (because the index has not changed).
(If I recall correctly, DirectoryReader.Open() just opens the index directory, so the higher-level Lucene code does not realize that the index has changed if you just call DirectoryReader.Open.)

Lucene .NET IndexWriter lock

My question related to the next code snippet:
static void Main(string[] args)
{
Lucene.Net.Store.Directory d = FSDirectory.Open(new DirectoryInfo(/*my index path*/));
IndexWriter writer = new IndexWriter(d, new WhitespaceAnalyzer());
//Exiting without closing the indexd writer...
}
In this test, I opened an IndexWriter without closing it - so even after the test exits, the write.lock file still exists in the index directory, so I expected that the next time I open an instance of IndexWriter to that index, a LockObatinFailedException will be thrown.
Can someone please explain to me why am I wrong? I mean, does the meaning of the write.lock file is to protect creation of two IndexWriters in the same process only? that doesnt seems the right answer to me...
It looks like there is a bug with that IndexWriter constructor, if you change your code to:
IndexWriter writer = new IndexWriter("Path to index here", new WhitespaceAnalyzer());
You will get the Exception.
The lock file is used to prevent 2 IndexWriter opened on the same index, whether they are in the same process or not. You are right to expect an Exception there.

Set Lucene IndexWriter max fields

I was started working my way through the second edition of 'Lucene in Action' which uses the 3.0 API, the author creates a basic INdexWriter with the following method
private IndexWriter getIndexWriter() throws CorruptIndexException, LockObtainFailedException, IOException {
return new IndexWriter(directory, new WhitespaceAnalyzer(), IndexWriter.MaxFieldLength.Unlimited);
}
In the code Below I've made the changes according the current API, with the exception that I cannot figure out how to set the writer's max field length to unlimited like the constant in the book example. I've just inserted the int 1000 below. Is this unlimited constant just gone completely in the current API?
private IndexWriter getIndexWriter() throws CorruptIndexException, LockObtainFailedException, IOException {
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_36,
new LimitTokenCountAnalyzer(new WhitespaceAnalyzer(Version.LUCENE_36), 1000));
return new IndexWriter(directory, iwc);
}
Thanks, this is just for curiosity.
IndexWriter javadoc says:
#deprecated use LimitTokenCountAnalyzer instead. Note that the
behvaior slightly changed - the analyzer limits the number of
tokens per token stream created, while this setting limits the
total number of tokens to index. This only matters if you index
many multi-valued fields though.
So, in other words, a hard-wired method has been replaced with a nice adapter/delegate pattern.

Using RAMDirectory

When should I use Lucene's RAMDirectory? What are its advantages over other storage mechanisms? Finally, where can I find a simple code example?
When you don’t want to permanently store your index data. I use this for testing purposes. Add data to your RAMDirectory, Do your unit tests in RAMDir.
e.g.
public static void main(String[] args) {
try {
Directory directory = new RAMDirectory();
Analyzer analyzer = new SimpleAnalyzer();
IndexWriter writer = new IndexWriter(directory, analyzer, true);
OR
public void testRAMDirectory () throws IOException {
Directory dir = FSDirectory.getDirectory(indexDir);
MockRAMDirectory ramDir = new MockRAMDirectory(dir);
// close the underlaying directory
dir.close();
// Check size
assertEquals(ramDir.sizeInBytes(), ramDir.getRecomputedSizeInBytes());
// open reader to test document count
IndexReader reader = IndexReader.open(ramDir);
assertEquals(docsToAdd, reader.numDocs());
// open search zo check if all doc's are there
IndexSearcher searcher = new IndexSearcher(reader);
// search for all documents
for (int i = 0; i < docsToAdd; i++) {
Document doc = searcher.doc(i);
assertTrue(doc.getField("content") != null);
}
// cleanup
reader.close();
searcher.close();
}
Usually if things work out with RAMDirectory, it will pretty much work fine with others. i.e. to permanently store your index.
Alternate to this is FSDirectory. You will have to take care of filesystem permissions in this case(which is not valid with RAMDirectory)
Functionally,there is not distinct advantage of RAMDirectory over FSDirectory(other than the fact that RAMDirectory will be visibly faster than FSDirectory). They both server two different needs.
RAMDirectory -> Primary memory
FSDirectory -> Secondary memory
Pretty similar to RAM & Hard disk .
I am not sure what will happen to RAMDirectory if it exceeds memory limit. I’d except a
OutOfMemoryException :
System.SystemException
thrown.