Sitecore Lucene indexing - file not found exception using advanced database crawler

Sitecore Lucene indexing - file not found exception using advanced database crawler - lucene

I'm having a problem with Sitecore/Lucene on our Content Management environment, we have two Content Delivery environments where this isn't a problem. I'm using the Advanced Database Crawler to index a number of items of defined templates. The index is pointing to the master database.
The index will remain 'stable' for a few hours or so, and then in the logs I will start to see this error appearing. Along with if I try and open a Searcher.
ManagedPoolThread #17 16:18:47 ERROR Could not update index entry. Action: 'Saved', Item: '{9D5C2EAC-AAA0-43E1-9F8D-885B16451D1A}'
Exception: System.IO.FileNotFoundException
Message: Could not find file 'C:\website\www\data\indexes\__customSearch\_f7.cfs'.
Source: Lucene.Net
at Lucene.Net.Index.SegmentInfos.FindSegmentsFile.Run()
at Sitecore.Search.Index.CreateReader()
at Sitecore.Search.Index.CreateSearcher(Boolean close)
at Sitecore.Search.IndexSearchContext.Initialize(ILuceneIndex index, Boolean close)
at Sitecore.Search.IndexDeleteContext..ctor(ILuceneIndex index)
at Sitecore.Search.Crawlers.DatabaseCrawler.DeleteItem(Item item)
at Sitecore.Search.Crawlers.DatabaseCrawler.UpdateItem(Item item)
at System.EventHandler.Invoke(Object sender, EventArgs e)
at Sitecore.Data.Managers.IndexingProvider.UpdateItem(HistoryEntry entry, Database database)
at Sitecore.Data.Managers.IndexingProvider.UpdateIndex(HistoryEntry entry, Database database)
From what I read this can be due to an update on the index whilst there is an open reader, and when a merge operation happens the reader will still have a reference to the deleted segment, or something to that avail (I'm not an expert on Lucene).
I have tried a few things with no success. Including sub classing the Sitecore.Search.Index object and overriding CreateWriter(bool recreate) to change the merge scheduler/policy and tweaking the merge factor. See below.
protected override IndexWriter CreateWriter(bool recreate)
{
IndexWriter writer = base.CreateWriter(recreate);
LogByteSizeMergePolicy policy = new LogByteSizeMergePolicy();
policy.SetMergeFactor(20);
policy.SetMaxMergeMB(10);
writer.SetMergePolicy(policy);
writer.SetMergeScheduler(new SerialMergeScheduler());
return writer;
}
When I'm reading the index I call SearchManager.GetIndex(Index).CreateSearchContext().Searcher and when I'm done getting the documents I need I call .Close() which I thought would've been sufficient.
I was thinking I could perhaps try overriding CreateSearcher(bool close) as well, to ensure I'm opening a new reader each time, which I will give a go after this. I don't really know enough about how Sitecore handles Lucene, its readers/writers?
I also tried playing around with the UpdateInterval value in the web config to see if that would help, alas it didn't.
I would greatly appreciate anyone who a) knows of any kind of situations in which this could occur, and b) any potential advice/solutions, as I'm starting to bang my head against a rather large wall :)
We're running Sitecore 6.5 rev111123 with Lucene 2.3.
Thanks,
James.

It seems like Lucene freaks out when you try to re-index something that is in the process of being indexed already. To verify that, try the following:
Set the updateinterval of your index to a really high value (8 hours).
Then, stop the w3wp.exe and delete the index.
After deleting the index try to rebuild the index in Sitecore and wait for this to finish.
Test again and see if this occurs.
If this doesn't occur anymore it will be the updateinterval set too low which causes your index (that is probably still being constructed) to be overwritten with a new one (that won't be finished either) causing your segments.gen file to contain the wrong index information.
This .gen file will point your indexreader to what segments are part of your index and is recreated after index rebuilding.
That's why I suggest to try to disable the updates for a large amount of time and to rebuild it manually.

Related

Infinispan clustered lock performance does not improve with more nodes?

I have a piece of code that is essentially executing the following with Infinispan in embedded mode, using version 13.0.0 of the -core and -clustered-lock modules:
#Inject
lateinit var lockManager: ClusteredLockManager
private fun getLock(lockName: String): ClusteredLock {
lockManager.defineLock(lockName)
return lockManager.get(lockName)
}
fun createSession(sessionId: String) {
tryLockCounter.increment()
logger.debugf("Trying to start session %s. trying to acquire lock", sessionId)
Future.fromCompletionStage(getLock(sessionId).lock()).map {
acquiredLockCounter.increment()
logger.debugf("Starting session %s. Got lock", sessionId)
}.onFailure {
logger.errorf(it, "Failed to start session %s", sessionId)
}
}
I take this piece of code and deploy it to kubernetes. I then run it in six pods distributed over six nodes in the same region. The code exposes createSession with random Guids through an API. This API is called and creates sessions in chunks of 500, using a k8s service in front of the pods which means the load gets balanced over the pods. I notice that the execution time to acquire a lock grows linearly with the amount of sessions. In the beginning it's around 10ms, when there's about 20_000 sessions it takes about 100ms and the trend continues in a stable fashion.
I then take the same code and run it, but this time with twelve pods on twelve nodes. To my surprise I see that the performance characteristics are almost identical to when I had six pods. I've been digging in to the code but still haven't figured out why this is, I'm wondering if there's a good reason why infinispan here doesn't seem to perform better with more nodes?
For completeness the configuration of the locks are as follows:
val global = GlobalConfigurationBuilder.defaultClusteredBuilder()
global.addModule(ClusteredLockManagerConfigurationBuilder::class.java)
.reliability(Reliability.AVAILABLE)
.numOwner(1)
and looking at the code the clustered locks is using DIST_SYNC which should spread out the load of the cache onto the different nodes.
UPDATE:
The two counters in the code above are simply micrometer counters. It is through them and prometheus that I can see how the lock creation starts to slow down.
It's correctly observed that there's one lock created per session id, this is per design what we'd like. Our use case is that we want to ensure that a session is running in at least one place. Without going to deep into detail this can be achieved by ensuring that we at least have two pods that are trying to acquire the same lock. The Infinispan library is great in that it tells us directly when the lock holder dies without any additional extra chattiness between pods, which means that we have a "cheap" way of ensuring that execution of the session continues when one pod is removed.
After digging deeper into the code I found the following in CacheNotifierImpl in the core library:
private CompletionStage<Void> doNotifyModified(K key, V value, Metadata metadata, V previousValue,
Metadata previousMetadata, boolean pre, InvocationContext ctx, FlagAffectedCommand command) {
if (clusteringDependentLogic.running().commitType(command, ctx, extractSegment(command, key), false).isLocal()
&& (command == null || !command.hasAnyFlag(FlagBitSets.PUT_FOR_STATE_TRANSFER))) {
EventImpl<K, V> e = EventImpl.createEvent(cache.wired(), CACHE_ENTRY_MODIFIED);
boolean isLocalNodePrimaryOwner = isLocalNodePrimaryOwner(key);
Object batchIdentifier = ctx.isInTxScope() ? null : Thread.currentThread();
try {
AggregateCompletionStage<Void> aggregateCompletionStage = null;
for (CacheEntryListenerInvocation<K, V> listener : cacheEntryModifiedListeners) {
// Need a wrapper per invocation since converter could modify the entry in it
configureEvent(listener, e, key, value, metadata, pre, ctx, command, previousValue, previousMetadata);
aggregateCompletionStage = composeStageIfNeeded(aggregateCompletionStage,
listener.invoke(new EventWrapper<>(key, e), isLocalNodePrimaryOwner));
}
The lock library uses a clustered Listener on the entry modified event, and this one uses a filter to only notify when the key for the lock is modified. It seems to me the core library still has to check this condition on every registered listener, which of course becomes a very big list as the number of sessions grow. I suspect this to be the reason and if it is it would be really really awesome if the core library supported a kind of key filter so that it could use a hashmap for these listeners instead of going through a whole list with all listeners.

I believe you are creating a clustered lock per session id. Is this what you need ? what is the acquiredLockCounter? We are about to deprecate the "lock" method in favour of "tryLock" with timeout since the lock method will block forever if the clustered lock is never acquired. Do you ever unlock the clustered lock in another piece of code? If you shared a complete reproducer of the code will be very helpful for us. Thanks!

Ontotext GraphDB Repository cannot be used for queries

I am getting an error message while trying to sparql in a particular repository.
Error :
The currently selected repository cannot be used for queries due to an error:
Page [id=7, ref=1,private=false,deprecated=false] from pso has size of 206 != 820 which is written in the index: PageIndex#244 [OPENED] ref:3 (parent=null freePages=1 privatePages=0 deprecatedPages=0 unusedPages=0)
So I tried to recreate the repository by uploading a new RDF file, but still issue persist. Any solution? Thanks in advance

The error indicates an inconsistency between what is written in the index (pso.index) and the actual page (pso). Is there any chance that the binary files were modified/over-written/partially merged? Under normal operation, you should never get this an error.
The only way to hide this error is to start GraphDB with: ./graphdb -Dthrow.exception.on.index.inconsistency=false. I will recommend doing this only for dumping the repository content into an RDF file, drop the repository, and recreate it.

How to corrupt a Raven Index

I am building a script that checks for corrupted indexes and resets them but I am having issues getting corrupted indexes locally.
Does anyone know how to force an index corruption for RavenDB?

To cause a corruption you can delete one of the header files (headers.one or headers.two or both) or delete one of the journal files (when the database is offline).
The files are located under the relevant index folder.

You can simply divide by 0 and you will get index errors.
For example - define an index with:
from order in docs.Orders
select new
{
order.Company,
Total = order.Lines.Sum(l => (l.Quantity / 0))
}
Update:
Go to Debugging Index Errors
To see how you can generate:
Index Compilation Errors -and/or-
Index Execution Errors
https://ravendb.net/docs/article-page/4.1/Csharp/indexes/troubleshooting/debugging-index-errors

Exceeded max configured index size while indexing document while running LS Agent

In our project we have a LS agent which is supposed to delete and create a new FT index. We haven't figured out yet why updating the index cannot be accomplished automatically(although in our database option this is set to update the index daily). So we decided to make roughly the same thing, but using our very simple agent. The agent looks like this and runs daily in the night:
Option Public
Option Declare
Dim s As NotesSession
Dim ndb As NotesDatabase
Sub Initialize
Set s = New NotesSession
Set ndb = s.CurrentDatabase
Print("BEFORE REMOVING INDEXES")
Call ndb.Removeftindex()
Print("INDEXES HAVE BEEN REMOVED SUCCESSFULLY")
Call ndb.createftindex(FTINDEX_ALL_BREAKS, true)
Print("INDEXES HAVE BEEN CREATED SUCCESSFULLY")
End Sub
In most cases it works very well, but sometimes when somebody creates a document which exceeds 12MB (we really don't know how this is possible) the agent fails to create the index. (but the latest index is already deleted).
Error message is:
31.05.2018 03:01:25 Full Text Error (FTG): Exceeded max configured index
size while indexing document NT000BD992 in database index *path to FT file*.ft
My question is how to avoid this problem? We've already expanded the limit of 6MB by the following command SET CONFIG FTG_INDEX_LIMIT=12582912. Can we expand it even more? And in general, how to solve the problem? Thanks in advance.

Using FTG_INDEX_LIMIT is an option to avoid this error, yes. But it will impact server performance in two ways - FTI-update processes will take more time and more memory.
There's no max size of this limit (in theory), but! As update-processes eats memory from common heap it can lead to 'out-of-memory-overheaping' and server crash.
You can try to exclude attachments from index - i don't think someone can put more than 1mb of text in one doc, but users can attach some big text files - and this will produce the error you writing about.
p.s. and yeah, i agree with Scott - why do you need such agent anyway? common ft-indexing working fine usualy.

Compass/Lucene in clustered environment

I get the following error in a clustered environment where one node is indexing the objects and the other node is confused about the segments that are there in cache. The node never recovers by itself even after server restart. The node that's indexing might be merging the segments and deleting which the other node is not aware of. I did not touch the invalidateCacheInterval setting and added compass.engine.globalCacheIntervalInvalidation property with 500ms. It didn't help.
This is happening while searching and indexing on the other node.
Can someone help me how to resolve this issue? Maybe to ask compass reload the cache or start from scratch without having to reindex all the objects?
org.compass.core.engine.SearchEngineException: Failed to search with query [+type:...)]; nested exception is org.apache.lucene.store.jdbc.JdbcStoreException: No entry for [_6ge.tis] table index_objects
org.apache.lucene.store.jdbc.JdbcStoreException: No entry for [_6ge.tis] table index_objects
at org.apache.lucene.store.jdbc.index.FetchOnBufferReadJdbcIndexInput$1.execute(FetchOnBufferReadJdbcIndexInput.java:68)
at org.apache.lucene.store.jdbc.support.JdbcTemplate.executeSelect(JdbcTemplate.java:112)
at org.apache.lucene.store.jdbc.index.FetchOnBufferReadJdbcIndexInput.refill(FetchOnBufferReadJdbcIndexInput.java:58)
at org.apache.lucene.store.ConfigurableBufferedIndexInput.readByte(ConfigurableBufferedIndexInput.java:27)
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78)
at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:64)
at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:127)
at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:158)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:250)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:218)
at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:752)
at org.apache.lucene.index.MultiSegmentReader.docFreq(MultiSegmentReader.java:377)
at org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:86)
at org.apache.lucene.search.Similarity.idf(Similarity.java:457)
at org.apache.lucene.search.TermQuery$TermWeight.(TermQuery.java:44)
at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:146)
at org.apache.lucene.search.BooleanQuery$BooleanWeight.(BooleanQuery.java:185)
at org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:360)
at org.apache.lucene.search.Query.weight(Query.java:95)
at org.apache.lucene.search.Hits.(Hits.java:85)
at org.apache.lucene.search.Searcher.search(Searcher.java:61)
at org.compass.core.lucene.engine.transaction.support.AbstractTransactionProcessor.findByQuery(AbstractTransactionProcessor.java:146)
at org.compass.core.lucene.engine.transaction.support.AbstractSearchTransactionProcessor.performFind(AbstractSearchTransactionProcessor.java:59)
at org.compass.core.lucene.engine.transaction.search.SearchTransactionProcessor.find(SearchTransactionProcessor.java:50)
at org.compass.core.lucene.engine.LuceneSearchEngine.find(LuceneSearchEngine.java:352)
at org.compass.core.lucene.engine.LuceneSearchEngineQuery.hits(LuceneSearchEngineQuery.java:188)
at org.compass.core.impl.DefaultCompassQuery.hits(DefaultCompassQuery.java:199)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Sitecore Lucene indexing - file not found exception using advanced database crawler - lucene

Related

Infinispan clustered lock performance does not improve with more nodes?

Ontotext GraphDB Repository cannot be used for queries

How to corrupt a Raven Index

Exceeded max configured index size while indexing document while running LS Agent

Compass/Lucene in clustered environment

Categories

Resources