I seem to be having some issues with the IntervalAsynchronousStrategy for updating content items.
Sometimes, the indexes will not be automatically updated with this strategy, and a manual index rebuild is required.
These are the corresponding log file entries:
8404 09:20:24 INFO [Index=artscentre_web_index] IntervalAsynchronousUpdateStrategy executing.
8404 09:20:24 INFO [Index=artscentre_web_index] History engine is empty. Incremental rebuild returns
8032 09:20:21 WARN [Index=artscentre_web_index] IntervalAsynchronousUpdateStrategy triggered but muted. Indexing is paused.
And I see this for every time the index rebuilds, even though there is content being edited and published in that time.
I have previously swapped from the OnPublishEnd rebuild strategy to the interval strategy as I was finding that publishing content would not trigger an index rebuild either.
Our environment is a single instance setup only, so the single IIS website handles both CM and CD. Therefore I can eliminate anything to do with remote events, I think?
Has anyone else had this much trouble getting Sitecore to maintain index updates?
Cheers,
Justin
Related
Using the GraphDB workbench, I have built a few Similarity indexes The first is just the default query and two custom ones. These worked fine for the build. Now they have a status of outdated but the refresh feature is disabled, the UI will not allow a click, instead presenting bubble indicating I cant do it. Only the delete feature is allows. Has anyone determined why this happens and how to fix it? This is the 3rd time it has happened. Yes I can drop a rebuild but I would prefer to find out why its happening. The logs do not appear to have anything related to this.
Thanks
Similarity queries needed for index rebuild are stored in Workbench settings file in GraphDB Home /work/workbench/settings.js. This may happen if you change your GraphDB Home. Please, also check if you have some errors on initialization.
I have a rendering component that runs a search using the Lucene index to populate itself.
We have two indexes defined; Master & Web. When in the experience editor it uses the Master index, and the Web index for the actual site.
We've configured the Web index strategy as onPublishEndAsync, and we've configured the Master index strategy as syncMaster, the idea being that CMS users can add/edit Sitecore items that power this component, and see them straight away in the experience editor.
However, it seems that the master index is not being updated as we change data in Sitecore. The experience editor only shows the data once I've manually run an index rebuild.
<strategies hint="list:AddStrategy">
<strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/syncMaster" />
</strategies>
Why doesn't the index update itself upon data changes?
UPDATE
So I've compared the files suggested to a clean install and they are the same.
I should add, I'm not using the standard sitecore_master_index. We have multiple sites running off the same instance of sitecore, so we have added a config include for websitename_master_index. I have compared the config for this within the <index> node against sitecore_master_index in Sitecore.ContentSearch.Lucene.Index.Master.config and the only differences are the crawler's <root> element which points to the particular sites content node, plus we've added some custom fields, but I assume that these fields wouldn't be causing a problem we can manually rebuild the index fine?
One other interesting thing I found when looking at the showconfig.aspx was this:
<agent type="Sitecore.ContentSearch.Tasks.Optimize" method="Run" interval="12:00:00" patch:source="Sitecore.ContentSearch.config">
<indexes hint="list">
<index>sitecore_master_index</index>
</indexes>
</agent>
I'm not sure if this has any significance, but there was not a matching entry for our custom websitename_master_index?
UPDATE
I've also added debug level logging to the crawler
In the crawling.log I only see the following:
14416 08:55:10 INFO [Index=website_master_index] Initializing SitecoreItemCrawler. DB:master / Root:/sitecore/Content/Website/Home
14416 08:55:10 INFO [Index=website_master_index] Initializing SynchronousStrategy.
Upon editing and saving items, there is no further mention of the index in the log, and this is actually true of the standard sitecore_master_index which we haven't altered the config for?
In order to guarantee Lucene files are not concurrently modified, Lucene adds a .lock file concept - whatever process is about to write, has to create the file.
In case there is one already - wait for it to be removed.
Should a writer process be terminated, file never got removed, hence index never got updated.
The solution was to clean the folder manually.
In order to make a better prediction a memory snapshot of the process is needed to see what is happening inside (or what does each thread do).
I have two full text catalogs, DogNameCatalog and CatNameCatalog. I have two tables, DogName and CatName to which they apply. Each of those tables has a field called Name, and that is the field within the catalogs.
I have disabled all scheduling because I want to rebuild the catalogs myself.
The CatNameCatalog works fine. I run this command:
ALTER FULLTEXT CATALOG CatNameCatalog REBUILD WITH ACCENT_SENSITIVITY = OFF
And when I query it it initially says building, then after a few minutes, the LastPopulated property updates to the current time and it goes to Idle.
However, the DogNameCatalog doesn't rebuild when I run the same command:
ALTER FULLTEXT CATALOG DogNameCatalog REBUILD WITH ACCENT_SENSITIVITY = OFF
It runs successfully, but nothing happens - the LastPopulatedDate resets to 1990-01-01 and it remains Idle. In the properties, it is 0MB. Rebuilding it via SSMS by right clicking and pressing rebuild prompts me saying Do you want to delete and recreate? and I press Yes, and it immediately says success but it hasn't done anything as the LastPopulatedDate is still 1990 and the size is still 0MB.
This issue occurs on both my local dev machine and my test server. However it has worked a single time on both as I tested it, deployed it, and it ran overnight (updated the database, then rebuilt the indexes). The 2nd time it ran, it hung because it was waiting for the index to rebuild and it wasn't doing it.
Any ideas at all on how I can debug this?
I've fixed this (typical to find the solution as soon as you post a question).
Turns out the DogNameCatalog was set to Do not track changes when it should have been set to Manual.
Do not track changes means if you attempt to rebuild it will do nothing.
Manual is the one you want when doing things manually.
A coworker asked this question, and I wasn't immediately finding a solution, so I'm posting it here. He is programmatically inserting a Sitecore item in the master DB, and then subsequently has to insert another item that has a dependency on the first item being present in the index. He originally was having that second item insert fail every time or two, but has since inserted a manual pause in his code to try to allow the index time to catch up, and it's now failing only about every tenth time. Better, but not perfect.
He is looking for whether there's a Sitecore way to check for if the index has been updated before he proceeds with inserting the dependent item.
I did find this blog post by Alex Shyba (http://sitecoreblog.alexshyba.com/2011/04/search-index-troubleshooting.html), which looks like it might have some applicability, but my coworker is strictly working in the master DB (no publishing involved), and we already have the first several steps in Alex's article implemented in our solution (I didn't go through the whole thing).
If you are dependent on an index add, in the end the only way to ensure the item is in the index is to take the action following the asynchronous index update. And in Sitecore 6, the only way to do that which I am aware of is the database:propertychanged event. Alex Shyba describes this event in another article, with regard to HTML cache clearing.
Your challenge will likely be knowing in the event handler what item was inserted, and what to do with it. You'll need some sort of global data structure to communicate this state information, since the index update runs as an asynch job.
Other options (which may be easier) would be to remove the dependency on the index update (use Sitecore query or fast query), or poll the index until the item is there (which is a bit ugly).
Why not just add the item the index yourself? That way the UI will be blocked until its done.
You could do it by hooking into the item:saved event. I'm thinking the event handler would be based on the code from the database crawler
Have you thought about queuing the second task as a "timed task", with some wrapper to check the dependency and requeue if necessary? See http://www.sitecore.net/Community/Technical-Blogs/John-West-Sitecore-Blog/Posts/2010/11/All-About-Sitecore-Scheduling-Agents-and-Tasks.aspx.
I just started playing with the Azure Library for Lucene.NET (http://code.msdn.microsoft.com/AzureDirectory). Until now, I was using my own custom code for writing lucene indexes on the azure blob. So, I was copying the blob to localstorage of the azure web/worker role and reading/writing docs to the index. I was using my custom locking mechanism to make sure we dont have clashes between reads and writes to the blob. I am hoping Azure Library would take care of these issues for me.
However, while trying out the test app, I tweaked the code to use compound-file option, and that created a new file everytime I wrote to the index. Now, my question is, if I have to maintain the index - i.e keep a snapshot of the index file and use it if the main index gets corrupt, then how do I go about doing this. Should I keep a backup of all the .cfs files that are created or handling only the latest one is fine. Are there api calls to clean up the blob to keep the latest file after each write to the index?
Thanks
Kapil
After i answered this, we ended up changing our search infrastructure and used Windows Azure Drive. We had a Worker Role, which would mount a VHD using the Block Storage, and host the Lucene.NET Index on it. The code checked to make sure the VHD was mounted first and that the index directory existed. If the worker role fell over, the VHD would automatically dismount after 60 seconds, and a second worker role could pick it up.
We have since changed our infrastructure again and moved to Amazon with a Solr instance for search, but the VHD option worked well during development. it could have worked well in Test and Production, but Requirements meant we needed to move to EC2.
i am using AzureDirectory for Full Text indexing on Azure, and i am getting some odd results also... but hopefully this answer will be of some use to you...
firstly, the compound-file option: from what i am reading and figuring out, the compound file is a single large file with all the index data inside. the alliterative to this is having lots of smaller files (configured using the SetMaxMergeDocs(int) function of IndexWriter) written to storage. the problem with this is once you get to lots of files (i foolishly set this to about 5000) it takes an age to download the indexes (On the Azure server it takes about a minute,, of my dev box... well its been running for 20 min now and still not finished...).
as for backing up indexes, i have not come up against this yet, but given we have about 5 million records currently, and that will grow, i am wondering about this also. if you are using a single compounded file, maybe downloading the files to a worker role, zipping them and uploading them with todays date would work... if you have a smaller set of documents, you might get away with re-indexing the data if something goes wrong... but again, depends on the number....