Sitecore: Programmatically trigger reindexing of related content - lucene

In my sitecore instance, I have content for 2 templates, Product and Product Category. The products have a multilist that link to the Product Category as lookups. The Products also have an indexing computed field setup that precomputes some data based on selected Product Categories. So when a user changes a Product, Sitecore's indexing strategy indexes the Product with the computed field.
My issue is, when a user changes the data in the Product Category, I want Sitecore to reindex all of the related products. I'm not sure how to do this. I do not see any hook where I could detect that a Product Category is being indexed, so I could programmatically trigger an index to products

You could achieve using the indexing.getDependencies pipeline. Add a processor to it - your custom class should derive from Sitecore.ContentSearch.Pipelines.GetDependencies.BaseProcessor and override Process(GetDependenciesArgs context).
In this function you can get your IndexedItem and add, based on this information, other items to the Dependencies. These dependencies will be indexed as well. Benefit of this way of working is that the dependent items will be indexed in the same job, instead of calling new jobs to update them.
Just be aware of the performance hit this could cause if badly written as this code will be called on all indexes. Get out of the function as soon as you can if not applicable.
Some known issues about this pipeline can be found on the kb.

One way would be to add a custom OnItemSave handler which will check if the template of the changed item has the ProductCategory template, and will problematically trigger the index update.
To only reindex the changed items, you can pickup the related Products and register them in the HistoryEngine by using the HistoryEngine.RegisterItem method:
Sitecore.Context.Database.Engines.HistoryEngine.RegisterItemSaved(myItem, new ItemChanges(myItem));
Some useful instructions how to create OnItemSave handler can be found here: https://naveedahmad.co.uk/2011/11/02/sitecore-extending-onitemsave-handler/

You could add or change one of existing index update strategies. (configuration\sitecore\contentSearch\indexConfigurations\indexUpdateStrategies configuration node)
As example you could take
Sitecore.ContentSearch.Maintenance.Strategies.SynchronousStrategy
One thing, you need to change
public void Run(EventArgs args, bool rebuildDescendants)
method. args contains changed item reference. All that you need, trigger update of index for related items.
After having custom update strategy you should add it to your index, strategies configuration node.

Related

How to manage additional processed data in MarkLogic

MarkLogic 9.0.8.2
We have around 20M records in MarkLogic.
For one of the business requirement, we need to generate additional data for each xml and then need user will search this data.
As we can't change original document, so need input on what is best way to manage additional data. Following are the few which we have thought of
Create separate collection and store additional data in separate xml with same unique number i.e. same as original xml. So when user search for it, search in this collection and then retrieved original documents and send response back.
Store additional data in original document properties
We also need to create element range index to make sure it works when end user provide data in range operators.
<abc>
<xyz>
<quan>qty1</quan>
<value1>1.01325E+05</value1>
<unit>Pa</unit>
</xyz>
<xyz>
<quan>qty2</quan>
<value1>9.73E+02</value1>
<value2>1.373E+03</value2>
<unit>K</unit>
</xyz>
<xyz>
<quan>qty3</quan>
<value1>1.8E+03</value1>
<unit>s</unit>
</xyz>
<xyz>
<quan>qty4</quan>
<value1>3.6E+03</value1>
<unit>s</unit>
</xyz>
</abc>
We need to process data from value1 element. User will then search for something like
qty1 >= minvalue AND qty1<=maxvalue
qty2 >= minvalue AND qty2<=maxvalue
qty3 >= minvalue AND qty3<=maxvalue
So when user will search for qty1 then it should only get data from element where value is qty1 and so on.
So would like to know
What is best approach to store data like this
What kind of index i should create to implement this
I would recommend wrapping the original data in an envelope, which allows adding extra data in the header. It could also allow creating a canonical view on the relevant pieces of the data, and either store that as instance, and original as 'attachment' (sub-property, not an attached binary), or keep the instance as-is, and put canonical values for indexing in the header.
There is a lengthy blog article about the topic, that discusses pros and cons in high detail: https://www.marklogic.com/blog/envelope-design-pattern/
HTH!
Grtjn's answer would be the recommended solution, as it is more performant to keep all the information inside the document itself, versus having to query across both the document with the properties, but it would require changes to the document.
Option 1 & 2 could both work.
Properties documents already exist, so it doesn't add fragments, but the properties must conform to the schema.
Creating a sidecar document provides more flexibility, because you are creating new documents, it will increase number of fragments.

Search Items without presentation details in sitecore

Improve search performance.
We are currently on sitecore 8.1.3 in production and use Lucene Search to make the search work. We will be moving over to SOLR or Coveo search in near future. That said, we are trying to improve search functionality on our site.
In current scenario if a user searches on our site, Lucene search provides us with appropriate search results from sitecore content items. The results are a list of items in which some have presentation details where as some don't have presentation detail(which are basically datasource items, or pulled in multilist fields items). We displays results which have presentation details directly to user, however, the datasource items do not have presentation details attached to it, thus for such items we dispaly the items in which these respective items are referred as datasource items in presentation details, via sitecore link or are referenced in a multi-list field.
We are using Globals.LinkDatabase.GetItemReferrers(item, false) method to fetch the item where results items are referring to. We know this method is a heavy method. To improve the performance, we are filtering the items that are returned when we use Globals.LinkDatabase.GetItemReferrers(item, false) method. We select only the latest version of the item, we select an item only if the item has presentation details, we select only if the item is of same language as that of the context language. If the current item doesn't have presentation details, it will search for its related item with presentation details using the same function recursively. This logic or code that we have helps us to improve the performance at some level and yields the required results.
However this code slows down its performance if the number of search results is high. Say if I search for an item in which the Lucene search returns me say 10 items for it, our custom search code will then yield me say 100 related items(assuming the Datasource items of items found in the result can be reused across different items). The performance degrades when the Lucene search provides results with a huge count, say 500. In such scenarios we will be running our code recursively on 500 items and their related items. For better performance we have tried using LINQ query instead of foreach iterations wherever possible. The code works perfectly fine. We do get appropriate results, however the search slows down if the count is high for search items. Want to know if there is any more area where we can improve the performance.
The best way to improve the performance is to have a custom index that has the results you want to search and does not contain items that you do not want to return. In this way, your filtering is 'pre-done' during indexing.
The common way of doing is to use a computed field that will contain all the 'text' of the page (collating together content from datasources) so that the page's full contents are in a field in the index. This way, even if the text match would have been on a datasource, the page will still come back as a valid search result.
There is a blog from Kam Figy on this topic: https://kamsar.net/index.php/2014/05/indexing-subcontent/
Note that in addition to the computed field, you will also need to patch in the field to the index using a Sitecore config patch file. Kam's blog shows an example of that as well.
You need to index this data together to begin with, rather than trying to piece it together at runtime. You should also try to keep your indexes lean or use queries to restrict the results that are returned to only provide the relevant results.
I agree with the answer from Jason that a separate index is one of the best solutions, combined with a computed field that includes content from all referenced datasources.
Further, I would create a custom crawler which excludes items without any layout from the index. For an index which which is only used to provide results for site search, you only care about items with layout since only they have a navigable URL.
namespace MyProject.CMS.Custom.ContentSearch.Crawlers
{
public class CustomItemCrawler : Sitecore.ContentSearch.SitecoreItemCrawler
{
protected override bool IsExcludedFromIndex(SitecoreIndexableItem indexable, bool checkLocation = false)
{
bool isExcluded = base.IsExcludedFromIndex(indexable, checkLocation);
if (isExcluded)
return true;
Item obj = (Item)indexable;
return obj.Visualization != null && obj.Visualization.Layout != null;
}
protected override bool IndexUpdateNeedDelete(SitecoreIndexableItem indexable)
{
Item obj = indexable;
return obj.Visualization != null && obj.Visualization.Layout != null;
}
}
}
If for some reason you do not wish to create a separate index, or you only want to keep a single index (since you are using the Content Search API and require a full index for your component queries, or even just to minimise indexing speeds across multiple indexes) then I would consider creating a custom computed field in the index which stores [true/false]. The logic is the same as above. You can then filter in your search to only return results which have layout.
The combination of including/combining the content of the datasourse items during indexing and only returning items with layout should result in much better performance of your search queries.

Keeping dValIds For auto-generated dimensions consistent

I am working with Endeca 6.4.1 and have many auto-generated dimensions present in my pipeline (mapped using Dev-studio), the application's indexing is CAS-less. So only FCM is creating Dimensions and assigning dValIds. I am using Endeca SEO, so the dVal Id directly reflects in my URL, and if an auto-gen dimension's value's Id changes, a link to that navigation State is lost.
I have a flat file as the dimension's source, for example
product.feature|neon finish
What I want is that, if the value some day changes to Neon-finish or Neon color, the dValId that was assigned to neon finish should be transferred to the new value. I can keep a custom mapping of the change to track that neon finish has been changed to a new value.
Is there any way to achieve this, may be by using some manipulators?
Please share your thoughts.
There are two basic ways to do this:
1) Update the state files when you change a dimension value (APPDIR/data/state/autogen_dimensions.xml ). This would most likely be a manual process.
2) A more robust but complex solution is to change the dimension values to be some ID number and use a synonym for the display name. Then the display name can change without a change to the id number. This may require some serious changes to your pipeline.
Good luck

Adding lookups to given view

I'm just getting into Yii. I have a simple relational db. I have a "client" table related to "orders" table (client:id to orders:client_id). If I build my CRUD for orders i naturally see client_id, however I'd rather add a lookup for client name somehow
I have found how to do this on new and update _forms by adding a dropDownList. The list based views seem a little more complex. I can see that actionIndex() in the controller is gathering data and passing it to index.php, finally through to _view, but I can't find any help on where and how I should break into this with my lookup returning the client name
I'd appreciate any help
thanks
Check the Yii documentation about relations. You will need to create a relation in your orders table, lets call it client, Then in your list view where it generates client_id you can put client.name instead. Now you will need to make sure that you have the appropriate label because in the generated model it will only have a label for client_id, and you need a label for client.name. Yii will guess, or you can add it, or you can modify the label for client_id, and instead of using client.name in the view you can use
array(
'name'=>'client_id',
'value'=>$model->client->name,
)
I tend to gravitate towards more explicit definitions rather than shortcuts, but to each his own.

Magento API: Rebuild Indexes after adding new products

I am currently writing a script that lets me import multiple products in magento.
$product = Mage::getModel('catalog/product');
$product->setSku($data['sku']);
//etc etc
$product->save();
The product gets created perfectly but it won't show up in my frontend until I either save it in the backend (without changing anything!) OR I rebuild the indexes in the backend.
I did a diff on the relevant database tables to see what's changing when I save the product and added those fields to my import script, but it did not have any effect. The imported product has to be OK since it shows up when I rebuild the indexes via the backend manually.
Caching is completely disabled.
Now my question is: How can I rebuild the indexes after importing my products?
You can use such a model in Index module.
$processes = Mage::getSingleton('index/indexer')->getProcessesCollection();
$processes->walk('reindexAll');
Since you need to rebuild all the indexes, there is no filters aplied to collection. But you can filter index processes list by set of parameters (code, last time re-indexed, etc) via addFieldToFilter($field, $condition) method.
Small Suggestion
Would be great to set indexes to manual mode while you importing the products, it will help you to speed up the import process, because some of them observe product saving event , so it takes some time. You can do it in the following way:
$processes = Mage::getSingleton('index/indexer')->getProcessesCollection();
$processes->walk('setMode', array(Mage_Index_Model_Process::MODE_MANUAL));
$processes->walk('save');
// Here goes your
// Importing process
// ................
$processes->walk('reindexAll');
$processes->walk('setMode', array(Mage_Index_Model_Process::MODE_REAL_TIME));
$processes->walk('save');
There are at least two circumstances that prevent indexer to reindex a product on save.
One: the "Manual update" setting in the Indexes properties you find under System, Index Management. You should set it to "Update on Save" if you want a product to be indexed upon a save.
Two: the setIsMassupdate product flag that is used, for example, in DataFlow batch import procedures in order to prevent indexer to be triggered upon each product save method call.
Hope this helps.
Regards, Alessandro