Sitecore 8 XP ContentSearch: Exclude path from Indexing - indexing

I'm having trouble with Sitecore Indexing of the general indexes "sitecore_master_index", "sitecore_web_index", which take forever because the crawler/indexer checks all items in the database.
I imported thousands of products with a whole lot of specifications and literally have hundreds of thousands of items in the product repository.
If I could exclude the path from indexing it wouldn't have to check a million items for template exclusion.
FOLLOWUP
I implemented a custom-crawler that excludes a list of paths from being indexed:
<index id="sitecore_web_index" type="Sitecore.ContentSearch.SolrProvider.SwitchOnRebuildSolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
<param desc="name">$(id)</param>
<param desc="core">sitecore_web_index</param>
<param desc="rebuildcore">sitecore_web_index_sec</param>
<param desc="propertyStore" ref="contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" />
<configuration ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration" />
<strategies hint="list:AddStrategy">
<strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync" />
</strategies>
<locations hint="list:AddCrawler">
<crawler type="Sitecore.ContentSearch.Utilities.Crawler.ExcludePathsItemCrawler, Sitecore.ContentSearch.Utilities">
<Database>web</Database>
<Root>/sitecore</Root>
<ExcludeItemsList hint="list">
<ProductRepository>/sitecore/content/Product Repository</ProductRepository>
</ExcludeItemsList>
</crawler>
</locations>
</index>
In addition I activated SwitchOnSolrRebuildIndex as it's awesome ootb functionality, cheers SC.
using System.Collections.Generic;
using System.Linq;
using Sitecore.ContentSearch;
using Sitecore.Diagnostics;
namespace Sitecore.ContentSearch.Utilities.Crawler
{
public class ExcludePathsItemCrawler : SitecoreItemCrawler
{
private readonly List<string> excludeItemsList = new List<string>();
public List<string> ExcludeItemsList
{
get
{
return excludeItemsList;
}
}
protected override bool IsExcludedFromIndex(SitecoreIndexableItem indexable, bool checkLocation = false)
{
Assert.ArgumentNotNull(indexable, "item");
if (ExcludeItemsList.Any(path => indexable.AbsolutePath.StartsWith(path)))
{
return true;
}
return base.IsExcludedFromIndex(indexable, checkLocation);
}
}
}

You can override SitecoreItemCrawler class which is used by the index you want to change:
<locations hint="list:AddCrawler">
<crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
<Database>master</Database>
<Root>/sitecore</Root>
</crawler>
</locations>
You can then add your own parameters, e.g. ExcludeTree or even a list of ExcludedBranches.
And in the implementation of the class just override method
public override bool IsExcludedFromIndex(IIndexable indexable)
and check whether it is under excluded node.

When importing large amounts of data you should try disabling the indexing of data temporarily otherwise you'll run into issues with a crawler that can't keep up.
There's a great post here on disabling the index while importing data - it's for Lucene but I'm sure you can do the same with Solr,
http://intothecore.cassidy.dk/2010/09/disabling-lucene-indexes.html
Another option could be to store your products in a separate Sitecore database rather than in the master db.
Another post from into the core:
http://intothecore.cassidy.dk/2009/05/working-with-multiple-content-databases.html

Related

Sitecore Pipeline (indexing.filterIndex.inbound) not being called

I am trying to create a Lucene index in Sitecore 8.x of items that are visible to unauthenticated users (extranet\Anonymous). In order to do this I am trying to use the indexing.filterIndex.inbound pipeline.
I have tried writing a custom pipeline that returns false if the item cannot be read as extranet\Anonymous:
public class ApplyInboundIndexAccessFilter : InboundIndexFilterProcessor
{
public override void Process(InboundIndexFilterArgs args)
{
var item = args.IndexableToIndex as SitecoreIndexableItem;
var anonymousUser = Sitecore.Security.Accounts.User.FromName("extranet\\anonymous", false);
if (!item.Item.Security.CanRead(anonymousUser))
{
args.IsExcluded = true;
}
}
}
but at no time does this pipeline get invoked.
I have added my config (tried it with the default, before, after, with the default removed)
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
<sitecore>
<pipelines>
<indexing.filterIndex.inbound>
<processor type="MyApplication.Site.Features.ContentSearch.IndexFilters.ApplyInboundIndexAccessFilter, MyApplication.Site">
<includedIndexNames hint="list">
<indexName>siteSearchIndex_web</indexName>
</includedIndexNames>
<excludedIndexNames hint="list">
<indexName>siteSearchIndex_master</indexName>
</excludedIndexNames>
</processor>
</indexing.filterIndex.inbound>
</pipelines>
</sitecore>
</configuration>
Am I right in assuming that this should be called on indexing, if not, when?
Any suggestions would be gratefully received.

How to convert these two log4j lines to log4j2?

How can I write the following for my properties file using log4j2?
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.null=org.apache.log4j.varia.NullAppender
You can implement NullAppender as a plugin.
Plugin implementation is like this:
package myPlugins;
#Plugin(name = "NullAppenderDemo", category = "Core", elementType = "appender", printObject = true)
public classNullAppenderDemo extends AbstractAppender {
private static final long serialVersionUID = 1L;
protected NullAppenderDemo(String name, Filter filter, Layout<? extends Serializable> layout, boolean ignoreExceptions) {
super(name, filter, layout, ignoreExceptions);
}
#Override
public void append(LogEvent event) {
// Nothing is done here !!!
}
#PluginFactory
public static NullAppender createAppender(
#PluginAttribute("name") String name,
#PluginAttribute("ignoreExceptions") boolean ignoreExceptions,
#PluginElement("Layout") Layout<? extends Serializable> layout,
#PluginElement("Filters") Filter filter) {
if (name == null) {
LOGGER.error("No name provided for NullAppender");
return null;
}
return new NullAppenderDemo(name, filter, layout, ignoreExceptions);
}
}
Specify the package of the plugin class in the log4j2 configuration:
<?xml version="1.0" encoding="UTF-8"?>
<Configuration packages="myPlugins">
Use appenders (I prefer xml format to properties, but you can do the mapping according to the manual, if you prefer the properties:
<Appenders>
<NullAppender name="null">
</NullAppender>
<Console name="console">
<PatternLayout>
<pattern>
%d %level{length=2} (%c{1.}.%M:%L) - %m%n
</pattern>
</PatternLayout>
</Console>
</Appenders>
<Loggers>
<root level="info">
<appenderRef ref="console" />
</root>
<logger name="nullAppenderPackage" additivity="false">
<appenderRef ref="null" />
</logger>
</Loggers>
But actually, you can have the same effect with level="off" without NullAppender at all:
<logger name="nullAppenderPackage" level="off">
</logger>
You can find more details here.
Just curious, but what is the need for a NullAppender when you can just configure any appender to filter out everything?

sitecore search synonyms file location

I've changed my DefaultIndexConfiguration config file to search based on synonyms (http://firebreaksice.com/sitecore-synonym-search-with-lucene/) and it works fine. However this is based in a xml file in the filesystem
<param hint="engine" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.XmlSynonymEngine, Sitecore.ContentSearch.LuceneProvider">
<param hint="xmlSynonymFilePath">C:\inetpub\wwwroot\website\Data\synonyms.xml</param>
</param>
What I'd like to do is to have this data manageable in the CMS.
Does anyone know how can I set this xmlSynonymFilePath parameter to achieve what I want? Or am I missing something?
The simplest solution would be to create an item in Sitecore (e.g. /sitecore/system/synonyms) using the template with only one multi-line field called Synonyms and keep xml in this field instead of reading it from file.
Then create your custom implementation of ISynonymEngine like that (this is just simplest example - it's NOT production ready code):
public class CustomSynonymEngine : Sitecore.ContentSearch.LuceneProvider.Analyzers.ISynonymEngine
{
private readonly List<ReadOnlyCollection<string>> _synonymGroups = new List<ReadOnlyCollection<string>>();
public CustomSynonymEngine()
{
Database database = Sitecore.Context.ContentDatabase ?? Sitecore.Context.Database ?? Database.GetDatabase("web");
Item item = database.GetItem("/sitecore/system/synonyms"); // or whatever is the path
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml(item["synonyms"]);
XmlNodeList xmlNodeList = xmlDocument.SelectNodes("/synonyms/group");
if (xmlNodeList == null)
throw new InvalidOperationException("There are no synonym groups in the file.");
foreach (IEnumerable source in xmlNodeList)
_synonymGroups.Add(
new ReadOnlyCollection<string>(
source.Cast<XmlNode>().Select(synNode => synNode.InnerText.Trim().ToLower()).ToList()));
}
public IEnumerable<string> GetSynonyms(string word)
{
Assert.ArgumentNotNull(word, "word");
foreach (ReadOnlyCollection<string> readOnlyCollection in _synonymGroups)
{
if (readOnlyCollection.Contains(word))
return readOnlyCollection;
}
return null;
}
}
And register your engine in Sitecore configuration instead of default engine:
<analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.PerExecutionContextAnalyzer, Sitecore.ContentSearch.LuceneProvider">
<param desc="defaultAnalyzer" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.DefaultPerFieldAnalyzer, Sitecore.ContentSearch.LuceneProvider">
<param desc="defaultAnalyzer" type="Sitecore.ContentSearch.LuceneProvider.Analyzers.SynonymAnalyzer, Sitecore.ContentSearch.LuceneProvider">
<param hint="engine" type="My.Assembly.Namespace.CustomSynonymEngine, My.Assembly">
</param>
</param>
</param>
</analyzer>
This is NOT production ready code - it only reads the list of synonyms once when the CustomSynonymsEngine class is instantiated (I don't know if Sitecore keeps the instance or creates new instance multiple times).
You should extend this code to cache the synonyms and clear the cache every time a synonyms list is changed.
Also you should think about having a nice synonyms structure in the Sitecore tree instead of having a one item and xml blob which will be really hard to maintain.

AutoStart/Pre-warm features not working in IIS 7.5 / WCF service

For testing the many headaches of IIS/WCF implementation from scratch, I built the HelloWorld service and client walked through (very nicely) here. I added endpoints for net.tcp, and the service is working properly end-to-end for both bindings under IIS 7.5 (on Windows 7) in its own ApplicationPool called HW.
What I'm trying to get working is the announced AutoStart and Preload (or "pre-warm caching") features. I've followed the instructions laid out here and here (quite similar to one another, but always good to have a second opinion) very closely. Which means I
1) Set the application pool startMode...
<applicationPools>
<!-- ... -->
<add name="HW" managedRuntimeVersion="v4.0" startMode="AlwaysRunning" />
</applicationPools>
2) ...enabled serviceAutoStart and set a pointer to my serviceAutoStartProvider
<site name="HW" id="2">
<application path="/" applicationPool="HW" serviceAutoStartEnabled="true" serviceAutoStartProvider="PreWarmMyCache" />
<!-- ... -->
</site>
3) ...and named said provider, with the GetType().AssemblyQualifiedName of the class listed in its entirety below
<serviceAutoStartProviders>
<add name="PreWarmMyCache" type="MyWCFServices.Preloader, HelloWorldServer, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null" />
</serviceAutoStartProviders>
using System;
namespace MyWCFServices
{
public class Preloader : System.Web.Hosting.IProcessHostPreloadClient
{
public void Preload(string[] parameters)
{
System.IO.StreamWriter sw = new System.IO.StreamWriter(#"C:\temp\PreloadTest.txt");
sw.WriteLine("Preload executed {0:G}", DateTime.Now);
sw.Close();
}
}
}
Alas, all this manual configuration, plus a couple iisreset calls, and I get nothing. No w3wp.exe process firing up in Task Manager (though I get it if I launch the HelloWorldClient), no text file, and above all, no satisfaction.
There is a frustratingly scant amount of discussion about this feature, either on SO or the wider web, and the few similar questions here got little attention, all of which rings an alarm bell or two. Perhaps needlessly though--any experts out there who have been down this very road a time or two care to chime in? (Happy to offer up the entire solution if you can suggest a good place to host it.)
EDIT: I tried resetting that path in the Preload method to the relative App_Data folder (another SO answer suggested that), didn't matter. Also, I learned the w3wp.exe process fires on a simple browse to the localhost. The process consumes an impressive 17MB of memory to serve up its single tiny OperationContract, while for the price offering zero Preload value. 17MB of ColdDeadCache.
This is a slightly different approach for your problem:
Use Windows Server AppFabric for service auto-start
Use WCF infrastructure to execute custom startup code
Re 1: The Appfabric AutoStart feature should just work out of the box (provided you're not using MVC's ServiceRoute to register your services, they MUST be specified either in the Web.config's serviceActivations section or using physical *.svc files.
Re 2: To inject custom startup code into the WCF pipeline you could use an attribute like this:
using System;
using System.ServiceModel;
using System.ServiceModel.Description;
namespace WCF.Extensions
{
/// <summary>
/// Allows to specify a static activation method to be called one the ServiceHost for this service has been opened.
/// </summary>
[AttributeUsage(AttributeTargets.Class, AllowMultiple = true, Inherited = false)]
public class ServiceActivatorAttribute : Attribute, IServiceBehavior
{
/// <summary>
/// Initializes a new instance of the ServiceActivatorAttribute class.
/// </summary>
public ServiceActivatorAttribute(Type activatorType, string methodToCall)
{
if (activatorType == null) throw new ArgumentNullException("activatorType");
if (String.IsNullOrEmpty(methodToCall)) throw new ArgumentNullException("methodToCall");
ActivatorType = activatorType;
MethodToCall = methodToCall;
}
/// <summary>
/// The class containing the activation method.
/// </summary>
public Type ActivatorType { get; private set; }
/// <summary>
/// The name of the activation method. Must be 'public static void' and with no parameters.
/// </summary>
public string MethodToCall { get; private set; }
private System.Reflection.MethodInfo activationMethod;
#region IServiceBehavior
void IServiceBehavior.AddBindingParameters(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase, System.Collections.ObjectModel.Collection<ServiceEndpoint> endpoints, System.ServiceModel.Channels.BindingParameterCollection bindingParameters)
{
}
void IServiceBehavior.ApplyDispatchBehavior(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase)
{
serviceHostBase.Opened += (sender, e) =>
{
this.activationMethod.Invoke(null, null);
};
}
void IServiceBehavior.Validate(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase)
{
// Validation: can get method
var method = ActivatorType.GetMethod(name: MethodToCall,
bindingAttr: System.Reflection.BindingFlags.Static | System.Reflection.BindingFlags.Public,
callConvention: System.Reflection.CallingConventions.Standard,
types: Type.EmptyTypes,
binder: null,
modifiers: null);
if (method == null)
throw new ServiceActivationException("The specified activation method does not exist or does not have a valid signature (must be public static).");
this.activationMethod = method;
}
#endregion
}
}
..which can be used like this:
public static class ServiceActivation
{
public static void OnServiceActivated()
{
// Your startup code here
}
}
[ServiceActivator(typeof(ServiceActivation), "OnServiceActivated")]
public class YourService : IYourServiceContract
{
}
That's the exact approach we've been using for quite a while and on a large number of services. The extra benefit of using a WCF ServiceBehavior for custom startup code (as opposed to relying on the IIS infrastructure) is that it works in any hosting environment (incl. self-hosted) and can be more easily tested.
I know this sounds absurd but I faced the same issue (w3wp.exe not firing automatically after making the config changes) and it was because I hadn't run the text editor in Admin mode when I was editing the applicationHost.config file. Stupid mistake on my part.
In my defense I was using Notepad++ which told me it was saving when it actually wasn't.
I've done the same. it works...
In preload method I have some code copied from a nice white paper available here!
Preload method looks like...
public void Preload(string[] parameters)
{
bool isServceActivated = false;
int attempts = 0;
while (!isServceActivated && (attempts <10))
{
Thread.Sleep(1 * 1000);
try
{
string virtualPath = "/Test1/Service1.svc";
ServiceHostingEnvironment.EnsureServiceAvailable(virtualPath);
isServceActivated = true;
}
catch (Exception exception)
{
attempts++;
//continue on these exceptions, otherwise fail fast
if (exception is EndpointNotFoundException ||
exception is ServiceActivationException ||
exception is ArgumentException)
{
//log
}
else
{
throw;
}
}
}
}
Maybe you are on a 64-bit system? There is a known "feature" in Windows where the save gets redirected to the 32 bit folder and thus no changes will be picked up
(I have converted my comment to an answer as answers might be easier to find)

How to do Excel File upload in struts2?

I am trying to upload Excel only file in Struts2 WEB Application. So using the following Code in Jsp page:
<s:label value="File Name : *" />
<s:file name="fileUpload" label="Select a File to upload"/>
<br>
<br>
<s:submit value="Add" name="add" tabindex="8" />
</s:form>
In Response.jsp page displaying the File content and It's type as Follows:
<s:form action="saveBulkStores.action" method="get" >
<h4>
File Name : <s:property value="fileUploadFileName"/>
</h4>
<h4>
Content Type : <s:property value="fileUploadContentType"/>
</h4>
<h4>
File : <s:property value="fileUpload"/>
</h4>
<br>
</s:form>
In Struts.xml:
<action name="bulkStores" class="com.action.FilesUploadAction"
method="loadBulkStoresPage">
<result name="input">/viewfile.jsp</result>
<result name="success">/uploadfile.jsp</result>
</action>
<action name="saveBulkStores" class="com.action.FilesUploadAction"
method="saveBulkStores">
<interceptor-ref name="exception"/>
<interceptor-ref name="i18n"/>
<interceptor-ref name="fileUpload">
<param name="allowedTypes">text/plain</param>
<param name="maximumSize">10240</param>
</interceptor-ref>
<interceptor-ref name="params">
<param name="excludeParams">dojo\..*,^struts\..*</param>
</interceptor-ref>
<interceptor-ref name="validation">
<param name="excludeMethods">input,back,cancel,browse</param>
</interceptor-ref>
<interceptor-ref name="workflow">
<param name="excludeMethods">input,back,cancel,browse</param>
</interceptor-ref>
<result name="input">/uploadfile.jsp</result>
<result name="success">/success.jsp</result>
</action>
In Action Classes:
public String loadBulkStoresPage(){
System.out.println("FILES BULK UPLOADS.........");
return SUCCESS;
}
private File fileUpload;
private String fileUploadContentType;
private String fileUploadFileName;
//Getters and Setters for above Fields.
Displaying filename,content type as follows:
public String saveBulkStores(){
System.out.println("check Bulk upload file");
System.out.println("fileName:"+fileUploadFileName);
System.out.println("content type:"+fileUploadContentType);
System.out.println("fileupload:"+fileUpload);
return SUCCESS;
}
Output:
It's displaying NUll value only for my display statements. So anyone help me to fix this issue. thanks in Advance.
I am new to do this task.
FileUtils is not part of the standard JDK, it a class in the Apache Commons IO library.It contains Methods like
FileUtils.readFileToString(file);
FileUtils.copyDirectoryToDirectory(source, destination);
FileUtils.deleteDirectory(source);......................which make copying files from one directory to other directory in most easier way in word it reduces voluminous amount of code as entire code is written by Apache people
please refer this URL for reference http://www.koders.com/java/fid002DB6BD79CABA7B6AA0F2669061424E3B9776D3.aspx
<s:file id="myFile" name="myFile" disabled="true" ></s:file> is the tag that is used to upload file put the tag in jsp.
And the properties in action class as i written below
private File myFile;
private String myFileContentType;
private String myFileFileName;
/**
* #return the myFile
*/
public File getMyFile() {
return myFile;
}
/**
* #return the myFileContentType
*/
public String getMyFileContentType() {
return myFileContentType;
}
/**
* #return the myFileFileName
*/
public String getMyFileFileName() {
return myFileFileName;
}
/**
* #param myFile the myFile to set
*/
public void setMyFile(File myFile) {
this.myFile = myFile;
}
/**
* #param myFileContentType the myFileContentType to set
*/
public void setMyFileContentType(String myFileContentType) {
this.myFileContentType = myFileContentType;
}
/**
* #param myFileFileName the myFileFileName to set
*/
public void setMyFileFileName(String myFileFileName) {
this.myFileFileName = myFileFileName;
}
and In execute method insert this code
String destaddress=getText("ipaddress");
if(myFile!=null){
File destDir= new File(destaddress+myFileFileName);
destDir.createNewFile();
FileUtils.copyFile(myFile, destDir);
}
you add ip address in apllication properties file as show below
ipaddress=\\\\192.168.105.68\\Shared2\\Files\\
and the following interceptors to your struts. XML file
<interceptor-ref name="fileUpload">
<param name="maximumSize">52428800</param>
</interceptor-ref>
<interceptor-ref name="basicStack"/>
ipaddress=\\\\192.168.105.68\\Shared2\\Files\\
This will solve your problem.