Sitecore document field is null after building the index - indexing

I am facing a problem, after publishing the web in sitecore then building the sitecore_web_index the items are indexed but with document filed value of null!
I am working on Sitecore 8.1 on windows 10
using luke to see whats going on
any suggestions how to solve this problem?

Check the storage type on the index for those fields. By default a lot of the fields in the Lucene indexes are set to storageType="NO" - this will index the field content but not store the data in the index, so the fields will always appear empty in the results.
Example config from Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config:
<fieldTypes hint="raw:AddFieldByFieldTypeName">
<fieldType fieldTypeName="attachment" storageType="NO" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider" />
<!-- omitted for brevity -->
</fieldTypes>
If you need to see the field contents in the results, set storageType="YES" for the require field types in the config. Note tho that this will increase the size of your index.

Related

How do I get empty fields in SOLR indexed for a schemaless collection?

How do I get empty fields in SOLR indexed? I am using solr 7.2.0
I am using schemaless SOLR to try to index everything as string, but for files with empty fields, those fields do not get indexed. Is there a way to get them to show up?
col1,col2,col3
a,,1
d,e,
g,h,3
for example column 1 shows up as
{
"col1":"a",
"col3":"1",
}
I'm trying to also get col2 to show up.
in my solrconfig.xml i have this
<dynamicField name="*" type="text_general" indexed="true" stored="true" required="true" default="" />
and I have any traces of the remove-blank processor removed from my config. I've reloaded and deleted/recreated by collection multiple times. Is there a solution for this?
The CSV import module has its own option to keep empty fields - f.<field name>.keepEmpty=true.
If you don't give that option, the CSV handler will never give the empty field value to the next step in your indexing process.
Giving f.col2.keepEmpty=True as an URL argument should at least give you a better starting point.
maybe preprocess your csv file like this:
s/,,/, ,/g
That is, add an space between both commas (you will have to specially deal with the last value differntly though, there is a regex for that).
And then try again. Right now solr is reading the value as non existant, making it a space has more chances to make it through, and would not change search results (if you don't have some crazy analysis chains)

Sitecore 8.1: Custom Search Index not searching through PDF

I have a custom search index that I want to index pdf file content. The master index seems to be indexing pdf files fine and sitecore's built in search functionality searches through the pdf files perfectly fine. I seem to be having an issue on trying to index the PDF field and then search the contents of it.
In my indexConfiguration i add the filed by name
<fieldNames hint="raw:AddFieldByFieldName">
<field fieldName="publication pdf" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider" />
...
</fieldNames>
My results Item contains index field definition
[IndexField("publication pdf")]
public virtual string PDF { get; set; }
However when I create search context and try to find something inside the PDF, i get 0 results.
var query = context.GetQueryable<ResultItem>();
query = query.Where(p => p.PDF.Equals(SearchString));
Any help is greatly appreciated.
I'm guessing your "Publication PDF" field is some kind of reference field to a media library item. Content of the PDF is in fact not content of your current item. This means that you would need to write a custom computed field that would extract that media library item and crawl its content.
If you want to crawl content of a media item, you might want to use some reflector to check the code of Sitecore.ContentSearch.ComputedFields.MediaItemContentExtractor class. It's used by Sitecore to get the content of media items, as defined in Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config:
<field fieldName="_content" type="Sitecore.ContentSearch.ComputedFields.MediaItemContentExtractor,Sitecore.ContentSearch">
<mediaIndexing ref="contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/mediaIndexing"/>
</field>
You would need to first get media item and then use code copied from this class to get the content of PDF.
BUT
Yeah, there is always but. If the media library item has changed and your item has not changed, your item will not be reindexed automatically. So if you plan to change pdfs (uploading new item and selecting it should be fine), you would need either think about custom code that would execute reindexing of the item which holds reference to your pdf file, or manually reindex your item.

Search Predicate Builder

I am using Lucene search with Sitecore 7.2 and using predicate builder to search for data. I have included a computed field in the index which is a string. When I search on that field using .Contains(mystring), it fails when there is 'and' present in mystring. If there is no 'and' in the mystring it works.
Can you please suggest me anything?
Lucene by default, when the field and query is processed, will strip out what are called "stop words" such as and and the etc.
If you dont want this behaviour you can add an entry into the fieldMap section of your configuration to tell Sitecore how to process the field ...
<fieldNames hint="raw:AddFieldByFieldName">
<field fieldName="YOURFIELDNAME" storageType="YES" indexType="UN_TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider">
<analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" />
</field>
...
</fieldNames>
.. this example tells Sitecore, for that field, to not tokenize and also to put everything into lowercase. You can change to different analyzers to get the results you want.
You can try setting the indexType to TOKENIZED but still using the LowerCaseKeywordAnalyzer as another combination. UN_TOKENIZED will mean that your string will be processed as a single token which may not be what you want.
I have solved it, taking a hint from #Stephen Pope 's reply. In order to make your computed field untokenized you have to add it to both raw:AddFieldByFieldName and AddComputedIndexField.
See link below
http://www.sitecore.net/Community/Technical-Blogs/Martina-Welander-Sitecore-Blog/Posts/2013/09/Sitecore-7-Search-Tips-Computed-Fields.aspx

Upgrading sitecore 6.6 index configuration to sitecore 7 (using ComputedFields)

Sitecore CMS+DMS 6.6.0 rev.130404 => 7.0 rev.130424
In our project we have been using AdvancedDatabaseCrawler (ADC) for our indexes (specially because of it's dynamic fields feature). Here's a sample index configuration:
<index id="GeoIndex" type="Sitecore.Search.Index, Sitecore.Kernel">
<param desc="name">$(id)</param>
<param desc="folder">$(id)</param>
<analyzer ref="search/analyzer" />
<locations hint="list:AddCrawler">
<web type="scSearchContrib.Crawler.Crawlers.AdvancedDatabaseCrawler, scSearchContrib.Crawler">
<database>web</database>
<root>/sitecore/content/Globals/Locations</root>
<IndexAllFields>true</IndexAllFields>
<include hint="list:IncludeTemplate">
<!--Suburb Template-->
<suburb>{FF0D64AA-DCB4-467A-A310-FF905F9393C0}</suburb>
</include>
<dynamicFields hint="raw:AddDynamicFields">
<dynamicField type="OurApp.CustomSearchFields.SearchTextField,OurApp" name="search text" storageType="NO" indexType="TOKENIZED" vectorType="NO" />
<dynamicField type="OurApp.CustomSearchFields.LongNameField,OurApp" name="display name" storageType="YES" indexType="UN_TOKENIZED" vectorType="NO" />
</dynamicFields>
</web>
</locations>
</index>
As you can see, we use scSearchContrib.Crawler.Crawlers.AdvancedDatabaseCrawler as the crawler and it uses the fields defined inside <dynamicFields hint="raw:AddDynamicFields"> section to inject custom fields into the index.
Now we are upgrading our project to sitecore 7. In Sitecore 7, they have ported the DynamicFields functionality from ADC into sitecore. I found out some articles on this and converted our custom search field classes to implement sitecore 7 IComputedIndexField interface instead of inheriting from BaseDynamicField class in ADC. Now my problem is how to change the index configuration to match with new sitecore 7 APIs. There were bits and pieces on the web but couldn't find all the examples I needed to convert my configuration. Can anybody help me on this?
While I'm doing this I'm under the impression that we won't have to rebuild our indexes since it still uses Lucene internally. I don't want to change the index structure. Just want to upgrade the code and configuration from AdvancedDatabaseCrawler to Sitecore 7. Should I be worried about breaking our existing indexes? Please shed some light on this as well.
Thanks
A few quick clarifications :)
We have not merged ADC into Sitecore 7, the ContentSearch layer is a complete rewrite of the old search layer for Sitecore. We have taken some of the core concepts from ADC, such as dynamic fields, and put them in the new implementation (as ComputedFields). They are not 1:1 compatible and you will have to do some work on your indexes.
The version of Lucene has also been upgraded from 2.* to 3.0.3 so all indexes will need to be re-created anyway as they are a very different version of Lucene.
There are two options here, the old Lucene search (Sitecore.Search namespace) (which ADC was built upon) has not been touched and will still work in the same way, although I am not sure about ADC compatibility with SItecore 7 as in theory this has now been superseded.
The next option is to update your index to take advantage of the new search features of Sitecore 7. The configuration you have will not be directly compatible but, while you will need to rework your index into the new configuration, the basic concepts should be familiar to you. The dynamic fields you have should map logically to ComputedFields (fields that are calculated when an item is indexed) and everything else is straightforward.
While it looks like a lot of extra config for ContentSearch a lot of it is default config that you will not need to touch, you will just need to override the configuration parts for the computed fields you want to add and the template you want to include.
An example of creating your own configuration override can be found here : http://www.mikkelhm.dk/post/2013/10/12/Defining-a-custom-index-in-Sitecore-7-and-utilizing-it.aspx
I would also recommend making sure you upgrade to 7.0 rev. 131127 (7.0 Update-3) as this fixes a bug in the IncludeTemplates logic in the version you currently have.
I managed to convert the index configuration for sitecore ContentSearch API. Looking at Sitecore default index configurations was a great help for this.
Note: As also mentioned by Stephen, <include hint="list:IncludeTemplate"> does not work in Sitecore 7.0 initial release. It's fixed in Sitecore 7.0 rev. 131127 (7.0 Update-3) and I'm planning to upgrade to it.
Here's a good article on sitecore 7 index update strategies by John West. It'll help you in configuration your indexes the way you want.
Converted configuration:
<sitecore>
<contentSearch>
<configuration type="Sitecore.ContentSearch.LuceneProvider.LuceneSearchConfiguration, Sitecore.ContentSearch.LuceneProvider">
<DefaultIndexConfiguration type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">
<IndexAllFields>true</IndexAllFields>
<include hint="list:IncludeTemplate">
<!--Suburb Template-->
<suburb>{FF0D64AA-DCB4-467A-A310-FF905F9393C0}</suburb>
</include>
<fields hint="raw:AddComputedIndexField">
<field fieldName="search text" storageType="NO" indexType="TOKENIZED" vectorType="NO">OurApp.CustomSearchFields.SearchTextField,OurApp</field>
<field fieldName="display name" storageType="YES" indexType="UN_TOKENIZED" vectorType="NO">OurApp.CustomSearchFields.LongNameField,OurApp</field>
</fields>
</DefaultIndexConfiguration>
<indexes hint="list:AddIndex">
<index id="GeoIndex" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider">
<param desc="name">$(id)</param>
<param desc="folder">$(id)</param>
<!-- This initializes index property store. Id has to be set to the index id -->
<param desc="propertyStore" ref="contentSearch/databasePropertyStore" param1="$(id)" />
<strategies hint="list:AddStrategy">
<!-- NOTE: order of these is controls the execution order -->
<strategy ref="contentSearch/indexUpdateStrategies/onPublishEndAsync" />
</strategies>
<commitPolicy hint="raw:SetCommitPolicy">
<policy type="Sitecore.ContentSearch.TimeIntervalCommitPolicy, Sitecore.ContentSearch" />
</commitPolicy>
<commitPolicyExecutor hint="raw:SetCommitPolicyExecutor">
<policyExecutor type="Sitecore.ContentSearch.CommitPolicyExecutor, Sitecore.ContentSearch" />
</commitPolicyExecutor>
<locations hint="list:AddCrawler">
<crawler type="Sitecore.ContentSearch.LuceneProvider.Crawlers.DefaultCrawler, Sitecore.ContentSearch.LuceneProvider">
<Database>web</Database>
<Root>/sitecore/content/Globals/Countries</Root>
</crawler>
</locations>
</index>
</indexes>
</configuration>
</contentSearch>
</sitecore>

Sitecore Lucene Index not including all fields

I created a new index which uses default database crawler. I can't get it to index all the fields on 5 templates that I specified.
I am using the IndexViewer module to check for the above fields. In the available fields, it lists all the fields that I want indexed but is only indexing the following fields - _url, _group, _name, and _tags.
I also wrote some code to test against the index fields and I am getting the desired results. I just need my index to include all the fields on the specified templates. Below is my configuration for the index.
<index id="Articles" type="Sitecore.Search.Index, Sitecore.Kernel">
<param des="name">$(id)</param>
<param des="folder">__articles</param>
<Analyzer ref="search/analyzer"/>
<locations hint="list:AddCrawler">
<customindex type="Sitecore.Search.Crawlers.DatabaseCrawler, Sitecore.Kernel">
<Database>web</Database>
<Root>/sitecore/content/[websitehome]</Root>
<Tags>articles</Tags>
<IndexAllFields>true</IndexAllFields>
<include hint="list:IncludeTemplate">
<template1>{C4663677-909E-4C4D-AB3E-78AADBB36CF7}</template1>
<template2>{444D1797-1EA9-46F2-988D-2211CF926501}</template2>
<template3>{1A859C38-FFFA-4102-BF7F-9E670495C3AF}</template3>
<template4>{6EA89465-C6C4-4643-9589-188FBB180883}</template4>
<template5>{52F0AB89-E9C3-4D10-9242-ACB669841C41}</template5>
</include>
</customindex>
</locations>
Try using the Lukeall tool for observing index - IndexViewer may not show unstored fields. To use Lukeall just select the C:\inetpub\wwwroot\Sitecore\Data\indexes__articles folder, check "read-only" and "force unlock" and click OK.
To make Lucene store index value - set storageType="YES" on field definition.
<fieldMap type="Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch">
<fieldNames hint="raw:AddFieldByFieldName">
<field fieldName="_uniqueid" storageType="YES"
I figured this out. The index was including fields just not their values. Not sure if this is the desired functionality of the index but when I query against it, I get results back.