How to use Sitecore Solr Custom Index - indexing

Could someone please help me understand below?
Do we need to specify the name of the index in code when using a Sitecore solr search?
If we make the new custom index called 'sitecore_web-index_custom'. How do we make sure we are using this index in code?
Thank you.

In order to get Sitecore index, use GetIndex method from the ContentSearchManager class:
Sitecore.ContentSearch.ContentSearchManager.GetIndex(...)
You can either pass index name:
// get Sitecore built in index for current database:
string dbName = (Sitecore.Context.ContentDatabase ?? Sitecore.Context.Database).Name;
var index = Sitecore.ContentSearch.ContentSearchManager.GetIndex("sitecore_" + dbName + "_index");
// get custom index
Sitecore.ContentSearch.ContentSearchManager.GetIndex("sitecore_web-index_custom")
or Sitecore Item:
// get index by Sitecore item
Sitecore.ContentSearch.ContentSearchManager.GetIndex((SitecoreIndexableItem)item);
In the second scenario, Sitecore will try to find the index in which the item is indexed.
There is no difference between getting Solr or Lucene indexes - Sitecore API is transparent here.
More information about Sitecore search and indexing can be found in
Sitecore Search and Indexing Guide
Developer's Guide to Item Buckets and Search

Related

How to query Ravendb document size

I have a scenario in which I need to find the largest documents in my Ravendb database.
When I select any given document in Ravendb Studio, the size is displayed in the Properties section as circled in red in this screen shot:
Is there a query I can run that will order documents by this Size property so that I can identity the largest documents?
Maybe write a method that calculates your object size, probably using reflection.
Then, create a static Map Index with a field 'size',
and set it with your method that you will provide in the 'additional sources' in the index
See https://ravendb.net/docs/article-page/4.2/Csharp/studio/database/indexes/create-map-index#additional-sources
And then you could query this index and order-by the 'size' field
fyi - you can get a specific document size using the following endpoint:
{yourServerUrl}/databases/{yourDatabaseName}/docs/size?id={yourDocumentId}
Learn about ravenDB rest api in:
https://ravendb.net/docs/article-page/4.2/csharp/client-api/rest-api/rest-api-intro
Index (Map) definition:
from doc in docs
select new {
doc.BlittableJson.Size
}

Modify existing Solr 7.6.0 / Lucene index (add another field 'URL' to an already indexed file (.pdf, .docx etc.))

I have a Solr 7.6.0 Lucene index (lots of .pdf's, .docx and .xlsx files)
The index was created using the post command in a command window, pointing to a directory share (mapped filepath) where the files exist.
There is also a web URL for the document which I have in a database and Lucene currently knows nothing about. I would like to 'enrich' the existing index with this URL data.
Can I extract the id of the currently indexed files and then use the Solr web interface to modify the existing index, injecting the URL?
I am looking at the following tutorial for advice:
https://www.tutorialspoint.com/apache_solr/apache_solr_indexing_data.htm
The tutorial shows an example of adding a document but not modifying one.
Thanks #MatsLindh I managed to get it to work:
I used the Solr GUI to run the JSON add-field update:
{
"add-field" : {
"name":"URL",
"type":"string",
"stored":true
"indexed":true
}
}
I then inserted/set the property:
{"id":"S:\\Docs\\forIndexing\\indexThisFile_001.pdf",
"URL":{"set":"https//localhost/urlToFiles/indexThisFile_001.pdf:"}
}

Indexing a document with content using solrj in EmbeddedSolrServer

I want to query an EmbeddedSolrServer instance with a Filter query. Like we normally do in the picture with an admin panel. But the problem here is that I want to do this programmatically with Java. I know that we can do that query.setQuery("*:*"); , but this is not what I want if someone want to search by a specific word in content's document. I found also this solrParams.add(CommonParams.QT, "*:*");, But it's not working. I think that may be the problem is from parsing the PDF document, when I try to index it. So please if someone know how to index a document using EmbeddedSolrServer exactly the same way we index it using post.jar in command.
Indexing a file is as easy as
EmbeddedSolrServer server = new EmbeddedSolrServer(solrHome, defaultCoreName)
ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract");
req.addFile(fileToIndex, "application/octet-stream");
req.setParam("commit", "true");
req.setParam("literal.id", id);
NamedList<Object> namedList = server.request(req);
server.close();

Sitecore Lucene search - skip html tags

I create Lucene query this way:
BooleanQuery innerQuery = new BooleanQuery();
MultiFieldQueryParser queryParser = new MultiFieldQueryParser(fields.ToArray<string>(), this.SearchIndex.Analyzer);
queryParser.SetDefaultOperator(QueryParser.Operator.AND);
Query query = queryParser.Parse(QueryParser.Escape(searchExpression.ToLowerInvariant()));
if (boost.HasValue)
{
query.SetBoost(boost.Value);
}
innerQuery.Add(query, BooleanClause.Occur.SHOULD);
The problem is that when a field contains html tag, for example <a href.../>, and search expression is "href", it returns this item. Can I somehow set it to skip searching in "<>" tags?
This is actually an issue with the crawling process (i.e. what gets stored in the index) rather than the search query.
I see you're using Sitecore 6. Take a look at this pdf:
Sitecore 6.6 Search and Indexing
It has a section explaining how to make a crawler. This should allow you to parse the content however you like, so you can omit anything that's part of an HTML tag.

How to reuse an Index that already created using Apache Lucene?

I have a program using Lucene that create index in a Directory (index directory) every time. As everyone knows that creating index on each and every execution is time consuming process , I want to reuse the already created index in the initial execution ?
IS it possible in Lucene . Do Lucene have this feature ?
It is absolutely possible. Assuming indexDirPath is the location of your lucene index, you can use the following code:
Directory dir = FSDirectory.open(new File(indexDirPath));
IndexReader ir = DirectoryReader.open(dir);
IndexSearcher searcher = new IndexSearcher(ir);
This should be followed by use of the appropriate Analyzer you used while creating the index.