Could someone please help me understand below?
Do we need to specify the name of the index in code when using a Sitecore solr search?
If we make the new custom index called 'sitecore_web-index_custom'. How do we make sure we are using this index in code?
Thank you.
In order to get Sitecore index, use GetIndex method from the ContentSearchManager class:
Sitecore.ContentSearch.ContentSearchManager.GetIndex(...)
You can either pass index name:
// get Sitecore built in index for current database:
string dbName = (Sitecore.Context.ContentDatabase ?? Sitecore.Context.Database).Name;
var index = Sitecore.ContentSearch.ContentSearchManager.GetIndex("sitecore_" + dbName + "_index");
// get custom index
Sitecore.ContentSearch.ContentSearchManager.GetIndex("sitecore_web-index_custom")
or Sitecore Item:
// get index by Sitecore item
Sitecore.ContentSearch.ContentSearchManager.GetIndex((SitecoreIndexableItem)item);
In the second scenario, Sitecore will try to find the index in which the item is indexed.
There is no difference between getting Solr or Lucene indexes - Sitecore API is transparent here.
More information about Sitecore search and indexing can be found in
Sitecore Search and Indexing Guide
Developer's Guide to Item Buckets and Search
Related
I have a scenario in which I need to find the largest documents in my Ravendb database.
When I select any given document in Ravendb Studio, the size is displayed in the Properties section as circled in red in this screen shot:
Is there a query I can run that will order documents by this Size property so that I can identity the largest documents?
Maybe write a method that calculates your object size, probably using reflection.
Then, create a static Map Index with a field 'size',
and set it with your method that you will provide in the 'additional sources' in the index
See https://ravendb.net/docs/article-page/4.2/Csharp/studio/database/indexes/create-map-index#additional-sources
And then you could query this index and order-by the 'size' field
fyi - you can get a specific document size using the following endpoint:
{yourServerUrl}/databases/{yourDatabaseName}/docs/size?id={yourDocumentId}
Learn about ravenDB rest api in:
https://ravendb.net/docs/article-page/4.2/csharp/client-api/rest-api/rest-api-intro
Index (Map) definition:
from doc in docs
select new {
doc.BlittableJson.Size
}
I have a Solr 7.6.0 Lucene index (lots of .pdf's, .docx and .xlsx files)
The index was created using the post command in a command window, pointing to a directory share (mapped filepath) where the files exist.
There is also a web URL for the document which I have in a database and Lucene currently knows nothing about. I would like to 'enrich' the existing index with this URL data.
Can I extract the id of the currently indexed files and then use the Solr web interface to modify the existing index, injecting the URL?
I am looking at the following tutorial for advice:
https://www.tutorialspoint.com/apache_solr/apache_solr_indexing_data.htm
The tutorial shows an example of adding a document but not modifying one.
Thanks #MatsLindh I managed to get it to work:
I used the Solr GUI to run the JSON add-field update:
{
"add-field" : {
"name":"URL",
"type":"string",
"stored":true
"indexed":true
}
}
I then inserted/set the property:
{"id":"S:\\Docs\\forIndexing\\indexThisFile_001.pdf",
"URL":{"set":"https//localhost/urlToFiles/indexThisFile_001.pdf:"}
}
I want to query an EmbeddedSolrServer instance with a Filter query. Like we normally do in the picture with an admin panel. But the problem here is that I want to do this programmatically with Java. I know that we can do that query.setQuery("*:*"); , but this is not what I want if someone want to search by a specific word in content's document. I found also this solrParams.add(CommonParams.QT, "*:*");, But it's not working. I think that may be the problem is from parsing the PDF document, when I try to index it. So please if someone know how to index a document using EmbeddedSolrServer exactly the same way we index it using post.jar in command.
Indexing a file is as easy as
EmbeddedSolrServer server = new EmbeddedSolrServer(solrHome, defaultCoreName)
ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract");
req.addFile(fileToIndex, "application/octet-stream");
req.setParam("commit", "true");
req.setParam("literal.id", id);
NamedList<Object> namedList = server.request(req);
server.close();
I create Lucene query this way:
BooleanQuery innerQuery = new BooleanQuery();
MultiFieldQueryParser queryParser = new MultiFieldQueryParser(fields.ToArray<string>(), this.SearchIndex.Analyzer);
queryParser.SetDefaultOperator(QueryParser.Operator.AND);
Query query = queryParser.Parse(QueryParser.Escape(searchExpression.ToLowerInvariant()));
if (boost.HasValue)
{
query.SetBoost(boost.Value);
}
innerQuery.Add(query, BooleanClause.Occur.SHOULD);
The problem is that when a field contains html tag, for example <a href.../>, and search expression is "href", it returns this item. Can I somehow set it to skip searching in "<>" tags?
This is actually an issue with the crawling process (i.e. what gets stored in the index) rather than the search query.
I see you're using Sitecore 6. Take a look at this pdf:
Sitecore 6.6 Search and Indexing
It has a section explaining how to make a crawler. This should allow you to parse the content however you like, so you can omit anything that's part of an HTML tag.
I have a program using Lucene that create index in a Directory (index directory) every time. As everyone knows that creating index on each and every execution is time consuming process , I want to reuse the already created index in the initial execution ?
IS it possible in Lucene . Do Lucene have this feature ?
It is absolutely possible. Assuming indexDirPath is the location of your lucene index, you can use the following code:
Directory dir = FSDirectory.open(new File(indexDirPath));
IndexReader ir = DirectoryReader.open(dir);
IndexSearcher searcher = new IndexSearcher(ir);
This should be followed by use of the appropriate Analyzer you used while creating the index.