How do I get accurate search result in Lucene using Query syntax - lucene

So far I have been testing the keywords that I inputted in Sitecore using the query syntax but the search result does not rank the page first.
For example if I put query syntax on the word book....(title:book)^1
I want the index page that is name book to appear first in the search result and not bookmark.
Also, every time I publish a new page in Sitecore the keywords for the word Book get push down to the last result or doesn't appear in the search page.
How do I get accurate result in Lucene for the search engine page?
Also I've been following http://www.lucenetutorial.com/lucene-query-syntax.html about how to increase search result but it doesn't work.
Can someone explain how the boost of the search term works.

I recommend you leverage the Advanced Database Crawler to get the best use of Lucene.NET with Sitecore. From that, there's a config file for the indexes with a section called <dynamicFields ... >. In that section, you can specify an individual Sitecore field and adjust the boost attribute. The default boost for every field is 1f which is 1 floating point.
More reading:
Sitecore Searcher and Advanced Database Crawler
Source code for the ADC

Related

Sitecore: Full text search using lucene

I'm using sitecore 8 and I'm looking for a way to run a full text search for all my sitecore content. I have a solution in place, but I feel there's got to be a better way to do this.
My approach:
i have a computed field that merges all text fields into a single computed field. Before I execute a search I tokenize my search text and build a ORed predicate to match on the field.
I do not like this approach because it gets really complicated if I need to boost items that match the title vs the body i.e. i loose the field level boosting.
FYI: my code is very similar to this so post.
Thanks
Sitecore already maintains a full text field, _content, that contains all the text fields. You can run your search against that. You can even create computed fields that add to _content (such as the datasource content example here).
So assuming you are building a LINQ query for your full text search, and have already filtered on templates, latest version, location, etc., adding your search terms to the query would look something like this:
var terms = SearchTerm.Split();
var currentExpression = PredicateBuilder.True<SiteSearchResultItem>();
foreach (var term in terms)
{
//Content is mapped to _content
currentExpression = PredicateBuilder.And(currentExpression, x => x.Content.Contains(term));
}
query = query.Where(currentExpression);
Typically you would want to AND search terms rather than ORing them.
You are right that field level boosting is lost in this. In the end, Lucene is not a great solution for creating a quality full-text site search. If this is an important requirement, you may want to look at Coveo or even something like a Google Site Search.

Lucene Field.Store.YES versus Field.Store.NO

Will someone please explain under what circumstance I may use Field.Store.NO instead of Field.Store.YES? I am extremely new to Lucene. And I am trying to create a document. Per my basic knowledge, I am doing
doc.add(new StringField(fieldNameA,fieldValueA,Field.Store.YES));
doc.add(new TextField(fieldNameB,fieldValueB,Field.Store.YES));
There are two basic ways a document can be written into Lucene.
Indexed - The field is analyzed and indexed, and can be searched.
Stored - The field's full text is stored and will be returned with search results.
If a document is indexed but not stored, you can search for it, but it won't be returned with search results.
One reasonably common pattern is to use lucene for search, but only have an ID field being stored which can be used to retrieve the full contents of the document/record from, for instance, a SQL database, a file system, or an web resource.
You might also opt not to store a field when that field is just a search tool, but you wouldn't display it to the user, such as a soundex/metaphone, or an alternate analysis of a content field.
Use Field.Store.YES when you need a document back from Lucene document. Use NO when you just need a search from document. Here is a link explained with a scenario.
https://handyopinion.com/java-lucene-saving-fields-or-not/

Sitecore Search Ranking

Does Sitecore/Lucene support filtering/ranking of content?
I cannot find any related documentation.
Lucene returns ranked results, and you can structure queries to filter results using the QueryOccurance.MustNot clause, or to boost results using the QueryOccurance.Should clause.
From Sitecore's documentation of the QueryOccurance class:
Lucene uses the following operators for the search terms in complex
queries:
 Must – the search term must occur in the document to be
included into the search results.
 Should – the search term may occur
in the document but is not necessary, and the document may be
included in search results based on other criteria. However, the
documents containing the search term are ranked higher than
equivalent documents that do not contain the search term.
 Must not
– the search term must not occur in the document in order to be
included in the search results. Documents with the search term will
be excluded from the results
Some additional resources for Lucene in Sitecore:
Syntax of Lucene Queries: http://sitecoregadgets.blogspot.com/2009/11/working-with-lucene-search-index-in_25.html
Lucene Walkthrough: http://learnsitecore.cmsuniverse.net/en/Developers/Articles/2009/06/LuceneQuery1.aspx
Alex Shyba's Lucene posts: http://sitecoreblog.alexshyba.com/search/label/lucene
This question may also be useful: Sitecore + Lucene + QueryOccurance.Should not returning desired results
Sitecore has built-in sitecore_master_content, sitecore_web_content, sitecore_core_content indexes which are indexing all the content in Sitecore and already have an API to search for these indexes. You can specify boosting value in Sitecore "Indexing" item section (by default it's empty).
Also you can set boosting for the fields in your search query.

How to find href=blah but not href=/blah with Full-text search

I'm currently using the query
SELECT Url FROM Link WHERE CONTAINS(Url, 'href=blah')
It is including results with href=/blah. Any way I can tell the query to act more like WHERE Url LIKE '%href=blah%' and still use the full-text catalog?
Your problem is that = and / are both word breakers, in other words, sql fulltext is actually searching for href and blah
There are a couple of options you could try. First you could filter down the search domain using the fulltext engine, then search the subset of data using LIKE. You'll need to experiment to see how to squeeze out the best performance.
The other option is, if href=blah is a consistent term you could add that to a custom dictionary. A good article on this is here.

Retrieving per keyword/field match position in Lucene Solr -- possible?

Is there any way to retrieve the match field/position for each keyword for each matching document from solr?
For example, if the document has title "Retrieving per keyword/field match position in Lucene Solr -- possible?" and the query is "solr keyword", I'd like to get, in addition to the doc-id (I normally only want the doc-id, not the full document), something that can tell me the matches are at:
solr:
title: 9
keyword:
title: 3
I'm pretty sure such info is computing during query execution (for phrase queries), but is it possible to return these to the application?
Thanks!
Debugging Relevance Issues in Search suggest using Solr analysis, which you can get to from the admin URL, using something like http://localhost:8983/solr/admin/analysis.jsp?highlight=on .
This highlights matching terms and gives their position.
AFAIK there is no way to do that directly, but you can use hit highlighting to implement it.