Sitecore Search Ranking - lucene

Does Sitecore/Lucene support filtering/ranking of content?
I cannot find any related documentation.

Lucene returns ranked results, and you can structure queries to filter results using the QueryOccurance.MustNot clause, or to boost results using the QueryOccurance.Should clause.
From Sitecore's documentation of the QueryOccurance class:
Lucene uses the following operators for the search terms in complex
queries:
 Must – the search term must occur in the document to be
included into the search results.
 Should – the search term may occur
in the document but is not necessary, and the document may be
included in search results based on other criteria. However, the
documents containing the search term are ranked higher than
equivalent documents that do not contain the search term.
 Must not
– the search term must not occur in the document in order to be
included in the search results. Documents with the search term will
be excluded from the results
Some additional resources for Lucene in Sitecore:
Syntax of Lucene Queries: http://sitecoregadgets.blogspot.com/2009/11/working-with-lucene-search-index-in_25.html
Lucene Walkthrough: http://learnsitecore.cmsuniverse.net/en/Developers/Articles/2009/06/LuceneQuery1.aspx
Alex Shyba's Lucene posts: http://sitecoreblog.alexshyba.com/search/label/lucene
This question may also be useful: Sitecore + Lucene + QueryOccurance.Should not returning desired results

Sitecore has built-in sitecore_master_content, sitecore_web_content, sitecore_core_content indexes which are indexing all the content in Sitecore and already have an API to search for these indexes. You can specify boosting value in Sitecore "Indexing" item section (by default it's empty).
Also you can set boosting for the fields in your search query.

Related

lucene multiple documents created for large and unique number of resources?

I am a beginner in lucene search.If I have a collection resources like:
id,name,{list of products},{list of keywords}.If I want to search based on name or products or keyword.I have some doubts related to lucene and its usage:
1)For document creation, I create a document that has the structure of id,name,products(multiple values),keywords(multiple values).If I have a thousand unique resources, will it create 1000 unique documents?
2)Also, If I make name and products field as searchable fields(as StringField), then after searching, will the result also contains(ScoreDocs contains) exactly the same set of documents that has the text I searched?
Q> <..> will it create 1000 unique documents?
A> Lucene doesn't have the concept of "uniqueness" - it is only in your head. Alternatively, think of this as if all documents are unique for Lucene. If you search by these fields, relevant documents will be returned.
Q> <..> will the result also contains(ScoreDocs contains) exactly the same set of documents that has the text I searched?
A> Strange/unclear question. If you search for all documents, you will get all documents. If your search query will only match some documents, some documents will be returned. The internals are more complex - it all depends on how you analyze the text. Maybe you can more give concrete example with use cases?

In accordance with the user name query in lucene

I want to provide an search function on a blog website. But I want to search not only on whole documents, but also I want search on just one author's documents.
As I want use lucene to provide Full-text index,how can I do this when create index?
Indexing the author's name as a separate field would let you search for all documents containing "Lucene" with an author of "fisher", for example ("lucene author:fisher" in QueryParser syntax).

How do I get accurate search result in Lucene using Query syntax

So far I have been testing the keywords that I inputted in Sitecore using the query syntax but the search result does not rank the page first.
For example if I put query syntax on the word book....(title:book)^1
I want the index page that is name book to appear first in the search result and not bookmark.
Also, every time I publish a new page in Sitecore the keywords for the word Book get push down to the last result or doesn't appear in the search page.
How do I get accurate result in Lucene for the search engine page?
Also I've been following http://www.lucenetutorial.com/lucene-query-syntax.html about how to increase search result but it doesn't work.
Can someone explain how the boost of the search term works.
I recommend you leverage the Advanced Database Crawler to get the best use of Lucene.NET with Sitecore. From that, there's a config file for the indexes with a section called <dynamicFields ... >. In that section, you can specify an individual Sitecore field and adjust the boost attribute. The default boost for every field is 1f which is 1 floating point.
More reading:
Sitecore Searcher and Advanced Database Crawler
Source code for the ADC

Will Lucene ALWAYS return ALL the documents that match my query same way as the SQL select query does?

I'm using Lucene to index the values that I'm storing in an object database. I'm storing a reference (UUID) to the object along with the field names and their corresponding values (Lucene Fields) in the Lucene Document.
My question is will Lucene ALWAYS return ALL the documents that match my query?
Thanks.
it depends on analyzer which you are using and also you can limit the no of result while searching.
for better searching you also can use Apache's open source search platform - Solr.

All of these words feature

I have a "description" field indexed in Lucene.This field contains a book's description.
How do i achieve "All of these words" functionality on this field using BooleanQuery class?
For example if a user types in "top selling book" then it should return books which have all of these words in its description.
Thanks!
There are two pieces to get this to work:
You need the incoming documents to be analysed properly, so that individual words are tokenised and indexed separately
The user query needs to be tokenised, and the tokens combined with the AND operator.
For #1, there are a number of Analyzers and Tokenizers that come with Lucene - have a look in the org.apache.lucene.analysis package. There are options for many different languages, stemming, stopwords and so on.
For #2, there are again a lot of query parsers that come with Lucene, mainly in the org.apache.lucene.queryParser packagage. MultiFieldQueryParser might be good for you: to require every term to be present, just call
QueryParser.setDefaultOperator(QueryParser.AND_OPERATOR)
Lucene in Action, although a few versions old, is still accurate and extremely useful for more information on analysis and query parsing.
I believe if you add all query parts (one per term) via
BooleanQuery.add(Query, BooleanClause.Occur)
and set that second parameter to the constant BooleanClause.Occur.MUST, then you should get what you want. The equivalent query syntax would be "+term1+term2 +term3 ...".