Score Lucene document based on a field value - lucene

I'm using lucene to find documents and the score will be already on a field called score.
Note: This shall be basic but I don't find any recent article with this.
How do I use the field value to score the document?
I'm using Lucene 8.6.1

Related

Configure SOLR query to find the Plurals word along with Singular word while forming Query String

I'm Using a Solr query to sort the search based on the entered search text, currently my query is only working on singular word like filter, car, floor. if i'm searching for the word filter it's only giving the result for filter but i wanted my query to should give the search for filters, cars, floors also. currently it is giving all the results which having word filter, car , floor not there plurals.
below Solr query i'm using for the sort result -- >
https://searchg2.crownpeak.net/NEI-Blogs-Dev/select/?q=custom_s_brand:nei&fl=custom_s_heading,custom_s_article_summary_Image_url,custom_t_content_summary_Image_url_alt,custom_t_content_summary_Desc,custom_s_local_url,custom_s_local_dba,custom_t_heading,termfreq(custom_t_heading,filter*),sum(termfreq(custom_t_heading,filter*)),termfreq(custom_t_content,filter*),sum(termfreq(custom_t_content,filter*))&qf=custom_t_heading&fl=custom_t_content,termfreq(custom_t_content,filter*),sum(termfreq(custom_t_content,filter*))&qf=custom_t_content&sort=sum(termfreq(custom_t_heading,filter*))%20desc,sum(termfreq(custom_t_content,filter*))%20desc&defType=edismax&fq=custom_s_status:Active
Solr does different kind of searches depending on the type of the field that you are searching on.
For string fields the search performs an exact match, hence if the value stored is "car" it will find "car" but not "cars", not even "CAR".
If the field you search on is a text tokenized field then the search will match certain variations depending on how the value was tokenized and what filters were applied to it. For example if you use a built-in text_en field it will perform certain transformations that are typical for values in English and then a search for "cars" or "CAR" will match if the value stored was "car" because the text_en field stores the stem of the word (e.g. "car" for "cars") and this seems to be what you are after.
It looks like the field that you are searching on (custom_s_brand) is a string field, perhaps you want to create additional tokenized fields for the brand so that your searches capture a wider range of matches rather than only identical matches.

Is Lucene capable of finding the location of matches within a document?

Say I have 100 documents indexed in Lucene. I want to search for the term "American Airlines". Lucene runs the search and gives me back 10 documents that contain the term "American Airlines". I now want to be able to go through each of these 10 documents in my UI, and highlight/scroll to each of the matches automatically. These are all html documents with uniquely id-ed paragraph tags, so I can scroll using something like http://docurl#p_120 to scroll to <p id="p_120">American Airlines is a big company.</p>. But how do I get Lucene to tell me what paragraph the term is in, and exactly where it is so I can highlight it?
Your question is about highlighting. You ask how to index a text with subdocuments so that you know the id of the subdocument for highlighting.
imho you have three possibilities. But first of all let me remind you that lucene can use the offset (=position in original text) for highlighting
https://lucene.apache.org/core/6_4_0/highlighter/org/apache/lucene/search/highlight/package-summary.html
and that lucene knows the concept of sub-documents as "blocked child documents" or "nested documents" or "embedded documents".
The tree possibilities:
use payloads to store the id of the corresponding subdocument for each occurence of a term.
store the offset of each occurrence of a term and be aware at which offset a new subdocument begins. Store the ids together with the corresponding offsets in an extra field and use this to look-up the id for each hit.
index the document together with all subdocuments as extra child document in a block. Search with http://lucene.apache.org/core/6_4_0/join/index.html?org/apache/lucene/search/join/ToParentBlockJoinCollector.html

Is it possible to update Lucene document field from custom token filter?

This is what I want to do.
I have some calculation for each token/term/word in the document.
After calculating each token/term/word, I would like to sum (for example) this number (described in #1) and store it as an attribute of the document.
As I understand, for #1, I can use custom token filter which will help me to perform calculation on each word. After all the words in a document are processed, I will sum up those number for a given document.
My question is, How can I save this aggregate number I calculated? I would like to save it as an attribute or field of a document. As I understand, I can not access field in token filter. Is there any other strategy to achieve it?
Please guide.

Lucene - exclude fields from being searched

I have a search index and require a lucene query which will conditionally search specified fields. The end result will be that if you're logged into the website, all fields will be searched, or if you're logged out, specified fields will be skipped by modifying the lucene query.
The closest I have at the moment is:
+(term1~ term2~) +_culture:([en-gb TO en-gb] [invariantifieldivaluei TO invariantifieldivaluei]) **-FieldToIgnore1:(term1 term2) -FieldToIgnore2:(term1 term2)**
The problem with this however is if one of the search terms exists in one of the fields not mentioned (FieldToIgnore1 or FieldToIgnore2), then the document is ignored because it's been excluded as one one of the fields to ignore were matched.
How can this be modified so lucene doesn't even match against the fields to ignore?
Instead of qualifying your search via Lucene and the Smart Search Results webpart, have you tried modifying the searchability of the document fields themselves. You can set search parameters on the Page Type or index itself.
Go to Page Types --> [your doc type] --> Search fields, and set what fields are and aren't exposed to searching.
Version 9 gives you these settings in the Smart Search app. See these docs for details.

How to allow only one find per document searched on Lucene

I only want my Lucene search to give the highest scoring highlighted fragment per document. So say I have 5 documents with the word "performance" on each one three times, I still only want 5 results to be printed and highlighted to the results page. How can I go about doing that? Thanks!
You get only one fragment per document returned from the search by calling getBestFragment, rather than getBestFragments.
If your call to search is returning the same documents more than once, you very likely have more than one copy of the same document in your index. Make sure that if you intend to create a new index, you open your IndexWriter with it's OpenMode set to: IndexWriterConfig.OpenMode.CREATE.