How to boost search based on index type in elasticsearch or lucene? - lucene

I have three food type indices "Italian", "Spanish", "American".
When the user searches "Cheese", documents from "Italian" appear to come up at the top. Is it possible to "boost" the results if I were to give preference to say "Spanish"? (I should still get results for Italian, but based on some numeric boost value for index type "Spanish", the ordering of the documents returned in the results give preference to the "Spanish" index. Is this possible in user input lucene and/or ES query? If so, how?

Add a term query with a boost for either the _type field or the _index (or both).

Use a script_score as part of the function score query:
function_score: {
script_score: {
script: "doc['_type'].value == '<your _type>' ? _score * <boost_factor> : _score"
}
}

If querying several indices at once, it is possible to specify indices boost at the top level of object passed to Search API:
curl -XGET localhost:9200/italian,spanish,american/_search -d '
{
"query":{"term":{"food_type":"cheese"}},
"indices_boost" : {
"ilalian" : 1.4,
"spanish" : 1.3,
"american" : 1.1
}
}'
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-index-boost.html#search-request-index-boost

For query-time boosting, queries (ex. query string) generally have a boost attribute you can set. Alternatively, you can wrap queries in a custom boost factor. I would probably prefer the former, usually.

Related

How to search in a key/value PostgreSQL json field?

I have a field in my PSQL database with this datas:
{
"1":{
"wp_post_id":137840,
"sync_at":{
"date":"2021-02-23 22:02:35.958325",
"timezone_type":3,
"timezone":"Europe\/Berlin"
}
},
"3":{
"wp_post_id":773,
"sync_at":{
"date":"2021-05-25 16:17:14.322988",
"timezone_type":3,
"timezone":"Europe\/Berlin"
}
}
}
I try to search records with sync_at lower than another date without success…
Maybe my field format is not good?!
The easiest way to do that would be a JSONPATH query (available since v12):
WHERE jsoncol ## '$.*.sync_at.date < "2020-06-01 00:00:00"'
The query is simple, but there is no way to make it fast with an index.

Apache Solr sort based on score and fieldn values

I used the following request
http://localhost:8983/solr/test6/select?q=*:*&sort=product(score,hits)%20desc
to sort results based on their relevancy score as determined by Apache Solr multiplied by a field called hits (integers).
However, I receive the following error message:
{ "responseHeader":{
"status":400,
"QTime":0,
"params":{
"q":"*:*",
"sort":"product(score,hits) desc"}}, "error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"],
"msg":"sort param could not be parsed as a query, and is not a field that exists in the index: product(score,hits)",
"code":400}}
Why is it that sort cannot correctly input the function value when:
http://localhost:8983/solr/test6/select?q=*:*&sort=score%20desc
http://localhost:8983/solr/test6/select?q=*:*&sort=hits%20desc
work when a function isn't applied?
NOTE: http://localhost:8983/solr/test6/select?q=*:*&sort=product(hits,2)%20desc where I added the product() function also returns the same error message.
The score value isn't really a field - so you can't use a function to manipulate it in the sort clause.
Instead you can use a multiplicative boost through boost (if you're using edismax) to achieve what you want: &boost=hits. You might want to use log(hits) or something similar (recip for example) instead to avoid large differences in score for just small changes in the number of hits.

Boosting individual elasticsearch indices to have preference in results

I am trying to boost certain indices in my elastic search query. Right now, my query is looking like this.
var query = {
"query": {
"query_string": {
"fields": ["FirstName", "LastName"],
"query": "Hank Hill",
"default_operator": "AND"
}
}
};
var boosted_indices = {
"index_A" : 1.0,
"index_B" : 1.0,
"index_C" : 10.0
};
if (boosted_indices) {
query["indices_boost"] = boosted_indices;
}
// stringify and send query in an http.get request
I know that my query without boosting any indices works as I expect. However, I am still getting a lot of results from "index_A" in my query results, rather than the heavily boosted index_C. I know that there should be a similar number of matching results in A and C, so the issue must be that I am not boosting the query correctly.
Did I set up my query JSON incorrectly? On the tutorial I linked, it did not give much context.
One other thing I noticed.. the "_score" field for the returned documents... all of them are set to null. Might this have something to do with my documents not being boosted according to the index they came from?
I hope you are not using the sort parameter in query. This could be the reason that _score is null and you are not getting expected results.
Does this help?

Lucene query with filter "without property"

I need to write lucene query/filter to get objects without specific property.
I tried with ... ISNULL:"cm:param_name" but id didn't work.
Edit: I have added new property in aspect but objects that haven't been updated yet don't have it amongst their listed properties (checked with node browser).
With a query like "cm:*", you should only receive documents that have the field "cm" plus content. Note that you have to allow leading wildcard queries by the query parser with setAllowLeadingWildcard(true).
Also check out this post, which deals with a reversed version of your problem:
Find all Lucene documents having a certain field
Can you please be more clear as to what "without property" means ? Do you mean that you do not want to specify the field like so "field:value" and instead set the filter to "value" ?
EDIT
Are you generating these field names dynamically or is this the only field name that can have it's value missing ? If there is only one field that may or may not appear in your document then you could just populate it with a default value when it's missing and then search for that . Otherwise, you could try a negated rangequery like so : NOT foo:[* TO *] . This should match all documents without a value in the foo field. For performance purposes , in the second case the field should be indexed as a string field (not analyzed).
I managed to get this done with .. AND NOT (#namespace\:property:"")
In Java and Lucene 3.6.2 the "FieldValueFilter" with activated negation can be used: (which was not the question)
import org.apache.lucene.search.FieldValueFilter;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.search.TopDocs;
final IndexSearcher indexSearcher = getIndexSearcher() <- whereever that comes from
final TopDocs topdocs = indexSearcher.search(new MatchAllDocsQuery(), new FieldValueFilter("cm", true), Integer.MAX_VALUE);
You can use ISUNSET and/or ISNULL for this scenario.
ISUNSET:"cm:title"
ISNULL:"cm:title"

Do Lucene and Sphinx support prefix matching?

If not how do you make this work with them and which is better?
e.g. when searching for "mi" i would like results with "microsoft" to potentially show up in a result even though there is no "keyword" like "mi" specifically.
Yes and Yes.
Lucene has PrefixQuery:
BooleanQuery query = new BooleanQuery();
for (String token : tokenize(queryString)) {
query.add(new PrefixQuery(new Term(LABEL_FIELD_NAME, token)), Occur.MUST);
}
return query;
You can also use the Lucene query parser syntax and define the prefix search by using a wildcard exam*. The query parser syntax works if you want to deploy a separate Lucene search server, Solr, that is called using a HTTP API
In Sphinx it seams you have to do the following:
Set minimum prefix length to a value larger than 0
Enable wildcard syntax
Generate a query string with a willdcard exam*