CustomScore and biggest value from all docs - lucene

Hi now i have custom_score "_score + ((parseInt(doc.ad_when.value) - oldestAd) / doc.ad_since.value) * 2". Is possible to use in custom score biggest value from all docs. I want that oldestAd was from all searched data. MySql have function MAX. In MySql it will be easy.
Example:
in all docs i have popularity field and i want use biggest value from all docs to customscore. is it possible?

Unfortunately, it can be only done in two steps. First, you need to retrieve the first record from the list sorted by ad_when.value or find the oldestAd value using facets. Then you can use this value in the custom score. I would suggest making oldestAd a script parameter to prevent elasticsearch from parsing the script on every request.

Related

How to keep SQL data and Elasticsearch in-sync, and which to search from?

I've seen two solutions mentioned, and was wondering what most people do.
Use logstash
Code your application to make writes to Elasticsearch alongside SQL. For example,
public saveRecord() {
saveToElasticsearch();
saveToSQL();
}
Another question is how to handle actually searching the entity? Do you ONLY use Elasticsearch?
If not, I would assume you fetch from Elasticsearch based on keywords and use the IDs returned to filter your SQL query. My question then, is how do you handle pagination? For example let's say you only want results 50 to 100. First you query Elasticsearch which returns 50-100. Then the SQL query reduces that to 20 results - the other 30 results are in what would've been the next Elasticsearch query (100 - 150 for example). Do you keep going back and forth?
As for your first question check here
As for the second question, if you plan to use elasticsearch as your search layer then better do it for all the searchable/filterable fields. As you've described, the alternative will get very messy very soon. Use elasticsearch for all your searches/filters and even aggregations if it suits your needs. Use the sql database as your point of truth and just get the full payload from there.
In general, if you will need to paginate then your search should better be in one place otherwise it will get ugly.

Ldap search for objects where attribute X contains multiple values

I would like to know if it is possible to do a search like this:
"give me all objects where description has more than 1 value"
The short answer is no. At least not from a single LDAP Query without somehow parsing the results.
I know of a tool that will provide those results however it has not been updated in a while but last time I used it, it worked.

Lucene DocValuesField, SortedDocValuesField usage for filtering and sorting

I am going to switch to newest (4.10.2) version of Lucene and I'd like to make some optimization in my index and code.
I would like to use DocValuesField to get values but also for filtering and sorting.
So here I have some questions:
If I'd like to use range filter (FieldCacheRangeFilter) I need to store a value in XxxDocValuesField,
but if i want to use terms filter (FieldCacheTermsFilter) I need to store a value in SortedDocValuesField.
So it looks like if I want to use range and terms filters I need to have two different fields. Am I right? Am I using it correctly?
Another thing is Sort. I can choose between SortedNumericSortField and SortField. First one requires SortedNumericDocValues, another NumericDocValuesField. Is there any(big) difference in performance?
Should I use SortedNumericSortField (adding another field to the index)?
And the last one. Am I right that all corresponding DocValuesField will be removed from index when doc is removed? I saw an IndexWriter method for an update doc value but no delete method for doc value.
Regards
Piotr

Default value for missing field in document for elasticsearch statistical facet

I am using a statistical facet (#see http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-statistical-facet.html) to perform an aggregation on a few fields across the documents in my Elastic Search index.
I was wondering if anyone knew if the API provided a means to provide a default value if a particular field does not exist. For example, if a field does not exist use 0 (zero) as that fields value. By default it seems to give a null pointer exception when the aggregation is taking place.
My initial thoughts are to utilize a script field to test if the aggregation field is null and perform the default 0 logic there.
As you stated in your question, you could try a script field as defined here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-statistical-facet.html#_script_field_2
For example: "script" : "_source.place == null ? null : 0"
I'll admit that I have not tried this on a statistical facet, but I have used a similar script on a terms stats facet and it worked fine.

determine which value produced a hit in SOLR multivalued field type

If I have a multiValued field type of text, and I put values [cat,dog,green,blue] in it. Is there a way to tell when I execute a query against that field for dog, that it was in the 1st element position for that multiValued field?
Assumption: client does not have any pre-knowledge of what the field type of the field being queried is. (i.e. Solr must provide the answer and the client can't post process the return doc to figure it out because it would not know how SOLR matched the query to the result).
Disclosure: I posted to solr-user list and am getting no traction so I post here now.
Currently, there's no out-of-the-box functionality provided in Solr which tells you the position of a value in a multiValue field.
Hopefully I understand your question correctly.
If you want to get field index or value there is an ugly workaround:
You could add the index directly in the value e.g. store "1; car", "2; test" and so on. Then use highlighting. When reading the returned fields simply skip the text before the semicolon.
But if you want to query only one type:
You can avoid the multivalue approach and simply store it as item_i and query via item_1. To query against all items regardless the type you need to use the copyField directive in the schema.xml
The Lucene API allows for this, but I'm not sure if Solr does out of the box. In Lucene you can use the IndexReader.termPositions(Term term) method.