Sorted Pagination on Composite Aggregation - elasticsearch-aggregation

I have ElasticSearch 7.1 documents with following mappings:-
{
"event" : {
"mappings" : {
"properties" : {
"Code1" : {
"type" : "keyword"
},
"Code2" : {
"type" : "keyword"
},
"Date1" : {
"type" : "date"
},
"Date2" : {
"type" : "date"
},
"Value" : {
"type" : "long"
}
}
}
}
}
I want to group the documents by Code1, Code2, Date1, Date2 into buckets
together with
TotalValue which is sum of Value field of all documents in a bucket
and
Count which is number of documents in a bucket.
Final Output which I want is like this:-
{
{
"Code1": "ABC",
"Code2": "XYZ",
"Date1": "01/01/2022",
"Date2": "31/01/2022",
"TotalValue": "100",
"Count": "3"
},
...
}
Also I want, paginated output with sorting on any of the output fields of the bucket, viz. ; Code1, Code2, Date1, Date2, TotalValue, Count.
Using Composite Aggregation, I came up with this query, which is able to do aggregation as reqd with paginated response and sorting on Code1, Code2, Date1, Date2
but not able to do proper sorted pagination on TotalValueand Count(doc_count) fields.
GET event/_search
{
"size":0,
"aggs": {
"AggregatedBucket": {
"composite": {
"size":"10",
"sources": [
{
"Code1": {
"terms": {
"field": "Code1",
"order": "desc"
}
}
},
{
"Code2": {
"terms": {
"field": "Code2",
"order": "desc"
}
}
},
{
"Date1": {
"terms": {
"field": "Date1",
"order": "desc"
}
}
},
{
"Date2": {
"terms": {
"field": "Date2",
"order": "desc"
}
}
}
]
},
"aggs":{
"TotalValue":{
"sum": {
"field": "Value"
}
}
}
}
}
}}
Here is the truncated response I am getting
"aggregations" : {
"AggregatedBucket" : {
"after_key" : {
"Code1" : "ABC2",
"Code2" : "XYZ2",
"Date1" : "02/01/2022",
"Date2" : "02/02/2022"
},
"buckets" : [
{
"key" : {
"Code1" : "ABC1",
"Code2" : "XYZ1",
"Date1" : "01/01/2022",
"Date2" : "01/02/2022"
},
"doc_count" : 1,
"TotalValue" : {
"value" : 4.0
}
},
{
"key" : {
"Code1" : "ABC2",
"Code2" : "XYZ2",
"Date1" : "02/01/2022",
"Date2" : "02/02/2022"
},
"doc_count" : 1,
"TotalValue" : {
"value" : 3.0
}
}
]
}
}
Any alternate way to return my expected response would also be helpful.

Sorry to say this, but you cannot paginate a composite aggregation using a sort order. The composite aggregation is already "sorted" based on the keys that you specified for the pagination.
In your case it will sort
On ascending order of Code1
If 2 code1's are the same, then ascending order of Code2
If 2 code2's are the same, then ascending order of Date1
If 2 Date1's are the same, then ascending order of Date2.
The subaggregation that you have created (total) cannot be used to sort a composite aggregation.
This is and always has been a major drawback of composite aggregations.
If you want to make this less complicated, a simpler way would be to build a concatenated field out of the four fields:
"Code1-Code2-Date1-Date2". THen insert that into every document. Perform a terms aggregation on the concatenated field and sort in descending order (which will automatically be your "total"). This still does not allow you to paginate, but you can set the size of the returned aggregation response to something that is large enough to meet your requirement.
Aggregations have very poor support for pagination. They are actually intended to take ALL the data in the index and produce a response. The concept of pagination is not designed around aggregations.
HTH.

Related

Querying data from Elasticsearch

Using Elasticsearch 7.*, trying to execute SQL query on an index 'com-prod':
GET /com-prod/_search
{
"script_fields": {
"test1": {
"script": {
"lang": "painless",
"source": "params._source.ElapsedTime"
}
}
}
}
It gives the output and below as one of the hit successfully:
"hits" : [
{
"_index" : "com-prod",
"_type" : "_doc",
"_id" : "abcd",
"_score" : 1.0,
"fields" : {
"test1" : [
"29958"
]
}
}
Now, I am trying to increment the ElapsedTime by 2, as below:
GET /com-prod/_search
{
"script_fields": {
"test2": {
"script": {
"lang": "painless",
"source": "params._source.ElapsedTime + 2"
}
}
}
}
But its actually adding number 2 to the output, as below:
"hits" : [
{
"_index" : "com-prod",
"_type" : "_doc",
"_id" : "abcd",
"_score" : 1.0,
"fields" : {
"test2" : [
"299582"
]
}
}
Please guide what could be wrong here, and how to get the output as 29960.
You are getting 299582, instead of 29960, because the ElapsedTime field is of string type ("29958"), so when you are adding 2 in this using script, 2 gets appended at the end (similar to concat two strings).
So, in order to solve this issue, you can :
Create a new index, with updated mapping of the ElaspsedTIme field of int type, then reindex the data. Then you can use the same search query as given in the question above.
Convert the string to an int type value, using Integer.parseInt()
GET /com-prod/_search
{
"script_fields": {
"test2": {
"script": {
"lang": "painless",
"source": "Integer.parseInt(params._source.ElapsedTime) + 2"
}
}
}
}

How to realize sum(field) group by multi field in elasticsearch?

I'm using logstash to save row data from MySQL to ElasticSearch. How to calculate sum on one field group by two fields?
For example, here is one table named "Students", it has several columns: id, class_id, name, gender, age;
and here is one SQL query:
select class_id, gender, sum(age) from Students group by class_id, gender;
How to translate this SQL to ElasticSearch high level rest client API call?
Below is my try, but it is not correct:
public TermsAggregationBuilder constructAggregation() {
TermsAggregationBuilder aggregation = AggregationBuilders.terms("by_classid")
.field("classId.keyword");
aggregation = aggregation.subAggregation(AggregationBuilders.terms("by_gender").field("gender.keyword"));
aggregation = aggregation.subAggregation(AggregationBuilders.sum("sum_age")
.field("age"));
return aggregation;
}
Following is the raw query for your sql statement
POST aggregation_index_2/_search
{
"size": 0,
"aggs": {
"gender_agg": {
"terms": {
"field": "gender"
},
"aggs": {
"class_id_aggregator": {
"terms": {
"aggs": {
"field": "class_id"
},
"age_sum_aggregator": {
"sum": {
"field": "age"
}
}
}
}
}
}
}
}
Mappings
PUT aggregation_index_2
{
"mappings": {
"properties": {
"gender": {
"type": "keyword"
},
"age": {
"type": "integer"
},
"class_id": {
"type": "integer"
}
}
}
}

Elastic search aggregation

Is there a way to return only one product if it has different color.
e.g. suppose I have a product with following properties:
brand,color,title
nike, red, air max
nike, blue, air max
now I want to create elastic search query to return only one product while aggregation but count as two belonging to brand nike.
{
"query" : {
"match_all" : {}
},
"aggs" : {
"brand" : {
"terms" : {
"field" : "brand"
},
"aggs" : {
"size" : {
"terms" : {
"field" : "title"
}
}
}
}
}
}
I am not able to get desired results. I want like select name,color,title, count(*) title from product group by name,title
I think you want to get document, aggregated by name,title
This can be done using topHits aggregation.
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"brand": {
"terms": {
"field": "name"
},
"aggs": {
"size": {
"terms": {
"field": "title"
}
},
"aggs":{
"top_hits" :{
"_source" :[ "name","color","band"],
"size":1
}
}
}
}
}
}
For count, there is always doc_count in returned buckets.
Hope this helps!! If I am missing something, do mention.
Thanks

Sorting ElasricSearch based on size of array type field

I have a ElasticSearch cluster on which I have to perform a sort query based on the size of the object array field 'contents'.
So far I have tried,
{
"size": 10,
"from": 0,
"fields" : ['name'],
"query": {
"match_all": {}
},
"sort" : {
"script" : {
"script" : "doc['contents'].values.length",
"order": "desc"
}
}
}
The above query gives me SearchPhaseExecutionException. The ES query is made from client side using elasticsearch.angular.js.
Any kind of help will be appreciate.
The security has changed for scripts in versions 1.2.x. In ES_HOME/config/scripts create a file called script_score.mvel and add the script:
doc.containsKey('content') == false ? 0 : doc['content'].values.size()
Restart Elasticsearch and change your query to:
{
"size": 10,
"from": 0,
"query": {
"match_all": {}
},
"sort": {
"_script": {
"script": "script_score",
"order": "desc",
"type" : "string"
}
}
}
For more information take a look here:
http://www.elasticsearch.org/blog/scripting-security/

hierarchical faceting with Elasticsearch

I'm using elasticsearch and need to implement facet search for hierarchical object as follow:
category 1 (10)
subcategory 1 (4)
subcategory 2 (6)
category 2 (X)
...
So I need to get facets for two related objects. Documentation says that it's possible to get such kind of facets for numeric value, but I need it for strings http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-terms-stats-facet.html
Here is another interesting topic, unfortunately it's old: http://elasticsearch-users.115913.n3.nabble.com/Pivot-facets-td2981519.html
Does it possible with elastic search?
If so, how can I do that?
The previous solution works really well until you have no more than a multi-level tag on a single-document. In this case a simple aggregation doesn't work, because the flat structure of the lucene fields mix the results on the internal aggregation.
See the example below:
DELETE /test_category
POST /test_category
# Insert a doc with 2 hierarchical tags
POST /test_category/test/1
{
"categories": [
{
"cat_1": "1",
"cat_2": "1.1"
},
{
"cat_1": "2",
"cat_2": "2.2"
}
]
}
# Simple two-levels aggregations query
GET /test_category/test/_search?search_type=count
{
"aggs": {
"main_category": {
"terms": {
"field": "categories.cat_1"
},
"aggs": {
"sub_category": {
"terms": {
"field": "categories.cat_2"
}
}
}
}
}
}
That's the WRONG response that I have got on ES 1.4, where the fields on the internal aggregation are mixed at a document level:
{
...
"aggregations": {
"main_category": {
"buckets": [
{
"key": "1",
"doc_count": 1,
"sub_category": {
"buckets": [
{
"key": "1.1",
"doc_count": 1
},
{
"key": "2.2", <= WRONG
"doc_count": 1
}
]
}
},
{
"key": "2",
"doc_count": 1,
"sub_category": {
"buckets": [
{
"key": "1.1", <= WRONG
"doc_count": 1
},
{
"key": "2.2",
"doc_count": 1
}
]
}
}
]
}
}
}
A Solution can be to use nested objects. These are the steps to do:
1) Define a new type in the schema with nested objects
POST /test_category/test2/_mapping
{
"test2": {
"properties": {
"categories": {
"type": "nested",
"properties": {
"cat_1": {
"type": "string"
},
"cat_2": {
"type": "string"
}
}
}
}
}
}
# Insert a single document
POST /test_category/test2/1
{"categories":[{"cat_1":"1","cat_2":"1.1"},{"cat_1":"2","cat_2":"2.2"}]}
2) Run a nested aggregation query:
GET /test_category/test2/_search?search_type=count
{
"aggs": {
"categories": {
"nested": {
"path": "categories"
},
"aggs": {
"main_category": {
"terms": {
"field": "categories.cat_1"
},
"aggs": {
"sub_category": {
"terms": {
"field": "categories.cat_2"
}
}
}
}
}
}
}
}
That's the response, now correct, that I have got:
{
...
"aggregations": {
"categories": {
"doc_count": 2,
"main_category": {
"buckets": [
{
"key": "1",
"doc_count": 1,
"sub_category": {
"buckets": [
{
"key": "1.1",
"doc_count": 1
}
]
}
},
{
"key": "2",
"doc_count": 1,
"sub_category": {
"buckets": [
{
"key": "2.2",
"doc_count": 1
}
]
}
}
]
}
}
}
}
The same solution can be extended to a more than two-levels hierarchy facet.
Currently, elasticsearch does not support hierarchical facetting out-of-the-box. But the upcoming 1.0 release features a new aggregations module, that can be used to get these kind of facets (which are more like pivot-facets rather than hierarchical facets). Version 1.0 is currently in beta, you can download the second beta and test out aggregatins by yourself. Your example might look like
curl -XPOST 'localhost:9200/_search?pretty' -d '
{
"aggregations": {
"main category": {
"terms": {
"field": "cat_1",
"order": {"_term": "asc"}
},
"aggregations": {
"sub category": {
"terms": {
"field": "cat_2",
"order": {"_term": "asc"}
}
}
}
}
}
}'
The idea is, to have a different field for each level of facetting and bucket your facets based on the terms of the first level (cat_1). These aggregations then would have sub-buckets, based on the terms of the second level (cat_2). The result may look like
{
"aggregations" : {
"main category" : {
"buckets" : [ {
"key" : "category 1",
"doc_count" : 10,
"sub category" : {
"buckets" : [ {
"key" : "subcategory 1",
"doc_count" : 4
}, {
"key" : "subcategory 2",
"doc_count" : 6
} ]
}
}, {
"key" : "category 2",
"doc_count" : 7,
"sub category" : {
"buckets" : [ {
"key" : "subcategory 1",
"doc_count" : 3
}, {
"key" : "subcategory 2",
"doc_count" : 4
} ]
}
} ]
}
}
}