Elastic search aggregation - lucene

Is there a way to return only one product if it has different color.
e.g. suppose I have a product with following properties:
brand,color,title
nike, red, air max
nike, blue, air max
now I want to create elastic search query to return only one product while aggregation but count as two belonging to brand nike.
{
"query" : {
"match_all" : {}
},
"aggs" : {
"brand" : {
"terms" : {
"field" : "brand"
},
"aggs" : {
"size" : {
"terms" : {
"field" : "title"
}
}
}
}
}
}
I am not able to get desired results. I want like select name,color,title, count(*) title from product group by name,title

I think you want to get document, aggregated by name,title
This can be done using topHits aggregation.
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"brand": {
"terms": {
"field": "name"
},
"aggs": {
"size": {
"terms": {
"field": "title"
}
},
"aggs":{
"top_hits" :{
"_source" :[ "name","color","band"],
"size":1
}
}
}
}
}
}
For count, there is always doc_count in returned buckets.
Hope this helps!! If I am missing something, do mention.
Thanks

Related

Sorted Pagination on Composite Aggregation

I have ElasticSearch 7.1 documents with following mappings:-
{
"event" : {
"mappings" : {
"properties" : {
"Code1" : {
"type" : "keyword"
},
"Code2" : {
"type" : "keyword"
},
"Date1" : {
"type" : "date"
},
"Date2" : {
"type" : "date"
},
"Value" : {
"type" : "long"
}
}
}
}
}
I want to group the documents by Code1, Code2, Date1, Date2 into buckets
together with
TotalValue which is sum of Value field of all documents in a bucket
and
Count which is number of documents in a bucket.
Final Output which I want is like this:-
{
{
"Code1": "ABC",
"Code2": "XYZ",
"Date1": "01/01/2022",
"Date2": "31/01/2022",
"TotalValue": "100",
"Count": "3"
},
...
}
Also I want, paginated output with sorting on any of the output fields of the bucket, viz. ; Code1, Code2, Date1, Date2, TotalValue, Count.
Using Composite Aggregation, I came up with this query, which is able to do aggregation as reqd with paginated response and sorting on Code1, Code2, Date1, Date2
but not able to do proper sorted pagination on TotalValueand Count(doc_count) fields.
GET event/_search
{
"size":0,
"aggs": {
"AggregatedBucket": {
"composite": {
"size":"10",
"sources": [
{
"Code1": {
"terms": {
"field": "Code1",
"order": "desc"
}
}
},
{
"Code2": {
"terms": {
"field": "Code2",
"order": "desc"
}
}
},
{
"Date1": {
"terms": {
"field": "Date1",
"order": "desc"
}
}
},
{
"Date2": {
"terms": {
"field": "Date2",
"order": "desc"
}
}
}
]
},
"aggs":{
"TotalValue":{
"sum": {
"field": "Value"
}
}
}
}
}
}}
Here is the truncated response I am getting
"aggregations" : {
"AggregatedBucket" : {
"after_key" : {
"Code1" : "ABC2",
"Code2" : "XYZ2",
"Date1" : "02/01/2022",
"Date2" : "02/02/2022"
},
"buckets" : [
{
"key" : {
"Code1" : "ABC1",
"Code2" : "XYZ1",
"Date1" : "01/01/2022",
"Date2" : "01/02/2022"
},
"doc_count" : 1,
"TotalValue" : {
"value" : 4.0
}
},
{
"key" : {
"Code1" : "ABC2",
"Code2" : "XYZ2",
"Date1" : "02/01/2022",
"Date2" : "02/02/2022"
},
"doc_count" : 1,
"TotalValue" : {
"value" : 3.0
}
}
]
}
}
Any alternate way to return my expected response would also be helpful.
Sorry to say this, but you cannot paginate a composite aggregation using a sort order. The composite aggregation is already "sorted" based on the keys that you specified for the pagination.
In your case it will sort
On ascending order of Code1
If 2 code1's are the same, then ascending order of Code2
If 2 code2's are the same, then ascending order of Date1
If 2 Date1's are the same, then ascending order of Date2.
The subaggregation that you have created (total) cannot be used to sort a composite aggregation.
This is and always has been a major drawback of composite aggregations.
If you want to make this less complicated, a simpler way would be to build a concatenated field out of the four fields:
"Code1-Code2-Date1-Date2". THen insert that into every document. Perform a terms aggregation on the concatenated field and sort in descending order (which will automatically be your "total"). This still does not allow you to paginate, but you can set the size of the returned aggregation response to something that is large enough to meet your requirement.
Aggregations have very poor support for pagination. They are actually intended to take ALL the data in the index and produce a response. The concept of pagination is not designed around aggregations.
HTH.

How to know if a geo coordinate lies within a geo polygon in elasticsearch?

I am using elastic search 1.4.1 - 1.4.4. I'm trying to index a geo polygon shape (document) into my index and now when the shape is indexed i want to know if a geo coordinate lies within the boundaries of that particular indexed geo-polygon shape.
GET /city/_search
{
"query":{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_polygon" : {
"location" : {
"points" : [
[72.776491, 19.259634],
[72.955705, 19.268060],
[72.945406, 19.189611],
[72.987291, 19.169507],
[72.963945, 19.069596],
[72.914506, 18.994300],
[72.873994, 19.007933],
[72.817689, 18.896882],
[72.816316, 18.941052],
[72.816316, 19.113720],
[72.816316, 19.113720],
[72.790224, 19.192205],
[72.776491, 19.259634]
]
}
}
}
}
}
}
With above geo polygon filter i'm able get all indexed geo-coordinates lies within described polygon but i also need to know if a non-indexed geo-coordinate lies with in this geo polygon or not. My doubt is that if that is possible in the elastic search 1.4.1.
Yes, Percolator can be used to solve this problem.
As in normal use case of Elasticsearch, we index our docs into elasticsearch and then we run queries on indexed data to retrieve matched/ required documents.
But percolators works in a different way of it.
In percolators you register your queries and then you percolate your documents through registered queries and gets back the queries which matches your documents.
After going through infinite number of google results and many of blogs i wasn't able to find any thing which could explain how i can use percolators to solve this problem.
So i'm explaining this with an example so that other people facing same problem can take a hint from my problem and the solution i found. I would like if someone can improve my answer or can share a better approach of doing it.
e.g:-
First of all we need to create an index.
PUT /city/
then, we need to add a mapping for user document which consist a user's
latitude-longitude for percolating against registered queries.
PUT /city/user/_mapping
{
"user" : {
"properties" : {
"location" : {
"type" : "geo_point"
}
}
}
}
Now, we can register our geo polygon queries as percolators with id as city name or any other identifier you want to.
PUT /city/.percolator/mumbai
{
"query":{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_polygon" : {
"location" : {
"points" : [
[72.776491, 19.259634],
[72.955705, 19.268060],
[72.945406, 19.189611],
[72.987291, 19.169507],
[72.963945, 19.069596],
[72.914506, 18.994300],
[72.873994, 19.007933],
[72.817689, 18.896882],
[72.816316, 18.941052],
[72.816316, 19.113720],
[72.816316, 19.113720],
[72.790224, 19.192205],
[72.776491, 19.259634]
]
}
}
}
}
}
}
Let's register another geo polygon filter for another city
PUT /city/.percolator/delhi
{
"query":{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_polygon" : {
"location" : {
"points" : [
[76.846998, 28.865160],
[77.274092, 28.841104],
[77.282331, 28.753252],
[77.482832, 28.596619],
[77.131269, 28.395064],
[76.846998, 28.865160]
]
}
}
}
}
}
}
Now we have registered 2 queries as percolators and we can make sure by making this API call.
GET /city/.percolator/_count
Now to know if a geo point exist with any of registered cities we can percolate a user document using below query.
GET /city/user/_percolate
{
"doc": {
"location" : {
"lat" : 19.088415,
"lon" : 72.871248
}
}
}
This will return : _id as "mumbai"
{
"took": 25,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"total": 1,
"matches": [
{
"_index": "city",
"_id": "mumbai"
}
]
}
trying another query with different lat-lon
GET /city/user/_percolate
{
"doc": {
"location" : {
"lat" : 28.539933,
"lon" : 77.331770
}
}
}
This will return : _id as "delhi"
{
"took": 25,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"total": 1,
"matches": [
{
"_index": "city",
"_id": "delhi"
}
]
}
Let's run another query with random lat-lon
GET /city/user/_percolate
{
"doc": {
"location" : {
"lat" : 18.539933,
"lon" : 45.331770
}
}
}
and this query will return no matched results.
{
"took": 5,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"total": 0,
"matches": []
}

SQL Where clause equivalent for Elastic Search

I am trying to create a aggregate results in elastic search but filter option is not working for me.
I can aggregate data without filter e.g.
select name , material ,sum(price)
from products group by name , material
curl -XGET 'http://localhost:9200/products/_search?pretty=true' -d'
{
"aggs" : {
"product" : {
"terms" : {
"field" : "name"
},
"aggs" : {
"material" : {
"terms" : {
"field" : "material"
},
"aggs" : {
"sum_price" : {
"sum" : {
"field" : "price"
}
}
}
}
}
}
},
"size" : 0
}'
but I am facing problems to write equivalent DSL query of :
select name , material ,sum(price)
from products
where material = "wood"
group by name , material
Should be something like this:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"material": "wood"
}
}
}
},
"aggs" : {
"product" : {
"terms" : {
"field" : "name"
},
"aggs" : {
"material" : {
"terms" : {
"field" : "material"
},
"aggs" : {
"sum_price" : {
"sum" : {
"field" : "price"
}
}
}
}
}
}
},
"size" : 0
}
Use a filter if you know the exact value and do not need a match, else use a match query instead of the filtered query.
You can use match
{
"query": {
"bool": {
"must": [
{
"match": {
"material": "wood"
}
}
],
"filter": [
{
"match_all": {}
},
]
}
},
"aggs" : {
"product" : {
"terms" : {
"field" : "name"
},
"aggs" : {
"material" : {
"terms" : {
"field" : "material"
},
"aggs" : {
"sum_price" : {
"sum" : {
"field" : "price"
}
}
}
}
}
}
},
"size" : 0
}

ElasticSearch facet: how do I do this?

I have an ElasticSearch index that contains several million products with over a thousand brands.
What query would I have to use to get a list of all the brands in the index?
Sample product entry:
{
_index: main
_type: one
_id: LA37dcdc7D70QygoV4KjfRU0hqUDhPs=
_version: 4
_score: 1
_source: {
pid: S2dcdcd528950_C243
mid: 6540
url: http://being.successfultogether.co.uk/
price: 4
currency: GBP
brand: Reebok
store: Matalan
}
}
Here is an example of generating facets against a selected field within your documents -
curl -XPUT <host>:9200/indices/type/_search?
{
"query": {
"match": {
"store": "Matalan"
}
},
"facets": {
"brand": {
"terms": {
"field": "brand"
}
}
}
}'
I think the all terms facet will get every term for a field:
POST /_all/_search
{
"query" : {
"match_all" : { }
},
"facets" : {
"tag" : {
"terms" : {
"field" : "stub",
"all_terms" : true
}
}
}
}
Terms aggregation as seen below ES 1.0 style, with a very high size count will probably return you every term and its count, it is not efficient nor is it for sure going to get them all.
You can read more about size and shard size params with aggregations/faceting here:
Elasticsearch Doco 1.0
POST /_all/_search
{
"aggs" : {
"genders" : {
"terms" : {
"field" : "stub",
"size":1000
}
}
},
"size":0
}
ALSO, There are faceting plugins to get every term for a field as a list, see here:
Approx Plugin

hierarchical faceting with Elasticsearch

I'm using elasticsearch and need to implement facet search for hierarchical object as follow:
category 1 (10)
subcategory 1 (4)
subcategory 2 (6)
category 2 (X)
...
So I need to get facets for two related objects. Documentation says that it's possible to get such kind of facets for numeric value, but I need it for strings http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-terms-stats-facet.html
Here is another interesting topic, unfortunately it's old: http://elasticsearch-users.115913.n3.nabble.com/Pivot-facets-td2981519.html
Does it possible with elastic search?
If so, how can I do that?
The previous solution works really well until you have no more than a multi-level tag on a single-document. In this case a simple aggregation doesn't work, because the flat structure of the lucene fields mix the results on the internal aggregation.
See the example below:
DELETE /test_category
POST /test_category
# Insert a doc with 2 hierarchical tags
POST /test_category/test/1
{
"categories": [
{
"cat_1": "1",
"cat_2": "1.1"
},
{
"cat_1": "2",
"cat_2": "2.2"
}
]
}
# Simple two-levels aggregations query
GET /test_category/test/_search?search_type=count
{
"aggs": {
"main_category": {
"terms": {
"field": "categories.cat_1"
},
"aggs": {
"sub_category": {
"terms": {
"field": "categories.cat_2"
}
}
}
}
}
}
That's the WRONG response that I have got on ES 1.4, where the fields on the internal aggregation are mixed at a document level:
{
...
"aggregations": {
"main_category": {
"buckets": [
{
"key": "1",
"doc_count": 1,
"sub_category": {
"buckets": [
{
"key": "1.1",
"doc_count": 1
},
{
"key": "2.2", <= WRONG
"doc_count": 1
}
]
}
},
{
"key": "2",
"doc_count": 1,
"sub_category": {
"buckets": [
{
"key": "1.1", <= WRONG
"doc_count": 1
},
{
"key": "2.2",
"doc_count": 1
}
]
}
}
]
}
}
}
A Solution can be to use nested objects. These are the steps to do:
1) Define a new type in the schema with nested objects
POST /test_category/test2/_mapping
{
"test2": {
"properties": {
"categories": {
"type": "nested",
"properties": {
"cat_1": {
"type": "string"
},
"cat_2": {
"type": "string"
}
}
}
}
}
}
# Insert a single document
POST /test_category/test2/1
{"categories":[{"cat_1":"1","cat_2":"1.1"},{"cat_1":"2","cat_2":"2.2"}]}
2) Run a nested aggregation query:
GET /test_category/test2/_search?search_type=count
{
"aggs": {
"categories": {
"nested": {
"path": "categories"
},
"aggs": {
"main_category": {
"terms": {
"field": "categories.cat_1"
},
"aggs": {
"sub_category": {
"terms": {
"field": "categories.cat_2"
}
}
}
}
}
}
}
}
That's the response, now correct, that I have got:
{
...
"aggregations": {
"categories": {
"doc_count": 2,
"main_category": {
"buckets": [
{
"key": "1",
"doc_count": 1,
"sub_category": {
"buckets": [
{
"key": "1.1",
"doc_count": 1
}
]
}
},
{
"key": "2",
"doc_count": 1,
"sub_category": {
"buckets": [
{
"key": "2.2",
"doc_count": 1
}
]
}
}
]
}
}
}
}
The same solution can be extended to a more than two-levels hierarchy facet.
Currently, elasticsearch does not support hierarchical facetting out-of-the-box. But the upcoming 1.0 release features a new aggregations module, that can be used to get these kind of facets (which are more like pivot-facets rather than hierarchical facets). Version 1.0 is currently in beta, you can download the second beta and test out aggregatins by yourself. Your example might look like
curl -XPOST 'localhost:9200/_search?pretty' -d '
{
"aggregations": {
"main category": {
"terms": {
"field": "cat_1",
"order": {"_term": "asc"}
},
"aggregations": {
"sub category": {
"terms": {
"field": "cat_2",
"order": {"_term": "asc"}
}
}
}
}
}
}'
The idea is, to have a different field for each level of facetting and bucket your facets based on the terms of the first level (cat_1). These aggregations then would have sub-buckets, based on the terms of the second level (cat_2). The result may look like
{
"aggregations" : {
"main category" : {
"buckets" : [ {
"key" : "category 1",
"doc_count" : 10,
"sub category" : {
"buckets" : [ {
"key" : "subcategory 1",
"doc_count" : 4
}, {
"key" : "subcategory 2",
"doc_count" : 6
} ]
}
}, {
"key" : "category 2",
"doc_count" : 7,
"sub category" : {
"buckets" : [ {
"key" : "subcategory 1",
"doc_count" : 3
}, {
"key" : "subcategory 2",
"doc_count" : 4
} ]
}
} ]
}
}
}