ElasticSearch facet: how do I do this? - lucene

I have an ElasticSearch index that contains several million products with over a thousand brands.
What query would I have to use to get a list of all the brands in the index?
Sample product entry:
{
_index: main
_type: one
_id: LA37dcdc7D70QygoV4KjfRU0hqUDhPs=
_version: 4
_score: 1
_source: {
pid: S2dcdcd528950_C243
mid: 6540
url: http://being.successfultogether.co.uk/
price: 4
currency: GBP
brand: Reebok
store: Matalan
}
}

Here is an example of generating facets against a selected field within your documents -
curl -XPUT <host>:9200/indices/type/_search?
{
"query": {
"match": {
"store": "Matalan"
}
},
"facets": {
"brand": {
"terms": {
"field": "brand"
}
}
}
}'

I think the all terms facet will get every term for a field:
POST /_all/_search
{
"query" : {
"match_all" : { }
},
"facets" : {
"tag" : {
"terms" : {
"field" : "stub",
"all_terms" : true
}
}
}
}
Terms aggregation as seen below ES 1.0 style, with a very high size count will probably return you every term and its count, it is not efficient nor is it for sure going to get them all.
You can read more about size and shard size params with aggregations/faceting here:
Elasticsearch Doco 1.0
POST /_all/_search
{
"aggs" : {
"genders" : {
"terms" : {
"field" : "stub",
"size":1000
}
}
},
"size":0
}
ALSO, There are faceting plugins to get every term for a field as a list, see here:
Approx Plugin

Related

Counting $lookup and $unwind documents filtered with $match without getting rid of parent document when all results match

I have a collection "Owners" and I want to return a list of "Owner" matching a filter (any filter), plus the count of "Pet" from the "Pets" collection for that owner, except I don't want the dead pets. (made up example)
I need the returned documents to look exactly like an "Owner" document with the addition of the "petCount" field because I'm using Java Pojos with the Mongo Java driver.
I'm using AWS DocumentDB that does not support $lookup with filters yet. If it did I would use this and I'd be done:
db.Owners.aggregate( [
{ $match: {_id: UUID("b13e733d-2686-4266-a686-d3dae6501887")} },
{ $lookup: { from: 'Pets', as: 'pets', 'let': { ownerId: '$_id' }, pipeline: [ { $match: { $expr: { $ne: ['$state', 'DEAD'] } } } ] } },
{ $addFields: { petCount: { $size: '$pets' } } },
{ $project: { pets: 0 } }
]).pretty()
But since it doesn't this is what I got so far:
db.Owners.aggregate( [
{ $match: {_id: { $in: [ UUID("cbb921f6-50f8-4b0c-833f-934998e5fbff") ] } } },
{ $lookup: { from: 'Pets', localField: '_id', foreignField: 'ownerId', as: 'pets' } },
{ $unwind: { path: '$pets', preserveNullAndEmptyArrays: true } },
{ $match: { 'pets.state': { $ne: 'DEAD' } } },
{ "$group": {
"_id": "$_id",
"doc": { "$first": "$$ROOT" },
"pets": { "$push": "$pets" }
}
},
{ $addFields: { "doc.petCount": { $size: '$pets' } } },
{ $replaceRoot: { "newRoot": "$doc" } },
{ $project: { pets: 0 } }
]).pretty()
This works perfectly, except if an Owner only has "DEAD" pets, then the owner doesn't get returned because all the "document copies" got filtered out by the $match. I'd need the parent document to be returned with petCount = 0 when ALL of them are "DEAD". I cannot figure out how to do this.
Any ideas?
These are the supported operations for DocDB 4.0 https://docs.amazonaws.cn/en_us/documentdb/latest/developerguide/mongo-apis.html
EDIT: update to use $filter as $reduce not supported by aws document DB
You can use $filter to keep only not DEAD pets in the lookup array, then count the size of the remaining array.
Here is the Mongo playground for your reference.
$reduce version
You can use $reduce in your aggregation pipeline to to a conditional sum for the state.
Here is Mongo playground for your reference.
As of January 2022, Amazon DocumentDB added support for $reduce, the solution posted above should work for you.
Reference.

how to count number of keys in embedded mongodb document

I have a mongodb query: (Give me the settings where account='test')
db.collection_name.find({"account" : "test1"}, {settings : 1}).pretty();
where I get the following sample output:
{
"_id" : ObjectId("49830ede4bz08bc0b495f123"),
"settings" : {
"clusterData" : {
"us-south-1" : "cluster1",
"us-east-1" : "cluster2"
},
},
What I'm looking for now, is to give me the account where the clusterData has more than 1 key.
I'm only interested in listing those accounts with (2) or more keys.
I've tried this: (but this doesn't work)
db.collection_name.find({'settings.clusterData.1': {$exists: true}}, {account : 1}).pretty();
Is this possible to do with the current data structure? I don't have the option to redesign this schema.
Your clusterData field is not an array which is why you cannot just filter the number of elements it has. There is a way, though, to get what you want via the aggregation framework. Try this:
db.collection_name.aggregate({
$match: {
"account" : "test1"
}
}, {
$project: {
"settingsAsArraySize": { $size: { $objectToArray: "$settings.clusterData" } },
"settings.clusterData": 1
}
}, {
$match: {
"settingsAsArraySize": { $gt: 1 }
}
}, {
$project: {
"_id": 0,
"settings.clusterData": 1
}
}).pretty();

Elasticsearch sum aggregration

there is documents with category(number), and piece(long) fields. I need some kind of aggregration that group these docs by category and sum all the pieces in it
here how documents looks:
{
"category":"1",
"piece":"10.000"
},
{
"category":"2",
"piece":"15.000"
} ,
{
"category":"2",
"piece":"10.000"
},
{
"category":"3",
"piece":"5.000"
}
...
The query result must be like:
{
"category":"1",
"total_amount":"10.000"
}
{
"category":"2",
"total_amount":"25.000"
}
{
"category":"3",
"total_amount":"5.000"
}
..
any advice ?
You want to do a terms aggregation on the categories, from the records above I see that you are sending them as strings, so they will be categorical variables. Then, as a metric, pass on the sum.
This is how it might look:
"aggs" : {
"categories" : {
"terms" : {
"field" : "category"
},
"aggs" : {
"sum_category" : {
"sum": { "field": "piece" }
}
}
}
}

How to know if a geo coordinate lies within a geo polygon in elasticsearch?

I am using elastic search 1.4.1 - 1.4.4. I'm trying to index a geo polygon shape (document) into my index and now when the shape is indexed i want to know if a geo coordinate lies within the boundaries of that particular indexed geo-polygon shape.
GET /city/_search
{
"query":{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_polygon" : {
"location" : {
"points" : [
[72.776491, 19.259634],
[72.955705, 19.268060],
[72.945406, 19.189611],
[72.987291, 19.169507],
[72.963945, 19.069596],
[72.914506, 18.994300],
[72.873994, 19.007933],
[72.817689, 18.896882],
[72.816316, 18.941052],
[72.816316, 19.113720],
[72.816316, 19.113720],
[72.790224, 19.192205],
[72.776491, 19.259634]
]
}
}
}
}
}
}
With above geo polygon filter i'm able get all indexed geo-coordinates lies within described polygon but i also need to know if a non-indexed geo-coordinate lies with in this geo polygon or not. My doubt is that if that is possible in the elastic search 1.4.1.
Yes, Percolator can be used to solve this problem.
As in normal use case of Elasticsearch, we index our docs into elasticsearch and then we run queries on indexed data to retrieve matched/ required documents.
But percolators works in a different way of it.
In percolators you register your queries and then you percolate your documents through registered queries and gets back the queries which matches your documents.
After going through infinite number of google results and many of blogs i wasn't able to find any thing which could explain how i can use percolators to solve this problem.
So i'm explaining this with an example so that other people facing same problem can take a hint from my problem and the solution i found. I would like if someone can improve my answer or can share a better approach of doing it.
e.g:-
First of all we need to create an index.
PUT /city/
then, we need to add a mapping for user document which consist a user's
latitude-longitude for percolating against registered queries.
PUT /city/user/_mapping
{
"user" : {
"properties" : {
"location" : {
"type" : "geo_point"
}
}
}
}
Now, we can register our geo polygon queries as percolators with id as city name or any other identifier you want to.
PUT /city/.percolator/mumbai
{
"query":{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_polygon" : {
"location" : {
"points" : [
[72.776491, 19.259634],
[72.955705, 19.268060],
[72.945406, 19.189611],
[72.987291, 19.169507],
[72.963945, 19.069596],
[72.914506, 18.994300],
[72.873994, 19.007933],
[72.817689, 18.896882],
[72.816316, 18.941052],
[72.816316, 19.113720],
[72.816316, 19.113720],
[72.790224, 19.192205],
[72.776491, 19.259634]
]
}
}
}
}
}
}
Let's register another geo polygon filter for another city
PUT /city/.percolator/delhi
{
"query":{
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_polygon" : {
"location" : {
"points" : [
[76.846998, 28.865160],
[77.274092, 28.841104],
[77.282331, 28.753252],
[77.482832, 28.596619],
[77.131269, 28.395064],
[76.846998, 28.865160]
]
}
}
}
}
}
}
Now we have registered 2 queries as percolators and we can make sure by making this API call.
GET /city/.percolator/_count
Now to know if a geo point exist with any of registered cities we can percolate a user document using below query.
GET /city/user/_percolate
{
"doc": {
"location" : {
"lat" : 19.088415,
"lon" : 72.871248
}
}
}
This will return : _id as "mumbai"
{
"took": 25,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"total": 1,
"matches": [
{
"_index": "city",
"_id": "mumbai"
}
]
}
trying another query with different lat-lon
GET /city/user/_percolate
{
"doc": {
"location" : {
"lat" : 28.539933,
"lon" : 77.331770
}
}
}
This will return : _id as "delhi"
{
"took": 25,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"total": 1,
"matches": [
{
"_index": "city",
"_id": "delhi"
}
]
}
Let's run another query with random lat-lon
GET /city/user/_percolate
{
"doc": {
"location" : {
"lat" : 18.539933,
"lon" : 45.331770
}
}
}
and this query will return no matched results.
{
"took": 5,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"total": 0,
"matches": []
}

Elastic search aggregation

Is there a way to return only one product if it has different color.
e.g. suppose I have a product with following properties:
brand,color,title
nike, red, air max
nike, blue, air max
now I want to create elastic search query to return only one product while aggregation but count as two belonging to brand nike.
{
"query" : {
"match_all" : {}
},
"aggs" : {
"brand" : {
"terms" : {
"field" : "brand"
},
"aggs" : {
"size" : {
"terms" : {
"field" : "title"
}
}
}
}
}
}
I am not able to get desired results. I want like select name,color,title, count(*) title from product group by name,title
I think you want to get document, aggregated by name,title
This can be done using topHits aggregation.
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"brand": {
"terms": {
"field": "name"
},
"aggs": {
"size": {
"terms": {
"field": "title"
}
},
"aggs":{
"top_hits" :{
"_source" :[ "name","color","band"],
"size":1
}
}
}
}
}
}
For count, there is always doc_count in returned buckets.
Hope this helps!! If I am missing something, do mention.
Thanks