I have the following query to be converted to DSL and executed on ES. I could not find a suitable aggregation and filter over results of aggregation available out-of-the-box in ES. As an alternative, I am fetching the 'group by count' for each id from ES and filtering the result as a part of my application logic, which is not efficient. Can you suggest any more suitable solution?
select distinct id from index where colA = "something" group by id having count(*) > 10;
index mapping
id : (string)
colA: (string)
Terms aggregation: to get distinct Ids.
Bucket selector: to return ids with doc count more than 10
{
"query": {
"bool": {
"filter": [
{
"term": {
"colA.keyword": "something" --> where clause
}
}
]
}
},
"aggs": {
"distinct_id": {
"terms": { --> group by
"field": "id.keyword",
"size": 10
},
"aggs": {
"ids_having_count_morethan_10": {
"bucket_selector": { --> having
"buckets_path": {
"count": "_count"
},
"script": "params.count>10"
}
}
}
}
}
}
Related
SELECT prop_type , count(prop_type) As no_of_properties
from prop_type
JOIN prop_for_rent USING (prop_type)
GROUP BY prop_type;
prop_type and prop_for_rent are both tables. Then "prop_type" used for the JOIN and GROUP BY is an attribute name in both.
The expected result is for the query to count the number of properties in each property type.
And this is the result of the sql version of it
Screenshot of sql query result
So i need it to display the same prop_type and then the number of properties in each type in mongodb
db.prop_type.aggregate([
{
"$lookup": {
"from": "prop_for_rent",
"localField": "prop_type",
"foreignField": "prop_type",
"as": "prop_for_rent_docs"
}
},
{
"$unwind": {
path: "$prop_for_rent_docs",
preserveNullAndEmptyArrays: true
}
},
{
"$group": {
"_id": "$prop_type",
"no_of_properties": {
"$sum": 1
}
}
}
])
mongoplayground
I'm using logstash to save row data from MySQL to ElasticSearch. How to calculate sum on one field group by two fields?
For example, here is one table named "Students", it has several columns: id, class_id, name, gender, age;
and here is one SQL query:
select class_id, gender, sum(age) from Students group by class_id, gender;
How to translate this SQL to ElasticSearch high level rest client API call?
Below is my try, but it is not correct:
public TermsAggregationBuilder constructAggregation() {
TermsAggregationBuilder aggregation = AggregationBuilders.terms("by_classid")
.field("classId.keyword");
aggregation = aggregation.subAggregation(AggregationBuilders.terms("by_gender").field("gender.keyword"));
aggregation = aggregation.subAggregation(AggregationBuilders.sum("sum_age")
.field("age"));
return aggregation;
}
Following is the raw query for your sql statement
POST aggregation_index_2/_search
{
"size": 0,
"aggs": {
"gender_agg": {
"terms": {
"field": "gender"
},
"aggs": {
"class_id_aggregator": {
"terms": {
"aggs": {
"field": "class_id"
},
"age_sum_aggregator": {
"sum": {
"field": "age"
}
}
}
}
}
}
}
}
Mappings
PUT aggregation_index_2
{
"mappings": {
"properties": {
"gender": {
"type": "keyword"
},
"age": {
"type": "integer"
},
"class_id": {
"type": "integer"
}
}
}
}
I have a simple SQL query in Elasticsearch which I know returns less than 100 rows of results. How can I get all these results at once (i.e., without using scroll)? I tried the limit n clause but it works when n is less than or equal to 10 but doesn't work when n is great than 10.
The Python code for calling the Elasticsearch SQL API is as below.
import requests
import json
url = 'http://10.204.61.127:9200/_xpack/sql'
headers = {
'Content-Type': 'application/json',
}
query = {
'query': '''
select
date_start,
sum(spend) as spend
from
some_index
where
campaign_id = 790
or
campaign_id = 490
group by
date_start
'''
}
response = requests.post(url, headers=headers, data=json.dumps(query))
The above query returns a cursor ID. I tried to feed the cursor ID into the same SQL API but it doesn't gave me more result.
I also tried to translated the above SQL query to native Elasticsearch query using the SQL translate API and wrapped it into the following Python code, but it doesn't work either. I still got only 10 rows of results.
import requests
import json
url = 'http://10.204.61.127:9200/some_index/some_doc/_search'
headers = {
'Content-Type': 'application/json',
}
query = {
"size": 0,
"query": {
"bool": {
"should": [
{
"term": {
"campaign_id.keyword": {
"value": 790,
"boost": 1.0
}
}
},
{
"term": {
"campaign_id.keyword": {
"value": 490,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": True,
"boost": 1.0
}
},
"_source": False,
"stored_fields": "_none_",
"aggregations": {
"groupby": {
"composite": {
"size": 1000,
"sources": [
{
"2735": {
"terms": {
"field": "date_start",
"missing_bucket": False,
"order": "asc"
}
}
}
]
},
"aggregations": {
"2768": {
"sum": {
"field": "spend"
}
}
}
}
}
}
response = requests.post(url, headers=headers, data=json.dumps(query)).json()
POST _sql?format=json
{
"query": "SELECT field1, field2 FROM indexTableName ORDER BY field1",
"fetch_size": 10000
}
The above query will return a cursor in the response, which needs to be passed in the next call.
POST _sql?format=json
{
"cursor": "g/W******lAAABBwA="
}
This resembles the normal scroll method in Elasticsearch
elasticsearch has limited but if you are using of python you can use of elasticsearc-dsl
from elasticsearch_dsl import Search
q = Q('term', Frequency=self._frequency)
q = q & Q("range", **{'#timestamp': {"from": self._start, "to": self._end}})
Search().query(q).scan()
With elasticsearch-sql, LIMIT 100 should translate to "size": 100 in traditional query DSL. This will return up to 100 matching results.
Given this request:
POST _xpack/sql/translate
{
"query":"SELECT FlightNum FROM flights LIMIT 100"
}
The translated query is:
{
"size": 100,
"_source": {
"includes": [
"FlightNum"
],
"excludes": []
},
"sort": [
{
"_doc": {
"order": "asc"
}
}
]
}
So syntax-wise, LIMIT N should do what you expect it to. As to why you're not seeing more results, this is likely something specific to your index, your query, or your data.
There is a setting index.max_result_window which can cap the size of a query, but it defaults to 10K and also should return an error rather than just limiting the results.
For example, if I have a SQL query:
SELECT distinct emp_id, salary FROM TABLE_EMPLOYEE
what would be its ElasticSearch equivalent?
This is what I have come up with until now:
{
"aggs": {
"Employee": {
"terms": {
"field":["emp_id", "salary" ]
"size": 1000
}
}
}
}
Instead of sending a list of fields to perform distinct upon, send them as separate aggregations.
{
"aggs": {
"Employee": {
"terms": {
"field": "emp_id",
"size": 10
}
},
"Salary":{
"terms": {
"field": "salary",
"size": 10
}
}
},
"size": 0
}
To answer from our conversation you would issue the following http command using curl.
curl -XGET localhost:9200/<your index>/<type>/_search?pretty
Is there a way to return only one product if it has different color.
e.g. suppose I have a product with following properties:
brand,color,title
nike, red, air max
nike, blue, air max
now I want to create elastic search query to return only one product while aggregation but count as two belonging to brand nike.
{
"query" : {
"match_all" : {}
},
"aggs" : {
"brand" : {
"terms" : {
"field" : "brand"
},
"aggs" : {
"size" : {
"terms" : {
"field" : "title"
}
}
}
}
}
}
I am not able to get desired results. I want like select name,color,title, count(*) title from product group by name,title
I think you want to get document, aggregated by name,title
This can be done using topHits aggregation.
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"brand": {
"terms": {
"field": "name"
},
"aggs": {
"size": {
"terms": {
"field": "title"
}
},
"aggs":{
"top_hits" :{
"_source" :[ "name","color","band"],
"size":1
}
}
}
}
}
}
For count, there is always doc_count in returned buckets.
Hope this helps!! If I am missing something, do mention.
Thanks