Term aggregation with shard_size - elasticsearch-aggregation

I have term query written like this, and I have 18 shards for my index. I have a question regarding the significance of shard-size parameter. higher shard-size parameter more accurate the query response,
{
"terms": {
"field": "#timestamp",
"size": 1,
"shard_size": 1000,
"order": {
"_count": "desc"
}
}
}
even though I have only 18 shards, why is it expecting more values for shard_size, and in response giving proper response

Related

There's a count difference between Druid Native Query and Druid SQL when using query

I have a problem with Druid Query.
I wanted to get data count with hour granularity.
So, I used Druid SQL like this.
SELECT TIME_FLOOR(__time, 'PT1H') AS t, count(1) AS cnt FROM mydatasource GROUP BY 1
then I got response like this.
[
{
"t": "2022-08-31T09:00:00.000Z",
"cnt": 12427
},
{
"t": "2022-08-31T10:00:00.000Z",
"cnt": 16693
},
{
"t": "2022-08-31T11:00:00.000Z",
"cnt": 16694
},
...
But, When using native query like this,
{
"queryType": "timeseries",
"dataSource": "mydatasource",
"intervals": "2022-08-31T07:01Z/2022-09-01T07:01Z",
"granularity": {
"type": "period",
"period": "PT1H",
"timeZone": "Etc/UTC"
},
"aggregations": [
{
"name": "count",
"type": "longSum",
"fieldName": "count"
}
],
"context": {
"skipEmptyBuckets": "true"
}
}
There's a difference result.
[
{
"timestamp": "2022-08-31T09:00:00.000Z",
"result": {
"count": 1288965
}
},
{
"timestamp": "2022-08-31T10:00:00.000Z",
"result": {
"count": 1431215
}
},
{
"timestamp": "2022-08-31T11:00:00.000Z",
"result": {
"count": 1545258
}
},
...
I want to use the result of Native Query.
What's the problem in my Druid SQL query??
How do I create a query to get native query results?
I found what's difference.
when using longSum type aggregation, I get result like native query.
So, I want to know how to query aggregate like below using sql.
"aggregations": [
{
"type": "longSum",
"name": "count",
"fieldName": "count"
}
]
I found solution.
Query like this.
SELECT TIME_FLOOR(__time, 'PT1H') AS t, sum("count") AS cnt FROM mydatasource GROUP BY 1
Given that your datasource has a "count" column, I'm assuming it comes from an ingestion that uses rollup. This means that the original raw rows have been aggregated and the "count" column contains the count of raw rows that were summarized into the each aggregate row.
The Native query is using the longSum function over the "count" column.
The original SQL you used, is just counting the aggregate rows.
So yes, the correct way to get the count of raw rows is SUM("count").

Only structured queries are supported FIRESTORE API REST

I am trying to run a query with the rest api but I canĀ“t get it to work, I am sending this body in json:
"structuredQuery": {
"where": {
"fieldFilter": {
"field": {
"fieldPath": "total"
},
"op": "EQUAL",
"value": {
"integerValue": "10",
}
}
},
"from": [{
"collectionId": "Total"
}]
}
I am just testing if querys work, I have a collection called Total with documents, and these documents have a field called total that is an integer value. I am using a POST request with the following URL:
"https://firestore.googleapis.com/v1/projects/MY_PROJECT_NAME/databases/(default)/documents:runQuery"
and I am getting this error:
[{
"error": {
"code": 400,
"message": "only structured queries are supported",
"status": "INVALID_ARGUMENT"
}
}]
The CollectionSelector in your structured query is missing the allDescendants field.
If you check this documentation you can see that this field is a flag which can either be true or false but not null, so you have to set this in the query otherwise it will not work.
Also you need to add the select clause to add all fields you want to get as a result of the query or keep it empty to return all fields.
Finally in the said documentation you can check that there is a proper order that should be respected, in which the from must be declared before where. So if you change your structured query to the following:
"structuredQuery": {
"from": [{
"collectionId": "Total"
"allDescendants": false
}],
"select": { "fields": [] },
"where": {
"fieldFilter": {
"field": {
"fieldPath": "total"
},
"op": "EQUAL",
"value": {
"integerValue": "10",
}
}
}
}
It will work as expected.

Return all rows in a Elasticsearch SQL query

I have a simple SQL query in Elasticsearch which I know returns less than 100 rows of results. How can I get all these results at once (i.e., without using scroll)? I tried the limit n clause but it works when n is less than or equal to 10 but doesn't work when n is great than 10.
The Python code for calling the Elasticsearch SQL API is as below.
import requests
import json
url = 'http://10.204.61.127:9200/_xpack/sql'
headers = {
'Content-Type': 'application/json',
}
query = {
'query': '''
select
date_start,
sum(spend) as spend
from
some_index
where
campaign_id = 790
or
campaign_id = 490
group by
date_start
'''
}
response = requests.post(url, headers=headers, data=json.dumps(query))
The above query returns a cursor ID. I tried to feed the cursor ID into the same SQL API but it doesn't gave me more result.
I also tried to translated the above SQL query to native Elasticsearch query using the SQL translate API and wrapped it into the following Python code, but it doesn't work either. I still got only 10 rows of results.
import requests
import json
url = 'http://10.204.61.127:9200/some_index/some_doc/_search'
headers = {
'Content-Type': 'application/json',
}
query = {
"size": 0,
"query": {
"bool": {
"should": [
{
"term": {
"campaign_id.keyword": {
"value": 790,
"boost": 1.0
}
}
},
{
"term": {
"campaign_id.keyword": {
"value": 490,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": True,
"boost": 1.0
}
},
"_source": False,
"stored_fields": "_none_",
"aggregations": {
"groupby": {
"composite": {
"size": 1000,
"sources": [
{
"2735": {
"terms": {
"field": "date_start",
"missing_bucket": False,
"order": "asc"
}
}
}
]
},
"aggregations": {
"2768": {
"sum": {
"field": "spend"
}
}
}
}
}
}
response = requests.post(url, headers=headers, data=json.dumps(query)).json()
POST _sql?format=json
{
"query": "SELECT field1, field2 FROM indexTableName ORDER BY field1",
"fetch_size": 10000
}
The above query will return a cursor in the response, which needs to be passed in the next call.
POST _sql?format=json
{
"cursor": "g/W******lAAABBwA="
}
This resembles the normal scroll method in Elasticsearch
elasticsearch has limited but if you are using of python you can use of elasticsearc-dsl
from elasticsearch_dsl import Search
q = Q('term', Frequency=self._frequency)
q = q & Q("range", **{'#timestamp': {"from": self._start, "to": self._end}})
Search().query(q).scan()
With elasticsearch-sql, LIMIT 100 should translate to "size": 100 in traditional query DSL. This will return up to 100 matching results.
Given this request:
POST _xpack/sql/translate
{
"query":"SELECT FlightNum FROM flights LIMIT 100"
}
The translated query is:
{
"size": 100,
"_source": {
"includes": [
"FlightNum"
],
"excludes": []
},
"sort": [
{
"_doc": {
"order": "asc"
}
}
]
}
So syntax-wise, LIMIT N should do what you expect it to. As to why you're not seeing more results, this is likely something specific to your index, your query, or your data.
There is a setting index.max_result_window which can cap the size of a query, but it defaults to 10K and also should return an error rather than just limiting the results.

Elasticsearch -- get count of log type in last 24 hours

So I have 3 types of logs in my Elasticsearch index-
CA, CT, And Acc
I am trying to query Elasticsearch to get a count of each for the 24 hours before the call but I'm not having much luck combining them.
Calling
10.10.23.45:9200/filebeat-*/_count
With
{
"query":{
"term": {"type":"ct"}
}
}
Gets me the count, but trying to add the time-range has proved to be fruitless. When I try to add a range to the same query -- it doesn't work
I tried using:
{
"query":{
"term": {"type":"ct"},
"range":{
"date":{
"gte": "now-1d/d",
"lt" : "now"
}
}
}
}
But was returned
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[term] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 5,
"col": 3
}
],
"type": "parsing_exception",
"reason": "[term] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 5,
"col": 3
},
"status": 400
}
You need to use Bool Query to combine two types of queries into one. Try this instead.
POST _search
{
"query": {
"bool" : {
"must" : {
"term": {"type":"ct"}
},
"must" : {
"range":{
"date":{
"gte": "now-1d/d",
"lt" : "now"
}
}
}
}
}
}
The following worked for me (note -- this is a post sent to elasticsearch:9200/index/_search )
{"query":{"bool":{"must":[{"query_string":{"analyze_wildcard":true,"query":"type:\"acc\""}},{"range":{"#timestamp":{"gte":"now-1h","lte":"now","format":"epoch_millis"}}}]}}}

Is it possible to extend graphql response other than just data for pagination?

In GraphQL response normally looks like followings.
{
"data": [{
"id": 1,
"username": "Jon Snow",
"email": "crow#northofthew.all",
"age": 20
}, {
"id": 2,
"username": "Tyrion Lannister",
"email": "drunk#i.mp",
"age": 34
}, {
"id": 3,
"username": "Sansa Stark",
"email": "redhead#why.me",
"age": 17
}]
}
Is it possible to add meta data to your response such as pagination like this.
{
"pagination": {
"total": 14,
"count": 230,
},
"data": [{
"id": 1,
"username": "Jon Snow",
"email": "crow#northofthew.all",
"age": 20
}, {
"id": 2,
"username": "Tyrion Lannister",
"email": "drunk#i.mp",
"age": 34
}]
}
I'm using express-graphql and currently put those pagination to custom response header, which is fine but it can be better. Since GraphQL response is already wrapped with "data", it is not very strange to add more "data" to its response.
Reenforcing what #CommonsWare already stated, according to the specification that would a be an invalid GraphQL response. Regarding pagination, Relay has its own pagination approach called connections, but indeed, several other approaches are possible and even more suitable in some situtations (connections aren't a silver bullet).
I want to augment what was already said by adding that the hierarchical nature of GraphQL incites related data to be at the same level. An example is worth a thousands words, so here it goes:
query Q {
pagination_info { # what is this info related to? completely unclear
total
count
}
user {
friends {
id
}
}
}
Instead...
query Q {
user {
friends {
pagination_info { # fairly obvious that this is related to friends
total
count
}
friend {
id
}
}
}
}