Return all rows in a Elasticsearch SQL query - sql

I have a simple SQL query in Elasticsearch which I know returns less than 100 rows of results. How can I get all these results at once (i.e., without using scroll)? I tried the limit n clause but it works when n is less than or equal to 10 but doesn't work when n is great than 10.
The Python code for calling the Elasticsearch SQL API is as below.
import requests
import json
url = 'http://10.204.61.127:9200/_xpack/sql'
headers = {
'Content-Type': 'application/json',
}
query = {
'query': '''
select
date_start,
sum(spend) as spend
from
some_index
where
campaign_id = 790
or
campaign_id = 490
group by
date_start
'''
}
response = requests.post(url, headers=headers, data=json.dumps(query))
The above query returns a cursor ID. I tried to feed the cursor ID into the same SQL API but it doesn't gave me more result.
I also tried to translated the above SQL query to native Elasticsearch query using the SQL translate API and wrapped it into the following Python code, but it doesn't work either. I still got only 10 rows of results.
import requests
import json
url = 'http://10.204.61.127:9200/some_index/some_doc/_search'
headers = {
'Content-Type': 'application/json',
}
query = {
"size": 0,
"query": {
"bool": {
"should": [
{
"term": {
"campaign_id.keyword": {
"value": 790,
"boost": 1.0
}
}
},
{
"term": {
"campaign_id.keyword": {
"value": 490,
"boost": 1.0
}
}
}
],
"adjust_pure_negative": True,
"boost": 1.0
}
},
"_source": False,
"stored_fields": "_none_",
"aggregations": {
"groupby": {
"composite": {
"size": 1000,
"sources": [
{
"2735": {
"terms": {
"field": "date_start",
"missing_bucket": False,
"order": "asc"
}
}
}
]
},
"aggregations": {
"2768": {
"sum": {
"field": "spend"
}
}
}
}
}
}
response = requests.post(url, headers=headers, data=json.dumps(query)).json()

POST _sql?format=json
{
"query": "SELECT field1, field2 FROM indexTableName ORDER BY field1",
"fetch_size": 10000
}
The above query will return a cursor in the response, which needs to be passed in the next call.
POST _sql?format=json
{
"cursor": "g/W******lAAABBwA="
}
This resembles the normal scroll method in Elasticsearch

elasticsearch has limited but if you are using of python you can use of elasticsearc-dsl
from elasticsearch_dsl import Search
q = Q('term', Frequency=self._frequency)
q = q & Q("range", **{'#timestamp': {"from": self._start, "to": self._end}})
Search().query(q).scan()

With elasticsearch-sql, LIMIT 100 should translate to "size": 100 in traditional query DSL. This will return up to 100 matching results.
Given this request:
POST _xpack/sql/translate
{
"query":"SELECT FlightNum FROM flights LIMIT 100"
}
The translated query is:
{
"size": 100,
"_source": {
"includes": [
"FlightNum"
],
"excludes": []
},
"sort": [
{
"_doc": {
"order": "asc"
}
}
]
}
So syntax-wise, LIMIT N should do what you expect it to. As to why you're not seeing more results, this is likely something specific to your index, your query, or your data.
There is a setting index.max_result_window which can cap the size of a query, but it defaults to 10K and also should return an error rather than just limiting the results.

Related

Cloudflare GraphQL Analytic API does not have access to the path

When I tried query this query
query ($zoneID: String!) {
viewer {
zones(filter: {zoneTag: $zoneID}) {
httpRequestsAdaptiveGroups(filter: {date_gt: "2022-05-29"}, limit: 100) {
count
dimensions {
requestSource
}
sum {
visits
edgeResponseBytes
}
}
}
}
}
and it gave me this error
{
"data": null,
"errors": [
{
"message": "zone '0ab45c20ea56c46d2db5999b19221234' does not have access to the path",
"path": [
"viewer",
"zones",
"0",
"httpRequestsAdaptiveGroups"
],
"extensions": {
"code": "authz",
"timestamp": "2022-06-29T06:14:55.82422442Z"
}
}
]
}
How to have access to viewing httpRequestsAdaptiveGroups, do I have to upgrade the project plan because right now it is currently on free-tier
What I've tried so far was giving all the zone readable-permission and it still happen

SQL having equivalent keyword in ES query DSL

I have the following query to be converted to DSL and executed on ES. I could not find a suitable aggregation and filter over results of aggregation available out-of-the-box in ES. As an alternative, I am fetching the 'group by count' for each id from ES and filtering the result as a part of my application logic, which is not efficient. Can you suggest any more suitable solution?
select distinct id from index where colA = "something" group by id having count(*) > 10;
index mapping
id : (string)
colA: (string)
Terms aggregation: to get distinct Ids.
Bucket selector: to return ids with doc count more than 10
{
"query": {
"bool": {
"filter": [
{
"term": {
"colA.keyword": "something" --> where clause
}
}
]
}
},
"aggs": {
"distinct_id": {
"terms": { --> group by
"field": "id.keyword",
"size": 10
},
"aggs": {
"ids_having_count_morethan_10": {
"bucket_selector": { --> having
"buckets_path": {
"count": "_count"
},
"script": "params.count>10"
}
}
}
}
}
}

Karate - filter a specific json key from response based on another static array

I have the following JSON response (reference name: "list") and
[
{
"key": "101",
"val": {
"portCall": {
"id": 12664978
},
"status": "in-port"
}
},
{
"key": "102",
"val": {
"portCall": {
"id": 12415798
},
"status": "in-port"
}
},
{
"key": "103",
"val": {
"status": "on-voyage",
"voyage": {
"id": "7kynv-7lq85"
}
}
},
{
"key": "104",
"val": {
"status": "on-voyage",
"voyage": {
"id": "7kynv-2385"
}
}
}
]
also, I have an array list of few key values, evImos = [101,102,104]
In that, I have to identify the first key in the "list" response that has status as "on-voyage". So, the result should be "104".
I have tried the following and I need some help to make it work. Any help would be appreciated.
* def function getFirst = function(evImos) { for (let num of evImos) { let results = list.filter(d => d["key"] === num && d["val"]["status"] === "on-voyage"); if(results.length === 1) { karate.log(num); return num; } } }
* list.forEach(getFirst(evImos))
I'll just give you one hint. This one line will convert the whole thing in to a form that is much easier for you to validate:
* def temp = {}
* list.forEach(x => temp[x.key] = x.val.status)
Which gives you:
{
"101": "in-port",
"102": "in-port",
"103": "on-voyage",
"104": "on-voyage"
}
Now you can do:
* def isOnVoyage = function(key){ return temp[key] == 'on-voyage' }
Also read this: https://stackoverflow.com/a/59162760/143475
Thanks, to #Peter.
Based on his hint, I just tweaked it a little bit to match my requirement and it worked for me.
Here is the working copy for anyone to refer in the future.
* def temp = {}
* list.forEach(x => temp[x.key] = x.val.status)
* def isOnVoyage = function(keys){ for (let key of keys) { if(temp[key] == 'on-voyage'){ karate.log(key); karate.set('num', key); break; }}}
* isOnVoyage(evImos)

How do I write an ElasticSearch query to find unique elements in columns?

For example, if I have a SQL query:
SELECT distinct emp_id, salary FROM TABLE_EMPLOYEE
what would be its ElasticSearch equivalent?
This is what I have come up with until now:
{
"aggs": {
"Employee": {
"terms": {
"field":["emp_id", "salary" ]
"size": 1000
}
}
}
}
Instead of sending a list of fields to perform distinct upon, send them as separate aggregations.
{
"aggs": {
"Employee": {
"terms": {
"field": "emp_id",
"size": 10
}
},
"Salary":{
"terms": {
"field": "salary",
"size": 10
}
}
},
"size": 0
}
To answer from our conversation you would issue the following http command using curl.
curl -XGET localhost:9200/<your index>/<type>/_search?pretty

Elasticsearch -- get count of log type in last 24 hours

So I have 3 types of logs in my Elasticsearch index-
CA, CT, And Acc
I am trying to query Elasticsearch to get a count of each for the 24 hours before the call but I'm not having much luck combining them.
Calling
10.10.23.45:9200/filebeat-*/_count
With
{
"query":{
"term": {"type":"ct"}
}
}
Gets me the count, but trying to add the time-range has proved to be fruitless. When I try to add a range to the same query -- it doesn't work
I tried using:
{
"query":{
"term": {"type":"ct"},
"range":{
"date":{
"gte": "now-1d/d",
"lt" : "now"
}
}
}
}
But was returned
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[term] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 5,
"col": 3
}
],
"type": "parsing_exception",
"reason": "[term] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 5,
"col": 3
},
"status": 400
}
You need to use Bool Query to combine two types of queries into one. Try this instead.
POST _search
{
"query": {
"bool" : {
"must" : {
"term": {"type":"ct"}
},
"must" : {
"range":{
"date":{
"gte": "now-1d/d",
"lt" : "now"
}
}
}
}
}
}
The following worked for me (note -- this is a post sent to elasticsearch:9200/index/_search )
{"query":{"bool":{"must":[{"query_string":{"analyze_wildcard":true,"query":"type:\"acc\""}},{"range":{"#timestamp":{"gte":"now-1h","lte":"now","format":"epoch_millis"}}}]}}}