With elasticsearch version:6.4.3. I want to group by a field(type is date) with hour between some days - elasticsearch-aggregation

I want to group by start_time with hour between 20190701 and 20190710,but not each day each hour is a bucket, I want the data Divided into 24 buckets,for example: 20190701,20190801,20190901... fall into the 01 bucket,20190702,20190802,20190902... fall into the 02 bucket and so on.
this is each day each hour is a bucket,the result is not what I want,how to solve this problem?
start_time field type as follows:
"start_time":
{
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||epoch_second"
}
My code as followed:
GET qd_analysis/kw/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"uin": {
"value": "111"
}
}
},
{
"range": {
"imp_date": {
"gte": "20190701",
"lte": "20190710"
}
}
}
]
}
},
"aggs": {
"result": {
"date_histogram": {
"field": "start_time",
"time_zone": "+08:00",
"interval": "hour",
"format": "HH",
"order": {
"_count": "desc"
}
}
}
}
}
I want to group by start_time with hour between 20190701 and 20190710,but not each day each hour is a bucket, I want the data Divided into 24 buckets,for example: 20190701,20190801,20190901... fall into the 01 bucket,20190702,20190802,20190902... fall into the 02 bucket and so on.

You will need to use terms aggregation, with script to extract the hour-of-day:
{
"aggs": {
"hour_of_day": {
"terms": {
"script": "doc['#timestamp'].date.hourOfDay"
}
}
}
}

Related

aggregations merged in hits in elasticsearch

Just for an example, let's say I have a database, or an elastic index, holding sales persons and also all their customer visits past and into the future.
Lets also say I want to produce a list of these sales persons and show how many customer visits they have scheduled.
In SQL I would do something like this:
(mind: SQL is probably not all that correct, because it is just written here and just for telling what I am intending to do)
select foo, bar, sum(baz) from table_barbaz
where appointment_date > now()
group by bar
is it possible to get the same result in Elastic search? Like a list of documents sort of looking like this:
{
"foo": "Salesmen John",
"bar": "Client visit this week",
"sum_baz": 99
}
Not sure if this is related to nested aggregations or something else.
Below is a mapping that could have been used in this example. As the real mapping is internal IP, I don't really want to share it publicly.
{
"mappings": {
"properties": {
"salesman_id": {
"type": "integer"
},
"salesman_name": {
"type": "keyword"
},
"customer_visit": {
"type": "integer"
},
"customer_visit_start_date": {
"type": "date",
"format": "yyyy-MM-dd||strict_date"
},
"customer_visit_end_date": {
"type": "date",
"format": "yyyy-MM-dd||strict_date"
}
}
}
}
Then, an aggregation query like the following one would give you the number of customer visits for each salesman, for each day:
{
"size": 0,
"aggs": {
"salesmen": {
"terms": {
"field": "salesman_name",
"size": 20
},
"aggs": {
"days": {
"date_histogram": {
"field": "customer_visit_start_date",
"interval": "day"
},
"aggs": {
"visits": {
"sum": {
"field": "customer_visit"
}
}
}
}
}
}
}
}

ES6: Joining of subqueries to two different rows through the AND operator

I have following index:
+-----+-----+-------+
| oid | tag | value |
+-----+-----+-------+
| 1 | t1 | aaa |
| 1 | t2 | bbb |
| 2 | t1 | aaa |
| 2 | t2 | ddd |
| 2 | t3 | eee |
+-----+-----+-------+
where: oid - object ID, tag - property name, value - property value.
Mappings:
"mappings": {
"document": {
"_all": { "enabled": false },
"properties": {
"oid": { "type": "integer" },
"tag": { "type": "text" }
"value": { "type": "text" },
}
}
}
This simple structure allows store any number of object properties and it is a quite simple to search by one property or by more using OR logical operator.
E.g. get object oid's where:
(tag='t1' AND value='aaa') OR (tag='t2' AND value='ddd')
ES query:
{
"_source": { "includes":["oid"] },
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{ "term": { "tag": "t1" } },
{ "term": { "value": "aaa" } }
]
}
},
{
"bool": {
"must": [
{ "term": { "tag": "t2" } },
{ "term": { "value": "ddd" } }
]
}
}
],
"minimum_should_match": "1"
}
}
}
But it is hard to search by two or more properties using AND logical operator. So the question is how to join two sub-queries to two different records through the AND operator. E.g. get object oid's where:
(tag='t1' AND value='aaa') AND (tag='t2' AND value='ddd')
In this case result must be: { "oid": "2" }
Searching data contains in two different records and applying MUST instead of SHOULD from the previous example returns nothing in this case.
I have two equivalents in SQL of what I need:
SELECT i1.[oid]
FROM [index] i1 INNER JOIN [index] i2 ON i1.oid = i2.oid
WHERE
(i1.tag='t1' AND i1.value='aaa')
AND
(i2.tag='t2' AND i2.value='ddd')
---------
SELECT [oid] FROM [index] WHERE tag='t1' AND value='aaa'
INTERSECT
SELECT [oid] FROM [index] WHERE tag='t2' AND value='ddd'
Do the two requests and merge them on the client is not the option.
Elastic Search version is 6.1.1
In order to achieve what you want, you need to use the nested type, i.e. your mapping should look like this:
PUT my-index
{
"mappings": {
"doc": {
"properties": {
"oid": {
"type": "keyword"
},
"data": {
"type": "nested",
"properties": {
"tag": {
"type": "keyword"
},
"value": {
"type": "text"
}
}
}
}
}
}
}
The documents would be indexed like this:
PUT /my-index/doc/_bulk
{ "index": {"_id": 1}}
{ "oid": 1, "data": [ {"tag": "t1", "value": "aaa"}, {"tag": "t2", "value": "bbb"}] }
{ "index": {"_id": 2}}
{ "oid": 2, "data": [ {"tag": "t1", "value": "aaa"}, {"tag": "t2", "value": "ddd"}, {"tag": "t3", "value": "eee"}] }
Then you can make your query work like this:
POST my-index/_search
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "data",
"query": {
"bool": {
"filter": [
{
"term": {
"data.tag": "t1"
}
},
{
"term": {
"data.value": "aaa"
}
}
]
}
}
}
},
{
"nested": {
"path": "data",
"query": {
"bool": {
"filter": [
{
"term": {
"data.tag": "t2"
}
},
{
"term": {
"data.value": "ddd"
}
}
]
}
}
}
}
]
}
}
}
There might be one way, which is a little ugly: adding terms aggregations to your query body.
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{ "term": { "tag": "t1" } },
{ "term": { "value": "aaa" } }
]
}
},
{
"bool": {
"must": [
{ "term": { "tag": "t2" } },
{ "term": { "value": "ddd" } }
]
}
}
],
"minimum_should_match": "1"
}
},
"size": 0,
"aggs": {
"find_joined_oid": {
"terms": {
"field": "oid.keyword"
}
}
}
}
If everything goes right, this will output something like
{
"took": 123,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 123,
"max_score": 0,
"hits": []
},
"aggregations": {
"find_joined_oid": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 1
},
{
"key": "2",
"doc_count": 2
}
}
}
}
Here, in the "aggregations" part,
"key": "1"
means your "oid":"1", and
"doc_counts": 1
means there is 1 hit in query with "oid":"1".
As you know how many tags you are querying to match, say N, in the aggregations result body, only those "key"s with "doc_count" equal to N are the result you're pursuing. In this example, you are querying tag:t1 (with value aaa) and tag:t2 (with value ddd), thus N=2. You can iterate in the result bucket list to find out those "key"s who have "doc_count" equal to 2.
However, there should be a better way. If you would alter your mapping to a document like style, ie. store all fields of one oid in one doc, life will be much easier.
{
"properties": {
"oid": { "type": "integer" },
"tag-1": { "type": "text" }
"value-1": { "type": "text" },
"tag-2": { "type": "text" }
"value-2": { "type": "text" }
}
}
When you want to add new tag-value pairs, just get the original doc with oid concerned, put new tag-pair into the doc, and put the whole new doc back into Elasticsearch with the same _id which you get from the original one. Most of the time dynamic mapping will work properly in your case, which means you don't need to assert mapping for new fields explicitly.
No-SQL databases like Elasticsearch and others are not designed to handle such SQL style query you are asking.

How do I write an ElasticSearch query to find unique elements in columns?

For example, if I have a SQL query:
SELECT distinct emp_id, salary FROM TABLE_EMPLOYEE
what would be its ElasticSearch equivalent?
This is what I have come up with until now:
{
"aggs": {
"Employee": {
"terms": {
"field":["emp_id", "salary" ]
"size": 1000
}
}
}
}
Instead of sending a list of fields to perform distinct upon, send them as separate aggregations.
{
"aggs": {
"Employee": {
"terms": {
"field": "emp_id",
"size": 10
}
},
"Salary":{
"terms": {
"field": "salary",
"size": 10
}
}
},
"size": 0
}
To answer from our conversation you would issue the following http command using curl.
curl -XGET localhost:9200/<your index>/<type>/_search?pretty

Scope 0 count terms in aggregation in ElasticSearch

i am doing aggregations on "location" field in my document ,where there is also a "city" field in the same document.I am querying the document on city field and aggregating the documents on location field.
{
"aggs": {
"locations": {
"terms": {
"field": "location",
"min_doc_count": 0
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"city": "mumbai",
"_cache": true
}
}
]
}
}
}
}
}
Now the count and aggregations come fine and along with the hits.but my problem is that i want to do aggregation with 'doc-count' set to 0 and the aggregation bucket returns me all the lcoations with 0 count which even falls in other city.I want to get 0 count locations only for that city.want to scope the context of 0 count location to city.
I tried achieving this by nested aggregation placing location inside nested city and then doing aggs, or combining the filter aggs with terms agg but still getting the same result.Is there any way to achieve this or elasticsearch is inherently build to work like this.
ES Version - 1.6
My mapping looks like this:
{
"service": {
"_source": {
"enabled": true
},
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"location": {
"type": "string",
"index": "not_analyzed"
},
"city": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
Sample docs to index
{
"name": "a",
"location": "x",
"city": "mumbai"
}
{
"name": "b",
"location": "x",
"city": "mumbai"
}
{
"name": "c",
"location": "y"
"city": "chennai"
}
You should try to sort your terms aggregation (embedded into a filter aggregation) by ascending doc count and you'll get all the terms with 0 doc count first. Note that by default, you'll only get the first 10 terms, if you have less terms with 0 doc count, you'll see them all, otherwise you might need to increase the size parameter to something higher than 10.
{
"aggs": {
"city_filter": {
"filter": {
"term": {
"city": "mumbai"
}
},
"aggs": {
"locations": {
"terms": {
"field": "location",
"min_doc_count": 0,
"size": 20, <----- add this if you have more than ten 0-doc-count terms
"order": { <----- add this to see 0-doc-count first
"_count": "asc"
}
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"city": "mumbai",
"_cache": true
}
}
]
}
}
}
}
}

Aggregations on most recent document in group using elasticsearch

Suppose there are several documents per person that contain values:
{
"name": "John",
"value": 1,
"timestamp": 2014-06-15
}
{
"name": "John",
"value": 2,
"timestamp": 2014-06-16
}
{
"name": "Sam",
"value": 2,
"timestamp": 2014-06-15
}
{
"name": "Sam",
"value": 3,
"timestamp": 2014-06-16
}
How do I get a list of the most recent documents for each person?
How do I get an average of the values for the list of the most recent documents for each person? Given the sample data, this would be 2.5, not 2.
Is there some combination of buckets and metrics that could achieve this result? Will I need to implement a custom aggregator as part of a plugin, or must this sort of computation be performed in memory?
If you only need to find the most recent persons try something like this:
"aggs": {
"personName": {
"terms": {
"field": "name",
"size": 5,
"order": {"timeCreated": "desc"}
},
"aggs": {
"timeCreated": {
"max": {"field": "timestamp"}
}
}
}
}
The second operation is just an aggregation, and to get the average of the value field you could try something like:
curl -XPOST "http://DOMAIN:9200/your/data/_search" -d'
{
"size": 0,
"aggregations": {
"the_name": {
"terms": {
"field": "name",
"order": {
"value_avg": "desc"
}
},
"aggregations": {
"value_avg": {
"avg": {
"field": "value"
}
}
}
}
}
}'
To achieve a solution for your first issue I would recommend you to order the response by date, and then in your project ignore a term when you have another with the same name (meaning filter the data after the response of ES)