Scope 0 count terms in aggregation in ElasticSearch - lucene

i am doing aggregations on "location" field in my document ,where there is also a "city" field in the same document.I am querying the document on city field and aggregating the documents on location field.
{
"aggs": {
"locations": {
"terms": {
"field": "location",
"min_doc_count": 0
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"city": "mumbai",
"_cache": true
}
}
]
}
}
}
}
}
Now the count and aggregations come fine and along with the hits.but my problem is that i want to do aggregation with 'doc-count' set to 0 and the aggregation bucket returns me all the lcoations with 0 count which even falls in other city.I want to get 0 count locations only for that city.want to scope the context of 0 count location to city.
I tried achieving this by nested aggregation placing location inside nested city and then doing aggs, or combining the filter aggs with terms agg but still getting the same result.Is there any way to achieve this or elasticsearch is inherently build to work like this.
ES Version - 1.6
My mapping looks like this:
{
"service": {
"_source": {
"enabled": true
},
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"location": {
"type": "string",
"index": "not_analyzed"
},
"city": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
Sample docs to index
{
"name": "a",
"location": "x",
"city": "mumbai"
}
{
"name": "b",
"location": "x",
"city": "mumbai"
}
{
"name": "c",
"location": "y"
"city": "chennai"
}

You should try to sort your terms aggregation (embedded into a filter aggregation) by ascending doc count and you'll get all the terms with 0 doc count first. Note that by default, you'll only get the first 10 terms, if you have less terms with 0 doc count, you'll see them all, otherwise you might need to increase the size parameter to something higher than 10.
{
"aggs": {
"city_filter": {
"filter": {
"term": {
"city": "mumbai"
}
},
"aggs": {
"locations": {
"terms": {
"field": "location",
"min_doc_count": 0,
"size": 20, <----- add this if you have more than ten 0-doc-count terms
"order": { <----- add this to see 0-doc-count first
"_count": "asc"
}
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"city": "mumbai",
"_cache": true
}
}
]
}
}
}
}
}

Related

ES6: Joining of subqueries to two different rows through the AND operator

I have following index:
+-----+-----+-------+
| oid | tag | value |
+-----+-----+-------+
| 1 | t1 | aaa |
| 1 | t2 | bbb |
| 2 | t1 | aaa |
| 2 | t2 | ddd |
| 2 | t3 | eee |
+-----+-----+-------+
where: oid - object ID, tag - property name, value - property value.
Mappings:
"mappings": {
"document": {
"_all": { "enabled": false },
"properties": {
"oid": { "type": "integer" },
"tag": { "type": "text" }
"value": { "type": "text" },
}
}
}
This simple structure allows store any number of object properties and it is a quite simple to search by one property or by more using OR logical operator.
E.g. get object oid's where:
(tag='t1' AND value='aaa') OR (tag='t2' AND value='ddd')
ES query:
{
"_source": { "includes":["oid"] },
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{ "term": { "tag": "t1" } },
{ "term": { "value": "aaa" } }
]
}
},
{
"bool": {
"must": [
{ "term": { "tag": "t2" } },
{ "term": { "value": "ddd" } }
]
}
}
],
"minimum_should_match": "1"
}
}
}
But it is hard to search by two or more properties using AND logical operator. So the question is how to join two sub-queries to two different records through the AND operator. E.g. get object oid's where:
(tag='t1' AND value='aaa') AND (tag='t2' AND value='ddd')
In this case result must be: { "oid": "2" }
Searching data contains in two different records and applying MUST instead of SHOULD from the previous example returns nothing in this case.
I have two equivalents in SQL of what I need:
SELECT i1.[oid]
FROM [index] i1 INNER JOIN [index] i2 ON i1.oid = i2.oid
WHERE
(i1.tag='t1' AND i1.value='aaa')
AND
(i2.tag='t2' AND i2.value='ddd')
---------
SELECT [oid] FROM [index] WHERE tag='t1' AND value='aaa'
INTERSECT
SELECT [oid] FROM [index] WHERE tag='t2' AND value='ddd'
Do the two requests and merge them on the client is not the option.
Elastic Search version is 6.1.1
In order to achieve what you want, you need to use the nested type, i.e. your mapping should look like this:
PUT my-index
{
"mappings": {
"doc": {
"properties": {
"oid": {
"type": "keyword"
},
"data": {
"type": "nested",
"properties": {
"tag": {
"type": "keyword"
},
"value": {
"type": "text"
}
}
}
}
}
}
}
The documents would be indexed like this:
PUT /my-index/doc/_bulk
{ "index": {"_id": 1}}
{ "oid": 1, "data": [ {"tag": "t1", "value": "aaa"}, {"tag": "t2", "value": "bbb"}] }
{ "index": {"_id": 2}}
{ "oid": 2, "data": [ {"tag": "t1", "value": "aaa"}, {"tag": "t2", "value": "ddd"}, {"tag": "t3", "value": "eee"}] }
Then you can make your query work like this:
POST my-index/_search
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "data",
"query": {
"bool": {
"filter": [
{
"term": {
"data.tag": "t1"
}
},
{
"term": {
"data.value": "aaa"
}
}
]
}
}
}
},
{
"nested": {
"path": "data",
"query": {
"bool": {
"filter": [
{
"term": {
"data.tag": "t2"
}
},
{
"term": {
"data.value": "ddd"
}
}
]
}
}
}
}
]
}
}
}
There might be one way, which is a little ugly: adding terms aggregations to your query body.
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{ "term": { "tag": "t1" } },
{ "term": { "value": "aaa" } }
]
}
},
{
"bool": {
"must": [
{ "term": { "tag": "t2" } },
{ "term": { "value": "ddd" } }
]
}
}
],
"minimum_should_match": "1"
}
},
"size": 0,
"aggs": {
"find_joined_oid": {
"terms": {
"field": "oid.keyword"
}
}
}
}
If everything goes right, this will output something like
{
"took": 123,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 123,
"max_score": 0,
"hits": []
},
"aggregations": {
"find_joined_oid": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 1
},
{
"key": "2",
"doc_count": 2
}
}
}
}
Here, in the "aggregations" part,
"key": "1"
means your "oid":"1", and
"doc_counts": 1
means there is 1 hit in query with "oid":"1".
As you know how many tags you are querying to match, say N, in the aggregations result body, only those "key"s with "doc_count" equal to N are the result you're pursuing. In this example, you are querying tag:t1 (with value aaa) and tag:t2 (with value ddd), thus N=2. You can iterate in the result bucket list to find out those "key"s who have "doc_count" equal to 2.
However, there should be a better way. If you would alter your mapping to a document like style, ie. store all fields of one oid in one doc, life will be much easier.
{
"properties": {
"oid": { "type": "integer" },
"tag-1": { "type": "text" }
"value-1": { "type": "text" },
"tag-2": { "type": "text" }
"value-2": { "type": "text" }
}
}
When you want to add new tag-value pairs, just get the original doc with oid concerned, put new tag-pair into the doc, and put the whole new doc back into Elasticsearch with the same _id which you get from the original one. Most of the time dynamic mapping will work properly in your case, which means you don't need to assert mapping for new fields explicitly.
No-SQL databases like Elasticsearch and others are not designed to handle such SQL style query you are asking.

Elastic Search Query for displaying the documents when it contains all the set of specific properties

I am new to Elastic Search APIs. I have a requirement where i need to query and list the documents which compulsorily contains following properties, say
"request: "/v3?id=100000" & "type: "GET"
Result should contains list of documents containing both the above. I have tried the following and it gets either of the above.
{
"query": {
"match": {
"type": "GET"
}
}
}
I tried
{
"query": {
"match": {
"type": "GET",
"request: "/v3/id=100000"
}
}
}
It fails...
Can someone suggest me a query to list all the docs with both the properties set as above ? Not sure how to use filters, if I try it shows failures - parse exceptions.
My example document:
{
"_index": "logstash-2016.04.22",
"_type": "endpoint-access",
"_id": "fAhTQkDRQTiHKlzuleNA",
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2016-04-22T15:26:35.153Z",
"offset": "43714176",
"ident": "-",
"auth": "-",
"timestamp": "22/Apr/2016:15:26:35 +0000",
"type": "GET",
"request": "/v3?id=1b32e833-b521",
"httpversion": "1.1",
"response": "500",
"bytes": "265",
"referrer": "-",
"agent": "-",
"x_forwarded_for": "\"101.2.123.24\""
"host": "101.123.115.167"
},
"sort": [
1461338795153,
1461338795153
]
}
You may use "must" to get the result:
{
"query": {
"bool": {
"must": [
{
"match": {
"type": "GET"
}
},
{
"match": {
"request": "/v3/id=100000"
}
}
]
}
}
}

Cloudant Search Queries Index Function

I can't find very much documentation on how to properly define the index function such that I can do a full text search on the information that I need.
I've used the Alchemy API to add "entities" json to my documents.
For instance, I have a document with the following:
"_id": "redacted",
"_rev": "redacted",
"session": "20152016",
"entities": [
{
"relevance": "0.797773",
"count": "3",
"type": "Organization",
"text": "California Constitution"
},
{
"relevance": "0.690092",
"count": "1",
"type": "Organization",
"text": "Governors Highway Safety Association"
}
]
I haven't been able to find any code snippets showing how to construct a search index function that looks at nested json.
My stab at indexing the whole object appears to be incorrect.
This is the full design document:
{
"_id": "_design/entities",
"_rev": "redacted",
"views": {},
"language": "javascript",
"indexes": {
"entities": {
"analyzer": "standard",
"index": "function (doc) {\n if (doc.entities.relevance > 0.5){\n index(\"default\", doc.entities.text, {\"store\":\"yes\"});\n }\n\n}"
}
}
}
And the search index formatted a little bit more clearly is
function (doc) {
if (doc.entities.relevance > 0.5){
index("default", doc.entities.text, {"store":"yes"});
}
}
Adding the for loop as suggested below makes a lot of sense.
However, I still am not able to return any results.
My query is
"https://user.cloudant.com/calbills/_design/entities/_search/entities?q=Governors"
Server response is:
{"total_rows":0,"bookmark":"g2o","rows":[]}
The "for..in" style loop doesn't seem to work.
However, I do get results using the more standard for loop loops.
function (doc) {
if(doc.entities){
var arrayLength = doc.entities.length;
for (var i = 0; i < arrayLength; i++) {
if (parseFloat(doc.entities[i].relevance) > 0.5)
index("default", doc.entities[i].text);
}
}
}
Cheers!
Your need to loop on the elements in the doc.entities array.
function (doc) {
for(entity in doc.entities){
if (parseFloat(entity.relevance) > 0.5){
index("default", entity.text, {"store":"yes"});
}
}
}
This is what I tried :
function(doc){
if(doc.entities){
for( var p in doc.entities ){
if (doc.entities[p].relevance > 0.5)
{
index("entitiestext", doc.entities[p].text, {"store":"yes"});
}
}
}
}
Query String used :"q=entitiestext:California Constitution&include_docs=true"
Result:
{
"total_rows": 1,
"bookmark": "xxxx",
"rows": [
{
"id": "redacted",
"order": [
0.03693288564682007,
1
],
"fields": {
"entitiestext": [
"Governors Highway Safety Association",
"California Constitution"
]
},
"doc": {
"_id": "redacted",
"_rev": "4-7f6e6db246abcf2f884dc0b91451272a",
"session": "20152016",
"entities": [
{
"relevance": "0.797773",
"count": "3",
"type": "Organization",
"text": "California Constitution"
},
{
"relevance": "0.690092",
"count": "1",
"type": "Organization",
"text": "Governors Highway Safety Association"
}
]
}
}
]
}
Query String used: q=entitiestext:California Constitution
Result:
{
"total_rows": 1,
"bookmark": "xxxx",
"rows": [
{
"id": "redacted",
"order": [
0.03693288564682007,
1
],
"fields": {
"entitiestext": [
"Governors Highway Safety Association",
"California Constitution"
]
}
}
]
}

Elasticsearch: Update mapping field type ID from long to string

I changed the elasticsearch mapping field type from:
"articles": {
"properties": {
"id": {
"type": "long"
}}}
to
"articles": {
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
}
After that I did the following steps:
Create the index with new mapping
Reindex the mapping to the new index
After the mapping update my previous query filter doesn't work anymore and I have no results:
GET /art/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "articles"
}
},
{
"term": {
"id": "123467679"
}
}
]
}
}
}
},
"size": 1,
"sort": [
{
"_score": "desc"
}
]
}
If I check with this query the result is what I expect:
GET /art/articles/_search
{
"query": {
"match_all": {}
}
}
I would appreciate if somebody have some idea why after the field type change the query is no longer working.
Thanks!
The problem in the query was with ID filter.
The query works correctly changing the filter from:
"term": {
"id": "123467679"
}
in:
"term": {
"_id": "123467679"
}
I'm still a beginner with elasticsearch to figure out why the mapping change broke the query although I did the reindex, but "_id" fixed my query.
You can find more informations in the :
elasticsearch mapping reference documentation.

hierarchical faceting with Elasticsearch

I'm using elasticsearch and need to implement facet search for hierarchical object as follow:
category 1 (10)
subcategory 1 (4)
subcategory 2 (6)
category 2 (X)
...
So I need to get facets for two related objects. Documentation says that it's possible to get such kind of facets for numeric value, but I need it for strings http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-terms-stats-facet.html
Here is another interesting topic, unfortunately it's old: http://elasticsearch-users.115913.n3.nabble.com/Pivot-facets-td2981519.html
Does it possible with elastic search?
If so, how can I do that?
The previous solution works really well until you have no more than a multi-level tag on a single-document. In this case a simple aggregation doesn't work, because the flat structure of the lucene fields mix the results on the internal aggregation.
See the example below:
DELETE /test_category
POST /test_category
# Insert a doc with 2 hierarchical tags
POST /test_category/test/1
{
"categories": [
{
"cat_1": "1",
"cat_2": "1.1"
},
{
"cat_1": "2",
"cat_2": "2.2"
}
]
}
# Simple two-levels aggregations query
GET /test_category/test/_search?search_type=count
{
"aggs": {
"main_category": {
"terms": {
"field": "categories.cat_1"
},
"aggs": {
"sub_category": {
"terms": {
"field": "categories.cat_2"
}
}
}
}
}
}
That's the WRONG response that I have got on ES 1.4, where the fields on the internal aggregation are mixed at a document level:
{
...
"aggregations": {
"main_category": {
"buckets": [
{
"key": "1",
"doc_count": 1,
"sub_category": {
"buckets": [
{
"key": "1.1",
"doc_count": 1
},
{
"key": "2.2", <= WRONG
"doc_count": 1
}
]
}
},
{
"key": "2",
"doc_count": 1,
"sub_category": {
"buckets": [
{
"key": "1.1", <= WRONG
"doc_count": 1
},
{
"key": "2.2",
"doc_count": 1
}
]
}
}
]
}
}
}
A Solution can be to use nested objects. These are the steps to do:
1) Define a new type in the schema with nested objects
POST /test_category/test2/_mapping
{
"test2": {
"properties": {
"categories": {
"type": "nested",
"properties": {
"cat_1": {
"type": "string"
},
"cat_2": {
"type": "string"
}
}
}
}
}
}
# Insert a single document
POST /test_category/test2/1
{"categories":[{"cat_1":"1","cat_2":"1.1"},{"cat_1":"2","cat_2":"2.2"}]}
2) Run a nested aggregation query:
GET /test_category/test2/_search?search_type=count
{
"aggs": {
"categories": {
"nested": {
"path": "categories"
},
"aggs": {
"main_category": {
"terms": {
"field": "categories.cat_1"
},
"aggs": {
"sub_category": {
"terms": {
"field": "categories.cat_2"
}
}
}
}
}
}
}
}
That's the response, now correct, that I have got:
{
...
"aggregations": {
"categories": {
"doc_count": 2,
"main_category": {
"buckets": [
{
"key": "1",
"doc_count": 1,
"sub_category": {
"buckets": [
{
"key": "1.1",
"doc_count": 1
}
]
}
},
{
"key": "2",
"doc_count": 1,
"sub_category": {
"buckets": [
{
"key": "2.2",
"doc_count": 1
}
]
}
}
]
}
}
}
}
The same solution can be extended to a more than two-levels hierarchy facet.
Currently, elasticsearch does not support hierarchical facetting out-of-the-box. But the upcoming 1.0 release features a new aggregations module, that can be used to get these kind of facets (which are more like pivot-facets rather than hierarchical facets). Version 1.0 is currently in beta, you can download the second beta and test out aggregatins by yourself. Your example might look like
curl -XPOST 'localhost:9200/_search?pretty' -d '
{
"aggregations": {
"main category": {
"terms": {
"field": "cat_1",
"order": {"_term": "asc"}
},
"aggregations": {
"sub category": {
"terms": {
"field": "cat_2",
"order": {"_term": "asc"}
}
}
}
}
}
}'
The idea is, to have a different field for each level of facetting and bucket your facets based on the terms of the first level (cat_1). These aggregations then would have sub-buckets, based on the terms of the second level (cat_2). The result may look like
{
"aggregations" : {
"main category" : {
"buckets" : [ {
"key" : "category 1",
"doc_count" : 10,
"sub category" : {
"buckets" : [ {
"key" : "subcategory 1",
"doc_count" : 4
}, {
"key" : "subcategory 2",
"doc_count" : 6
} ]
}
}, {
"key" : "category 2",
"doc_count" : 7,
"sub category" : {
"buckets" : [ {
"key" : "subcategory 1",
"doc_count" : 3
}, {
"key" : "subcategory 2",
"doc_count" : 4
} ]
}
} ]
}
}
}