elasticsearch / lucene highlight - lucene

I'm using ElasticSearch to index documents.
My mapping is:
"mongodocid": {
"boost": 1.0,
"store": "yes",
"type": "string"
},
"fulltext": {
"boost": 1.0,
"index": "analyzed",
"store": "yes",
"type": "string",
"term_vector": "with_positions_offsets"
}
To highlight the complete fulltext I am setting number_of_framgments to 0.
If I do the following Lucene-like string query:
{
"highlight": {
"pre_tags": "<b>",
"fields": {
"fulltext": {
"number_of_fragments": 0
}
},
"post_tags": "</b>"
},
"query": {
"query_string": {
"query": "fulltext:test"
}
},
"size": 100
}
For some documents in the result set the length of the highlighted fulltext is smaller than the fulltext itself.
Since I am setting number_of_fragments to 0 and pre_tags/post_tags are added this should not happen.
Now comes the strange behaviour: If I only search for one of the failing elements by doing this:
{
"highlight": {
"pre_tags": "<b>",
"fields": {
"fulltext": {
"number_of_fragments": 0
}
},
"post_tags": "</b>"
},
"query": {
"query_string": {
"query": "fulltext:test AND mongodocid:4d0a861c2ebef6032c00b1ec"
}
},
"size": 100
}
then all works fine.
Any ideas?

Sounds like issue which has been fixed in 0.14.0 (see #479). As of writing the 0.14.0 hasn't been released yet, can you try master?

Related

Search including special characters in MongoDB Atlas

I faced with the issue when I try to search for several words including a special character (section sign "§").
Example: AB § 32.
I want all words "AB", "32" and symbol "§" to be included in found documents.
In some cases document can be found, in some not.
If my document contains the following text then search finds it:
Lagrum: 32 § 1 mom. första stycket a) kommunalskattelagen (1928:370) AB
But if document contains this text then search doesn't find:
Lagrum: 32 § 1 mom. första stycket AB
For symbol "§" I use UT8-encoding "\xc2\xa7".
Index uses "lucene.swedish" analyzer.
"Content": [
{
"analyzer": "lucene.swedish",
"minGrams": 4,
"tokenization": "nGram",
"type": "autocomplete"
},
{
"analyzer": "lucene.swedish",
"type": "string"
}
]
Query looks like:
{
"index": "test_index",
"compound": {
"filter": [
{
"text": {
"query": [
"111111111111"
],
"path": "ProductId"
}
},
],
"must": [
{
"autocomplete": {
"query": [
"AB"
],
"path": "Content"
}
},
{
"autocomplete": {
"query": [
"\xc2\xa7",
],
"path": "Content"
}
},
{
"autocomplete": {
"query": [
"32"
],
"path": "Content"
}
}
],
},
"count": {
"type": "lowerBound",
"threshold": 500
}
}
The question is what is wrong with the search and how can I get a correct result (return both above mentioned documents) ?
Focusing only on the content field, here is an index definition that should work for your requirements. The docs are here. Let me know if this works for you.
{
"mappings": {
"dynamic": false,
"fields": {
"content": [
{
"type": "autocomplete",
"tokenization": "nGram",
"minGrams": 4,
"maxGrams": 7,
"foldDiacritics": false,
"analyzer": "lucene.whitespace"
},
{
"analyzer": "lucene.swedish",
"type": "string"
}
]
}
}
}

Elastic Search Query for displaying the documents when it contains all the set of specific properties

I am new to Elastic Search APIs. I have a requirement where i need to query and list the documents which compulsorily contains following properties, say
"request: "/v3?id=100000" & "type: "GET"
Result should contains list of documents containing both the above. I have tried the following and it gets either of the above.
{
"query": {
"match": {
"type": "GET"
}
}
}
I tried
{
"query": {
"match": {
"type": "GET",
"request: "/v3/id=100000"
}
}
}
It fails...
Can someone suggest me a query to list all the docs with both the properties set as above ? Not sure how to use filters, if I try it shows failures - parse exceptions.
My example document:
{
"_index": "logstash-2016.04.22",
"_type": "endpoint-access",
"_id": "fAhTQkDRQTiHKlzuleNA",
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2016-04-22T15:26:35.153Z",
"offset": "43714176",
"ident": "-",
"auth": "-",
"timestamp": "22/Apr/2016:15:26:35 +0000",
"type": "GET",
"request": "/v3?id=1b32e833-b521",
"httpversion": "1.1",
"response": "500",
"bytes": "265",
"referrer": "-",
"agent": "-",
"x_forwarded_for": "\"101.2.123.24\""
"host": "101.123.115.167"
},
"sort": [
1461338795153,
1461338795153
]
}
You may use "must" to get the result:
{
"query": {
"bool": {
"must": [
{
"match": {
"type": "GET"
}
},
{
"match": {
"request": "/v3/id=100000"
}
}
]
}
}
}

Scope 0 count terms in aggregation in ElasticSearch

i am doing aggregations on "location" field in my document ,where there is also a "city" field in the same document.I am querying the document on city field and aggregating the documents on location field.
{
"aggs": {
"locations": {
"terms": {
"field": "location",
"min_doc_count": 0
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"city": "mumbai",
"_cache": true
}
}
]
}
}
}
}
}
Now the count and aggregations come fine and along with the hits.but my problem is that i want to do aggregation with 'doc-count' set to 0 and the aggregation bucket returns me all the lcoations with 0 count which even falls in other city.I want to get 0 count locations only for that city.want to scope the context of 0 count location to city.
I tried achieving this by nested aggregation placing location inside nested city and then doing aggs, or combining the filter aggs with terms agg but still getting the same result.Is there any way to achieve this or elasticsearch is inherently build to work like this.
ES Version - 1.6
My mapping looks like this:
{
"service": {
"_source": {
"enabled": true
},
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"location": {
"type": "string",
"index": "not_analyzed"
},
"city": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
Sample docs to index
{
"name": "a",
"location": "x",
"city": "mumbai"
}
{
"name": "b",
"location": "x",
"city": "mumbai"
}
{
"name": "c",
"location": "y"
"city": "chennai"
}
You should try to sort your terms aggregation (embedded into a filter aggregation) by ascending doc count and you'll get all the terms with 0 doc count first. Note that by default, you'll only get the first 10 terms, if you have less terms with 0 doc count, you'll see them all, otherwise you might need to increase the size parameter to something higher than 10.
{
"aggs": {
"city_filter": {
"filter": {
"term": {
"city": "mumbai"
}
},
"aggs": {
"locations": {
"terms": {
"field": "location",
"min_doc_count": 0,
"size": 20, <----- add this if you have more than ten 0-doc-count terms
"order": { <----- add this to see 0-doc-count first
"_count": "asc"
}
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"city": "mumbai",
"_cache": true
}
}
]
}
}
}
}
}

Elasticsearch: Update mapping field type ID from long to string

I changed the elasticsearch mapping field type from:
"articles": {
"properties": {
"id": {
"type": "long"
}}}
to
"articles": {
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
}
After that I did the following steps:
Create the index with new mapping
Reindex the mapping to the new index
After the mapping update my previous query filter doesn't work anymore and I have no results:
GET /art/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "articles"
}
},
{
"term": {
"id": "123467679"
}
}
]
}
}
}
},
"size": 1,
"sort": [
{
"_score": "desc"
}
]
}
If I check with this query the result is what I expect:
GET /art/articles/_search
{
"query": {
"match_all": {}
}
}
I would appreciate if somebody have some idea why after the field type change the query is no longer working.
Thanks!
The problem in the query was with ID filter.
The query works correctly changing the filter from:
"term": {
"id": "123467679"
}
in:
"term": {
"_id": "123467679"
}
I'm still a beginner with elasticsearch to figure out why the mapping change broke the query although I did the reindex, but "_id" fixed my query.
You can find more informations in the :
elasticsearch mapping reference documentation.

ElasticSearch:filtering documents based on field length?

Is there a way to filter ElasticSearch documents based on the length of a specific field?
For instance, I have a bunch of documents with the field "body", and I only want to return results where the number of characters in body is > 1000. Is there a way to do this in ES without having to add an extra column with the length in the index?
Use the script filter, like this:
"filtered" : {
"query" : {
...
},
"filter" : {
"script" : {
"script" : "doc['body'].length > 1000"
}
}
}
EDIT
Sorry, meant to reference the query DSL guide on script filters
You can also create a custom tokenizer and use it in a multifields property as in the following:
PUT test_index
{
"settings": {
"analysis": {
"analyzer": {
"character_analyzer": {
"type": "custom",
"tokenizer": "character_tokenizer"
}
},
"tokenizer": {
"character_tokenizer": {
"type": "nGram",
"min_gram": 1,
"max_gram": 1
}
}
}
},
"mappings": {
"person": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"words_count": {
"type": "token_count",
"analyzer": "standard"
},
"length": {
"type": "token_count",
"analyzer": "character_analyzer"
}
}
}
}
}
}
}
PUT test_index/person/1
{
"name": "John Smith"
}
PUT test_index/person/2
{
"name": "Rachel Alice Williams"
}
GET test_index/person/_search
{
"query": {
"term": {
"name.length": 10
}
}
}