elasticsearch Not_analyzed and analyzed - indexing

Hello For certain requirement i have made all the index not_analyzed
{
"template": "*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"my_template": {
"match_mapping_type": "string",
"mapping": {
"index": "not_analyzed"
}
}
}
]
}
}
}
But now as per our requirement i have to make certain field as analyzed . and keep rest of the field as not analyzed
My Data is of type :
{ "field1":"Value1",
"field2":"Value2",
"field3":"Value3",
"field4":"Value3",
"field5":"Value4",
"field6":"Value5",
"field7":"Value6",
"field8":"",
"field9":"ce-3sdfa773-7sdaf2-989e-5dasdsdf",
"field10":"12345678",
"field11":"ertyu12345ffd",
"field12":"A",
"field13":"Value7",
"field14":"Value8",
"field15":"Value9",
"field16":"Value10",
"field17":"Value11",
"field18":"Value12",
"field19":{
"field20":"Value13",
"field21":"Value14"
},
"field22":"Value15",
"field23":"ipaddr",
"field24":"datwithtime",
"field25":"Value6",
"field26":"0",
"field20":"0",
"field28":"0"
}
If i change my template as per recommendation to something like this
{
"template": "*",
"mappings": {
"_default_": {
"properties": {
"filed6": {
"type": "string",
"analyzer": "keyword",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}}},
"dynamic_templates": [
{
"my_template": {
"match_mapping_type": "*",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
Then i get error while i insert data stating
{"error":"MapperParsingException[failed to parse [field19]]; nested: ElasticsearchIllegalArgumentException[unknown property [field20 ]]; ","status":400}

In short you want to change the mapping of your index.
If your index does not contain any data(which I suppose, is not the
case), then you can simply delete the index and create it again with
new mapping.
If your index contains data, you will have to reindex it.
Steps for reindexing:
Put all data from existing index to dummy index.
Delete existing index. Generate new mapping.
Transfer data from dummy index to newly created index.
You can also give a look to elastic search alias here
This link might also be useful.
If you want to use the same field as analysed and not analysed at the same time you have to use multifield using
"title": {
"type": "multi_field",
"fields": {
"title": { "type": "string" },
"raw": { "type": "string", "index": "not_analyzed" }
}
}
This is for your reference.
For defining multifield in dynamic_templates use:
{
"template": "*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"my_template": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
]
}
}
}
Refer this for more info on this.

You can either write multiple templates or have separate properties depending on your requirements. Both will work smoothly.
1) Multiple Templates
{
"mappings": {
"your_doctype": {
"dynamic_templates": [
{
"analyzed_values": {
"match_mapping_type": "*",
"match_pattern": "regex",
"match": "title|summary",
"mapping": {
"type": "string",
"analyzer": "keyword"
}
}
},
{
"date_values": {
"match_mapping_type": "date",
"match": "*_date",
"mapping": {
"type": "date"
}
}
},
{
"exact_values": {
"match_mapping_type": "*",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
Here title and summary are analyzed by keyword analyzer. I have also added date field which is optional, it will map create_date etc as date. Last one will match anything and it will not_analyzed which will fulfill your requirements.
2) Add analyzed field as properties.
{
"mappings": {
"your_doctype": {
"properties": {
"title": {
"type": "string",
"analyzer": "keyword"
},
"summary": {
"type": "string",
"analyzer": "keyword"
}
},
"dynamic_templates": [
{
"any_values": {
"match_mapping_type": "*",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
Here title and summary fields are analyzed while rest will be not_analyzed
You would have to reindex the data no matter which solution you take.
EDIT 1 : After looking at your data and mapping, there is one slight problem in it. Your data contains object structure and you are mapping everything apart from filed6 as string and filed19 is an Object and not string and hence ES is throwing the error. The solution is to let ES decide which datatype the field is with dynamic_type. Change your mapping to this
{
"template": "*",
"mappings": {
"_default_": {
"properties": {
"filed6": {
"type": "string",
"analyzer": "keyword",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
},
"dynamic_templates": [
{
"my_template": {
"match_mapping_type": "*",
"mapping": {
"type": "{dynamic_type}", <--- this will decide if field is either object or string.
"index": "not_analyzed"
}
}
}
]
}
}
}
Hope this helps!!

Related

Json Schema template for valiadtion

I am new to defining JSON schema and validating json against the schema.
Here is a sample json for which I want to define a json schema template for validation:
{
 "version": "1.0",
 "config": {
   "globalConfig": {
      “ClientNames”: [
        “client1”, “client2”, “client3”
       ]
    },
“ClientConfigs”: [
{
“ClientName”: “client1”,
“property1”: “some value”,
“property2”: “some value”
},
{
“ClientName”: “client2”,
“property1”: “some value”,
“property2”: “some value”
},
{
“ClientName”: “client3”,
“property1”: “some value”,
“property2”: “some value”
}
]
}
From what I understand “ClientConfigs” would be an array of object (let’s say ClientConfig) which will contain clientName, property1 and property2. Here is what I think schema would like:
{
"$schema": "http://json-schema.org/draft-01/schema#",
"title": "ClientConfig",
"type": "object",
"description": "Some configuration",
"properties": {
"version": {
"type": "string"
},
"config": {
"$ref": "#/definitions/config"
}
},
"definitions": {
"config": {
"type": "object",
"properties": {
"globalConfig": {
"type": "object",
"description": "Global config for all clients",
"properties": {
"ClientNames": {
"type": "array",
"minItems": 1,
"items": {
"type": "string"
}
}
}
},
"ClientConfigs": {
"type": "array",
"description": "List of configs for different clients",
"minItems": 1,
"items": {
"$ref": "#/definitions/ClientConfig"
}
}
}
},
"ClientConfig": {
"type": "object",
"properties": {
"ClientName": {
"type": "string"
},
"property1": {
"type": "string"
},
"property2": {
"type": "string"
}
}
}
}
}
I want to validate 2 things with jsonschema:
ClientName in every element of ClientConfigs array is one of the values from “ClientNames” i.e. individual ClientConfig in “ClientConfigs” array should only contain client names defined in property “ClientNames”.
Every clientName present in “ClientNames” should be defined as an element in “ClientConfigs” array. To be more precise, ClientConfig is defined for every clientName present in “ClientNames” property.
Here is an example which is NOT valid according to my requirements:
{
 "version": "1.0",
 "config": {
   "globalConfig": {
      “ClientNames”: [
        “client1”, “client2”, “client3”
       ]
    },
“ClientConfigs”: [
{
“ClientName”: “client4”,
“property1”: “some value”,
“property2”: “some value”
}
]
}
It is invalid because:
It doesn’t define ClientConfig for client1, client2 and client3.
It defines ClientConfig for client4 which is not present in “ClientNames”.
Is it possible to do such validation using json schema template? If yes, how to validate the same?
You cannot reference instance data in your JSON Schema. This is considered business logic and is out side of the scope for JSON Schema.

Apply addtional constraints to a refered definition in JSON schema

I defined in the schema a validType, where every attribute should have text and annotation .
I want to add additional constraints to refine the text of course must follow "pattern":"[a-z]{2}[0-9]{2}". Is there any way I can apply the constraint directly without copy&paste the content of the validType?
Schema:
{
"type": "object",
"definition": {
"validType": {
"description": "a self-defined type, can be complicated",
"type": "object",
"properties": {
"text": {
"type": "string"
},
"annotation": {
"type": "string"
}
}
},
"properties": {
"name": {
"$ref": "#/definitions/validType"
},
"course": {
"$ref": "#/definitions/validType"
}
}
}
}
Data:
{"name":{
"text":"example1",
"annotation":"example1Notes"},
"course":{
"text":"example2",
"annotation":"example2Notes"}}
The expected schema for course should work as this:
{"course": {
"type": "object",
"properties": {
"text": {
"type": "string",
"pattern":"[a-z]{2}[0-9]{2}"
},
"annotation": {
"type": "string"
}
}
}}
But instead of repeating the big block of validType, I am expecting something similar to the format below:
{"course": {
"$ref": "#/definitions/validType"
"text":{"pattern":"[a-z][0-9]"}
}}
Yup! You can add constraints but you cannot modify the constraints you reference.
To add constraints, you need to understand that $ref for draft-07 and previous is the only allowed key in a subschema when it exsits. Other keys are ignored if it exists.
As such, you need to create two subschemas, one of which has your reference, and the other your additional constraint.
You then wrap these two subschemas in an allOf.
Here's how that would look...
{
"$schema": "http://json-schema.org/draft-07/schema#",
"properties": {
"course": {
"allOf": [
{
"$ref": "#/definitions/validType"
},
{
"properties": {
"text": {
"pattern": "[a-z][0-9]"
}
}
}
]
}
}
}
Have a play using https://jsonschema.dev

Elastic Search Query for displaying the documents when it contains all the set of specific properties

I am new to Elastic Search APIs. I have a requirement where i need to query and list the documents which compulsorily contains following properties, say
"request: "/v3?id=100000" & "type: "GET"
Result should contains list of documents containing both the above. I have tried the following and it gets either of the above.
{
"query": {
"match": {
"type": "GET"
}
}
}
I tried
{
"query": {
"match": {
"type": "GET",
"request: "/v3/id=100000"
}
}
}
It fails...
Can someone suggest me a query to list all the docs with both the properties set as above ? Not sure how to use filters, if I try it shows failures - parse exceptions.
My example document:
{
"_index": "logstash-2016.04.22",
"_type": "endpoint-access",
"_id": "fAhTQkDRQTiHKlzuleNA",
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2016-04-22T15:26:35.153Z",
"offset": "43714176",
"ident": "-",
"auth": "-",
"timestamp": "22/Apr/2016:15:26:35 +0000",
"type": "GET",
"request": "/v3?id=1b32e833-b521",
"httpversion": "1.1",
"response": "500",
"bytes": "265",
"referrer": "-",
"agent": "-",
"x_forwarded_for": "\"101.2.123.24\""
"host": "101.123.115.167"
},
"sort": [
1461338795153,
1461338795153
]
}
You may use "must" to get the result:
{
"query": {
"bool": {
"must": [
{
"match": {
"type": "GET"
}
},
{
"match": {
"request": "/v3/id=100000"
}
}
]
}
}
}

Elasticsearch: Update mapping field type ID from long to string

I changed the elasticsearch mapping field type from:
"articles": {
"properties": {
"id": {
"type": "long"
}}}
to
"articles": {
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
}
After that I did the following steps:
Create the index with new mapping
Reindex the mapping to the new index
After the mapping update my previous query filter doesn't work anymore and I have no results:
GET /art/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"type": {
"value": "articles"
}
},
{
"term": {
"id": "123467679"
}
}
]
}
}
}
},
"size": 1,
"sort": [
{
"_score": "desc"
}
]
}
If I check with this query the result is what I expect:
GET /art/articles/_search
{
"query": {
"match_all": {}
}
}
I would appreciate if somebody have some idea why after the field type change the query is no longer working.
Thanks!
The problem in the query was with ID filter.
The query works correctly changing the filter from:
"term": {
"id": "123467679"
}
in:
"term": {
"_id": "123467679"
}
I'm still a beginner with elasticsearch to figure out why the mapping change broke the query although I did the reindex, but "_id" fixed my query.
You can find more informations in the :
elasticsearch mapping reference documentation.

ElasticSearch:filtering documents based on field length?

Is there a way to filter ElasticSearch documents based on the length of a specific field?
For instance, I have a bunch of documents with the field "body", and I only want to return results where the number of characters in body is > 1000. Is there a way to do this in ES without having to add an extra column with the length in the index?
Use the script filter, like this:
"filtered" : {
"query" : {
...
},
"filter" : {
"script" : {
"script" : "doc['body'].length > 1000"
}
}
}
EDIT
Sorry, meant to reference the query DSL guide on script filters
You can also create a custom tokenizer and use it in a multifields property as in the following:
PUT test_index
{
"settings": {
"analysis": {
"analyzer": {
"character_analyzer": {
"type": "custom",
"tokenizer": "character_tokenizer"
}
},
"tokenizer": {
"character_tokenizer": {
"type": "nGram",
"min_gram": 1,
"max_gram": 1
}
}
}
},
"mappings": {
"person": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"words_count": {
"type": "token_count",
"analyzer": "standard"
},
"length": {
"type": "token_count",
"analyzer": "character_analyzer"
}
}
}
}
}
}
}
PUT test_index/person/1
{
"name": "John Smith"
}
PUT test_index/person/2
{
"name": "Rachel Alice Williams"
}
GET test_index/person/_search
{
"query": {
"term": {
"name.length": 10
}
}
}