Related
I defined in the schema a validType, where every attribute should have text and annotation .
I want to add additional constraints to refine the text of course must follow "pattern":"[a-z]{2}[0-9]{2}". Is there any way I can apply the constraint directly without copy&paste the content of the validType?
Schema:
{
"type": "object",
"definition": {
"validType": {
"description": "a self-defined type, can be complicated",
"type": "object",
"properties": {
"text": {
"type": "string"
},
"annotation": {
"type": "string"
}
}
},
"properties": {
"name": {
"$ref": "#/definitions/validType"
},
"course": {
"$ref": "#/definitions/validType"
}
}
}
}
Data:
{"name":{
"text":"example1",
"annotation":"example1Notes"},
"course":{
"text":"example2",
"annotation":"example2Notes"}}
The expected schema for course should work as this:
{"course": {
"type": "object",
"properties": {
"text": {
"type": "string",
"pattern":"[a-z]{2}[0-9]{2}"
},
"annotation": {
"type": "string"
}
}
}}
But instead of repeating the big block of validType, I am expecting something similar to the format below:
{"course": {
"$ref": "#/definitions/validType"
"text":{"pattern":"[a-z][0-9]"}
}}
Yup! You can add constraints but you cannot modify the constraints you reference.
To add constraints, you need to understand that $ref for draft-07 and previous is the only allowed key in a subschema when it exsits. Other keys are ignored if it exists.
As such, you need to create two subschemas, one of which has your reference, and the other your additional constraint.
You then wrap these two subschemas in an allOf.
Here's how that would look...
{
"$schema": "http://json-schema.org/draft-07/schema#",
"properties": {
"course": {
"allOf": [
{
"$ref": "#/definitions/validType"
},
{
"properties": {
"text": {
"pattern": "[a-z][0-9]"
}
}
}
]
}
}
}
Have a play using https://jsonschema.dev
I need to define a JSON schema for a JSON in which a field/key is called as the value of a previous field. Examples:
{
"key1": "SOME_VALUE",
"SOME_VALUE": "..."
}
{
"key1": "ANOTHER_VALUE",
"ANOTHER_VALUE": "..."
}
Moreover, the second field should be among the required ones.
I have been looking around but I am not sure JSON schema offers such feature. Maybe some advanced semantics check?
Thanks for the help
The only way you could do this is if you knew the values in advance, but it looks like this is not possible for you. This would need to be in your business logic validation as opposed to your format validation.
So, thanks to Relequestual suggestions, I managed to get to a solution.
Constraint: possible values of "key1" need to be finite and known in advance
Suppose we need a JSON schema for validating a JSON that:
Requires the string properties "required_simple_property1" and "required_simple_property2".
Requires the property "key1" as an enum with 3 possible values ["value1", "value2", "value3"].
Requires a third property, whose key must be the value taken by key1.
This can be accomplished with a schema like:
"oneOf": [
{
"required": [
"required_simple_property1",
"required_simple_property2",
"value1"
],
"properties": {
"key1": {
"type": "string",
"const": "value1"
}
}
},
{
"required": [
"required_simple_property1",
"required_simple_property2",
"value2"
],
"properties": {
"key1": {
"type": "string",
"const": "value2"
}
}
},
{
"required": [
"required_simple_property1",
"required_simple_property2",
"value3"
],
"properties": {
"key1": {
"type": "string",
"const": "value3"
}
}
}
],
"properties": {
"required_simple_property1": {
"type": "string"
},
"required_simple_property2": {
"type": "string"
},
"value1": {
... (anything)
},
"value2": {
... (anything)
},
"value3": {
... (anything)
},
}
Need to Create Avro schema for this ->
{"city":"XXXXXX", "brand":"YYYY", "discount": {} }
{"city":"XXXXXX", "brand":"YYYY", "discount": {"name": "Freedom", "value": 100} }
{"city":"XXXXXX", "brand":"YYYY", "discount": {"name": "Festive Sale", "value": 100} }
I tried with the below shemas, which do not work:
{ "type":"record", "name":"simple_avro",
"fields":[ { "name":"city", "type":"string" },
{ "name":"brand", "type":"string" },
{ "name":"discount",
"type":{ "type":"record", "name":"discount", "default":"",
"fields":[ { "name":"discount_name", "type":"string", "default":"null" },
{ "name":"discount_value", "type":"float", "default":0 }
] }}
] }
For discount field, I have tried default to as "[]", "{}", "", but none of these work.
I don't think an empty {} object is allowed in any case, but if you want to allow no object at all, then it needs to be a union type, as designated by an array for the type, the the default value goes on the outer field rather than inside the record body
{ "name":"discount",
"type" : [
"null",
{ "type":"record", "name":"discount", "fields": [...] }
],
"default" : "null"
In general, I find that easier to express in IDL format
Then, a valid message could be {"city":"XXXXXX", "brand":"YYYY"}
I have following index:
+-----+-----+-------+
| oid | tag | value |
+-----+-----+-------+
| 1 | t1 | aaa |
| 1 | t2 | bbb |
| 2 | t1 | aaa |
| 2 | t2 | ddd |
| 2 | t3 | eee |
+-----+-----+-------+
where: oid - object ID, tag - property name, value - property value.
Mappings:
"mappings": {
"document": {
"_all": { "enabled": false },
"properties": {
"oid": { "type": "integer" },
"tag": { "type": "text" }
"value": { "type": "text" },
}
}
}
This simple structure allows store any number of object properties and it is a quite simple to search by one property or by more using OR logical operator.
E.g. get object oid's where:
(tag='t1' AND value='aaa') OR (tag='t2' AND value='ddd')
ES query:
{
"_source": { "includes":["oid"] },
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{ "term": { "tag": "t1" } },
{ "term": { "value": "aaa" } }
]
}
},
{
"bool": {
"must": [
{ "term": { "tag": "t2" } },
{ "term": { "value": "ddd" } }
]
}
}
],
"minimum_should_match": "1"
}
}
}
But it is hard to search by two or more properties using AND logical operator. So the question is how to join two sub-queries to two different records through the AND operator. E.g. get object oid's where:
(tag='t1' AND value='aaa') AND (tag='t2' AND value='ddd')
In this case result must be: { "oid": "2" }
Searching data contains in two different records and applying MUST instead of SHOULD from the previous example returns nothing in this case.
I have two equivalents in SQL of what I need:
SELECT i1.[oid]
FROM [index] i1 INNER JOIN [index] i2 ON i1.oid = i2.oid
WHERE
(i1.tag='t1' AND i1.value='aaa')
AND
(i2.tag='t2' AND i2.value='ddd')
---------
SELECT [oid] FROM [index] WHERE tag='t1' AND value='aaa'
INTERSECT
SELECT [oid] FROM [index] WHERE tag='t2' AND value='ddd'
Do the two requests and merge them on the client is not the option.
Elastic Search version is 6.1.1
In order to achieve what you want, you need to use the nested type, i.e. your mapping should look like this:
PUT my-index
{
"mappings": {
"doc": {
"properties": {
"oid": {
"type": "keyword"
},
"data": {
"type": "nested",
"properties": {
"tag": {
"type": "keyword"
},
"value": {
"type": "text"
}
}
}
}
}
}
}
The documents would be indexed like this:
PUT /my-index/doc/_bulk
{ "index": {"_id": 1}}
{ "oid": 1, "data": [ {"tag": "t1", "value": "aaa"}, {"tag": "t2", "value": "bbb"}] }
{ "index": {"_id": 2}}
{ "oid": 2, "data": [ {"tag": "t1", "value": "aaa"}, {"tag": "t2", "value": "ddd"}, {"tag": "t3", "value": "eee"}] }
Then you can make your query work like this:
POST my-index/_search
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "data",
"query": {
"bool": {
"filter": [
{
"term": {
"data.tag": "t1"
}
},
{
"term": {
"data.value": "aaa"
}
}
]
}
}
}
},
{
"nested": {
"path": "data",
"query": {
"bool": {
"filter": [
{
"term": {
"data.tag": "t2"
}
},
{
"term": {
"data.value": "ddd"
}
}
]
}
}
}
}
]
}
}
}
There might be one way, which is a little ugly: adding terms aggregations to your query body.
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{ "term": { "tag": "t1" } },
{ "term": { "value": "aaa" } }
]
}
},
{
"bool": {
"must": [
{ "term": { "tag": "t2" } },
{ "term": { "value": "ddd" } }
]
}
}
],
"minimum_should_match": "1"
}
},
"size": 0,
"aggs": {
"find_joined_oid": {
"terms": {
"field": "oid.keyword"
}
}
}
}
If everything goes right, this will output something like
{
"took": 123,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 123,
"max_score": 0,
"hits": []
},
"aggregations": {
"find_joined_oid": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 1
},
{
"key": "2",
"doc_count": 2
}
}
}
}
Here, in the "aggregations" part,
"key": "1"
means your "oid":"1", and
"doc_counts": 1
means there is 1 hit in query with "oid":"1".
As you know how many tags you are querying to match, say N, in the aggregations result body, only those "key"s with "doc_count" equal to N are the result you're pursuing. In this example, you are querying tag:t1 (with value aaa) and tag:t2 (with value ddd), thus N=2. You can iterate in the result bucket list to find out those "key"s who have "doc_count" equal to 2.
However, there should be a better way. If you would alter your mapping to a document like style, ie. store all fields of one oid in one doc, life will be much easier.
{
"properties": {
"oid": { "type": "integer" },
"tag-1": { "type": "text" }
"value-1": { "type": "text" },
"tag-2": { "type": "text" }
"value-2": { "type": "text" }
}
}
When you want to add new tag-value pairs, just get the original doc with oid concerned, put new tag-pair into the doc, and put the whole new doc back into Elasticsearch with the same _id which you get from the original one. Most of the time dynamic mapping will work properly in your case, which means you don't need to assert mapping for new fields explicitly.
No-SQL databases like Elasticsearch and others are not designed to handle such SQL style query you are asking.
Hello For certain requirement i have made all the index not_analyzed
{
"template": "*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"my_template": {
"match_mapping_type": "string",
"mapping": {
"index": "not_analyzed"
}
}
}
]
}
}
}
But now as per our requirement i have to make certain field as analyzed . and keep rest of the field as not analyzed
My Data is of type :
{ "field1":"Value1",
"field2":"Value2",
"field3":"Value3",
"field4":"Value3",
"field5":"Value4",
"field6":"Value5",
"field7":"Value6",
"field8":"",
"field9":"ce-3sdfa773-7sdaf2-989e-5dasdsdf",
"field10":"12345678",
"field11":"ertyu12345ffd",
"field12":"A",
"field13":"Value7",
"field14":"Value8",
"field15":"Value9",
"field16":"Value10",
"field17":"Value11",
"field18":"Value12",
"field19":{
"field20":"Value13",
"field21":"Value14"
},
"field22":"Value15",
"field23":"ipaddr",
"field24":"datwithtime",
"field25":"Value6",
"field26":"0",
"field20":"0",
"field28":"0"
}
If i change my template as per recommendation to something like this
{
"template": "*",
"mappings": {
"_default_": {
"properties": {
"filed6": {
"type": "string",
"analyzer": "keyword",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}}},
"dynamic_templates": [
{
"my_template": {
"match_mapping_type": "*",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
Then i get error while i insert data stating
{"error":"MapperParsingException[failed to parse [field19]]; nested: ElasticsearchIllegalArgumentException[unknown property [field20 ]]; ","status":400}
In short you want to change the mapping of your index.
If your index does not contain any data(which I suppose, is not the
case), then you can simply delete the index and create it again with
new mapping.
If your index contains data, you will have to reindex it.
Steps for reindexing:
Put all data from existing index to dummy index.
Delete existing index. Generate new mapping.
Transfer data from dummy index to newly created index.
You can also give a look to elastic search alias here
This link might also be useful.
If you want to use the same field as analysed and not analysed at the same time you have to use multifield using
"title": {
"type": "multi_field",
"fields": {
"title": { "type": "string" },
"raw": { "type": "string", "index": "not_analyzed" }
}
}
This is for your reference.
For defining multifield in dynamic_templates use:
{
"template": "*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"my_template": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
]
}
}
}
Refer this for more info on this.
You can either write multiple templates or have separate properties depending on your requirements. Both will work smoothly.
1) Multiple Templates
{
"mappings": {
"your_doctype": {
"dynamic_templates": [
{
"analyzed_values": {
"match_mapping_type": "*",
"match_pattern": "regex",
"match": "title|summary",
"mapping": {
"type": "string",
"analyzer": "keyword"
}
}
},
{
"date_values": {
"match_mapping_type": "date",
"match": "*_date",
"mapping": {
"type": "date"
}
}
},
{
"exact_values": {
"match_mapping_type": "*",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
Here title and summary are analyzed by keyword analyzer. I have also added date field which is optional, it will map create_date etc as date. Last one will match anything and it will not_analyzed which will fulfill your requirements.
2) Add analyzed field as properties.
{
"mappings": {
"your_doctype": {
"properties": {
"title": {
"type": "string",
"analyzer": "keyword"
},
"summary": {
"type": "string",
"analyzer": "keyword"
}
},
"dynamic_templates": [
{
"any_values": {
"match_mapping_type": "*",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
Here title and summary fields are analyzed while rest will be not_analyzed
You would have to reindex the data no matter which solution you take.
EDIT 1 : After looking at your data and mapping, there is one slight problem in it. Your data contains object structure and you are mapping everything apart from filed6 as string and filed19 is an Object and not string and hence ES is throwing the error. The solution is to let ES decide which datatype the field is with dynamic_type. Change your mapping to this
{
"template": "*",
"mappings": {
"_default_": {
"properties": {
"filed6": {
"type": "string",
"analyzer": "keyword",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
},
"dynamic_templates": [
{
"my_template": {
"match_mapping_type": "*",
"mapping": {
"type": "{dynamic_type}", <--- this will decide if field is either object or string.
"index": "not_analyzed"
}
}
}
]
}
}
}
Hope this helps!!