Just started with jsonschema. I want to describe a collection of objects for a property where the key may not be known a priori.
Here is my starting point:
"tx_properties": {
"type": "object",
"anyOf": [
{
"required": [
"original_msg"
]
}
],
"properties": {
"original_msg": {
"type": "string"
}
}
}
}
I want to be able to validate the additions of more properties for tx_properties that may have different types but are not known at schema definition time.
For example I might have, in json:
"tx_properties": {
"original_msg": "foo",
"something_else": "bar",
"or_something_numeric": 172,
"or_even_deeper_things": {
"fungible": false,
}
}
As a n00b I'm a bit stuck on how to accomplish this. The use of anyOne is what I thought I needed at least in the final solution.
As #Evert said, "additionalProperties": false can be used to indicate that no other properties other than those listed in properties keywords (and patternProperties) are permitted. If additionalProperties is omitted, the behaviour is as if the schema said "additionalProperties": true (that is, additional properties are permitted).
Also note that the value at additionalProperties doesn't have to be a boolean: it can be a subschema itself, to allow you to conditionally allow for additional properties depending on their value(s).
Reference: https://json-schema.org/understanding-json-schema/reference/object.html#additional-properties
Related
When a property has const or enum, what are the benefits or downsides of proving type too?
{
"type": "object",
"properties": {
"has_car": {
"title": "Do you have a car?",
"enum": ["yes", "no"],
"type": "string",
"$comment": "Do I need type here?"
},
"car_brand": {
"title": "What's your car brand?",
"type": "string"
},
"terms": {
"title": "I accept my car terms",
"const": "acknowledged",
"$comment": "Do I need type here?"
}
}
}
There really is no benefit. const and enum verify exact values, so type adds nothing, as you suspect.
No it is not required because keywords are applied separately. Depending upon what is using the schema it can help to define both. For code generation, adding type helps because one can limit the allowed input types with that info. So that is a potential benefit.
One downside is that your schema definition is larger, and that if new enum values are added with different types are added, those new types need to be added also.
I have an existing JSON data feed between multiple systems which I do not control and cannot change. I have been tasked with writing a schema for this feed. The existing JSON looks in part like this:
"ids": [
{ "type": "payroll", "value": "011808237" },
{ "type": "geid", "value": "31826" }
]
When I try to define a JSON schema for this, I wind up with a scrap of schema that looks like this:
"properties": {
"type": { <====================== PROBLEM!!!!
"type": "string",
"enum": [ "payroll", "geid" ]
},
"value": {
"type": [ "string", "null" ],
"pattern": "^[0-9]*$"
}
}
As you might guess, when the JSON validator hits that "type" on the line marked "PROBLEM!!!" it gets upset and throws an error about how type needs to be a string or array.
That's a bug in the particular implementation you're using, and should be reported as such. It should be able to handle properties-that-look-like-keywords just fine. In fact, the metaschema (the schema to valid schemas) uses "type" in exactly this way, along with all the other keywords too: e.g. http://json-schema.org/draft-07/schema
I wonder if it is not making use of the official test suite (https://github.com/json-schema-org/JSON-Schema-Test-Suite)?
You didn't indicate what implementation you're using, or what language, but perhaps you can find an alternative one here: https://json-schema.org/implementations.html#validators
I found at least a work-around if not a proper solution. Instead of "type" inside "properties", use "^type$" inside "patternProperties". I.e.
"patternProperties": {
"^type$": {
"type": "string"
}
}
Unfortunately there doesn't seem to be a great way to make "^type$" a required property. I've settled for listing all the other properties as "required" and setting the min and max property counts to the number that should be there.
I'm going through the docs to try and figure out how loops work so I can validate every object of an array of objects match the schema.
It seems like recursion is what I want but the example given doesn't work: https://json-schema.org/understanding-json-schema/structuring.html
I'm trying to validate that example but its always "valid". I tried changing all the field names in the JSON and it doesn't matter:
Not sure what's happening. For this example how would I validate every child matches the person schema (without statically writing out each one in the schema).
For example, I want to valid this JSON. there could be any number of objects under toplevel and any number of objects under "objectsList". I want to make sure every object under "objectsList" has the right field names and types (again without hard coding the entire thing in the schema):
{
"toplevel": {
"objectOne": {
"objectsList": [
{
"field1": 1231,
"field2": "sekfjlskjflsdf",
"field3": ["ssss","eeee"],
},
{
"field1": 11,
"field2": "sef",
"field3": ["eeee","qqqq"],
},
{
"field1": 1231,
"field2": "wwwww",
"field3": ["sisjflkssss","esdfsdeee"],
},
]
},
"objectTwo": {
"objectsList": [
{
"field1": 99999,
"field2": "yuyuyuyuyu",
"field3": ["ssssuuu","eeeeeee"],
},
{
"field1": 221,
"field2": "vesdlkfjssef",
"field3": ["ewerweeee","ddddq"],
},
]
},
}
}
What's wrong?
The problem here is not the recursion – your schema looks good.
The underlying issue is the same as here: https://stackoverflow.com/a/61038256/5127499
JSON Schema is designed for extensibility. That means it allows any kind of additional properties to be added as long as they are not conflicting with the known/expected keywords.
Solution
The solution here is to add "additionalProperties": false in your "person" (from the screenshot) and top-level schema to prevent those incorrect objects to be accepted. Same goes for your second example: in any definitions of "type": "object" you'd have to add "additionalProperties": false if you don't want to allow these extraneous properties to be defined.
Alternatively, you can declare your expected properties as required to ensure that at least those are present.
Why?
As per json-schema.org/understanding-json-schema (emphasis mine):
The additionalProperties keyword is used to control the handling of extra stuff, that is, properties whose names are not listed in the properties keyword. By default any additional properties are allowed.
The additionalProperties keyword may be either a boolean or an object. If additionalProperties is a boolean and set to false, no additional properties will be allowed.
To address the screenshot you posted and why the instance passes:
The schema is looking to find a person property, but that property doesn't exist.
The schema does not declare that person is required.
The schema does not declare requirements on undefined properties, so it will always accept the personsdfsd property with whatever value is in it, without checking it further.
So in short, your JSON data is bad and your schema doesn't have any protections against that.
Other than that, your schema looks good. It should validate that items in the children property match the person definition's subschema.
I am struggling a bit with how the appsettings.Development.json overrides or otherwise merges with the appsettings.json. I am not sure how to "clear" a node out of appsettings.json by using the appsettings.Development.json file.
For reference, I am using the default builder as seen here https://github.com/aspnet/MetaPackages/blob/rel/2.0.0-preview1/src/Microsoft.AspNetCore/WebHost.cs#L159-L160
appsettings.json
{
"Policy": {
"roles": [
{
"name": "inventoryAdmin",
"subjects": [ "bob", "alice" ],
"identityRoles": [ "ActiveDirectory-Role-Manager" ]
},
]
}
}
Given that example, why can I not do the following in my:
appsettings.Development.Json:
{ "Policy": { "Roles": [] } }
or
{ "Policy": { "Roles": null } }
When I check the output via something like Configuration.Get<PolicyServer.Local.Policy>().Roles I still get 3 roles back.
This question is hopefully going to guide me on how I can override a node and not just clear it. So I am hoping to start simple and work my way there.
All of the settings that go into your IConfiguration instance are simply key-value pairs. Take the following, simplified example JSON:
{
"Roles": [
{ "Name": "Role1", "Subjects": [ "Alice", "Bob" ] },
{ "Name": "Role2", "Subjects": [ "Charlie" ] }
]
}
Although this is essentially a tree structure, it maps into the following key-value pairs when added to your IConfiguration instance (there are some additional empty values here, but they're not part of this discussion):
Roles =
Roles:0:Name = Role1
Roles:0:Subjects:0 = Alice
Roles:0:Subjects:1 = Bob
Roles:1:Name = Role2
Roles:1:Subjects:0 = Charlie
You can see that this mimics the hierarchy of your JSON, where the names are object properties and the numbers are indexes into arrays. That first one is important: There's a key of Roles which has no value, because values can only be simple strings and its just a parent in itself.
Now, when you add an extra JSON file to the IConfiguration instance setup, it maps to a new set of key-value pairs that get applied on top of those that exist. Take the following additional JSON:
{
"Roles": []
}
This simply overwrites the existing Roles key and sets it to, well, the same value it already has: nothing. The same applies if you use null in your JSON file - that's just how this stuff works.
In terms of a solution here, I suggest seeing if you can rework your appsettings.json approach. For example, you might be able to put the role configuration itself into e.g. an appsettings.Production.json file and leave the default version blank so that it doesn't exist in your development environment. In other words, try and model your different appsettings.json files to be additive themselves.
I store different kinds of documents in a single index with strict predefined mapping. All of them have some field (say, "body"), but I'd want them to be analyzed slightly differently when indexed (for example, to use different token filters for specific documents) and treaten the same way while searched. As far as I know, analyzers can't be specified per document.
What I also considered to use:
Object fields with differently analyzed subfields for document kinds, so each document has only one filled subfield (like, "body.mail", "body.html"). The problem is that I couldn't search on the whole "body" field which would look through all its subfields (to not break the existing application).
New reincarnation of multi-fields (to have "body" field with a generic analyzer and custonly analyzed "mail", "html", etc. inside it). Hovewer, I'm not sure if it's possible to use them directly while indexing and indirectly while searching (e.g., to save object with {"mail":"smth"} to use a specific index analyzer, then search by "query":{"body":"smth"} to use generic search analyzer).
To separate "body" into several fields with different mappings, remove them from _all, and set copy_to to a single body field. I'm not sure, but it will add a substantial index overhead due to copying.
As I mentioned in the comments, what you want is not possible. Your requirement, in one sentence, is: have the same data analyzed in multiple ways, but searched as a single field because this would break the existing application.
-- body.html
-- body.email
body field ---- body.content --- all searched as "body"
...
-- body.destination
-- body.whatever
Your first option is multi-fields which has this exact purpose in mind: have the same data analyzed multiple ways. The problem is that you cannot search for "body" and expect ES to search body.html, body.email... Even if this would be possible, you want to be searched with different analyzers. Again, not possible. This option requires you to change the application and search for each field in a multi_match or in a query_string.
Your second option - reincarnation of multi-fields - will again not work because you cannot refer to body and ES, in the background, to match mail, content etc.
Third option - using copy_to - will not work because copying to another field "X" means indexing the data being copied will be analyzed with X's analyzer, and this breaks your requirement of having the same data analyzed differently.
There could be a fourth option - "path": "just_name" from multi_fields - which at a first look it should work. Meaning, you can have 3 multi-fields (email, content, html) which all three have a body sub-field. Having "path": "just_name" allows you to search just for body even if body is a sub-field of multiple other fields. But this is not possible because this type of multi-fields will not accept different analyzers for the same body.
Either way, you need to change something in your requirements, because they will not work they way you want it.
These being said, I'm curious to see what queries are you using in your application. It would be a simple change (yes, you will need to change your app) from querying body field to querying body.* in a multi_match.
And I have another solution for you: create multiple indices, one index for each analyzer of your body. For example, for mail, content and html you define three indices:
PUT /multi_fields1
{
"mappings": {
"test": {
"properties": {
"body": {
"type": "string",
"index_analyzer": "whitespace",
"search_analyzer": "standard"
}
}
}
}
}
PUT /multi_fields2
{
"mappings": {
"test": {
"properties": {
"body": {
"type": "string",
"index_analyzer": "standard",
"search_analyzer": "standard"
}
}
}
}
}
PUT /multi_fields3
{
"mappings": {
"test": {
"properties": {
"body": {
"type": "string",
"index_analyzer": "keyword",
"search_analyzer": "standard"
}
}
}
}
}
You see that all of them have the same type and the same field name - body - but different index_analyzers. Then you define an alias:
POST _aliases
{
"actions": [
{"add": {
"index": "multi_fields1",
"alias": "multi"}},
{"add": {
"index": "multi_fields2",
"alias": "multi"}},
{"add": {
"index": "multi_fields3",
"alias": "multi"}}
]
}
Name your alias the same as your current index. The application doesn't need to change, it will use the same name for index search, but this name will not point to an index, but to an alias which in turn refers to your multiple indices. What needs to change is how you index the documents, because a html documents needs to go in multi_fields1 index for example, an email document needs to be index in multi_fields2 index etc.
Whatever solution you find/choose, your requirements need to change because the way you want it is not possible.
I think you can use multi-field. With multi-field you can define analyzers (both indexing & searching) for each sub fields, and do the search on corresponding fields base on applications requirements.
In general, index analyzer can be difference from field to field, the same for search analyzer.
{
"your_type" : {
"properties":{
"body" : {
"type" : "string",
"index" : "analyzed",
"index_analyzer" : "index_body_analyzer",
"search_analyzer" : "search_body_analyzer",
"fields" : {
"mail" : {
"type" : "string",
"index" : "analyzed",
"index_analyzer" : "index_bodymail_analyzer",
"search_analyzer" : "search_bodymail_analyzer"
},
"html": {
"type" : "string",
"index" : "analyzed",
"index_analyzer" : "index_bodyhtml_analyzer",
"search_analyzer" : "search_bodyhtml_analyzer"
}
}
}
}
}