Have one property define the types in another array property, using JSON Schema? - jsonschema

Given this example JSON:
{
"type": "number",
"values": [ 34, 42, 99 ]
}
Is it possible to define JSON Schema that makes sure that the contents of
the values array are of the type specified in another property (in this example type)?
Above type is saying that the array values can only contain integers (using the specifier "number").
Or specify that values contains strings:
{
"type": "string",
"values": [ "hello", "world" ]
}

Yes you can use the "items" keyword. If it has a single value then that value is the schema for every element of the array.
{
"type": "array",
"items": { "type: "string" }
}
Assuming you're using a draft4 schema like most people do, section 8.2.3.1 of the specification states:
8.2.3.1. If "items" is a schema
If items is a schema, then the child instance must be valid against
this schema, regardless of its index, and regardless of the value of
"additionalItems".

Yes, but you will have to write an if/then block for each type you want to support.
The Understanding JSON Schema has a section on if/then/else: http://json-schema.org/understanding-json-schema/reference/conditionals.html
Here is an extract that explains how if/then/else works.
For example, let’s say you wanted to write a schema to handle
addresses in the United States and Canada. These countries have
different postal code formats, and we want to select which format to
validate against based on the country. If the address is in the United
States, the postal_code field is a “zipcode”: five numeric digits
followed by an optional four digit suffix. If the address is in
Canada, the postal_code field is a six digit alphanumeric string where
letters and numbers alternate.
{
"type": "object",
"properties": {
"street_address": {
"type": "string"
},
"country": {
"enum": ["United States of America", "Canada"]
}
},
"if": {
"properties": { "country": { "const": "United States of America" } }
},
"then": {
"properties": { "postal_code": { "pattern": "[0-9]{5}(-[0-9]{4})?" } }
},
"else": {
"properties": { "postal_code": { "pattern": "[A-Z][0-9][A-Z] [0-9][A-Z][0-9]" } }
}
}
For each type you want to support, you would need to write if/then object, and wrap all of them in an allOf.

Related

Validate phone only when provided using Json Schema

Using following JSON schema to validate phone number if provided.
Accepted validation
Min length 10
Max length 20
and Pattern
If phone is null or empty, no validation is required
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"Item": {
"type": "object",
"properties": {
"Phone": {
"anyOf": [
{
"type": "integer",
"minLength": 10,
"maxLength": 20,
"pattern": "^(\\([0-9]{3}\\))?[0-9]{3}-[0-9]{4}$"
},
{
"type": [ "integer", "null" ]
}
]
}
}
}
}
}
Can you please suggest what is missing in the above schema?
Thank you!
Remove integer from the null case. It's slowing so integers through, which overrides the phone number case.
Secondarily, if possible, you may want to use a later draft for your schema. Draft 4 is quite old. Check with your validator to see if it supports a newer draft.
There are errors in your schema, but you're missing the understanding about how JSON Schema works in terms of applicability.
JSON Schema has many keywords that are only applicable to a specific type. When the type is not that of a keywords applicability, it has no effect.
The subschema for "phone" can be simplified as the following:
{
"type": ["string", "null"],
"minLength": 10,
"maxLength": 20,
"pattern": "^(\\([0-9]{3}\\))?[0-9]{3}-[0-9]{4}$"
}
The keywords minLenght, maxLength, and pattern are only applicable to strings. If the value is not a string (and is null), those keywords are not applicable, and so are ignored.
(I've not checked your regex here, just copied what you had already.)

Is it possible to be agnostic on the properties' names?

Let's say I want to have a schema for characters from a superhero comics. I want the schema to validate json objects like this one:
{
"Name": "Roberta",
"Age": 15,
"Abilities": {
"Super_Strength": {
"Cost": 10,
"Effect": "+5 to Strength"
}
}
}
My idea is to do it like that:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "characters_schema.json",
"title": "Characters",
"description": "One of the characters for my game",
"type": "object",
"properties": {
"Name": {
"type": "string"
},
"Age": {
"type": "integer"
},
"Abilities": {
"description": "what the character can do",
"type": "object"
}
},
"required": ["Name", "Age"]
}
And use a second schema for abilities:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "abilities_schema.json",
"title": "Abilities",
"type": "object",
"properties": {
"Cost": {
"description": "how much mana the ability costs",
"type": "integer"
},
"Effect": {
"type": "string"
}
}
}
But I can't figure how to merge Abilities in Characters. I could easily tweak the schema so that it validates characters formatted like:
{
"Name": "Roberta",
"Age": 15,
"Abilities": [
{
"Name": "Super_Strength"
"Cost": 10,
"Effect": "+5 to Strength"
}
]
}
But as I need the name of the ability to be used as a key I don't know what to do.
You need to use the additionalProperties keyword.
The behavior of this keyword depends on the presence and annotation
results of "properties" and "patternProperties" within the same schema
object. Validation with "additionalProperties" applies only to the
child values of instance names that do not appear in the annotation
results of either "properties" or "patternProperties".
https://json-schema.org/draft/2020-12/json-schema-core.html#rfc.section.10.3.2.3
In laymans terms, if you don't define properties or patternProperties the schema value of additionalProperties is applied to all values in the object at that instance location.
Often additionalProperties is only given a true or false value, but rememeber, booleans are valid schema values.
If you have constraints on the keys for the object, you may wish to use patternPoperties followed by additionalProperties: false.

In JSON schema, define and reference a reusable enum type?

I noticed the following: Reusable enum types in json schema , which talks about defining a reusable enum type in JSON schema.
I would have assumed USING this reusable enum type would be trivial, simply specifying (in this case) the value of "MyEnum" for a "type" value.
I don't know if the results from Oxygen XML are authoritative, but I tried something like the following:
{
"$schema": "https://json-schema.org/draft/2019-09/schema#",
"type": "object",
"properties": {
"content": {"$ref": "#/definitions/content_type"}
},
"additionalProperties": false,
"definitions": {
"costCategory_type": {
"type": "object",
"enum": ["VH", "H", "M", "L"]
},
"allowedDevices_type": {
"type": "object",
"properties": {
"costCategory": {
"type": "costCategory_type"
},
On the line near the bottom of this, where I reference "costCategory_type", Oxygen gives me a syntax error, saying
#/definitions/allowedDevices_type/properties/costCategory/type: unknown type: [costCategory_type]
What am I missing?
Yes, the type keyword can only have values from the list null, boolean, object, array, string, number, integer. You can reference definitions with the $ref keyword:
...
"properties": {
"costCategory": {
"$ref": "#/definitions/costCategory_type",
}
}
(incidentally, your definition won't ever evaluate successfully as-is since you define it as being the "object" type, but the list of values in the enum are all strings.)

Valid name for definitions item

Is it correct to make a definition (suppose with name "abc") and then refer to it from an attribute called "abc" whose type is "array"? Or it's incorrect and array and its items have to have different names?
Thanks!
{
"$schema": "http://json-schema.org/draft-07/schema#",
"description": "newSchema.json",
"title": "newSchema",
"type": "object",
"definitions": {
"abc": {
"properties": {
"some_col": {
"description": "hi",
"type": "integer"
}
}
}
},
"properties": {
"abc": {
"type": "array",
"items": {
"$ref": "#/definitions/abc"
}
}
}
}
It's a totally valid JSON structure and JSON Schema setup.
If you intend for others to read your generated schemas, you could add annotations to them to give additional information, such as "This is an array of [table]" and "this object represents a row in [table]".
See the Schema Annotations section of the JSON Schema draft-7 validation specification.

Schema to load JSON to Google BigQuery

Suppose I have the following JSON, which is the result of parsing urls parameters from a log file.
{
"title": "History of Alphabet",
"author": [
{
"name": "Larry"
},
]
}
{
"title": "History of ABC",
}
{
"number_pages": "321",
"year": "1999",
}
{
"title": "History of XYZ",
"author": [
{
"name": "Steve",
"age": "63"
},
{
"nickname": "Bill",
"dob": "1955-03-29"
}
]
}
All the fields in top-level, "title", "author", "number_pages", "year" are optional. And so are the fields in the second level, inside "author", for example.
How should I make a schema for this JSON when loading it to BQ?
A related question:
For example, suppose there is another similar table, but the data is from different date, so it's possible to have different schema. Is it possible to query across these 2 tables?
How should I make a schema for this JSON when loading it to BQ?
The following schema should work. You may want to change some of the types (e.g. maybe you want the dob field to be a TIMESTAMP instead of a STRING), but the general structure should be similar. Since types are NULLABLE by default, all of these fields should handle not being present for a given row.
[
{
"name": "title",
"type": "STRING"
},
{
"name": "author",
"type": "RECORD",
"fields": [
{
"name": "name",
"type": "STRING"
},
{
"name": "age",
"type": "STRING"
},
{
"name": "nickname",
"type": "STRING"
},
{
"name": "dob",
"type": "STRING"
}
]
},
{
"name": "number_pages",
"type": "INTEGER"
},
{
"name": "year",
"type": "INTEGER"
}
]
A related question: For example, suppose there is another similar table, but the data is from different date, so it's possible to have different schema. Is it possible to query across these 2 tables?
It should be possible to union two tables with differing schemas without too much difficulty.
Here's a quick example of how it works over public data (kind of a silly example, since the tables contain zero fields in common, but shows the concept):
SELECT * FROM
(SELECT * FROM publicdata:samples.natality),
(SELECT * FROM publicdata:samples.shakespeare)
LIMIT 100;
Note that you need the SELECT * around each table or the query will complain about the differing schemas.