Merge two Json Schemas - jsonschema

I am new to JSON and JSON schema validation.
I have the following schema to validate a single employee object:
{
"$schema":"http://json-schema.org/draft-03/schema#",
"title":"Employee Type Schema",
"type":"object",
"properties":
{
"EmployeeID": {"type": "integer","minimum": 101,"maximum": 901,"required":true},
"FirstName": {"type": "string","required":true},
"LastName": {"type": "string","required":true},
"JobTitle": {"type": "string"},
"PhoneNumber": {"type": "string","required":true},
"Email": {"type": "string","required":true},
"Address":
{
"type": "object",
"properties":
{
"AddressLine": {"type": "string","required":true},
"City": {"type": "string","required":true},
"PostalCode": {"type": "string","required":true},
"StateProvinceName": {"type": "string","required":true}
}
},
"CountryRegionName": {"type": "string"}
}
}
and I have the following schema to validate an array of the same employee object:
{
"$schema": "http://json-schema.org/draft-03/schema#",
"title": "Employee set",
"type": "array",
"items":
{
"type": "object",
"properties":
{
"EmployeeID": {"type": "integer","minimum": 101,"maximum": 301,"required":true},
"FirstName": {"type": "string","required":true},
"LastName": {"type": "string","required":true},
"JobTitle": {"type": "string"},
"PhoneNumber": {"type": "string","required":true},
"Email": {"type": "string","required":true},
"Address":
{
"type": "object",
"properties":
{
"AddressLine": {"type": "string","required":true},
"City": {"type": "string","required":true},
"PostalCode": {"type": "string","required":true},
"StateProvinceName": {"type": "string","required":true}
}
},
"CountryRegionName": {"type": "string"}
}
}
}
Can you please show me how to merge them so that way I can use one single schema to validate both single employee object or an entire collection. Thanks.

(Note: this question was also asked on the JSON Schema Google Group, and this answer is adapted from there.)
With "$ref", you can have something like this for your array:
{
"type": "array",
"items": {"$ref": "/schemas/path/to/employee"}
}
If you want something to be an array or a single item, then you can use "oneOf":
{
"oneOf": [
{"$ref": "/schemas/path/to/employee"}, // the root schema, defining the object
{
"type": "array", // the array schema.
"items": {"$ref": "/schemas/path/to/employee"}
}
]
}
The original Google Groups answer also contains some advice on using "definitions" to organise schemas so all these variants can exist in the same file.

Related

Avro Schema: multiple records reference same data type issue: Unknown union branch

I have Avro Schema: customer record import the CustomerAddress subset.
[
{
"type": "record",
"namespace": "com.example",
"name": "CustomerAddress",
"fields": [
{ "name": "address", "type": "string" },
{ "name": "city", "type": "string" },
{ "name": "postcode", "type": ["string", "int"] },
{ "name": "type","type": {"type": "enum","name": "type","symbols": ["POBOX","RESIDENTIAL","ENTERPRISE"]}}
]
},
{
"type": "record",
"namespace": "com.example",
"name": "Customer",
"fields": [
{ "name": "first_name", "type": "string" },
{ "name": "middle_name", "type": ["null", "string"], "default": null },
{ "name": "last_name", "type": "string" },
{ "name": "age", "type": "int" },
{ "name": "height", "type": "float" },
{ "name": "weight", "type": "float" },
{ "name": "automated_email", "type": "boolean", "default": true },
{ "name": "customer_emails", "type": {"type": "array","items": "string"},"default": []},
{ "name": "customer_address", "type": "com.example.CustomerAddress" }
]
}
]
i have JSON payload:
{
"Customer" : {
"first_name": "John",
"middle_name": null,
"last_name": "Smith",
"age": 25,
"height": 177.6,
"weight": 120.6,
"automated_email": true,
"customer_emails": ["ning.chang#td.com", "test#td.com"],
"customer_address":
{
"address": "21 2nd Street",
"city": "New York",
"postcode": "10021",
"type": "RESIDENTIAL"
}
}
}
when i runt the command: java -jar avro-tools-1.8.2.jar fromjson --schema-file customer.avsc customer.json
got the following exception:
Exception in thread "main" org.apache.avro.AvroTypeException: Unknown union branch Customer
In your JSON data you use the key Customer but you have to use the fully qualified name. So it should be com.example.Customer.

Export BigQuery table schema to JSON Schema

It is possible to export a bigquery table schema to a JSON file but the resulting JSON file is a bigquery table schema and not a JSON schema.
I am looking for a way to generate a JSON schema using a bigquery table based on the standard available here: https://json-schema.org/
This looks something like this:
{
"definitions": {},
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://example.com/root.json",
"type": "object",
"title": "The Root Schema",
"required": [
"glossary"
],
"properties": {
"glossary": {
"$id": "#/properties/glossary",
"type": "object",
"title": "The Glossary Schema",
"required": [
"title",
"GlossDiv"
],
"properties": {
"title": {
"$id": "#/properties/glossary/properties/title",
"type": "string",
"title": "The Title Schema",
"default": "",
"examples": [
"example glossary"
],
"pattern": "^(.*)$"
},
"GlossDiv": {
"$id": "#/properties/glossary/properties/GlossDiv",
"type": "object",
"title": "The Glossdiv Schema",
"required": [
"title",
"GlossList"
],
"properties": {
"title": {
"$id": "#/properties/glossary/properties/GlossDiv/properties/title",
"type": "string",
"title": "The Title Schema",
"default": "",
"examples": [
"S"
],
"pattern": "^(.*)$"
},
"GlossList": {
"$id": "#/properties/glossary/properties/GlossDiv/properties/GlossList",
"type": "object",
"title": "The Glosslist Schema",
"required": [
"GlossEntry"
],
"properties": {
"GlossEntry": {
"$id": "#/properties/glossary/properties/GlossDiv/properties/GlossList/properties/GlossEntry",
"type": "object",
"title": "The Glossentry Schema",
"required": [
"ID",
"SortAs",
"GlossTerm",
"Acronym",
"Abbrev",
"GlossDef",
"GlossSee"
],
"properties": {
"ID": {
"$id": "#/properties/glossary/properties/GlossDiv/properties/GlossList/properties/GlossEntry/properties/ID",
"type": "string",
"title": "The Id Schema",
"default": "",
"examples": [
"SGML"
],
"pattern": "^(.*)$"
},
"SortAs": {
"$id": "#/properties/glossary/properties/GlossDiv/properties/GlossList/properties/GlossEntry/properties/SortAs",
"type": "string",
"title": "The Sortas Schema",
"default": "",
"examples": [
"SGML"
],
"pattern": "^(.*)$"
},
"GlossTerm": {
"$id": "#/properties/glossary/properties/GlossDiv/properties/GlossList/properties/GlossEntry/properties/GlossTerm",
"type": "string",
"title": "The Glossterm Schema",
"default": "",
"examples": [
"Standard Generalized Markup Language"
],
"pattern": "^(.*)$"
},
"Acronym": {
"$id": "#/properties/glossary/properties/GlossDiv/properties/GlossList/properties/GlossEntry/properties/Acronym",
"type": "string",
"title": "The Acronym Schema",
"default": "",
"examples": [
"SGML"
],
"pattern": "^(.*)$"
},
"Abbrev": {
"$id": "#/properties/glossary/properties/GlossDiv/properties/GlossList/properties/GlossEntry/properties/Abbrev",
"type": "string",
"title": "The Abbrev Schema",
"default": "",
"examples": [
"ISO 8879:1986"
],
"pattern": "^(.*)$"
},
"GlossDef": {
"$id": "#/properties/glossary/properties/GlossDiv/properties/GlossList/properties/GlossEntry/properties/GlossDef",
"type": "object",
"title": "The Glossdef Schema",
"required": [
"para",
"GlossSeeAlso"
],
"properties": {
"para": {
"$id": "#/properties/glossary/properties/GlossDiv/properties/GlossList/properties/GlossEntry/properties/GlossDef/properties/para",
"type": "string",
"title": "The Para Schema",
"default": "",
"examples": [
"A meta-markup language, used to create markup languages such as DocBook."
],
"pattern": "^(.*)$"
},
"GlossSeeAlso": {
"$id": "#/properties/glossary/properties/GlossDiv/properties/GlossList/properties/GlossEntry/properties/GlossDef/properties/GlossSeeAlso",
"type": "array",
"title": "The Glossseealso Schema",
"items": {
"$id": "#/properties/glossary/properties/GlossDiv/properties/GlossList/properties/GlossEntry/properties/GlossDef/properties/GlossSeeAlso/items",
"type": "string",
"title": "The Items Schema",
"default": "",
"examples": [
"GML",
"XML"
],
"pattern": "^(.*)$"
}
}
}
},
"GlossSee": {
"$id": "#/properties/glossary/properties/GlossDiv/properties/GlossList/properties/GlossEntry/properties/GlossSee",
"type": "string",
"title": "The Glosssee Schema",
"default": "",
"examples": [
"markup"
],
"pattern": "^(.*)$"
}
}
}
}
}
}
}
}
}
}
}
BigQuery does not use the json-schema standard for the tables schema. I found two projects that have the code available to go from json-schema to BigQuery schema:
jsonschema-bigquery
jsonschema-transpiler
You could try using those projects as reference to create the opposite transformation. Also, you could create a feature request to the BigQuery team, asking to include the json-schema standard as an output format option.
No this is not possible without writing a program to do so for you.
There is a feature request made by me that requests this functionality.
https://issuetracker.google.com/issues/145308573

When using JSON Schema for validation, it is not recursively validate child entities?

When using the following schema
{
"$schema": "http://json-schema.org/schema#",
"definitions":{
"entity": {
"type": "object",
"properties": {
"parent": {"type": ["null","string"]},
"exclude": {"type": "boolean"},
"count": {"type": ["null","integer"]},
"EntityType": {"type": "string"},
"Children": {
"type": "array",
"items": {"$ref":"#/definitions/entity"}
}
}
}
},
"required": ["parent","EntityType","count"]
}
On this provided body of JSON
{
"parent": "null",
"EntityType": "test",
"count": "null",
"Children": [
{
"EntityType": "test",
"count": 3
},
{
"EntityType": "test"
}
],
"Extra": "somevalue"
}
It should be returning that I have provided an invalid Json object, however it does not seem to be doing so.
That said, if I were to have the root node not succeed (by removing one of the required fields) the validation works and says that I haven't provided a required field. Is there a reason that I am not able to validate the json recursively?
It looks like you want parent, EntityType, and count to be required properties of the entity definition. However they're only required at the root, not the entity level. I would suggest that you move the required keyword into the entity definition, then reference the definition as part of an allOf to ensure the root is compliant.
{
"$schema": "http://json-schema.org/schema#",
"definitions":{
"entity": {
"type": "object",
"properties": {
"parent": {"type": ["null","string"]},
"exclude": {"type": "boolean"},
"count": {"type": ["null","integer"]},
"EntityType": {"type": "string"},
"Children": {
"type": "array",
"items": {"$ref":"#/definitions/entity"}
}
},
"required": ["parent","EntityType","count"]
}
},
"allOf": [{"$ref": "#/definitions/entity"}]
}

how to get type reference from value in json schema

I've got a json schema and I have 3 types of media, caption, image, and avatar.
Each of these media types has a different structure, so I'm using $ref and oneOf to specify which are valid options.
However, I can't figure out how to specify which ref to use based on a the value of a sibling.
My schema looks like this
const mediaSchema = {
"type": "object",
"required": ["mediaType", "content", "points"],
"properties":{
"mediaType": {"type":"string", "pattern": "^(image|avatar|caption)$"},
"content": {
"oneOf": [
{"$ref":"#/definitions/image"},
{"$ref": "#/definitions/caption"},
{"$ref": "#/definitions/avatar"}
],
}
},
"definitions": {
"caption":
{"type": "object",
"required": ["text"],
"properties": {
"text": {"type": "string"},
"fontSize": {"type": "string", "pattern": "^[0-9]{1,3}px$"}
}
},
"image": {"type": "string", "format": "url"},
"avatar":
{"type": "object",
"properties": {
"name": {"type": "string"},
"image": {"type": "string", "format":"url"}
}
}
}
}
and when I define an avatar like
mediaItem = {
"mediaType":"avatar",
"content": {
"name": "user name",
"avatar": "https://urlToImage
}
}
it should be valid, but if I define an avatar as
mediaItem = {
"mediaType": "avatar",
"content": "https://urlToImage"
}
it should throw an error as that is not valid for a media type of avatar.
You are on the right track, but you should put the oneOf dispatcher to the root of the schema, and define the "content" with 3 separate constants as a discriminator, like this:
{
"oneOf": [
{
"type": "object",
"properties": {
"mediaType": {
"const": "avatar"
},
"content": { "$ref": "#/definitions/avatar" }
},
"required": ["mediaType", "content"]
},
// ...
],
"definitions": {
// ...
}
}
Note: the "const" keyword exists only in the latest version of json schema (draft6). It may happen that the validator implementation you use doesn't support it yet. In that case you can replace "const": "avatar" with a single-element enum like "enum": ["avatar"]

JSON Schema - bq load errors

This file is part json used:
{"body1": {"posts": {"children": [{"row": {"acceptedanswerid": "26", "answercount": "5", "body": "<p>Now that the Engineer update has come, there will be lots of Engineers building up everywhere. How should this best be handled?</p>\n", "commentcount": "7", "creationdate": "2010-07-07T19:06:25.043", "id": "1", "lastactivitydate": "2010-08-27T22:38:43.840", "lasteditdate": "2010-08-27T22:38:43.840", "lasteditordisplayname": "", "lasteditoruserid": "56", "owneruserid": "11", "posttypeid": "1", "score": "10", "tags": "<strategy><team-fortress-2><tactics>", "title": "In Team Fortress 2, what is a good strategy to deal with lots of engineers turtling on the other team?", "viewcount": "1166"}}, {"row": {"acceptedanswerid": "184", "answercount": "3", "body": "<p>I know I can create a Warp Gate and teleport to Pylons, but I have no idea how to make Warp Prisms or know if there's any other unit capable of transporting.</p>\n\n<p>I would in particular like this to built remote bases in 1v1</p>\n", "commentcount": "2", "creationdate": "2010-07-07T19:07:58.427", "id": "2", "lastactivitydate": "2010-07-08T00:21:13.163", "lasteditdate": "2010-07-08T00:16:46.013", "lasteditordisplayname": "", "lasteditoruserid": "68", "owneruserid": "10", "posttypeid": "1", "score": "5", "tags": "<starcraft-2><how-to><protoss>", "title": "What protoss unit can transport others?", "viewcount": "398"}}]}}}
This is the schema used:
{
"name":"body1", "type": "STRING",
"name":"posts", "type": "STRING",
"name":"children", "type":"RECORD",
"fields": [
{"name": "row", "type": "STRING"},
{"name": "acceptedanswerid", "type": "STRING"},
{"name": "answercount", "type": "STRING"},
{"name": "body", "type": "STRING"},
{"name": "commentcount", "type": "STRING"},
{"name": "creationdate", "type": "STRING"},
{"name": "id", "type": "string"},
{"name": "lasteditdate", "type": "integer"},
{"name": "lasteditordisplayname", "type": "string"},
{"name": "lasteditoruserid", "type": "string"},
{"name": "owneruserid", "type": "string"},
{"name": "posttypeid", "type": "string"},
{"name": "score", "type": "string"},
{"name": "tags", "type": "string"},
{"name": "title", "type": "string"},
{"name": "viewcount", "type": "string"}
]
}
The problem is in the implementation of the scheme. But I didn't find the detailed scheme to build the model. Anyone can help me?
Following the suggestion of Gil, I modified your original design for this valid json:
{
"name":"body1", "type": "RECORD",
"fields": [
{"name":"posts", "type": "RECORD",
"fields": [
{"name":"children", "type": "RECORD",
"fields": [
{"name": "row", "type": "STRING"},
{"name": "acceptedanswerid", "type": "STRING"},
{"name": "answercount", "type": "STRING"},
{"name": "body", "type": "STRING"},
{"name": "commentcount", "type": "STRING"},
{"name": "creationdate", "type": "STRING"},
{"name": "id", "type": "string"},
{"name": "lasteditdate", "type": "integer"},
{"name": "lasteditordisplayname", "type": "string"},
{"name": "lasteditoruserid", "type": "string"},
{"name": "owneruserid", "type": "string"},
{"name": "posttypeid", "type": "string"},
{"name": "score", "type": "string"},
{"name": "tags", "type": "string"},
{"name": "title", "type": "string"},
{"name": "viewcount", "type": "string"}
]}]}]}
The bq command return:
File: 0 / Offset:0 / Line:1 / Column:8 / Field:body1: no such field
Looking at the raw data you've provided, it looks like "children" is a child of "posts", which in turn is a child of "body1" - meaning that everything is nested, and not 3 fields in the same hierarchy as you've described.
You should create your schema to reflect this, e.g. (not tested):
{
"name":"body1", "type": "RECORD"
"fields": [
"name":"posts", "type": "RECORD"
"fields": [
"name":"children", "type": "RECORD"
"fields": [
{"name": "row", "type": "STRING"},
{"name": "acceptedanswerid", "type": "STRING"},
{"name": "answercount", "type": "STRING"},
{"name": "body", "type": "STRING"},
{"name": "commentcount", "type": "STRING"},
{"name": "creationdate", "type": "STRING"},
{"name": "id", "type": "string"},
{"name": "lasteditdate", "type": "integer"},
{"name": "lasteditordisplayname", "type": "string"},
{"name": "lasteditoruserid", "type": "string"},
{"name": "owneruserid", "type": "string"},
{"name": "posttypeid", "type": "string"},
{"name": "score", "type": "string"},
{"name": "tags", "type": "string"},
{"name": "title", "type": "string"},
{"name": "viewcount", "type": "string"}
]
]
]
}
EDIT 1
OK, I took your input example and ran it through a schema generator (https://github.com/tottokug/BigQuerySchemaGenerator), and it gave:
[
{
"name": "body1",
"type": "RECORD",
"fields": [
{
"name": "posts",
"type": "RECORD",
"fields": [
[
{
"name": "row",
"type": "RECORD",
"fields": [
{
"name": "acceptedanswerid",
"type": "STRING"
},
{
"name": "answercount",
"type": "STRING"
},
{
"name": "body",
"type": "STRING"
},
{
"name": "commentcount",
"type": "STRING"
},
{
"name": "creationdate",
"type": "STRING"
},
{
"name": "id",
"type": "STRING"
},
{
"name": "lastactivitydate",
"type": "STRING"
},
{
"name": "lasteditdate",
"type": "STRING"
},
{
"name": "lasteditordisplayname",
"type": "STRING"
},
{
"name": "lasteditoruserid",
"type": "STRING"
},
{
"name": "owneruserid",
"type": "STRING"
},
{
"name": "posttypeid",
"type": "STRING"
},
{
"name": "score",
"type": "STRING"
},
{
"name": "tags",
"type": "STRING"
},
{
"name": "title",
"type": "STRING"
},
{
"name": "viewcount",
"type": "STRING"
}
]
}
],
[
{
"name": "row",
"type": "RECORD",
"fields": [
{
"name": "acceptedanswerid",
"type": "STRING"
},
{
"name": "answercount",
"type": "STRING"
},
{
"name": "body",
"type": "STRING"
},
{
"name": "commentcount",
"type": "STRING"
},
{
"name": "creationdate",
"type": "STRING"
},
{
"name": "id",
"type": "STRING"
},
{
"name": "lastactivitydate",
"type": "STRING"
},
{
"name": "lasteditdate",
"type": "STRING"
},
{
"name": "lasteditordisplayname",
"type": "STRING"
},
{
"name": "lasteditoruserid",
"type": "STRING"
},
{
"name": "owneruserid",
"type": "STRING"
},
{
"name": "posttypeid",
"type": "STRING"
},
{
"name": "score",
"type": "STRING"
},
{
"name": "tags",
"type": "STRING"
},
{
"name": "title",
"type": "STRING"
},
{
"name": "viewcount",
"type": "STRING"
}
]
}
]
]
}
]
}
]
Does this work?