Subclassing Avro record? - oop

I have two types of AvroRecords that both extend avro.SpecificRecord. Is there a way to make one a subclass of the other in Java? One of them is PersonRecord and the one I would like to be its subclass is EmployeeRecord. The reason I don't want to populate normal Java classes with the avro data is that I am using hadoop and would like to work with the avro files directly if it is possible.
To clarify, it is the polymorphism that I am interested in. I would like to be able to use a function that takes as argument a PersonRecord with an EmployeeRecord.
Thanks!

I think I understand what you're trying to do ("subclass" an Avro record in the Avro Schema definition file) but I don't think it's possible.
Instead, a way to do this would be to have EmployeeRecord have a PersonRecord member nested within it, and then the Employee-specific related info following. For example :
{
"type": "record",
"name": "PersonRecord",
"namespace": "com.yourapp",
"fields": [
{
"name": "first",
"type": "string"
},
{ etc... }
]
}
{
"type": "record",
"name": "EmployeeRecord",
"namespace": "com.yourapp",
"fields": [
{
"name": "PersonInfo",
"type": "PersonRecord"
},
{
"name": "salary",
"type": "int"
},
{ etc... }
]
}

Related

Using $vars within json schema $ref is undefined

While following the documentation for using variables in json schema I noticed the following example fails. It looks like the number-type doesn't get stored as a variable and cannot be read.
{
"$id": "http://example.com/number#",
"type": "object",
"properties": {
"type": {
"type": "string",
"enum": ["natural", "integer"]
},
"value": {
"$ref": "#/definitions/{+number-type}",
"$vars": {
"number-type": {"$ref": "1/type"}
}
}
},
"required": ["type", "value"],
"definitions": {
"natural": {
"type": "integer",
"minimum": 0
},
"integer": {
"type": "integer"
}
}
}
results in
Could not find a definition for #/definitions/{+number-type}
tl;dr $vars is not a JSON Schema keyword. It is an implementation specific extension.
The documentation you link to is not JSON Schema. It is documentation for a specific library which adds a preprocessing step to its JSON Schema processing model.
As such, this would only ever work when using that library, and would not create an interoperable or reuseable JSON Schema, if that's a consideration.
If you are using that library specifically, it sounds like a bug, and you should file an Issue in the appropriate repo. As you haven't provided any code, I can't tell what implementation you are using, so I can't be sure on that.

How do I subclass a JSON schema

How do I subclass in JSON-Schema?
First I restrict myself to draft-07, because that's all I can find implementations of.
The naive way to do sub-classing is described in
https://json-schema.org/understanding-json-schema/structuring.html#extending
But this works poorly with 'additionalProperties': false?
Why bother with
additionalProperties': false?
Without it - nearly any random garbage input json will be considered valid, since all the
'error' (mistaken json) will just be considered 'additionalProperties'.
Recapping https://json-schema.org/understanding-json-schema/structuring.html#extending
use allOf(baseClass)
then add your own properties
The problem with this - is that it doesn't work with 'additionalProperties' (because of
unclear but appantly unfortunate definitions of additionalProperties that it ONLY applies
to locally defined (in that sub-schema) properties, so one or the other schema will fail validation.
Alternative Approaches:
meta languages/interpretters layered on top of JSONSchema
(such as https://github.com/mokkabonna/json-schema-merge-allof)
This is not a good choice as the scehma can only be used from javascript (or the
language of that meta processor). And not easily interoperable with other tools
https://github.com/java-json-tools/json-schema-validator/wiki/v5%3A-merge
An alternative I will propose as a 'solution' / answer
How do I subclass in JSON-Schema?
You don't, because JSON Schema is not object oriented and schemas are not classes. JSON Schema is designed for validation. A schema is a collection of constraints.
But, let's look at it from an OO perspective anyway.
Composition over inheritance
The first thing to note is that JSON Schema doesn't support an analog to inheritance. You might be familiar with the old OO wisdom, "composition over inheritance". The Go language, chooses not to support inheritance at all, so JSON Schema is in good company with that approach. If you build your system using only composition, you will have no issues with "additionalProperties": false.
Polymorphism
Let's say that thinking in terms of composition is too foreign (it takes time to learn to think differently) or you don't have control over how your types are designed. For whatever reason, you need to model your data using inheritance, you can use the allOf pattern you're familiar with. The allOf pattern isn't quite the same as inheritance, but it's the closest you're going to get.
As you've noted, "additionalProperties": false wreaks havoc in conjunction with the allOf pattern. So, why should you leave this out? The OO answer is polymorphism. Let's say you have a "Person" type and a "Student" type that extends "Person". If you have a Student, you should be able to pass it to a method that accepts a Person. It doesn't matter that Student has a few properties that Person doesn't, when it's being used as a Person, the extra properties are simply ignored. If you use "additionalProperties": false, your types can't be polymorphic.
None of this is the kind of solution you are asking for, but hopefully it gives you a different perspective to consider alternatives to solve your problem in different way that is more idiomatic for JSON Schema.
I struggled with that, especially since I had to use legacy versions of JSON Schema. And I found that the solution is a tiny bit verbose but quite easy to read and understand.
Let's say that you want describe that kind of type:
interface Book {
pageCount: number
}
interface Comic extends Book {
imageCount: number
}
interface Encyclopedia extends Book {
volumeCount: number
}
// This is the schema I want to represent:
type ComicOrEncyclopedia = Comic | Encyclopedia
Here is how I can both handle polymorphism and forbid any extra-prop (while obviously enforcing inherited types in the "child" definitions):
{
"$schema": "http://json-schema.org/draft-07/schema#",
"definitions": {
"bookDefinition": {
"type": "object",
"properties": {
"imageCount": {
"type": "number"
},
"pageCount": {
"type": "number"
},
"volumeCount": {
"type": "number"
}
}
},
"comicDefinition": {
"type": "object",
"allOf": [{ "$ref": "#/definitions/bookDefinition" }],
"properties": {
"imageCount": {},
"pageCount": {},
"volumeCount": {
"not": {}
}
},
"required": ["imageCount", "pageCount"],
"additionalProperties": false
},
"encyclopediaDefinition": {
"type": "object",
"allOf": [{ "$ref": "#/definitions/bookDefinition" }],
"properties": {
"imageCount": {
"not": {}
},
"pageCount": {},
"volumeCount": {}
},
"required": ["pageCount", "volumeCount"],
"additionalProperties": false
}
},
"type": "object",
"oneOf": [
{ "$ref": "#/definitions/comicDefinition" },
{ "$ref": "#/definitions/encyclopediaDefinition" }]
}
This isn't a GREAT answer. But until the definition of JSONSchema is improved (or someone provides a better answer) - this is what I've come up with as workable.
Basically, you define two copies of each type, the first with all the details but no additionalProperties: false flag. Then second, REFERENCING the first, but with the 'additionalProperties: false' set.
The first you can think of as an 'abstract class' and the second as a 'concrete class'.
Then, to 'subclass', you use the https://json-schema.org/understanding-json-schema/structuring.html#extending approach, but referencing the ABSTRACT class, and then add the 'additionalProperties: false'. SADLY, to make this work, you must also REPEAT all the inherited properties (but no need to include their type info - just their names) - due to the sad choice for how JSONSchema draft 7 appears to interpret additionalProperties.
An EXAMPLE - based on https://json-schema.org/understanding-json-schema/structuring.html#extending should help:
https://www.jsonschemavalidator.net/s/3fhU3O1X
(reproduced here in case other site
/link not permanant/reliable)
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://TEST",
"definitions": {
"interface-address": {
"type": "object",
"properties": {
"street_address": {
"type": "string"
},
"city": {
"type": "string"
},
"state": {
"type": "string"
}
},
"required": ["street_address", "city", "state"]
},
"concrete-address": {
"allOf": [
{
"$ref": "#/definitions/interface-address"
}
],
"properties": {
"street_address": {},
"city": {},
"state": {}
},
"additionalProperties": false
},
"in-another-file-subclass-address": {
"allOf": [
{
"$ref": "#/definitions/interface-address"
}
],
"additionalProperties": false,
"properties": {
"street_address": {},
"city": {},
"state": {},
"type": {
"enum": ["residential", "business"]
}
},
"required": ["type"]
},
"test-of-address-schemas": {
"type": "object",
"properties": {
"interface-address-allows-bad-fields": {
"$ref": "#/definitions/interface-address"
},
"use-concrete-address-to-only-admit-legit-addresses-without-extra-crap": {
"$ref": "#/definitions/concrete-address"
},
"still-can-subclass-using-interface-not-concrete": {
"$ref": "#/definitions/in-another-file-subclass-address"
}
}
}
},
"anyOf": [
{
"$ref": "#/definitions/test-of-address-schemas"
}
]
}
and example document:
{
"interface-address-allows-bad-fields":{
"street_address":"s",
"city":"s",
"state":"s",
"allow-bad-fields-this-is-why-we-need-additionalProperties":"s"
},
"use-concrete-address-to-only-admit-legit-addresses-without-extra-crap":{
"street_address":"s",
"city":"s",
"state":"s"
},
"still-can-subclass-using-interface-not-concrete":{
"street_address":"s",
"city":"s",
"state":"s",
"type":"business"
}
}

Is it possible to inline JSON schemas into a JSON document? [duplicate]

For example a schema for a file system, directory contains a list of files. The schema consists of the specification of file, next a sub type "image" and another one "text".
At the bottom there is the main directory schema. Directory has a property content which is an array of items that should be sub types of file.
Basically what I am looking for is a way to tell the validator to look up the value of a "$ref" from a property in the json object being validated.
Example json:
{
"name":"A directory",
"content":[
{
"fileType":"http://x.y.z/fs-schema.json#definitions/image",
"name":"an-image.png",
"width":1024,
"height":800
}
{
"fileType":"http://x.y.z/fs-schema.json#definitions/text",
"name":"readme.txt",
"lineCount":101
}
{
"fileType":"http://x.y.z/extended-fs-schema-video.json",
"name":"demo.mp4",
"hd":true
}
]
}
The "pseudo" Schema note that "image" and "text" definitions are included in the same schema but they might be defined elsewhere
{
"id": "http://x.y.z/fs-schema.json",
"definitions": {
"file": {
"type": "object",
"properties": {
"name": { "type": "string" },
"fileType": {
"type": "string",
"format": "uri"
}
}
},
"image": {
"allOf": [
{ "$ref": "#definitions/file" },
{
"properties": {
"width": { "type": "integer" },
"height": { "type": "integer"}
}
}
]
},
"text": {
"allOf": [
{ "$ref": "#definitions/file" },
{ "properties": { "lineCount": { "type": "integer"}}}
]
}
},
"type": "object",
"properties": {
"name": { "type": "string"},
"content": {
"type": "array",
"items": {
"allOf": [
{ "$ref": "#definitions/file" },
{ *"$refFromProperty"*: "fileType" } // the magic thing
]
}
}
}
}
The validation parts of JSON Schema alone cannot do this - it represents a fixed structure. What you want requires resolving/referencing schemas at validation-time.
However, you can express this using JSON Hyper-Schema, and a rel="describedby" link:
{
"title": "Directory entry",
"type": "object",
"properties": {
"fileType": {"type": "string", "format": "uri"}
},
"links": [{
"rel": "describedby",
"href": "{+fileType}"
}]
}
So here, it takes the value from "fileType" and uses it to calculate a link with relation "describedby" - which means "the schema at this location also describes the current data".
The problem is that most validators do not take any notice of any links (including "describedby" ones). You need to find a "hyper-validator" that does.
UPDATE: the tv4 library has added this as a feature
I think cloudfeet answer is a valid solution. You could also use the same approach described here.
You would have a file object type which could be "anyOf" all the subtypes you want to define. You would use an enum in order to be able to reference and validate against each of the subtypes.
If the sub-types schemas are in the same Json-Schema file you don't need to reference the uri explicitly with the "$ref". A correct draft4 validator will find the enum value and will try to validate against that "subschema" in the Json-Schema tree.
In draft5 (in progress) a "switch" statement has been proposed, which will allow to express alternatives in a more explicit way.

Schema to load JSON to Google BigQuery

Suppose I have the following JSON, which is the result of parsing urls parameters from a log file.
{
"title": "History of Alphabet",
"author": [
{
"name": "Larry"
},
]
}
{
"title": "History of ABC",
}
{
"number_pages": "321",
"year": "1999",
}
{
"title": "History of XYZ",
"author": [
{
"name": "Steve",
"age": "63"
},
{
"nickname": "Bill",
"dob": "1955-03-29"
}
]
}
All the fields in top-level, "title", "author", "number_pages", "year" are optional. And so are the fields in the second level, inside "author", for example.
How should I make a schema for this JSON when loading it to BQ?
A related question:
For example, suppose there is another similar table, but the data is from different date, so it's possible to have different schema. Is it possible to query across these 2 tables?
How should I make a schema for this JSON when loading it to BQ?
The following schema should work. You may want to change some of the types (e.g. maybe you want the dob field to be a TIMESTAMP instead of a STRING), but the general structure should be similar. Since types are NULLABLE by default, all of these fields should handle not being present for a given row.
[
{
"name": "title",
"type": "STRING"
},
{
"name": "author",
"type": "RECORD",
"fields": [
{
"name": "name",
"type": "STRING"
},
{
"name": "age",
"type": "STRING"
},
{
"name": "nickname",
"type": "STRING"
},
{
"name": "dob",
"type": "STRING"
}
]
},
{
"name": "number_pages",
"type": "INTEGER"
},
{
"name": "year",
"type": "INTEGER"
}
]
A related question: For example, suppose there is another similar table, but the data is from different date, so it's possible to have different schema. Is it possible to query across these 2 tables?
It should be possible to union two tables with differing schemas without too much difficulty.
Here's a quick example of how it works over public data (kind of a silly example, since the tables contain zero fields in common, but shows the concept):
SELECT * FROM
(SELECT * FROM publicdata:samples.natality),
(SELECT * FROM publicdata:samples.shakespeare)
LIMIT 100;
Note that you need the SELECT * around each table or the query will complain about the differing schemas.

reusing an object for multiple JSON schemas

I have two separate JSON schemas (used to validate HTTP request endpoints for a REST API) where they both accept the same exact object, but have different required fields (this is a create vs update request). Is there a way I can reuse a single definition of this object and only change the required fields? I know how to use $ref for reusing an object as a property of another object, but I cannot figure out how to reuse an entire object as the top-level object in a schema. My failed attempt so far:
event.json
{
"id": "event",
"type": "object",
"properties": {
"name": {
"type": "string"
},
"start_date": {
"type": "integer"
},
"end_date": {
"type": "integer"
},
"description": {
"type": "string"
}
},
"additionalProperties": false
}
event-create.json
{
"id": "event-create",
"type": "object",
"$ref": "event",
"additionalProperties": false,
"required": [ "name", "description" ]
}
Obviously that doesn't work. It seems like it tries to insert the entirety of 'event' into the definition of 'event-create', including the ID and such. I tried referincing event#/properties to no avail. I can't seem to do a $ref as the sole value inside a properties property either. Any ideas?
Any members other than "$ref" in a JSON Reference object SHALL be ignored.
- https://datatracker.ietf.org/doc/html/draft-pbryan-zyp-json-ref-03#section-3
This is why your example doesn't work. Anything other than the $ref field is supposed to be ignored.
Support for $ref is limited to fields whose type is a JSON Schema. That is why trying to use it for properties doesn't work. properties is a plain object whose values are JSON Schemas.
The best way to do this is with allOf. In this case allOf can sort-of be thought of as a list of mixin schemas.
{
"id": "event-create",
"type": "object",
"allOf": [{ "$ref": "event" }],
"required": ["name", "description"]
}
I found some syntax that seems to work, but I'm not terribly happy with it:
{
"id": "event-create",
"allOf": [
{ "$ref": "event" },
{ "required": [ "name", "description" ] }
]
}
Seems like an abuse of the allOf operator, particularly for another case where there are no required fields (thus only one element insid the allof). But it works, so I'm going with it unless someone has a better idea.