Why is the avro default value not used ? (with avro-python) - serialization

I'm serializing some data with Avro (using the python library), and I have a hard time figuring how to make the "default" value work.
I have this schema:
{
"type": "record",
"fields":[
{"name": "amount", "type": "long"},
{"name": "currency", "type": "string", "default": "EUR"}
],
"name": "Monetary",
}
So as I understood, I could pass an amount and no currency, and the currency field would take the "EUR" value. However, if I don't pass a "currency" field when writing, I get the error avro.io.AvroTypeException: The datum { ... } is not an example of the schema xxx...
If I replace the currency field's type as an union ["string", "null"], then the data is serialized, but currency is null.
So it seems the "default" value is not taken into account at all.
What am I missing ? Are default value applicable for primitive types ?
Thanks in advance

Here is the relevant cite from avro specification
default: A default value for this field, used when reading instances that lack this field (optional)
The 'default value' field is used when you try to read an instance written with one schema and convert it to an instance written with another schema. If the field does not exist at the first schema (thus the instance lacks this field), the instance you get will take the default value of the second schema.
That't it!
The 'default value' is not used when you read/write instance using the same schema.
So, for your example, when you set the currency field a default value, if you try to read an instance which was written with older schema which did not contain currency field, the instance you get will contain the default value you've defined at your schema.
Worth to mention, when you use union, the default value refers only to the first type of the union.

Related

How to sort Redis list of objects using the properties of object

I have JSON data(see the ex below) which I'm storing in Redis list using 'rpush' with a key as 'person'.
Ex data:
[
{ "name": "john", "age": 30, "role": "developer" },
{ "name": "smith", "age": 45, "role": "manager" },
{ "name": "ram", "age": 35, "role": "tester" },
]
Now when I get this data using lrange person 0 -1, it gives me results as '[object object]'.
So, to actually get them with property names I'm storing them by stringifying them and parsing them back to objects to use the object properties.
But the issue with converting to a string is that I'm not able to sort them using any property, say name, age or role.
My question is, how do I store this JSON in Redis list and sort them using any of the properties.
Thanks.
Very recently I posted an answer for a very similar question.
The easiest approach is to use Redis Search module (which makese the approach portable to many clients / languages):
Store each needed object as separate key, following a prefixed key pattern (keys named prefix:something), and standard schema (all user keys are JSON, and all contain the field you want to sort).
Make a search index with FT.CREATE, with ON JSON parameter to search JSON-type keys, and likely PREFIX parameter to search just the needed keys, as well as x AS y parameters for all needed search fields, where x is field name, and y is type of field (TEXT, TAG, NUMERIC, etc. -- see documentation), optionally adding SORTABLE to the fields if they need to be sorted.
Use FT.SEARCH command with any combination of "#field=value" search parameters, and optionally SORTBY.
Otherwise, it's possible to just get all keys that follow a pattern using KEYS command, and use manual language-specific sorting code. That is of course more involved, depends on language and available libraries, and is therefore less portable.

Is there a way to add a default to a json schema array

I just want to understand if there is a way to add a default set of values to an array. (I don't think there is.)
So ideally I would like something like how you might imagine the following working. i.e. the fileTypes element defaults to an array of ["jpg", "png"]
"fileTypes": {
"description": "The accepted file types.",
"type": "array",
"minItems": 1,
"items": {
"type": "string",
"enum": ["jpg", "png", "pdf"]
},
"default": ["jpg", "png"]
},
Of course, all that being said... the above actually does seem to be validate as json schema however for example in VS code this default value does not populate like other defaults (like for strings) populate when creating documents.
It appears to be valid based on the spec.
9.2. "default"
There are no restrictions placed on the value of this keyword. When multiple occurrences of this keyword are applicable to a single sub-instance, implementations SHOULD remove duplicates.
This keyword can be used to supply a default JSON value associated with a particular schema. It is RECOMMENDED that a default value be valid against the associated schema.
See https://json-schema.org/draft/2020-12/json-schema-validation.html#rfc.section.9.2
It's up to the tooling to take advantage of that keyword in the JSON Schema and sounds like VS code is not.

Common type restrictions in JSON schema

can I have common type restrictions or a new type, which I could use for more properties in JSON scema? I am referencing some type properties, but I do not get what I would like to. For instance:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"description": "Common types",
"definitions": {
"YN": {
"description": "Y or N field (can be empty, too)",
"type": "string",
"minLength": 0,
"maxLength": 1,
"enum": [ "Y", "N", "" ]
},
"HHMM": {
"description": "Time in HHMM format (or empty).",
"type": "string",
"minLength": 0,
"maxLength": 4,
"pattern": "^[0-2][0-9][0-5][0-9]$|^$"
}
},
"properties" : {
"is_registered": {
"description": "User registered. (this description is overriden)",
"$ref": "#/definitions/YN"
},
"is_valid": {
"description": "User valid. (this description is overriden)",
"$ref": "#/definitions/YN"
},
"timeofday": {
"description": "User registered at HHMM. (this description is overriden)",
"$ref": "#/definitions/HHMM"
}
}
}
In the presented schema I have two strings with some restrictions (enum, pattern, etc.). I do not want to repeat these restrictions in every field of such type. Therefore, I have defined them in definitions and reused them. If type constraints changes, I change only the definitions.
However, there are two issues I have.
First, description is duplicated. If I load this into XMLSpy, only the description of the type is shown and not the description of the actual field. If the description of the type is empty, description of field is not used. I tried combining title, and description in a way that title would be used from common definition and description would be used from field description. It seems that always title and description are used from the common type definition. How could I use common type and the description of the field, which tells, what this field actually is.
Second, if description is inherited from definitions, can I just use common pattern or any other type property, and reference the pattern defined somehow in definitions or somewhere else?
In order to answer your question, first consider that JSON Schema does not have inheritance, only references.
For draft-04, using references mean that the WHOLE subschema object is replaced by the referenced schema object. This means your more specific field description is lost. (You can wrap them in allOf's but it probably won't do what you want in terms of generating documentation).
If you can move to draft 2019-09, $ref can be used alongside other keywords, as it is then classified as an applicator keyword. I don't know if the tooling your using will work as you expect though.
In terms of your second question, not built in to JSON Schema. $ref references can only be used when you have a schema (or subschema). If you want to de-duplicate common parts which are NOT schemas, a common approach is to use a templating engine and compile your schemas at build or runtime. I've seen this done at scale using jsonnet.

Use of type : object and properties in JSON schema

I'm new to JSON.
I see in various examples of JSON, like the following, where complex values are prefixed with "type":"object", properties { }
{
"$schema": "http://json-schema.org/draft-06/schema#",
"motor" : {
"type" : "object",
"properties" : {
"class" : "string",
"voltage" : "number",
"amperage" : "number"
}
}
}
I have written JSON without type, object, and properties, like the following.
{
"$schema": "http://json-schema.org/draft-06/schema#",
"motor" : {
"class" : "string",
"voltage" : "number",
"amperage" : "number"
}
}
and submitted to an on-line JSON schema validator with no errors.
What is the purpose of type:object, properties { }? Is it optional?
Yes it is optional, try removing it and use your validator.
{
"$schema": "http://json-schema.org/draft-06/schema#",
"foo": "bar"
}
You actually don't even need to use the $schema keyword i.e. {} is valid json
I would start by understanding what json is, https://www.json.org/ is the best place to start but you may prefer something easier to read like https://www.w3schools.com/js/js_json_intro.asp.
A schema is just a template (or definition) to make sure you're producing valid json for the consumer
As an example let's say you have an application that parses some json and looks for a key named test_score and saves the value (the score) in a database in some table/column. For this example we'll call the table tests and the column score. Since a database column requires a type we'll choose a numeric type, i.e. integer for our score column.
A valid json example for this may look like
{
"test_score": 100
}
Following this example the application would parse the key test_score and save the value 100 to the tests.score database table/column.
But let's say a score is absent so you put in a string i.e "NA"
{
"test_score": "NA"
}
When the application attempts to save NA to the database it will error because NA is a string not an integer which the database expects.
If you put each of those examples into any online json validator they are valid json example. However, while it's valid json to use "NA" or 100 it is not valid for the actual application that needs to consume the json.
So now you may understand that the author of the json may wonder
What are the different valid types I can use as values for my test
score?
The responsibility then falls on the writers of the application to provide some sort of definition (i.e a schema) that the clients (authors) can reference so the author knows exactly how to structure the json so the application can process it accordingly. Having a schema also allows you to validate/test your json so you know it can be processed by the application without actually having to send your json through the application.
So putting it altogether let's say in the schema you see
"$test_score": {
"type": "integer",
"format": "tinyint"
},
The writer of the json now knows that they must pass an integer and the range is 0 to 255 because it's a tinyint. They no longer have to trial by error different values and see which ones the application process. This is a big benefit to having a schema.

Change/Update Field name in the NiFi Schema Text property Across various parallel flows

I have few identical parallel flows(as shown in screenshot). I have convertRecord in each of the identical flows and in the Record Reader I have used "Schema Text Field Property" as access strategy and specified the "Schema text". For Example:
{
"type": "record",
"name": "AVLRecord0",
"fields" : [
{"name": "TimeOfDay", "type": "string", "logicalType":"timestamp-millis"},
{"name":"Field1", "type": "double"},
{"name":"Field2", "type": "double"},
{"name":"Field3", "type": "double"},
{"name": "Filename", "type": "string"}
]
}
Lets say the above schema I have used across various parallel flows ConvertRecord, and now I want to update one field name from Field to Field_Name so is there any way I can do it in one go across all the convert record Schema Text?
If I want to change/update one of the Field in the schema Text do I have to change/Update the field name in each processor manually? Or there is a global way that will change the field name across all the parallel flow I have?
Is there Any way that I can update the Schema Text across various processors In one go?
Any help is much appreciated! Thanks
As you are using Schema Text Field Property so you need to change in all ConvertRecord processor manually.
Try with this approach:
In ConvertRecord processor use Schema Access Strategy as
Use Schema Name Property
Then set up AvroSchemaRegistry and define your schema by adding new property
I have added sch as schema.name and defined the avro schema.
After GetFile Processor use UpdateAttribute processor and add schema.name attribute(for ex: with value sch) to the flowfile.
Now in reader controller service use the Schema Access strategy as Use Schema Name Property and Schema Registry asAvroSchemaRegistry` that has already setup.
By following this way we are not defining schema on all ConvertRecord processors instead we are referring to same schema that defined in AvroSchemaRegistry in case if you want to change one field name it is easy to go into Registry and change the value.
Flow:
1.GetFile
2.UpdateAttribute //add schema.name attribute
3.ConvertRecord //define/use AvroSchemaRegistry and access strategy as schemaname property
..other processors
Refer to this link for more details regards to defining/using AvroSchemaRegistry.