Avro schema specification won't take same namespace - schema

I defined a schema that goes by :
{ "namespace":"configschemas.avro",
"type":"record",
"name":"pathObject",
"fields":
[
{ "name":"pathString",
"type" : "string",
"default" : "null"
}
,
{ "name":"needsConversion",
"type" : "boolean" ,
"default" : false
}
]
}
The second schema won't compile after compiling the above schema.
{ "namespace" : "configschemas.avro",
"type" : "array" ,
"items" : configschemas.avro.pathObject
}
All the schemas are under the same directory and the namespaces are same as well. Can't get the flaw.
Error while compiling second schema:
Input files to compile:
logPaths.avsc
Exception in thread "main" org.apache.avro.SchemaParseException: org.codehaus.jackson.JsonParseException: Unexpected character ('p' (code 112)): expected a vali
d value (number, String, array, object, 'true', 'false' or 'null')
at [Source: logPaths.avsc; line: 3, column: 13]
at org.apache.avro.Schema$Parser.parse(Schema.java:967)
at org.apache.avro.Schema$Parser.parse(Schema.java:932)
at org.apache.avro.tool.SpecificCompilerTool.run(SpecificCompilerTool.java:73)
at org.apache.avro.tool.Main.run(Main.java:84)
at org.apache.avro.tool.Main.main(Main.java:73)
Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('p' (code 112)): expected a valid value (number, String, array, object, 'true', 'false
' or 'null')
at [Source: logPaths.avsc; line: 3, column: 13]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:442)
at org.codehaus.jackson.impl.Utf8StreamParser._handleUnexpectedValue(Utf8StreamParser.java:2090)
at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:555)
at org.codehaus.jackson.map.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:192)
at org.codehaus.jackson.map.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:58)
at org.codehaus.jackson.map.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2704)
at org.codehaus.jackson.map.ObjectMapper.readTree(ObjectMapper.java:1344)
at org.apache.avro.Schema$Parser.parse(Schema.java:965)
... 4 more**

I'm uncertain how you're invoking the schema parser, but putting both schemas in the same schema file should work, as this demonstrates
#Grapes([
#Grab(group='org.apache.avro', module='avro', version='1.7.7')
])
import org.apache.avro.Schema;
String schema = '''
{
"namespace":"configschemas.avro",
"type":"record",
"name":"pathObject",
"fields":[
{
"name":"pathString",
"type":"string",
"default":"null"
},
{
"name":"needsConversion",
"type":"boolean",
"default":false
}
]
}
{
"namespace":"configschemas.avro",
"type":"array",
"items":configschemas.avro.pathObject
}'''
try {
System.out.println(new Schema.Parser().parse(schema));
} catch (Throwable t) {
t.printStackTrace();
}
So if you either load all schemas in a namespace together, it should work (you can keep them in separate files, just load the text from the files together).

Related

Finding record which doesn't contain some value in array fields

I am using sequelize + typescript over node (with postgresql db) and I have the following model:
id: number,
someField: string,
arr1: number[],
arr2: number[]
and I'm trying to find all records in which arr1 and arr2 don't contain a certain value.
As far as I've seen my only option in one query is a mix between Op.not and Op.contains,
so I've tried the following queries:
/// Number 1
where: {
arr1: {
[Op.not] : {[Op.contains]: [someValue]}
},
arr2: {
[Op.not] : {[Op.contains]: [soemValue]}
}
},
/// Number 2
where: {
[Op.not]: [
{arr1: {[Op.contains]: [someValue]}},
{arr2: {[Op.contains]: [someValue]}}
]
},
Now, number 1 does compile in typescript but when trying to run it the following error returns:
{
"errorId": "db.failure",
"message": "Database error occurred",
"innerError":
{
"name": "SequelizeValidationError",
"errors":
[
{
"message": "{} is not a valid array",
"type": "Validation error",
"path": "arr1",
"value": {},
"origin": "FUNCTION",
"instance": null,
"validatorKey": "ARRAY validator",
"validatorName": null,
"validatorArgs": []
}
]
}
}
So I tried number 2, which didn't compile at all with the following TS error:
Type '{ [Op.not]: ({ arr1: { [Op.contains]: [number]; }; } | { arr2: { [Op.contains]: [number]; }; })[]; }' is not assignable to type 'WhereOptions<any>'.
Types of property '[Op.not]' are incompatible.
Type '({ arr1: { [Op.contains]: [number]; }; } | { arr2: { [Op.contains]: [number]; }; })[]' is not assignable to type 'undefined'
So the question is what am I doing wrong, or in other words, how can I make that query without querying all records and filter using code
Thanks!
You have to use notIn and not contain maybe then it will work:
Official Docs: https://sequelize.org/master/manual/model-querying-basics.html
where: {
arr1: {
[Op.notIn]: someValueArray
},
arr2: {
[Op.notIn]: someValueArray
}
},
Apparently the second option is the correct one, but what was incorrect was the types of sequelize, #ts-ignore fixes the problem

Karate - Nested JSON object schema validation causes KarateException

Feature: Test Karate schema validation
Scenario: Test nested json objects
* def response = read('tasks.json')
* def schema = { ab: "##[] string", c: "##[] string" }
* match response ==
"""
{
id: '#string',
name: '#string',
obj1: '#(schema)' ,
obj2: '##(schema)' ,
obj3: '#(schema)' ,
obj4: '#null'
}
"""
Following is json file used (tasks.json)
{
"id": "ad:p2:53456:4634:yu",
"name": "name",
"obj1": {
"ab": [
"test"
],
"c": null
},
"obj2": null,
"obj3": {
"ab": [
"tester"
],
"c": [
"t1", "t2"
]
},
"obj4": null
}
Error: com.intuit.karate.exception.KarateException: javascript evaluation failed: string, ReferenceError: "string" is not defined in at line number 1
I have tried multiple ways like :
obj1: '#(^schema)',
obj1: '#object schema'
but not able to fix the issue.
It should be ##[] #string , read the docs: https://github.com/intuit/karate#schema-validation

Json Schema required validation

I have my json schema where all values are required. For example:
....
{
"properties" : {
"minimumDelay" : {
"type" : "number"
},
"length" : {
"type" : "number"
},
},
"required": {
"minimumDelay",
"length"
}
Here the json data will be valid if I enter both minimumDelay and length values.
But my requirement is json data must be valid when I enter either 1 of the values(like XOR case). How my schema must be modified to achieve the same?
In JSON Schema, the XOR operator is oneOf.
{
"properties" : {
"minimumDelay" : {
"type" : "number"
},
"length" : {
"type" : "number"
}
},
"oneOf": [
{ "required": ["minimumDelay"] },
{ "required": ["length"] }
]
}

Morphline config file not indexing avro nexted data

I am generating index for my avro data in solr. Index are only getting generated for data elements which are at root level and not which are nested.
Below is the sample schema (not including all of it)
My Avro Schema is as below.
{
"type" : "record",
"name" : "abcd",
"namespace" : "xyz",
"doc" : "Schema Definition for Low Fare Search Shopping Request/Response Data",
"fields" : [ {
"name" : "ShopID",
"type" : "string"
}, {
"name" : "RqSysTimestamp",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "RqTimestamp",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "RsSysTimestamp",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "RsTimestamp",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "Request",
"type" : {
"type" : "record",
"name" : "RequestStruct",
"fields" : [ {
"name" : "TransactionID",
"type" : [ "string", "null" ]
}, {
"name" : "AgentSine",
"type" : [ "string", "null" ]
}, {
"name" : "CabinPref",
"type" : [ {
"type" : "array",
"items" : {
"type" : "record",
"name" : "CabinStruct",
"fields" : [ {
"name" : "Cabin",
"type" : [ "string", "null" ]
}, {
"name" : "PrefLevel",
"type" : [ "string", "null" ]
} ]
}
}, "null" ]
}, {
"name" : "CountryCode",
"type" : [ "string", "null" ]
},
"name" : "PassengerStatus",
"type" : [ "string", "null" ]
}, {
}
How do i refer "TransactionID" in my morphline config file. I tried all options but it does not generate index for data elements which are nested.
Below is the sample of my morphline config file.
extractAvroPaths {
flatten : true
paths : {
ShopID : /ShopID
RqSysTimestamp : /RqSysTimestamp
RqTimestamp : /RqTimestamp
RsSysTimestamp :/RsSysTimestamp
RsTimestamp : /RsTimestamp
TransactionID : "/Request/RequestStruct/TransactionID"
AgentSine : "/Request/RequestStruct/AgentSine"
Cabin :/Cabin
PrefLevel :/PrefLevel
CountryCode :/CountryCode
FrequentFlyerStatus :/FrequentFlyerStatus
The toAvro command expects a java.util.Map as input on conversion to a nested Avro record. So this is my solution.
morphlines: [
{
id: convertJsonToAvro
importCommands: [ "org.kitesdk.**" ]
commands: [
# read the JSON blob
{ readJson: {} }
# java code
{
java {
imports : """
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.kitesdk.morphline.base.Fields;
import java.io.IOException;
import java.util.Set;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
"""
code : """
String jsonStr = record.getFirstValue(Fields.ATTACHMENT_BODY).toString();
ObjectMapper mapper = new ObjectMapper();
Map<String, Object> map = null;
try {
map = (Map<String, Object>)mapper.readValue(jsonStr, Map.class);
} catch (IOException e) {
e.printStackTrace();
}
Set<String> keySet = map.keySet();
for (String o : keySet) {
record.put(o, map.get(o));
}
return child.process(record);
"""
}
}
# convert the extracted fields to an avro object
# described by the schema in this field
{ toAvro {
schemaFile: /etc/flume/conf/a1/like_user_event_realtime.avsc
} }
#{ logInfo { format : "loginfo: {}", args : ["#{}"] } }
# serialize the object as avro
{ writeAvroToByteArray: {
format: containerlessBinary
} }
]
}
]

JSON Schema - require all properties

The required field in JSON Schema
JSON Schema features the properties, required and additionalProperties fields. For example,
{
"type": "object",
"properties": {
"elephant": {"type": "string"},
"giraffe": {"type": "string"},
"polarBear": {"type": "string"}
},
"required": [
"elephant",
"giraffe",
"polarBear"
],
"additionalProperties": false
}
Will validate JSON objects like:
{
"elephant": "Johnny",
"giraffe": "Jimmy",
"polarBear": "George"
}
But will fail if the list of properties is not exactly elephant, giraffe, polarBear.
The problem
I often copy-paste the list of properties to the list of required, and suffer from annoying bugs when the lists don't match due to typos and other silly errors.
Is there a shorter way to denote that all properties are required, without explicitly naming them?
You can just use the "minProperties" property instead of explicity naming all the fields.
{
"type": "object",
"properties": {
"elephant": {"type": "string"},
"giraffe": {"type": "string"},
"polarBear": {"type": "string"}
},
"additionalProperties": false,
"minProperties": 3
}
I doubt there exists a way to specify required properties other than explicitly name them in required array.
But if you encounter this issue very often I would suggest you to write a small script that post-process your json-schema and add automatically the required array for all defined objects.
The script just need to traverse the json-schema tree, and at each level, if a "properties" keyword is found, add a "required" keyword with all defined keys contained in properties at the same level.
Let the machines do the bore stuff.
I do this in code with a one-liner, for instance, if I want to use required for insert in a DB, but only want to validate against the schema when performing an update.
prepareSchema(action) {
const actionSchema = R.clone(schema)
switch (action) {
case 'insert':
actionSchema.$id = `/${schema.$id}-Insert`
actionSchema.required = Object.keys(schema.properties)
return actionSchema
default:
return schema
}
}
if you using the library jsonschema in python use custom validators:
first create custom validator:
# Custom validator for requiring all properties listed in the instance to be in the 'required' list of the instance
def allRequired(validator, allRequired, instance, schema):
if not validator.is_type(instance, "object"):
return
if allRequired and "required" in instance:
# requiring all properties to 'required'
instanceRequired = instance["required"]
instanceProperties = list(instance["properties"].keys())
for property in instanceProperties:
if property not in instanceRequired:
yield ValidationError("%r should be required but only the following are required: %r" % (property, instanceRequired))
for property in instanceRequired:
if property not in instanceProperties:
yield ValidationError("%r should be in properties but only the following are properties: %r" % (property, instanceProperties))
then extend an exsitsing validator:
all_validators = dict(Draft4Validator.VALIDATORS)
all_validators['allRequired'] = allRequired
customValidator = jsonschema.validators.extend(
validator=Draft4Validator,
validators=all_validators
)
now test:
schema = {"allRequired": True}
instance = {"properties": {"name": {"type": "string"}}, "required": []}
v = customValidator(schema)
errors = validateInstance(v, instance)
you will get the error:
'name' should be required but only the following are required: []
As suggested by others, here's such post-processing python code:
def schema_to_strict(schema):
if schema['type'] not in ['object', 'array']:
return schema
if schema['type'] == 'array':
schema['items'] = schema_to_strict(schema['items'])
return schema
for k, v in schema['properties'].items():
schema['properties'][k] = schema_to_strict(v)
schema['required'] = list(schema['properties'].keys())
schema['additionalProperties'] = False
return schema
You can use the function below:
export function addRequiredAttributeRecursive(schema) {
if (schema.type === 'object') {
schema.required = [];
Object.keys(schema.properties).forEach((key) => {
schema.required.push(key);
if (schema.properties[key].type === 'object') {
schema.properties[key] = addRequiredAttributeRecursive(
schema.properties[key],
);
} else if (schema.properties[key].type === 'array') {
schema.properties[key].items = addRequiredAttributeRecursive(
schema.properties[key].items,
);
}
});
} else if (schema.type === 'array') {
if (schema.items.type === 'object') {
schema.items = addRequiredAttributeRecursive(schema.items);
}
}
return schema;
}
It recursively write the required attribute for every property on all objects from the schema you have.
If you are using Javascript, you can use property getter.
{
"type": "object",
"properties": {
"elephant": {"type": "string"},
"giraffe": {"type": "string"},
"polarBear": {"type": "string"}
},
get required() { return Object.keys(this.properties) },
"additionalProperties": false
}