Understanding JSON Schema errors using ajv - jsonschema

I have the following schema and json to validate using ajv.
const schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": [ "countries" ],
"definitions": {
"europeDef": {
"type": "object",
"required": ["type"],
"properties": { "type": {"const": "europe"} }
},
"asiaDef": {
"type": "object",
"required": ["type"],
"properties": { "type": {"const": "asia"} }
}
},
"properties": {
"countries": {
"type": "array",
"items": {
"oneOf":[
{ "$ref": "#/definitions/europeDef" },
{ "$ref": "#/definitions/asiaDef"}
]
}
}
}
}
const data = {
"countries":[
{"type": "asia"},
{"type": "europe1"}
]
}
const isValid = ajv.validate(schema, data); //schema, data
if(! isValid){
console.log(ajv.errors);
}
and the error is:
[
{
keyword: 'const',
dataPath: '/countries/1/type',
schemaPath: '#/definitions/europeDef/properties/type/const',
params: { allowedValue: 'europe' },
message: 'should be equal to constant'
},
{
keyword: 'const',
dataPath: '/countries/1/type',
schemaPath: '#/definitions/asiaDef/properties/type/const',
params: { allowedValue: 'asia' },
message: 'should be equal to constant'
},
{
keyword: 'oneOf',
dataPath: '/countries/1',
schemaPath: '#/properties/countries/items/oneOf',
params: { passingSchemas: null },
message: 'should match exactly one schema in oneOf'
}
]
I know why the error is appearing (reason: as I have used 'europe1' and it is not conforming the schema standard)
I have following questions from the above error situation:
Being, I have provided 'asia' as a valid const, the error stills talks about 'asia' as part of second entry in the array. Why did it showing as an error despite schema is absolute fine from asia perspective. Is this because 'oneOf' getting used ? In other words, it is very hard to understand, what and where is the error and what is not?
For asia, 'message: 'should be equal to constant' (2nd item of the array) is misleading imo. It gives an impression that there are still some problems with the 'asia'.
How to parse this error: on the basis of schemaPath or dataPath? Also in any case, it will still give an impression that there is a problem in terms of 'asia' (and actually its not)
Also, how to explain the above error output to a novice, as the novice will still say, why asia is coming part of error despite its correct?
Also, if the schema become more complex using oneOf/anyOf,allOf or using if-then-else, the ajv.errors output becomes more complex to understand and to explain (when certain condition are accurate but displayed as error, example asia here)
Are there any theory/documentaion/guidelines to understand the errors in a better way?

For JSON Schema draft 2019-09, we created several standardised output formats. ajv provides one of the most useful outputs from a draft-07 schema in comparison to many libraries.
When looking at the errors, what you might be overlooking is the dataPath value.
In answer to 1, the errors reported are all when applying to data path /countries/1. /countries/0 is fine, as you say. Arrays in javascript start at 0.
I think knowing that also answers all your other questions.
I think you may have assumed that arrays start at 1, and the data path was referring to asia object while it's actually targeting europe1 object.
Please do comment if I'm missing something or you're still confused on this.

Related

Where do I store GraphQL errors

Basically what the title asks. As far as I know, GraphQL responses are structured as follows:
{
"data": { ... },
"errors":
{
"message": "This message describes what happened",
"path": [
"somePath",
"inGraphQLQuery"
],
"locations": [
{
"line": 2,
"column": 3
}
]
}
]
}
And that works for simple errors. But what if my error is more complicated than just a message? For example, I have mutation that changes some fields of user, such as username, email, password, etc. That mutation might result in error if user-provided data for any field is invalid. But what do I do if validation for multiple fields fails? Ideal way in my opinion to show that to user would be to send structure like this
[
{
"field": "username",
"details": "Username must contain only latin letters and digits"
},
{
"field": "email",
"details": "Email address must contain '#'"
}
{
"field": "password",
"details": "Password must contain at least 8 characters"
}
]
But to my knowledge, there's no way to send this in errors section. So I tend to put this into data section of response and specify that user will receive either User or FieldValidationError.
So, to sum it up, there are some structures that I can't fit into errors, so (for consistency) it seems logical to put every error into data. And by doing so, errors section becomes redundant, since it isn't used for errors at all. What am I getting wrong?

Can you use separate files for json subschemas?

I am new to using JSON schemas and I am pretty confused about subschemas. I have done many searches and read https://json-schema.org/understanding-json-schema/structuring.html but I feel like I am not getting some basic concepts.
I'd like to break up a schema into several files. For instance, I have a metric schema that I would like nested in a category schema. Can a subschema be a separate file that is referenced or is it a block of code in the same file as the base schema? If they are separate files, how do you reference the other file? I have tried using a lot of various values for $ref with the $id of the nested file but it doesn't seem to work.
I don't think I really understand the $id and $schema fields. I have read the docs on them but leave still feeling confused. Does the $id need to be a valid URI? The docs seem to say that they don't. And I just copied the $schema value from the jsonschema site examples.
Any help would be appreciated about what I am doing wrong.
(added the following after Ether's reply)
The error messages I get are:
KeyError: 'http://mtm/metric'
and variations on
jsonschema.exceptions.RefResolutionError: HTTPConnectionPool(host='mtm', port=80): Max retries exceeded with url: /metric (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fe9204a31c0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
Here is the category schema in category_schema.json:
{
"$id": "http://mtm/category",
"$schema":"https://json-schema.org/draft/2020-12/schema",
"title":"Category Schema",
"type":"object",
"required":["category_name", "metrics"],
"properties": {
"category_name":{
"description": "The name of the category.",
"type":"string"
},
"metrics":{
"description": "The list of metrics for this category.",
"type":"array",
"items": {
"$ref": "/metric"
}
}
}
}
And here is the metric schema in metric_schema.json:
{
"$id": "http://mtm/metric",
"$schema":"https://json-schema.org/draft/2020-12/schema",
"title":"Metric Schema",
"description":"Schema of metric data.",
"type":"object",
"required": ["metric_name"],
"properties": {
"metric_name":{
"description": "The name of the metric in standard English. e.g. Live Views (Millions)",
"type":"string"
},
"metric_format": {
"description": "The format of the metric value. Can be one of: whole, decimal, percent, or text",
"type": "string",
"enum": ["integer", "decimal", "percent", "text"]
}
}
}
Yes, you can reference schemas in other documents, but the URIs need to be correct, and you need to add the files manually to the evaluator if they are not network- or filesystem-available.
In your first schema, you declare its uri is "http://mtm/category". But then you say "$ref": "/mtm/metric" -- since that's not absolute, the $id URI will be used as a base to resolve it. The full URI resolves to "http://mtm/mtm/metric", which is not the same as the identifier used in the second schema, so the document won't be found. This should be indicated in the error message (which you didn't provide).

JSON Schema Array Validation Woes Using oneOf

Hope I might find some help with this validation issue: I have a JSON array that can have multiple object types (video, image). Within that array, the objects have a rel field value. The case I'm working on is that there can only be one object with "rel": "primaryMedia" allowed — either a video or an image.
Here are the object representations for the image and video case, both showing the "rel": "primaryMedia".
{
"media": [
{
"caption": "Caption goes here",
"id": "ncim87659842",
"rel": "primaryMedia",
"type": "image"
},
{
"description": "Shaima Swileh arrived in San Francisco after fighting for 17 months to get a waiver from the U.S. government to be allowed into the country to visit her son.",
"headline": "Yemeni mother arrives in U.S. to be with dying 2-year-old son",
"id": "mmvo1402810947621",
"rel": "primaryMedia",
"type": "video"
}
]
}
Here's a stripped-down version of the schema I created to validate this using oneOf (assuming this will handle my case). It doesn't however, work as intended.
{
"$id": "http://example.com/schema/rockcms/article.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"definitions": {},
"properties": {
"media": {
"items": {
"oneOf": [
{
"additionalProperties": false,
"properties": {
"caption": {
"type": "string"
},
"id": {
"type": "string"
},
"rel": {
"enum": [
"primaryMedia"
],
"type": "string"
},
"type": {
"enum": [
"image"
],
"type": "string"
}
},
"required": [
"caption",
"id",
"rel",
"type"
],
"type": "object"
},
{
"additionalProperties": false,
"properties": {
"description": {
"type": "string"
},
"headline": {
"type": "string"
},
"id": {
"type": "string"
},
"rel": {
"enum": [
"primaryMedia"
],
"type": "string"
},
"type": {
"enum": [
"video"
],
"type": "string"
}
},
"required": [
"description",
"headline",
"id",
"rel",
"type"
],
"type": "object"
}
]
},
"type": "array"
}
}
}
Using the JSON Schema validator at https://www.jsonschemavalidator.net, the schema validates when data is correctly presented, but the doesn't work when trying to catch errors.
In the case below, headline is added for a video, and it's missing id. This should fail because headline isn't allowed on video, and id is required.
{
"media": [
{
"headline": "Yemeni mother arrives in U.S. to be with dying 2-year-old son",
"rel": "primaryMedia",
"type": "video"
}
]
}
The results I get from the validator, however, aren't entirely expected. Seems it's conflating the two object schemas in its response.
In addition to this, I've separately found that the schema will allow the population of BOTH a video and image object in media, which isn't expected.
Been trying to figure out what I've done wrong, but am stumped. Would very much appreciate some feedback, if anyone has to offer. Thanks in advance!
Validator Response:
Message: JSON is valid against no schemas from 'oneOf'.
Schema path: #/properties/media/items/oneOf
Message: Property 'headline' has not been defined and the schema does not allow additional properties.
Schema path: #/properties/media/items/oneOf/0/additionalProperties
Message: Value "video" is not defined in enum.
Schema path: #/properties/media/items/oneOf/0/properties/type/enum
Message: Required properties are missing from object: description, id.
Schema path: #/properties/media/items/oneOf/1/required
Message: Required properties are missing from object: caption, id.
Schema path: #/properties/media/items/oneOf/0/required
Your question is formed of three parts, but the first two are linked, so I'll address those, although they don't have a "solution" as such.
How can I validate that only one object in an array has a specific key, and the others do not.
You cannot do this with JSON Schema.
The set of JSON Schema keywords which are applicable to arrays do not have a means for expressing "one of the values must match a schema", but rather are applicable to either ALL of the items in a the array, or A SPECIFIC item in the array (if items is an array as opposed to an object).
https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-validation-01#section-6.4
The validation output is not what I expect. I expect to only see the failing branch of oneOf which relates to my object type, which
is defined by the type key.
The JSON Schema specification (as of draft-7) does not specify any format for returning errors, however the error structure you get is pretty "complete" in terms of what it's telling you (and is similar to how we are specifying errors should be returned for draft-8).
Consider, the validator knows nothing about your schema or your JSON instance in terms of your business logic.
When validating a JSON instance, a validator may step through all values, and test validation rules against all applicable subschemas. Looking at your errors, both of the schemas in oneOf are applicable to all items in your array, and so all are tested for validation. If one does not satisfy the condition, the others will be tested also. The validator cannot know, when using a oneOf, what your intent was.
You MAY be able to get around this issue, by using an if / then combination. If your if schema is simply a const of the type, and your then schema is the full object type, you may get a cleaner error response, but I haven't tested this theory.
I want ALL items in the array to be one of the types. What's going on here?
From the spec...
items:
If "items" is a schema, validation succeeds if all elements in the
array successfully validate against that schema.
oneOf:
An instance validates successfully against this keyword if it
validates successfully against exactly one schema defined by this
keyword's value.
What you have done is say: Each item in the array should be valid according to [schemaA]. SchemA: The object should be valid according to on of these schemas: [the schemas inside the oneOf].
I can see how this is confusing when you think "items must be one of the following", but items applies the value schema to each of the items in the array independantly.
To make it do what you mean, move the oneOf above items, and then refactor one of the media types to another items schema.
Here's a sudo schema.
oneOf: [items: properties: type: const: image], [items: properties: type: image]
Let me know if any of this isn't clear or you have any follow up questions.

Apache Druid segment merge task submition failure

I am using Druid 0.9.1.1 and trying to merge all the segment of a datasource per day to a single segment. Whereas the merge task initiation fails with error :
{"error":"Instantiation of [simple type, class io.druid.timeline.DataSegment] value failed: null (through reference chain: java.util.ArrayList[0])"}
I have got the segment details from segment metadata query. There is no help from driud documents as only specify raw structure of the overall query, but not the required segment detail structure(Below is how druid document suggests).
{
"type": "merge",
"id": <task_id>,
"dataSource": <task_datasource>,
"aggregations": <list of aggregators>,
"segments": <JSON list of DataSegment objects to merge>
}
example queries :
{
"type": "merge",
"id": "envoy_merge_task",
"dataSource": "dcap.envoy.diskmounts.kafka",
"segments": [{"id":"dcap.sermon.threshold.kafka_2017-05-22T00:00:00.000Z_2017-05-23T00:00:00.000Z_2017-05-22T07:00:02.951Z","intervals":["2017-05-22T00:00:00.000Z/2017-05-23T00:00:00.000Z"],"columns":{},"size":5460959,"numRows":41577,"aggregators":null,"queryGranularity":null},{"id":"dcap.sermon.threshold.kafka_2017-05-22T00:00:00.000Z_2017-05-23T00:00:00.000Z_2017-05-22T07:00:02.951Z_1","intervals":["2017-05-22T00:00:00.000Z/2017-05-23T00:00:00.000Z"],"columns":{},"size":5448881,"numRows":41577,"aggregators":null,"queryGranularity":null},{"id":"dcap.sermon.threshold.kafka_2017-05-22T00:00:00.000Z_2017-05-23T00:00:00.000Z_2017-05-22T07:00:02.951Z_2","intervals":["2017-05-22T00:00:00.000Z/2017-05-23T00:00:00.000Z"],"columns":{},"size":5454452,"numRows":41571,"aggregators":null,"queryGranularity":null},{"id":"dcap.sermon.threshold.kafka_2017-05-22T00:00:00.000Z_2017-05-23T00:00:00.000Z_2017-05-22T07:00:02.951Z_3","intervals":["2017-05-22T00:00:00.000Z/2017-05-23T00:00:00.000Z"],"columns":{},"size":5456267,"numRows":41569,"aggregators":null,"queryGranularity":null}] }
I have tried different forms of structure for "segments" key, results in same error.
example :
"segments": [{"id":"dcap.envoy.diskmounts.kafka_2017-05-21T06:00:00.000Z_2017-05-21T07:00:00.000Z_2017-05-21T06:02:43.482Z"},{"id":"dcap.envoy.diskmounts.kafka_2017-05-21T06:00:00.000Z_2017-05-21T07:00:00.000Z_2017-05-21T06:02:43.482Z_1"},{"id":"dcap.envoy.diskmounts.kafka_2017-05-21T06:00:00.000Z_2017-05-21T07:00:00.000Z_2017-05-21T06:02:43.482Z_2"},{"id":"dcap.envoy.diskmounts.kafka_2017-05-21T06:00:00.000Z_2017-05-21T07:00:00.000Z_2017-05-21T06:02:43.482Z_3"}]
What is right structure for segment-merge tasks.
the format i used for segments is
"segments":[
{
"dataSource": "wikiticker88",
"interval": "2015-09-12T02:00:00.000Z/2015-09-12T03:00:00.000Z",
"version": "2018-01-16T07:23:16.425Z",
"loadSpec": {
"type": "local",
"path": "/home/linux/druid-0.11.0/var/druid/segments/wikiticker88/2015-09-12T02:00:00.000Z_2015-09-12T03:00:00.000Z/2018-01-16T07:23:16.425Z/0/index.zip"
},
"dimensions": "channel,cityName,comment,countryIsoCode,countryName,isAnonymous,isMinor,isNew,isRobot,isUnpatrolled,metroCode,namespace,page,regionIsoCode,regionName,user",
"metrics": "count,added,deleted,delta,user_unique",
"shardSpec": {
"type": "none"
},
"binaryVersion": 9,
"size": 198267,
"identifier": "wikiticker88_2015-09-12T02:00:00.000Z_2015-09-12T03:00:00.000Z_2018-01-16T07:23:16.425Z"
},
]
use this to get your metadata of segments
/druid/coordinator/v1/metadata/datasources/{dataSourceName}/segments?full

Error loading file stored in Google Cloud Storage to Big Query

I have been trying to create a job to load a compressed json file from Google Cloud Storage to a Google BigQuery table. I have read/write access in both Google Cloud Storage and Google BigQuery. Also, the uploaded file belongs in the same project as the BigQuery one.
The problem happens when I access to the resource behind this url https://www.googleapis.com/upload/bigquery/v2/projects/NUMERIC_ID/jobs by means of a POST request. The content of the request to the abovementioned resource can be found as follows:
{
"kind" : "bigquery#job",
"projectId" : NUMERIC_ID,
"configuration": {
"load": {
"sourceUris": ["gs://bucket_name/document.json.gz"],
"schema": {
"fields": [
{
"name": "id",
"type": "INTEGER"
},
{
"name": "date",
"type": "TIMESTAMP"
},
{
"name": "user_agent",
"type": "STRING"
},
{
"name": "queried_key",
"type": "STRING"
},
{
"name": "user_country",
"type": "STRING"
},
{
"name": "duration",
"type": "INTEGER"
},
{
"name": "target",
"type": "STRING"
}
]
},
"destinationTable": {
"datasetId": "DATASET_NAME",
"projectId": NUMERIC_ID,
"tableId": "TABLE_ID"
}
}
}
}
However, the error doesn't make any sense and can also be found below:
{
"error": {
"errors": [
{
"domain": "global",
"reason": "invalid",
"message": "Job configuration must contain exactly one job-specific configuration object (e.g., query, load, extract, spreadsheetExtract), but there were 0: "
}
],
"code": 400,
"message": "Job configuration must contain exactly one job-specific configuration object (e.g., query, load, extract, spreadsheetExtract), but there were 0: "
}
}
I know the problem doesn't lie either in the project id or in the access token placed in the authentication header, because I have successfully created an empty table before. Also I specify the content-type header to be application/json which I don't think is the issue here, because the body content should be json encoded.
Thanks in advance
Your HTTP request is malformed -- BigQuery doesn't recognize this as a load job at all.
You need to look into the POST request, and check the body you send.
You need to ensure that all the above (which seams correct) is the body of the POST call. The above Json should be on a single line, and if you manually creating the multipart message, make sure there is an extra newline between the headers and body of each MIME type.
If you are using some sort of library, make sure the body is not expected in some other form, like resource, content, or body. I've seen libraries that use these differently.
Try out the BigQuery API explorer: https://developers.google.com/bigquery/docs/reference/v2/jobs/insert and ensure your request body matches the one made by the API.