Rapidjson giving validation success even when required field is missing - jsonschema

I was expecting rapidjson to give a validation error as my json file doesn't include one of the 'required' field mentioned in the schema. However, due to some reasons this doesn't happen.
dbconf.json(json file)
{
"MAX_CONNECTION_PER_HOST":20,
"QUEUE_IO_SIZE":10485,
"Garbage":50000
}
Here's the test code along with the schema.
#include "rapidjson/document.h"
#include "rapidjson/error/en.h"
#include "rapidjson/schema.h"
#include <rapidjson/stringbuffer.h>
#include<iostream>
#include<string>
#include<fstream>
using namespace std;
const char g_plJsonSchema[]="{\
\"$schema\": \"http://json-schema.org/draft-04/schema#\",\
\"title\": \"Schema\",\
\"description\": \"JSON schema for validating Json file\",\
\"type\": \"object\",\
\"properties\": {\
\"MAX_CONNECTION_PER_HOST\": { \"type\": \"number\" },\
\"QUEUE_IO_SIZE\": { \"type\": \"number\" },\
\"REQUEST_LOW_WATER_MARK\": { \"type\": \"number\" },\
\"required\": [\
\"MAX_CONNECTION_PER_HOST\",\
\"QUEUE_IO_SIZE\",\
\"REQUEST_LOW_WATER_MARK\"\
]\
}\
}";
int main()
{
rapidjson::Document l_peerAddSchemaDoc, l_peerAddDataDoc;
l_peerAddSchemaDoc.Parse(g_plJsonSchema);
if(l_peerAddSchemaDoc.HasParseError())
{
printf("JSON schema file is not a valid JSON file\n");
return -1;
}
std::ifstream l_confDataIStream("dbconf.json");
std::string l_confDataIStreamStr((std::istreambuf_iterator<char>(l_confDataIStream)),(std::istreambuf_iterator<char>()));
l_peerAddDataDoc.Parse(l_confDataIStreamStr.c_str());
rapidjson::SchemaDocument l_schemaDocument(l_peerAddSchemaDoc);
rapidjson::SchemaValidator l_SchemaValidator(l_schemaDocument);
if(!l_peerAddDataDoc.Accept(l_SchemaValidator))
{
rapidjson::StringBuffer sb;
l_SchemaValidator.GetInvalidSchemaPointer().StringifyUriFragment(sb);
printf("Invalid schema: %s\n", sb.GetString());
printf("Invalid keyword: %s\n", l_SchemaValidator.GetInvalidSchemaKeyword());
sb.Clear();
l_SchemaValidator.GetInvalidDocumentPointer().StringifyUriFragment(sb);
printf("Invalid document: %s\n", sb.GetString());
}
else
printf("\nJson file validated with the given schema successfully\n");
return 0;
}
I get the following output
Json file validated with the given schema successfully

Your issue here is required should be at the root level, and not inside properties. In fact, you currently have an invalid schema, as all values of keys inside properties should be objects only.
{
"$schema": "json-schema.org/draft-04/schema#",
"title": "Schema",
"description": "JSON schema for validating Json file",
"type": "object",
"properties": {
"MAX_CONNECTION_PER_HOST": {
"type": "number"
},
"QUEUE_IO_SIZE": {
"type": "number"
},
"REQUEST_LOW_WATER_MARK": {
"type": "number"
}
},
"required": [
"MAX_CONNECTION_PER_HOST",
"QUEUE_IO_SIZE",
"REQUEST_LOW_WATER_MARK"
]
}
I validated the schema against the instance using https://www.jsonschemavalidator.net for testing.

Related

JSON Schema v7: formatMinimum & formatMaximum validate everything

I am using ajv json schema library (v7) and trying to validate a date based on some value. It looks pretty straightforward with using formatMinimum/formatMaximum but it seems that every date is validated when using these keywords
Here's my schema
"some-date": {
"type": "object",
"properties": {
"data": {
"type": "object",
"properties": {
"value": {
"type": "string",
"format": "date-time",
"formatMinimum": "2021-03-10T14:25:00.000Z"
}
}
}
}
}
Here's the json:
{
"some-date": {
"data": {
"value": "2011-03-10T14:25:00.000Z"
}
}
}
Here's how I am validating:
const ajv = new Ajv({allErrors: true})
require('ajv-formats')(ajv)
require('ajv-errors')(ajv)
require('ajv-keywords')(ajv)
const validate = ajv.validate(mySchema)
const isValid = validate(myJSON)
I've tried it on JSONSchemalint and it validates the above json with the given schema. Also, I have tried with several dates and it validates everything.
Please let me know if I am missing something.
Thanks
I'm not sure where you're getting formatMinimum and formatMaximum from, but they are not standard keywords in the JSON Schema specification, under any version. Are they documented as supported keywords in the implementation that you are using?

How to correctly validate array of objects using JustinRainbow/JsonSchema

I have code that correctly validates an article returned from an endpoint that returns single articles. I'm pretty sure it's working correctly as it gives a validation error when I deliberately don't include a required field in the article.
I also have this code that tries to validate an array of articles returned from an endpoint that returns an array of articles. However, I'm pretty sure that isn't working correctly, as it always says the data is valid, even when I deliberately don't include a required field in the articles.
How do I correctly validate an array of data against the schema?
The full test code is below as a standalone runnable test. Both of the tests should fail, however only one of them does.
<?php
declare(strict_types=1);
error_reporting(E_ALL);
require_once __DIR__ . '/vendor/autoload.php';
// Return the definition of the schema, either as an array
// or a PHP object
function getSchema($asArray = false)
{
$schemaJson = <<< 'JSON'
{
"swagger": "2.0",
"info": {
"termsOfService": "http://swagger.io/terms/",
"version": "1.0.0",
"title": "Example api"
},
"paths": {
"/articles": {
"get": {
"tags": [
"article"
],
"summary": "Find all articles",
"description": "Returns a list of articles",
"operationId": "getArticleById",
"produces": [
"application/json"
],
"responses": {
"200": {
"description": "successful operation",
"schema": {
"type": "array",
"items": {
"$ref": "#/definitions/Article"
}
}
}
},
"parameters": [
]
}
},
"/articles/{articleId}": {
"get": {
"tags": [
"article"
],
"summary": "Find article by ID",
"description": "Returns a single article",
"operationId": "getArticleById",
"produces": [
"application/json"
],
"parameters": [
{
"name": "articleId",
"in": "path",
"description": "ID of article to return",
"required": true,
"type": "integer",
"format": "int64"
}
],
"responses": {
"200": {
"description": "successful operation",
"schema": {
"$ref": "#/definitions/Article"
}
}
}
}
}
},
"definitions": {
"Article": {
"type": "object",
"required": [
"id",
"title"
],
"properties": {
"id": {
"type": "integer",
"format": "int64"
},
"title": {
"type": "string",
"description": "The title for the link of the article"
}
}
}
},
"schemes": [
"http"
],
"host": "example.com",
"basePath": "/",
"tags": [],
"securityDefinitions": {
},
"security": [
{
"ApiKeyAuth": []
}
]
}
JSON;
return json_decode($schemaJson, $asArray);
}
// Extract the schema of the 200 response of an api endpoint.
function getSchemaForPath($path)
{
$swaggerData = getSchema(true);
if (isset($swaggerData["paths"][$path]['get']["responses"][200]['schema']) !== true) {
echo "response not defined";
exit(-1);
}
return $swaggerData["paths"][$path]['get']["responses"][200]['schema'];
}
// JsonSchema needs to know about the ID used for the top-level
// schema apparently.
function aliasSchema($prefix, $schemaForPath)
{
$aliasedSchema = [];
foreach ($schemaForPath as $key => $value) {
if ($key === '$ref') {
$aliasedSchema[$key] = $prefix . $value;
}
else if (is_array($value) === true) {
$aliasedSchema[$key] = aliasSchema($prefix, $value);
}
else {
$aliasedSchema[$key] = $value;
}
}
return $aliasedSchema;
}
// Test the data matches the schema.
function testDataMatches($endpointData, $schemaForPath)
{
// Setup the top level schema and get a validator from it.
$schemaStorage = new \JsonSchema\SchemaStorage();
$id = 'file://example';
$swaggerClass = getSchema(false);
$schemaStorage->addSchema($id, $swaggerClass);
$factory = new \JsonSchema\Constraints\Factory($schemaStorage);
$jsonValidator = new \JsonSchema\Validator($factory);
// Alias the schema for the endpoint, so JsonSchema can work with it.
$schemaForPath = aliasSchema($id, $schemaForPath);
// Validate the things
$jsonValidator->check($endpointData, (object)$schemaForPath);
// Process the result
if ($jsonValidator->isValid()) {
echo "The supplied JSON validates against the schema definition: " . \json_encode($schemaForPath) . " \n";
return;
}
$messages = [];
$messages[] = "End points does not validate. Violations:\n";
foreach ($jsonValidator->getErrors() as $error) {
$messages[] = sprintf("[%s] %s\n", $error['property'], $error['message']);
}
$messages[] = "Data: " . \json_encode($endpointData, JSON_PRETTY_PRINT);
echo implode("\n", $messages);
echo "\n";
}
// We have two data sets to test. A list of articles.
$articleListJson = <<< JSON
[
{
"id": 19874
},
{
"id": 19873
}
]
JSON;
$articleListData = json_decode($articleListJson);
// A single article
$articleJson = <<< JSON
{
"id": 19874
}
JSON;
$articleData = json_decode($articleJson);
// This passes, when it shouldn't as none of the articles have a title
testDataMatches($articleListData, getSchemaForPath("/articles"));
// This fails correctly, as it is correct for it to fail to validate, as the article doesn't have a title
testDataMatches($articleData, getSchemaForPath("/articles/{articleId}"));
The minimal composer.json is:
{
"require": {
"justinrainbow/json-schema": "^5.2"
}
}
Edit-2: 22nd May
I have been digging further turns out that the issue is because of your top level conversion to object
$jsonValidator->check($endpointData, (object)$schemaForPath);
You shouldn't have just done that and it would have all worked
$jsonValidator->check($endpointData, $schemaForPath);
So it doesn't seem to be a bug it was just a wrong usage. If you just remove (object) and run the code
$ php test.php
End points does not validate. Violations:
[[0].title] The property title is required
[[1].title] The property title is required
Data: [
{
"id": 19874
},
{
"id": 19873
}
]
End points does not validate. Violations:
[title] The property title is required
Data: {
"id": 19874
}
Edit-1
To fix the original code you would need to update the CollectionConstraints.php
/**
* Validates the items
*
* #param array $value
* #param \stdClass $schema
* #param JsonPointer|null $path
* #param string $i
*/
protected function validateItems(&$value, $schema = null, JsonPointer $path = null, $i = null)
{
if (is_array($schema->items) && array_key_exists('$ref', $schema->items)) {
$schema->items = $this->factory->getSchemaStorage()->resolveRefSchema((object)$schema->items);
var_dump($schema->items);
};
if (is_object($schema->items)) {
This will handle your use case for sure but if you don't prefer changing code from the dependency then use my original answer
Original Answer
The library has a bug/limitation that in src/JsonSchema/Constraints/CollectionConstraint.php they don't resolve a $ref variable as such. If I updated your code like below
// Alias the schema for the endpoint, so JsonSchema can work with it.
$schemaForPath = aliasSchema($id, $schemaForPath);
if (array_key_exists('items', $schemaForPath))
{
$schemaForPath['items'] = $factory->getSchemaStorage()->resolveRefSchema((object)$schemaForPath['items']);
}
// Validate the things
$jsonValidator->check($endpointData, (object)$schemaForPath);
and run it again, I get the exceptions needed
$ php test2.php
End points does not validate. Violations:
[[0].title] The property title is required
[[1].title] The property title is required
Data: [
{
"id": 19874
},
{
"id": 19873
}
]
End points does not validate. Violations:
[title] The property title is required
Data: {
"id": 19874
}
You either need to fix the CollectionConstraint.php or open an issue with developer of the repo. Or else manually replace your $ref in the whole schema, like had shown above. My code will resolve the issue specific to your schema, but fixing any other schema should not be a big issue
EDIT: Important thing here is that provided schema document is instance of Swagger Schema, which employs extended subset of JSON Schema to define some cases of request and response. Swagger 2.0 Schema itself can be validated by its JSON Schema, but it can not act as a JSON Schema for API Response structure directly.
In case entity schema is compatible with standard JSON Schema you can perform validation with general purpose validator, but you have to provide all relevant definitions, it can be easy when you have absolute references, but more complicated for local (relative) references that start with #/. IIRC they must be defined in the local schema.
The problem here is that you are trying to use schema references detached from resolution scope. I've added id to make references absolute, therefore not requiring being in scope.
"$ref": "http://example.com/my-schema#/definitions/Article"
The code below works well.
<?php
require_once __DIR__ . '/vendor/autoload.php';
$swaggerSchemaData = json_decode(<<<'JSON'
{
"id": "http://example.com/my-schema",
"swagger": "2.0",
"info": {
"termsOfService": "http://swagger.io/terms/",
"version": "1.0.0",
"title": "Example api"
},
"paths": {
"/articles": {
"get": {
"tags": [
"article"
],
"summary": "Find all articles",
"description": "Returns a list of articles",
"operationId": "getArticleById",
"produces": [
"application/json"
],
"responses": {
"200": {
"description": "successful operation",
"schema": {
"type": "array",
"items": {
"$ref": "http://example.com/my-schema#/definitions/Article"
}
}
}
},
"parameters": [
]
}
},
"/articles/{articleId}": {
"get": {
"tags": [
"article"
],
"summary": "Find article by ID",
"description": "Returns a single article",
"operationId": "getArticleById",
"produces": [
"application/json"
],
"parameters": [
{
"name": "articleId",
"in": "path",
"description": "ID of article to return",
"required": true,
"type": "integer",
"format": "int64"
}
],
"responses": {
"200": {
"description": "successful operation",
"schema": {
"$ref": "http://example.com/my-schema#/definitions/Article"
}
}
}
}
}
},
"definitions": {
"Article": {
"type": "object",
"required": [
"id",
"title"
],
"properties": {
"id": {
"type": "integer",
"format": "int64"
},
"title": {
"type": "string",
"description": "The title for the link of the article"
}
}
}
},
"schemes": [
"http"
],
"host": "example.com",
"basePath": "/",
"tags": [],
"securityDefinitions": {
},
"security": [
{
"ApiKeyAuth": []
}
]
}
JSON
);
$schemaStorage = new \JsonSchema\SchemaStorage();
$schemaStorage->addSchema('http://example.com/my-schema', $swaggerSchemaData);
$factory = new \JsonSchema\Constraints\Factory($schemaStorage);
$validator = new \JsonSchema\Validator($factory);
$schemaData = $swaggerSchemaData->paths->{"/articles"}->get->responses->{"200"}->schema;
$data = json_decode('[{"id":1},{"id":2,"title":"Title2"}]');
$validator->validate($data, $schemaData);
var_dump($validator->isValid()); // bool(false)
$data = json_decode('[{"id":1,"title":"Title1"},{"id":2,"title":"Title2"}]');
$validator->validate($data, $schemaData);
var_dump($validator->isValid()); // bool(true)
I'm not sure I fully understand your code here, but I have an idea based on some assumptions.
Assuming $typeForEndPointis the schema you're using for validation, your item key word needs to be an object rather than an array.
The items key word can be an array or an object. If it's an object, that schema is applicable to every item in the array. If it is an array, each item in that array is applicable to the item in the same position as the array being validated.
This means you're only validating the first item in the array.
If "items" is a schema, validation succeeds if all elements in the
array successfully validate against that schema.
If "items" is an array of schemas, validation succeeds if each element
of the instance validates against the schema at the same position, if
any.
https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-validation-01#section-6.4.1
jsonValidator don't like mixed of object and array association,
You can use either:
$jsonValidator->check($endpointData, $schemaForPath);
or
$jsonValidator->check($endpointData, json_decode(json_encode($schemaForPath)));

JSON Schema - specify string length based on input property

Is it possible with JSON Schema to specify a string length that depends on the value of a property in the item that is being validated?
For example, I have a document with a "foo" property with value 3. I would like to ensure that the "bar" property is a string of at least size 3
Sample JSON
{
"foo": 3,
"bar": "111"
}
JSON Schema
{
"properties" : {
"foo": {
"type": "integer",
"minimum": 1
}
"bar": {
"type": "string",
"minLength": "{$foo}"
}
}
}
There is a v5 proposal for a $data keyword that would "allow schemas to use values from the data, specified using JSON Pointers or Relative JSON Pointers".
Using your example:
{
"properties" : {
"foo": {
"type": "integer",
"minimum": 1
}
"bar": {
"type": "string",
"minLength": { "$data": "1/foo" }
}
}
}
Support for the $data keyword will obviously depend upon the validator you are using. Some validators do support the v5 proposals.

Avro schema specification won't take same namespace

I defined a schema that goes by :
{ "namespace":"configschemas.avro",
"type":"record",
"name":"pathObject",
"fields":
[
{ "name":"pathString",
"type" : "string",
"default" : "null"
}
,
{ "name":"needsConversion",
"type" : "boolean" ,
"default" : false
}
]
}
The second schema won't compile after compiling the above schema.
{ "namespace" : "configschemas.avro",
"type" : "array" ,
"items" : configschemas.avro.pathObject
}
All the schemas are under the same directory and the namespaces are same as well. Can't get the flaw.
Error while compiling second schema:
Input files to compile:
logPaths.avsc
Exception in thread "main" org.apache.avro.SchemaParseException: org.codehaus.jackson.JsonParseException: Unexpected character ('p' (code 112)): expected a vali
d value (number, String, array, object, 'true', 'false' or 'null')
at [Source: logPaths.avsc; line: 3, column: 13]
at org.apache.avro.Schema$Parser.parse(Schema.java:967)
at org.apache.avro.Schema$Parser.parse(Schema.java:932)
at org.apache.avro.tool.SpecificCompilerTool.run(SpecificCompilerTool.java:73)
at org.apache.avro.tool.Main.run(Main.java:84)
at org.apache.avro.tool.Main.main(Main.java:73)
Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('p' (code 112)): expected a valid value (number, String, array, object, 'true', 'false
' or 'null')
at [Source: logPaths.avsc; line: 3, column: 13]
at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:442)
at org.codehaus.jackson.impl.Utf8StreamParser._handleUnexpectedValue(Utf8StreamParser.java:2090)
at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:555)
at org.codehaus.jackson.map.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:192)
at org.codehaus.jackson.map.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:58)
at org.codehaus.jackson.map.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2704)
at org.codehaus.jackson.map.ObjectMapper.readTree(ObjectMapper.java:1344)
at org.apache.avro.Schema$Parser.parse(Schema.java:965)
... 4 more**
I'm uncertain how you're invoking the schema parser, but putting both schemas in the same schema file should work, as this demonstrates
#Grapes([
#Grab(group='org.apache.avro', module='avro', version='1.7.7')
])
import org.apache.avro.Schema;
String schema = '''
{
"namespace":"configschemas.avro",
"type":"record",
"name":"pathObject",
"fields":[
{
"name":"pathString",
"type":"string",
"default":"null"
},
{
"name":"needsConversion",
"type":"boolean",
"default":false
}
]
}
{
"namespace":"configschemas.avro",
"type":"array",
"items":configschemas.avro.pathObject
}'''
try {
System.out.println(new Schema.Parser().parse(schema));
} catch (Throwable t) {
t.printStackTrace();
}
So if you either load all schemas in a namespace together, it should work (you can keep them in separate files, just load the text from the files together).

How to specify an analyzer while creating an index in ElasticSearch

I'd like to specify an analyzer, name it, and use that name in a mapping while creating an index. I'm lost, my ES instance always returns me an error message.
This is, roughly, what I'd like to do:
"settings": {
"mappings": {
"alfedoc": {
"properties": {
"id": { "type": "string" },
"alfefield": { "type": "string", "analyzer": "alfeanalyzer" }
}
}
},
"analysis": {
"analyzer": {
"alfeanalyzer": {
"type": "pattern",
"pattern":"\\s+"
}
}
}
}
But this does not seem to work; the ES instance always returns me an error like
MapperParsingException[mapping [alfedoc]]; nested: MapperParsingException[Analyzer [alfeanalyzer] not found for field [alfefield]];
I tried putting the "analysis" branch of the dictionary at several places (inside the mapping etc.) but to no avail. I guess a working complete example (which I couldn't find up to now) would help me along as well. Probably I'm missing something rather basic.
"analysis" goes in the "settings" block, which goes either before or after the "mappings" block when creating an index.
"settings": {
"analysis": {
"analyzer": {
"alfeanalyzer": {
"type": "pattern",
"pattern": "\\s+"
}
}
}
},
"mappings": {
"alfedoc": { ... }
}
Here's a good complete, example: Example 1