Can a JSON schema validator be killed with this schema? - jsonschema

I tried this out with some JSON schema validators and some fail, but the problem is to figure out how much memory a validator uses that causes it to choke and be killed.
It turns out that we can implement finite state machines in JSON schema. To do so, the FSM nodes are object schemas and the FSM edges are a set of JSON Pointers wrapped in an anyOf. The whole thing is rather simple to do, but being able to do this has some consequences: what if we create an FSM that requires 2^N time or memory (depth first search or breadth first search, respectively) given a JSON schema with N definitions and some input to validate?
So let's create a JSON Schema with N definitions to implement a non-deterministic finite state machine (NFA) over an alphabet of two symbols a and b. All we need to do is to encode the regex
(a{N}|a(a|b+){0,N-1}b)*x, where x denotes the end. In the worst case, the NFA for this regex takes 2^N time to match text or 2^N memory (e.g. when converted to a deterministic finite state machine). Now notice that the word abbx can be represented by a JSON pointer a/b/b/x which in JSON is equivalent to {"a":{"b":{"b":{"x":true}}}}.
To encode this NFA as a schema, we first add a definition for state "0":
{
"$schema": "http://json-schema.org/draft-04/schema#",
"$ref": "#/definitions/0",
"definitions": {
"0": {
"type": "object",
"properties": {
"a": { "$ref": "#/definitions/1" },
"x": { "type": "boolean" }
},
"additionalProperties": false
},
Then we add N-1 definitions for each state <DEF> to the schema where <DEF> is enumerated "1", "2", "3", ... "N-1":
"<DEF>": {
"type": "object",
"properties": {
"a": { "$ref": "#/definitions/<DEF>+1" },
"b": {
"anyOf": [
{ "$ref": "#/definitions/0" },
{ "$ref": "#/definitions/<DEF>" }
]
}
},
"additionalProperties": false
},
where "<DEF>+1" wraps back to "0" when <DEF> is equal to N-1.
This "NFA" on a two-letter alphabet has N states, only one initial and one
final state. The equivalent minimal DFA has 2^N (2 to the power N) states.
This means that in the worst case, a validator that uses this schema either must be taking 2^N time or use 2^N memory "cells" to validate the input.
I don't see where this logic can go wrong, unless validators take shortcuts to approximate the validity checking.
I found this here.

I think in principle you are right. I am not 100% sure about the schema construction you've described, but theoretically it should be possible to construct a schema which required ^N time or space, exactly for the reasons you describe.
Practically most schema processors will probably just try to recursively validate anyOf. So, that would be exponential time.

Related

Parallel synchronous Service calls in Mule

I have a request something like
{
"wrapper": [
{
"key1": "A",
"key2": "B"
},
{
"key1": "C",
"key2": "D"
},
{
"key1": "E",
"key2": "F"
}
]
}
I want to do Synchronous parallel call to a single API(endpoint) in mule, I cannot use scatter gather as
1. There is single Endpoint 2. the elements(list) inside wrapper can be of random size, as in above sample there are 3.
VM component is also ruled out, as there also calls would be sequential
I'm able to process the data sequentially using forEach component, but can this be done in parallel ?
If I could also collect data from each call that would be great.

Spinnaker: set stage param to piepline param if set otherwise to a pipeline expression using trigger params

Problem Summary:
Set an envvar for a stage to an optional pipeline param if it is not null (for manual execution case)
Set an envvar for a stage to a pipeline expression value using trigger params if pipeline param is null (for triggered execution from jenkins)
Details:
I have an optional pipeline param APK_URL. If it is set then I want to use it as set as an envVar APK_URL for the stage. If it is not set then I want it to use a url made using trigger parameters using a pipeline expression. I tried the following:
{
...
"parameterConfig": [
{
"description": "The url or url-key for apk to deploy to device",
"label": "APK_URL",
"name": "APK_URL",
"required": false
}
],
"stages": [
{
...
"containers": [
{
"args": [],
"command": [
"./scripts/entrypoint-deploy.sh"
],
"envVars": [
{
"name": "APK_URL",
"value": "(${parameters[\"APK_URL\"]} != null) ? ${parameters[\"APK_URL\"]} : https://my.com/application/${trigger['buildInfo']['scm']['branch']/${trigger['buildInfo']['artifacts'][0]['displayPath']}"
}
],
...
}
]
...
}
When I run the pipeline manually (no trigger) with APK_URL param specified it gives an error:
Failed to evaluate [value] EL1007E: Property or field 'branch' cannot be found on null
It seems that the java ternary operator is evaluating the url made from trigger parameters even though APK_URL is not null.
Can someone tell me how I set the stage parameter to pipeline parameter for manual execution and set stage parameter to pipeline expression for triggered execution. TIA.
Actually Spinnaker does not provide such functionality directly.
But you can do it in several different ways:
Script Stage
Jenskins Stage
Run Job Stage
And implement what ever logic you need there.
Another option is to add several condition stages which will be skipped depened on your input.

How to validate a JSON object against a JSON schema based on object's type described by a field?

I have a number of objects (messages) that I need to validate against a JSON schema (draft-04). Each objects is guaranteed to have a "type" field, which describes its type, but every type have a completely different set of other fields, so each type of object needs a unique schema.
I see several possibilities, none of which are particularly appealing, but I hope I'm missing something.
Possibility 1: Use oneOf for each message type. I guess this would work, but the problem is very long validation errors in case something goes wrong: validators tend to report every schema that failed, which include ALL elements in "oneOf" array.
{
"oneOf":
[
{
"type": "object",
"properties":
{
"t":
{
"type": "string",
"enum":
[
"message_type_1"
]
}
}
},
{
"type": "object",
"properties":
{
"t":
{
"type": "string",
"enum":
[
"message_type_2"
]
},
"some_other_property":
{
"type": "integer"
}
},
"required":
[
"some_other_property"
]
}
]
}
Possibility 2: Nested "if", "then", "else" triads. I haven't tried it, but I guess that maybe errors would be better in this case. However, it's very cumbersome to write, as nested if's pile up.
Possibility 3: A separate scheme for every possible value of "t". This is the simplest solution, however I dislike it, because it precludes me from using common elements in schemas (via references).
So, are these my only options, or can I do better?
Since "type" is a JSON Schema keyword, I'll follow your lead and use "t" as the type-discrimination field, for clarity.
There's no particular keyword to accomplish or indicate this (however, see https://github.com/json-schema-org/json-schema-spec/issues/31 for discussion). This is because, for the purposes of validation, everything you need to do is already possible. Errors are secondary to validation in JSON Schema. All we're trying to do is limit how many errors we see, since it's obvious there's a point where errors are no longer productive.
Normally when you're validating a message, you know its type first, then you read the rest of the message. For example in HTTP, if you're reading a line that starts with Date: and the next character isn't a number or letter, you can emit an error right away (e.g. "Unexpected tilde, expected a month name").
However in JSON, this isn't true, since properties are unordered, and you might not encounter the "t" until the very end, if at all. "if/then" can help with this.
But first, begin by by factoring out the most important constraints, and moving them to the top.
First, use "type": "object" and "required":["t"] in your top level schema, since that's true in all cases.
Second, use "properties" and "enum" to enumerate all its valid values. This way if "t" really is entered wrong, it will be an error out of your top-level schema, instead of a subschema.
If all of these constraints pass, but the document is still invalid, then it's easier to conclude the problem must be with the other contents of the message, and not the "t" property itself.
Now in each sub-schema, use "const" to match the subschema to the type-name.
We get a schema like this:
{
"type": "object",
"required": ["t"],
"properties": { "t": { "enum": ["message_type_1", "message_type_2"] } }
"oneOf": [
{
"type": "object",
"properties": {
"t": { "const": "message_type_1" }
}
},
{
"type": "object",
"properties":
"t": { "const": "message_type_2" },
"some_other_property": {
"type": "integer"
}
},
"required": [ "some_other_property" ]
}
]
}
Now, split out each type into a different schema file. Make it human-accessible by naming the file after the "t". This way, an application can read a stream of objects and pick the schema to validate each object against.
{
"type": "object",
"required": ["t"],
"properties": { "t": { "enum": ["message_type_1", "message_type_2"] } }
"oneOf": [
{"$ref": "message_type_1.json"},
{"$ref": "message_type_2.json"}
]
}
Theoretically, a validator now has enough information to produce much cleaner errors (though I'm not aware of any validators that can do this).
So, if this doesn't produce clean enough error reporting for you, you have two options:
First, you can implement part of the validation process yourself. As described above, use a streaming JSON parser like Oboe.js to read each object in a stream, parse the object and read the "t" property, then apply the appropriate schema.
Or second, if you really want to do this purely in JSON Schema, use "if/then" statements inside "allOf":
{
"type": "object",
"required": ["t"],
"properties": { "t": { "enum": ["message_type_1", "message_type_2"] } }
"allOf": [
{"if":{"properties":{"t":{"const":"message_type_1"}}}, "then":{"$ref": "message_type_1.json"}},
{"if":{"properties":{"t":{"const":"message_type_2"}}}, "then":{"$ref": "message_type_2.json"}}
]
}
This should produce errors to the effect of:
t not one of "message_type_1" or "message_type_2"
or
(because t="message_type_2") some_other_property not an integer
and not both.

Elasticsearch / Lucene Misspelled Whitespace

How can I make Elasticsearch correct queries in which keyword should contain whitespace but instead typed adjacent. E.g.
"thisisaquery" -> "this is a query"
my current settings are:
"settings": {
"index": {
"analysis": {
"analyzer": {
"autocomplete": {
"tokenizer": "whitespace",
"filter": [
"lowercase", "engram"
]
}
},
"filter": {
"engram": {
"type": "edgeNGram",
"min_gram": 3,
"max_gram": 10
}
}
}
}
}
There isn't an out of the box tokenizer/token filter to explicitly handle what you're asking for. The closest would be the compound word token filter which requires manually providing a dictionary file which in your case would may require the full english dictionary to work correctly. Even with that it would likely have issues with words that are stems of other words, abbreviations, etc without a lot of additional logic. It may be good enough though depending on your exact requirements.
This ruby project claims to do this. You might try it if you're using ruby, or just look at their code and copy their analyzer settings for it :)
https://github.com/ankane/searchkick

iPhone NSArray from Dictionary of Dictionary values

I have a Dictionary of Dictionaries which is being returned to me in JSON format
{
"neverstart": {
"color": 0,
"count": 0,
"uid": 32387,
"id": 73129,
"name": "neverstart"
},
"dev": {
"color": 0,
"count": 1,
"uid": 32387,
"id": 72778,
"name": "dev"
},
"iphone": {
"color": 0,
"count": 1,
"uid": 32387,
"id": 72777,
"name": "iphone"
}
}
I also have an NSArray containing the id's required for an item. e.g. [72777, 73129]
What I need to be able to do is get a dictionary of id => name for the items in the array. I know this is possible by iterating through the array, and then looping through all the values in the Dictionary and checking values, but it seems like it there should be a less longwinded method of doing this.
Excuse my ignorance as I am just trying to find my way around the iPhone SDK and learning Objective C and Cocoa.
First off, since you're using JSON, I'm hoping you've already found BSJSONAdditions and/or json-framework, both of them open-source projects for parsing JSON into native Cocoa structures for you. This blog post gives an idea of how to use the latter to get an NSDictionary from a JSON string.
The problem then becomes one of finding the matching values in the dictionary. I'm not aware of a single method that does what you're looking for — the Cocoa frameworks are quite powerful, but are designed to be very generic and flexible. However, it shouldn't be too hard to put together in not too many lines... (Since you're programming on iPhone, I'll use fast enumeration to make the code cleaner.)
NSDictionary* jsonDictionary = ...
NSDictionary* innerDictionary;
NSArray* requiredIDs = ...
NSMutableDictionary* matches = [NSMutableDictionary dictionary];
for (id key in jsonDictionary) {
innerDictionary = [jsonDictionary objectForKey:key];
if ([requiredIDs containsObject:[innerDictionary objectForKey:#"id"]])
[matches setObject:[innerDictionary objectForKey:#"name"]
forKey:[innerDictionary objectForKey:#"id"]];
}
This code may have typos, but the concepts should be sound. Also note that the call to [NSMutableDictionary dictionary] will return an autoreleased object.
Have you tried this NSDictionary method:
+ (id)dictionaryWithObjects:(NSArray *)objects forKeys:(NSArray *)keys
Drew's got the answer... I found that the GCC manual for the NSDictionary was helpful in a bare-bones way the other day when I had a similar question.
http://www.gnustep.org/resources/documentation/Developer/Base/Reference/NSDictionary.html