JsonSchema number validation with multiple ranges - jsonschema

Is there a supported way (without using the anyOf keyword) to specify multiple ranges for a number in JsonSchema?
E.g., a value can either be in the range of 0-10 or 50-100 (the range of 10 < x < 50 is considered invalid).
The anyOf keyword can be used as follows:
{
"anyOf": [
{
"type": "number",
"minimum": 0,
"maximum": 10
},
{
"type": "number",
"minimum": 5,
"maximum": 100
}
]
}
Additionally, if the only allowed values were whole integers, I could use an enum and actually hand-specify each allowed number, but obviously that's less ideal than specifying ranges.
So, just wondering if there is a way to accomplish this with something like a "restrictions" keyword:
//NOTE: the below is not actually supported (I don't think), just using it as an example of what I'm interested in
{
"type": "number",
"restrictions": [
{
"minimum": 0,
"maximum": 10
},
{
"minimum": 50,
"maximum": 100
}
]
}
Also, for those wondering why if anyOf is available, it's that I have some custom tooling to maintain and supporting anyOf would be more of a lift than something that is specific to numeric validation.

The scenario you describe is exactly why anyOf exists. So no, if you want to express a logical OR, you need to implement the keyword that implements that. I don't see why adding a new custom keyword would make things any easier.

Related

What is equivalent to multiple types in OpenAPI 3.1? anyOf or oneOf?

I want to change multiple types (supported in the latest drafts of JSON Schema so does OpenAPI v3.1) to anyOf, oneOf but I am a bit confused to which the types would be mapped to. Or can I map to any of the two.
PS. I do have knowledge about anyOf, oneOf, etc. but multiple types behavior is a little ambiguous. (I know the schema is invalid but it is just an example that is more focused towards type conversion)
{
"type": ["null", "object", "integer", "string"],
"properties": {
"prop1": {
"type": "string"
},
"prop2": {
"type": "string"
}
},
"enum": [2, 3, 4, 5],
"const": "sample const entry",
"exclusiveMinimum": 1.22,
"exclusiveMaximum": 50,
"maxLength": 10,
"minLength": 2,
"format": "int32"
}
I am converting it this way.
{
"anyOf": [{
"type": "null"
},
{
"type": "object",
"properties": {
"prop1": {
"type": "string"
},
"prop2": {
"type": "string"
}
}
},
{
"type": "integer",
"enum": [2, 3, 4, 5],
"exclusiveMinimum": 1.22,
"exclusiveMaximum": 50,
"format": "int32"
},
{
"type": "string",
"maxLength": 10,
"minLength": 2,
"const": "sample const entry"
}
]
}
anyOf gives you a closer match for the semantics than oneOf;
The problem (or benefit!) of oneOf is that it will fail if you happen to match 2 different cases.
That is unlikely to be what you want, given the source of your conversion which has those looser semantics.
Imagine converting ["integer","number"], for example; if the input was a 1, you'd match both and fail using oneOf.
First of all, your example is not valid:
The initial schema doesn't match anything, it's an "impossible" schema. The "enum": [2, 3, 4, 5] and "const": "sample const entry" constraints are mutually exclusive, and so are "const": "sample const entry" and "maxLength": 10.
The rewritten schema is not equivalent to the original schema because the enum and const were moved from the root level into subschemas. Yes, this way the schema makes more sense and will sort of work (e.g. it will match the specified numbers - but not strings! because of const vs maxLength contradiction), but it's not the same the original schema.
With regard to oneOf/anyOf:
It depends.
The choice between anyOf and oneOf depends on the context, i.e. whether an instance is can match more than one subschema or exactly one subschema. In other words, whether multiple subschema match is considered OK or an error. Nullable references typically need anyOf rather than oneOf, but other cases vary from schema to schema.
For example,
"type": ["number", "integer"]
corresponds to anyOf because there's an overlap - integer values are also valid "number" values in JSON Schema.
Whereas
"type": ["string", "integer"]
can be represented using either oneOf or anyOf. oneOf is semantically closer since strings and integers are totally different data types with no overlap. But technically anyOf also works, it's just there won't be more than one subschema match in this particular case.
In your example, all base type values are distinct with no overlap, so I would use oneOf, but technically anyOf will also work.

snap selection to week/month/year

I'm using Vega Lite in a crossfiltery app with an external data source.
I'd like the selection on a chart with a temporal scale to "snap" or "round" to a time interval which I'll supply in the spec.
I don't see anything about this in the selections documentation, so I guess I will probably need to generate a Vega spec from the Vega Lite spec, and then patch it. That's not something I've done yet, but I'm eager to learn.
However I am surprised not to find this question on SO, GitHub or Slack. I think I've searched for all combinations of {vega, vega-lite} x {round, snap}.
Closest I can find is an issue related to snapping selections for ordinal scales.
Am I using the wrong terminology?
It turns out the issue regarding snapping ordinal scales is more about snapping to time intervals, and this comment by Jeff Heer covers the topic thoroughly.
The resulting Vega expressions are lengthy, but the process is straightforward:
Get the x-coordinate value.
Run it through a scale inversion (x.invert) to map from pixel domain to data domain.
Perform "snapping" by rounding / truncating values in the data domain.
Re-apply the x scale to get the snapped pixel coordinate.
We can change the update fields of the signals to implement this truncation.
snap to year
Say the selection is named selection1. Vega-Lite will generate a Vega spec including the signal selection1_x (redacted for brevity):
{
"name": "selection1_x",
"value": [],
"on": [
{
"events": {
"source": "scope",
"type": "mousedown",
"filter": // ...
},
"update": "[x(unit), x(unit)]"
},
{
"events": {
"source": "scope",
"type": "mousemove",
"between": // ...
},
"update": "[selection1_x[0], clamp(x(unit), 0, width)]"
},
// ...
We can replace the x(unit) expressions in the first on item, and the clamp(x(unit), 0, width) expression in the second on item, using the steps above:
{
"name": "selection1_x",
"value": [],
"on": [
{
"events": {
"source": "scope",
"type": "mousedown",
"filter": // ...
},
"update": "[scale(\"x\",datetime(year(invert(\"x\",x(unit))),0)),scale(\"x\",datetime(year(invert(\"x\",x(unit))),0))]"
},
{
"events": {
"source": "scope",
"type": "mousemove",
"between": // ...
},
"update": "[selection1_x[0],scale(\"x\",datetime(year(invert(\"x\",clamp(x(unit),0,width))),0))]"
},
// ...
The lines are getting long, and this is without considering intervals smaller than a year. You probably don't want to write this stuff by hand!
smaller intervals
This method supports truncating to any of the parameters of the Vega datetime expression:
datetime(year, month[, day, hour, min, sec, millisec])
Returns a new Date instance. The month is 0-based, such that 1 represents February.
You have to specify at least year and month, so above we had zeros for month.
week
To snap to week, you can subtract the day of the week from the date of the month:
[selection1_x[0], scale("x", datetime(
year(invert("x", clamp(x(unit),0,width))),
month(invert("x", clamp(x(unit),0,width))),
date(invert("x", clamp(x(unit),0,width)))
- day(invert("x", clamp(x(unit),0,width)))))]
I haven't learned Vega well enough to know if I can simplify these expressions.
onward
Also vega-time is in the pipeline, providing the functionality of d3-time intervals, so this should get a lot simpler in the near future!

Collapsing a group using Google Sheets API

So as a workaround to difficulties creating a new sheet with groups I am trying to create and collapse these groups in a separate call to batchUpdate. I can call request an addDimensionGroup successfully, but when I request updateDimensionGroup to collapse the group I just created, either in the same API call or in a separate one, I get this error:
{
"error": {
"code": 400,
"message": "Invalid requests[1].updateDimensionGroup: dimensionGroup.depth must be \u003e 0",
"status": "INVALID_ARGUMENT"
}
}
But I'm passing depth as 0 as seen by the following JSON which I send in my request:
{
"requests":[{
"addDimensionGroup":{
"range":{
"dimension":"ROWS",
"sheetId":0,
"startIndex":2,
"endIndex":5}
}
},{
"updateDimensionGroup":{
"dimensionGroup":{
"range": {
"dimension":"ROWS",
"sheetId":0,
"startIndex":2,
"endIndex":5
},
"depth":0,
"collapsed":true
},
"fields":"*"
}
}],
"includeSpreadsheetInResponse":true}',
...
I'm not entirely sure what I am supposed to provide for "fields", the documentation for UpdateDimensionGroupRequest says it is supposed to be a string ("string ( FieldMask format)"), but the FieldMask definition itself shows the possibility of multiple paths, and doesn't tell me how they are supposed to be separated in a single string.
What am I doing wrong here?
The error message is actually instructing you that the dimensionGroup.depth value must be > 0:
If you call spreadsheets.get() on your sheet, and request only the DimensionGroup data, you'll note that your created group is actually at depth 1:
GET https://sheets.googleapis.com/v4/spreadsheets/{SSID}?fields=sheets(rowGroups)&key={API_KEY}
This makes sense, since the depth is (per API spec):
depth numberThe depth of the group, representing how many groups have a range that wholly contains the range of this group.
Note that any given particular DimensionGroup "wholly contains its own range" by definition.
If your goal is to change the status of the DimensionGroup, then you need to set its collapsed property:
{
"requests":
[
{
"updateDimensionGroup":
{
"dimensionGroup":
{
"range":
{
"sheetId": <your sheet id>,
"dimension": "ROWS",
"startIndex": 2,
"endIndex": 5
},
"collapsed": true,
"depth": 1
},
"fields": "collapsed"
}
}
]
}
For this particular Request, the only attribute you can set is collapsed - the other properties are used to identify the desired DimensionGroup to manipulate. Thus, specifying fields: "*" is equivalent to fields: "collapsed". This is not true for the majority of requests, so specifying fields: "*" and then omitting a non-required request parameter is interpreted as "Delete that missing parameter from the server's representation".
To change a DimensionGroup's depth, you must add or remove other DimensionGroups that encompass it.

Fuzzy Like This on Attachment Returns Nothing on Partial Word

I have my mapping like this:
{
"doc": {
"mappings": {
"mydocument": {
"properties": {
"file": {
"type": "attachment",
"path": "full",
"fields": {
"file": {
"type": "string",
"store": true,
"term_vector": "with_positions_offsets"
},
"author": {
...
When I search for a complete word I get the result:
"query": {
"fuzzy_like_this" : {
"fields" : ["file"],
"like_text" : "This_is_something_I_want_to_search_for",
"max_query_terms" : 12
}
},
"highlight" : {
"number_of_fragments" : 3,
"fragment_size" : 650,
"fields" : {
"file" : { }
}
}
But if I change the search term to "This_is_something_I_want" I get nothing. What am I missing?
To implement a partial match, we must first understand what fuzzy like this does and then decide what you want partial matching to return. fuzzy like this will perform 2 key functions.
The like_text will be analyzed using the default analyzer. All the resulting tokens will then be used to find documents based on term frequency, or tf-idf
This typically means that the input term will be be split on space and lowercased. This_is_something_I_want will therefore be tokenized to this_is_something_i_want. Unless you have files with this exact term, no documents will match.
Secondly, all terms will be fuzzified. Fuzzy searches score terms based on how many character changes needs to made to a word to match another word. For instance to get from bat to hat we will need to make 1 character change.
For our case to get from this_is_something_i_want to this_is_something_i_want_to_search_for, we will need to make 14 character changes (adding _to_search_for.) Standard fuzzy search only allows for 3 character changes when working with terms longer that 5 or 6 characters. Increasing the fuzzy limit to 14 will however produce severely skewed results
So neither of these functions will help produce the results you seek.
Here is what I can suggest:
You can implement an analyzer that splits on underscore similar to this. Tokens produced will then be ['this', 'is', 'something', 'i', 'want'] which can correctly be matched to to the sample case
Alternatively, if all you want is a document that starts with the specified text, you can use a phrase prefix query instead of fuzzy like this. Documentations here

JSON-schema: validating an integer formatted in a string with min and max values

Through a json schema validator (like z-schema), I would like to validate an integer formatted in a string, e.g.:
{
"myvalue": "45"
}
Currently, the following validation schema is:
{
"type": "string",
"pattern": "^[0-9]+$"
}
However, now it would be great to be able to validate a minimum and maximum value, like:
{
"type": "integer",
"minimum": 0,
"maximum": 32
}
However the above json value "45" is not an integer.
Without changing the type to integer, the best you can do is use the pattern keyword to enforce the range using a regular expression. Here is an example of a regular expression to match integers from 0..32.
/^[1-2]?[0-9]$|^3[0-2]$/