Can JSON Schema support constraints on array items at specific indexes - jsonschema

A good schema language will allow a high degree of control on value constraints.
My quick impression of JSON Schema, however, is that one cannot go beyond specifying that an item must be an array with a single allowable type; one cannot apparently specify, for example, that the first item must be of one type, and the item at the second index of another type. Is this view mistaken?

Yes it can be done, here is an example of an array with the three first item type specified:
{
"type": "array",
"items": [
{
"type": "number"
},
{
"type": "string"
},
{
"type": "integer"
}
]
}
When you validate the schema the 1st, 2nd and 3rd item need to match their type.
If you have more than four items in your array, the extra ones dont have a specified type so they wont fail validation also an array with less than 3 items will validate as long as the type for each item is correct.
Source and a good read I found last week when I started json schema: Understanding JSON Schema (array section in page 24 of PDF)
ps: english it's not my first languaje, let me know of any mistake in spelling, punctuation or grammar

Related

Extract specific key from array of jsons in Amazon Redshift

Background
I am working in Amazon Redshift database using SQL. I have a table and one of the column called attributes contains data that looks like this:
[{"name": "Color", "value": "Beige"},{"name":"Size", "value":Small"}]
or
[{"name": "Size", "value": "Small"},{"name": "Color", "value": "Blue"},{"name": "Material", "value": "Cotton"}]
From what I understand, the above is a series of path elements in a JSON string.
Issue
I am looking to extract the color value in each JSON string. I am unsure how to proceed. I know that if color was in the same location I could use the index to indicate where to extract from. But that is not the case here.
What I tried
select json_extract_array_element_text(attributes, 1) as color_value, json_extract_path_text(color_value, 'value') as color from my_table
This query works for some columns but not all as the location of the color value is different.
I would appreciate any help here as i am very new to sql and have only done basic querying. I have been using the following page as a reference
First off your data is in an array format (between [ ]), not object format (between { }). The page you mention is a function for extracting data from JSON objects, not arrays. Also array format presents challenges as you need to know the numeric position of the element you wish to extract.
Based on your example data it seems like objects is the way to go. If so you will want to reformat your data to be more like:
{"Color": "Beige", "Size": "Small"}
and
{"Size": "Small", "Color": "Blue", "Material": "Cotton"}
This conversion only works if the "name" values are unique in your data.
With this the function you selected - JSON_EXTRACT_PATH_TEXT() - will pull the values you want from the data.
Now changing data may not be an option and dealing with these arrays will make things harder and less performant. To do this you will need to expand these arrays by cross joining with a set of numbers that contain all numbers up to the maximum length of your arrays. For example for the samples you gave you will need to cross join by the values 0,1,2 so that you 3 element array can be fully extracted. You can then filter on only those rows that have a "name" of "color".
The function you will need for extracting elements from an array is JSON_EXTRACT_ARRAY_ELEMENT_TEXT() and since you have objects stored in the array you will need to run JSON_EXTRACT_PATH_TEXT() on the results.

Cosmosdb index precision for fixed value strings

I want to index over a field in a collection whose values can be only 4 characters long
{ "field": "abcd" }
Can I use an index precision of 4 like below to save on RU's without having any side effects?
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": 4
},
]
For Range indexes, the index term length will never exceed the actual string length. So,if all of your strings are 4 characters long, then this will not have any impact (neither positive or negative). You're better off, however, to set the precision to -1 so that we don't have to change your index policy in the future in case the length of the strings changes.
Based on this official statement, the choice of the index precision might affect the performance of string range queries. Obviously, there is no specific statements about
effect like hash type index. So,I suggest you do actual test based on the simulation data instead so that maybe you could find the truth.
BTW, if you want to perform ORDER BY queries against your string properties, you must specify a precision of -1 for the corresponding paths.
There are more documents about saving RUs for your reference.
1.https://lbadri.wordpress.com/2018/04/07/is-azure-cosmos-db-really-expensive/
2.https://medium.com/#thomasweiss_io/how-i-learned-to-stop-worrying-and-love-cosmos-dbs-request-units-92c68c62c938

BACnet deserialization: How do I know if a new list elements starts

I'm implementing a generic BACnet decoder and came across the following question, of which I can't seem to find the answer within the BACnet standard. The chapter "20.2.1.3.2 Constructed Data" does not answer my question, or I might not fully understand it.
Let's assume I have a List (SEQUENCE OF) with elements of type Record (SEQUENCE).
Said record has 4 fields, identified by context tag, where field 0 and 1 are optional.
I further assume that the order, in which those fields are serialized, can be arbitrary (because they're identified by their context tags).
The data could look like that (number indicates field / column):
[{ "3", "0", 2" }, {"1", "2", "3"}]
Over the wire, the only "structure information" I assume I get are the open / close tags for the list.
That means:
Open Tag List
ctxTagColumn3, valueColumn3,
ctxTagColumn0, valueColumn0,
ctxTagColumn2, valueColumn2,
ctxTagColumn1, valueColumn1,
ctxTagColumn2, valueColumn2,
ctxTagColumn3, valueColumn3
Close Tag List
How do I know, after I've read the last column data ("2") of my first list item, that I must begin decoding the second item, starting with a value for column "1"?
Which of my assumptions is wrong?
Thank you and kind regards
Pascal
The order of elements of a SEQUENCE is always known and shall not be arbitrarily by definition. Further, not all conceivable combinations are possible to encode. Regarding BACnet, all type definitions shall be decodable universally.
Assuming I understand you correctly; the "order" cannot be "arbitrary"; i.e.:
SEQUENCE = *ordered* collection of variables of **different** types
SEQUENCE OF = *ordered* collection of variables of **same** type
The tag-number for the item (SD) context-tag will be different (/possibly an incremented value/maybe +1) from the containing (PD) context-tag; so you could check for that, or better still if the tag-number value is <= 5 (/'length' value) then it's a SD context-tag for one of your items, rather than a (/the closing) PD context tag (/'type' value) delimiting the end of your items.

How to get the original entity value from Wit.Ai?

I was wondering if there is a way to also return the original value of an entity from Wit.Ai.
For instance, my entity "states" correctly maps the misspelled and lower case word massachusets to Massachusetts. But it does not return the original word. So, I cannot easily tag the incorrect word.
{
"msg_id": "a6ac0938-d92c-45f4-8c41-4ca990d83415",
"_text": "What is the temperature in massachusets?",
"entities": {
"states": [
{
"confidence": 0.7956227869227184,
"type": "value",
"value": "Massachusetts"
}
]
}
}
I really appreciate if you know how I can accomplish that with Wit.Ai.
Thanks
Keep the search strategy of "states" as free-text & Keywords. This way you can extract the original word in the message. Declaring it as keyword matches it with the close one and returns that keyword where as if it is a free-text it returns original word in the message.
You may have to train wit each time to do so, by highlighting "Massachusetts" as a resolved value. This will make wit understand that you do not agree with the autocorrection.

Evaluate column value into rows

I have a column whose value is a json array. For example:
[{"att1": "1", "att2": "2"}, {"att1": "3", "att2": "4"}, {"att1": "5", "att2": "6"}]
What i would like is to provide a view where each element of the json array is transformed into a row and the attributes of each json object into columns. Keep in mind that the json array doesn't have a fixed size.
Any ideas on how i can achieve this ?
a stored procedure lexer to run against the string? anything else like trying a variable in the SQL or using regexp i imagine will be tricky.
if you need it for client-side viewing only, can you use JSON decode libraries (json_decode() if you are on PHP) and then build markup from that?
but if you're gonna use it for any Db work at all, i reckon it shouldn't be stored as JSON.