jsonschema for a map stored as an array [key1, val1, key2, val2.....] - jsonschema

Is it possible to create a json schema for an array with undefined length (besides it always being an even number of elements) that captures a map stored as an array?
i.e. as described in title [key1, val1, key2, val2.....]
it seems that the only option for an array of an undetermined length is to have a single item "type" (though that type could conceptually be a oneOf type). However, that wouldn't enforce ordering of key/val schema restrictions. While it would validate valid uses, it would also validate invalid uses.
if I knew how long the array would be, I could just enforce it by specifying the types for all keys and values in their respective positions, but that's not the case here.
Yes, it would be nice if the api worked off a map/object instead of an array in this location, but this is an old api that I'm trying to create a json schema for, so it probably can't be changed.

Related

Parquet: NULL, or zero-length array?

I'm designing a schema in Avro that will ultimately become the schema for a Parquet file to be queried by Hive.
There are several instances where I've got a nested column as an array of type , and the parent record may have zero or more . To use a more concrete example, let's say that I have a Person record, with a Children field. A Person can have zero or more children.
Are there any persuasive arguments on whether the Children field should be an array that can have zero items, or should instead be defined as a union of [null, array]?
That is, if there are zero children, should I use NULL, or should I use a zero-length array?
This early in my learning curve it appears to be a philosophical choice. But I don't know what I don't know, and so I'm hoping the community can share their insights based on experience that I don't have: should this be a NULLable column, or simply an array that could have zero elements in it?

Invalid value for implementation data types

How to configure the Invalid Value for implementation data types (Especially for arrays, based on the size of the array we have to create that many array value specifications)??
For a single ImplementationDataType of category ARRAY, you just create one ArrayValueSpecification that contains as many NumericalValueSpecifications as required.

Use Postgres to parse stringified JSON object

I've been using Postgres to store JSON objects as strings, and now I want to utilize PG's built-in json and jsonb types to store the objects more efficiently.
Basically, I want to parse the stringified JSON and put it in a json column from within PG, without having to resort to reading all the values into Python and parsing them there.
Ideally, my migration should look like this:
UPDATE table_name SET json_column=parse_json(string_column);
I looked at Postgres's JSON functions, and there doesn't seem to be a method of doing this, even though it seems pretty trivial. For the record, my JSON objects are just one-dimensional arrays of strings.
Is there any way to do this?
There is no need for a parse_json column, just change the type of the column:
ALTER TABLE table_name
ALTER COLUMN json_column TYPE json USING json_column::json;
Note that if you plan on doing a lot of JSON operations on these values (i.e. extracting elements from objects, modifying objects etc) it's better to use jsonb. json should only be used for storing JSON data. Also, as Laurenz Albe points out, if you don't need to do any JSON operations on these values and you are not interested in the validation that postgresql can do on them (e.g. because you trust that the source always provides valid JSON), then using text is a perfectly valid option (or bytea).

Dynamic type cast in select query

I have totally rewritten my question because of inaccurate description of the problem!
We have to store a lot of different informations about a specific region. For this we need a flexible data structure which does not limit the possibilities for the user.
So we've create a key-value table for this additional data which is described through a meta table which contains the datatype of the value.
We already use this information for queries over our rest api. We then automatically wrap the requested field with into a cast.
SQL Fiddle
We return this data together with information form other tables as a JSON object. We convert the corresponding rows from the data-table with array_agg and json_object into a JSON object:
...
CASE
WHEN count(prop.name) = 0 THEN '{}'::json
ELSE json_object(array_agg(prop.name), array_agg(prop.value))
END AS data
...
This works very well. Now the problem we have is if we store data like a floating point number into this field, we then get returned a string representation of this number:
e.g. 5.231 returns as "5.231"
Now we would like to CAST this number during our select statement into the right data-format so the JSON result would be correctly formatted. We have all the information we need so we tried following:
SELECT
json_object(array_agg(data.name),
-- here I cast the value into the right datatype!
-- results in an error
array_agg(CAST(value AS datatype))) AS data
FROM data
JOIN (
SELECT name, datatype
FROM meta)
AS info
ON info.name = data.name
The error message is following:
ERROR: type "datatype" does not exist
LINE 3: array_agg(CAST(value AS datatype))) AS data
^
Query failed
PostgreSQL said: type "datatype" does not exist
So is it possible to dynamically cast the text of the data_type column to a postgresql type to return a well-formatted JSON object?
First, that's a terrible abuse of SQL, and ought to be avoided in practically all scenarios. If you have a scenario where this is legitimate, you probably already know your RDBMS so intimately, that you're writing custom indexing plugins, and wouldn't even think of asking this question...
If you tell us what you're actually trying to do, there's about a 99.9% chance we can tell you a better way to do it.
Now with that disclaimer aside:
This is not possible, without using dynamic SQL. With a sufficiently recent version of PostgreSQL, you can accomplish this with the use of 'EXECUTE IMMEDIATE', which you can read about in the manual. It basically boils down to using EXEC.
Note, however, that even using this method, the result for every row fetched in the same query must have the same data type. In other words, you can't expect that row 1 will have a data type of VARCHAR, and row 2 will have INT. That is completely impossible.
The problem you have is, that json_object does create an object out of a string array for the keys and another string array for the values. So if you feed your JSON objects into this method, it will always return an error.
So the first problem is, that you have to use a JSON or JSONB column for the values. Or you can convert the values from string to json with to_json().
Now the second problem is that you need to use another method to create your json object because you want to feed it with a string array for the keys and a json-object array for the values. For this there is a method called json_object_agg.
Then your output should be like the one you expected! Here the full query:
SELECT
json_object_agg(data.name, to_json(data.value)) AS data
FROM data

Lucene field from TokenStream with stored values

I have a field which needs to come from a token stream; it cannot be instantiated with a string and then analyzed into tokens. For example, I might want to combine the data from multiple columns (in my RDBMS) into a single Lucene field, but I want to analyze each column in its own way. So I cannot simply concat them all as a single string then analyze the resulting string.
The problem I am running into now is that fields created from token streams cannot be stored, which makes sense in the general case since the stream may not have an obvious string representation. However, I know the string representation, and I would like to store that.
I tried adding the same field twice, once with it being stored and having string data and once with it coming from a token stream, but it seems that this can't be done. Apart from some hack like adding a field with a name of "myfield__stored" is there a way to do this?
I am using 2.9.2.
I found a way. You can sneak it in by instantiating it as a normal field but calling SetTokenStream later:
Field f = new Field(Name, StringValue, Store, Analyzed, TV);
f.SetTokenStream(TokenStreamValue);
Because the reader/string value is only indexed if the token stream value is null, the token stream value will be indexed. The store methods look at string/reader regardless of token stream, so it will be this value which is stored.