Extract specific key from array of jsons in Amazon Redshift - sql

Background
I am working in Amazon Redshift database using SQL. I have a table and one of the column called attributes contains data that looks like this:
[{"name": "Color", "value": "Beige"},{"name":"Size", "value":Small"}]
or
[{"name": "Size", "value": "Small"},{"name": "Color", "value": "Blue"},{"name": "Material", "value": "Cotton"}]
From what I understand, the above is a series of path elements in a JSON string.
Issue
I am looking to extract the color value in each JSON string. I am unsure how to proceed. I know that if color was in the same location I could use the index to indicate where to extract from. But that is not the case here.
What I tried
select json_extract_array_element_text(attributes, 1) as color_value, json_extract_path_text(color_value, 'value') as color from my_table
This query works for some columns but not all as the location of the color value is different.
I would appreciate any help here as i am very new to sql and have only done basic querying. I have been using the following page as a reference

First off your data is in an array format (between [ ]), not object format (between { }). The page you mention is a function for extracting data from JSON objects, not arrays. Also array format presents challenges as you need to know the numeric position of the element you wish to extract.
Based on your example data it seems like objects is the way to go. If so you will want to reformat your data to be more like:
{"Color": "Beige", "Size": "Small"}
and
{"Size": "Small", "Color": "Blue", "Material": "Cotton"}
This conversion only works if the "name" values are unique in your data.
With this the function you selected - JSON_EXTRACT_PATH_TEXT() - will pull the values you want from the data.
Now changing data may not be an option and dealing with these arrays will make things harder and less performant. To do this you will need to expand these arrays by cross joining with a set of numbers that contain all numbers up to the maximum length of your arrays. For example for the samples you gave you will need to cross join by the values 0,1,2 so that you 3 element array can be fully extracted. You can then filter on only those rows that have a "name" of "color".
The function you will need for extracting elements from an array is JSON_EXTRACT_ARRAY_ELEMENT_TEXT() and since you have objects stored in the array you will need to run JSON_EXTRACT_PATH_TEXT() on the results.

Related

How to do case insensitive search on dictionary key in CosmosDB?

I can't figure out a way to do a case insensitive search on dictionary keys in ComsosDB. My objects look like this:
...
"Codes": {
"CodeSystem1": [
"A1", "A2"
],
"CodeSystem2": [
"x1","x2"
]
},
...
Codes is a Dictionary<string, List<string>>
My query looks like this:
SELECT * FROM c WHERE ARRAY_CONTAINS(c.Codes["CodeSystem2"], 'x1')
However, I'd like to do a LOWER() on both the dictionary key and value, but it doesn't work like that.
SELECT * FROM c WHERE ARRAY_CONTAINS(c.Codes[LOWER("CodeSystem2"]), LOWER('x1'))
Any ideas? I can't change the structure of the objects, and rather not do the filtering in my .NET code.
LOWER/UPPER will not work with Array elements as you would want. If you have something like this:
"CodeSystem4": [
"Z1"
],
"CodeSystem5": "Z3"
We can use the lower with element CodeSystem5 as below:
select * from c where lower(c.Codes["CodeSystem5"]) = Lower('Z3')
But we cannot do the same with 'CodeSystem4' with ARRAY_CONTAINS, it will not return any result.
Also as per the below article, "The LOWER system function does not utilize the index. If you plan to do frequent case insensitive comparisons, the LOWER system function may consume a significant amount of RU's. If this is the case, instead of using the LOWER system function to normalize data each time for comparisons, you can normalize the casing upon insertion."
https://learn.microsoft.com/en-us/azure/cosmos-db/sql/sql-query-lower
One way is to add another searchable array in lower case to make it work through query. Or else we can filter it through the SDK

How to get the equivalent of combinig [contains] and [in] operators in the same query?

So I have a field that's a multi-choice on the Directus back end so when the JSON comes out of the API it's a one-dimensional array, like so:
"field_name": [
"",
"option 6",
"option 11",
""
]
(btw I have no idea why all these fields produce those blank values, but that's a matter for another day)
I am trying to make an interface on the front end where you can select one or more of these values and the result will come back if ANY of them are found for that record. Think of it like a tag list, if the item has just one of the values it should be returned.
I can use the [contains] operator to find if it has one of the values I'm looking for, but I can only pass a single value, whereas I need all that have either optionX OR optionY OR optionZ. I would basically need a combination of [contains] and [in] to achieve what I'm trying to do. Is there a way to achieve this?
I've also tried setting the [logical] operator to OR, but then that screws up the other filters that need to be included as AND (or I'm doing something wrong). Not to mention the query gets completely unruly.
Help?

Cosmosdb index precision for fixed value strings

I want to index over a field in a collection whose values can be only 4 characters long
{ "field": "abcd" }
Can I use an index precision of 4 like below to save on RU's without having any side effects?
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": 4
},
]
For Range indexes, the index term length will never exceed the actual string length. So,if all of your strings are 4 characters long, then this will not have any impact (neither positive or negative). You're better off, however, to set the precision to -1 so that we don't have to change your index policy in the future in case the length of the strings changes.
Based on this official statement, the choice of the index precision might affect the performance of string range queries. Obviously, there is no specific statements about
effect like hash type index. So,I suggest you do actual test based on the simulation data instead so that maybe you could find the truth.
BTW, if you want to perform ORDER BY queries against your string properties, you must specify a precision of -1 for the corresponding paths.
There are more documents about saving RUs for your reference.
1.https://lbadri.wordpress.com/2018/04/07/is-azure-cosmos-db-really-expensive/
2.https://medium.com/#thomasweiss_io/how-i-learned-to-stop-worrying-and-love-cosmos-dbs-request-units-92c68c62c938

Can JSON Schema support constraints on array items at specific indexes

A good schema language will allow a high degree of control on value constraints.
My quick impression of JSON Schema, however, is that one cannot go beyond specifying that an item must be an array with a single allowable type; one cannot apparently specify, for example, that the first item must be of one type, and the item at the second index of another type. Is this view mistaken?
Yes it can be done, here is an example of an array with the three first item type specified:
{
"type": "array",
"items": [
{
"type": "number"
},
{
"type": "string"
},
{
"type": "integer"
}
]
}
When you validate the schema the 1st, 2nd and 3rd item need to match their type.
If you have more than four items in your array, the extra ones dont have a specified type so they wont fail validation also an array with less than 3 items will validate as long as the type for each item is correct.
Source and a good read I found last week when I started json schema: Understanding JSON Schema (array section in page 24 of PDF)
ps: english it's not my first languaje, let me know of any mistake in spelling, punctuation or grammar

Evaluate column value into rows

I have a column whose value is a json array. For example:
[{"att1": "1", "att2": "2"}, {"att1": "3", "att2": "4"}, {"att1": "5", "att2": "6"}]
What i would like is to provide a view where each element of the json array is transformed into a row and the attributes of each json object into columns. Keep in mind that the json array doesn't have a fixed size.
Any ideas on how i can achieve this ?
a stored procedure lexer to run against the string? anything else like trying a variable in the SQL or using regexp i imagine will be tricky.
if you need it for client-side viewing only, can you use JSON decode libraries (json_decode() if you are on PHP) and then build markup from that?
but if you're gonna use it for any Db work at all, i reckon it shouldn't be stored as JSON.