how do i select certain key/value pair from json field inside a SQL table in SNOWFLAKE - sql

I am currently working on building a dataware house in snowflake for the business that i work for and i have encounter some problems. I used to apply the function Json_value in TSQL for extracting certain key/value pair from json format field inside my original MSSQL DB.
All the other field are in the regular SQL format but there is this one field that i really need that is formated in JSON and i can't seems to exact the key/value pair that i need.
I'm new to SnowSQL and i can't seems to find a way to extract this within a regular query. Does anyone knows a way around my problem ?
* ID /// TYPE /// Name (JSON_FORMAT)/// Amount *
1 5 {En: "lunch, fr: "diner"} 10.00
I would like to extract this line (for exemple) and be able to only retrieve the EN: "lunch" part from my JSON format field.
Thank you !

Almost any time you use JSON in Snowflake, it's advisable to use the VARIANT data type. You can use the parse_json function to convert a string into a variant with JSON.
select
parse_json('{En: "lunch", fr: "diner"}') as VARIANT_COLUMN,
VARIANT_COLUMN:En::string as ENGLISH_WORD;
In this sample, the first column converts your JSON into a variant named VARIANT_COLUMN. The second column uses the variant, extracting the "En" property and casting it to a string data type.
You can define columns as variant and store JSON natively. That's going to improve performance and allow parsing using dot notation in SQL.

For anyone else who also stumbles upon this question:
You can also use JSON_EXTRACT_PATH_TEXT. Here is an example, if you wanted to create a new column called meal.
select json_extract_path_text(Name,'En') as meal from ...

Related

Data Factory expression substring? Is there a function similar like right?

Please help,
How could I extract 2019-04-02 out of the following string with Azure data flow expression?
ABC_DATASET-2019-04-02T02:10:03.5249248Z.parquet
The first part of the string received as a ChildItem from a GetMetaData activity is dynamically. So in this case it is ABC_DATASET that is dynamic.
Kind regards,
D
There are several ways to approach this problem, and they are really dependent on the format of the string value. Each of these approaches uses Derived Column to either create a new column or replace the existing column's value in the Data Flow.
Static format
If the format is always the same, meaning the length of the sections is always the same, then substring is simplest:
This will parse the string like so:
Useful reminder: substring and array indexes in Data Flow are 1-based.
Dynamic format
If the format of the base string is dynamic, things get a tad trickier. For this answer, I will assume that the basic format of {variabledata}-{timestamp}.parquet is consistent, so we can use the hyphen as a base delineator.
Derived Column has support for local variables, which is really useful when solving problems like this one. Let's start by creating a local variable to convert the string into an array based on the hyphen. This will lead to some other problems later since the string includes multiple hyphens thanks to the timestamp data, but we'll deal with that later. Inside the Derived Column Expression Builder, select "Locals":
On the right side, click "New" to create a local variable. We'll name it and define it using a split expression:
Press "OK" to save the local and go back to the Derived Column. Next, create another local variable for the yyyy portion of the date:
The cool part of this is I am now referencing the local variable array that I created in the previous step. I'll follow this pattern to create a local variable for MM too:
I'll do this one more time for the dd portion, but this time I have to do a bit more to get rid of all the extraneous data at the end of the string. Substring again turns out to be a good solution:
Now that I have the components I need isolated as variables, we just reconstruct them using string interpolation in the Derived Column:
Back in our data preview, we can see the results:
Where else to go from here
If these solutions don't address your problem, then you have to get creative. Here are some other functions that may help:
regexSplit
left
right
dropLeft
dropRight

Escaping JSON special characters with JSON_QUERY not working

A project I'm working on involves storing a string of data in a table column. The table will have other columns relevant to the records. We decided to store the string data column using JSON.
From the table, a view will parse the JSON column into separate columns. The view will also have columns derived from the other main table columns. The data from the view is then used to populate parts of a document through SSRS.
When loading data into the main table, I need to utilize separate tables for deriving the other column values and the JSON column. I decided to use common table expressions for this. At the end of the query, I bring together the derived columns from the different common table expressions, including the JSON column, and insert them into the main table.
I had it almost done until I realized that when I use FOR JSON to create the JSON column, it escapes special characters. I did some research and have been trying to use the JSON_QUERY function to get around this but it's not working. Here is a simplification of the problem:
WITH Table1
(
First_Name_JSON
)
As
(
SELECT 'Tim/' As First_Name
FOR JSON PATH
)
SELECT JSON_QUERY(Table1.First_Name_JSON) as first_name
FROM Table1
FOR JSON PATH
Here is the output:
[{"first_name":[{"First_Name":"Tim\/"}]}]
Why is it still escaping? The documentation shows that passing a column that was created by a FOR JSON should make the JSON_QUERY function return it without escaped characters.
I know that this works:
SELECT JSON_QUERY('{"Firt_Name": "Tim/"}') as first_name
FOR JSON PATH
Output:
[{"first_name":{"Firt_Name": "Tim/"}}]
However, I need to be able to pass a column that's holding JSON data already because it's pretty long logic with many columns. Using FOR JSON is ideal for making changes versus hard coding the JSON format around each column.
I must be missing something. Thanks for any help.
It's quite simple:
{"Firt_Name": "Tim/"} is valid JSON, so JSON_QUERY can return it as is. Tim/ is not valid so needs escaping first.
Quote from the docs:
Using JSON_QUERY with FOR JSON
JSON_QUERY returns a valid JSON fragment. As a result, FOR JSON doesn't escape special characters in the JSON_QUERY return value.
If you're returning results with FOR JSON, and you're including data that's already in JSON format (in a column or as the result of an expression), wrap the JSON data with JSON_QUERY without the path parameter.
Given your use case, is it not possible to pass through the JSON to where you need it and un-escape it there? OPENJSON and JSON_VALUE are capable of this.

How to extract a value from JSON column with MariaDB, being this not an exact value in the JSON field?

I need to extract a field from a JSON string with MariaDB and search for specific patterns in that field.
This field is just a property of all the properties the JSON object has. I had read the documentation and I saw the JSON_EXTRACT function. I am still a newbie with databases so I would like some help in this matter.
{"user_id":"1","status_id":"1","text":"Hello, world"}
Lets say I want to get all the "text" values that have the "world" in the database table. I can extract with JSON_EXTRACT. But I want patterns, not absolute values.
How can I do that?
You can extract the value with json_extract(), and then do pattern matching with like:
select t.*
from mytable t
where json_extract(my_json_col, '$.text') like '%world%'

Add Array to field instead of String

I have the following problem:
I want to add my field value the value of value= [0,16,33,50,67,84,101,118,135,152,169,186,203,220,237,254,271,288,305,322,338,355,372,389,406,423,440,457,474,491,508,525,542,559,576,593,610,627,644,661,677,694,711,728,745,762,779,796,813,830,847,864,881,898,915,932,949,966,983,1000,1016,1033,1050,1067,1084,1101,1118,1135,1152,1169,1186,1203,1220,1237,1254,1271,1288,1305,1322,1338,1355,1372,1389,1406,1423,1440,1457,1474,1491,1508,1525,1542,1559,1576,1593,1610,1627,1644,1661,1677]
I tried to use JSON or any other field type it return me the value as a string (with "") and as I am doing stuff, it would not work. How to work around this?
I'm not entirely sure if this answers your question, but Directus 6 saves data in only the MySQL 5 datatypes. Therefore, CSV / JSON values are saved as strings (often in the TEXT datatype). If you want to use this data in your application as an array / JSON, you will have to convert it yourself.
The Directus team is working to support more (custom) datatypes in future versions, so the API can respond with nested arrays/objects in JSON.

Dynamic type cast in select query

I have totally rewritten my question because of inaccurate description of the problem!
We have to store a lot of different informations about a specific region. For this we need a flexible data structure which does not limit the possibilities for the user.
So we've create a key-value table for this additional data which is described through a meta table which contains the datatype of the value.
We already use this information for queries over our rest api. We then automatically wrap the requested field with into a cast.
SQL Fiddle
We return this data together with information form other tables as a JSON object. We convert the corresponding rows from the data-table with array_agg and json_object into a JSON object:
...
CASE
WHEN count(prop.name) = 0 THEN '{}'::json
ELSE json_object(array_agg(prop.name), array_agg(prop.value))
END AS data
...
This works very well. Now the problem we have is if we store data like a floating point number into this field, we then get returned a string representation of this number:
e.g. 5.231 returns as "5.231"
Now we would like to CAST this number during our select statement into the right data-format so the JSON result would be correctly formatted. We have all the information we need so we tried following:
SELECT
json_object(array_agg(data.name),
-- here I cast the value into the right datatype!
-- results in an error
array_agg(CAST(value AS datatype))) AS data
FROM data
JOIN (
SELECT name, datatype
FROM meta)
AS info
ON info.name = data.name
The error message is following:
ERROR: type "datatype" does not exist
LINE 3: array_agg(CAST(value AS datatype))) AS data
^
Query failed
PostgreSQL said: type "datatype" does not exist
So is it possible to dynamically cast the text of the data_type column to a postgresql type to return a well-formatted JSON object?
First, that's a terrible abuse of SQL, and ought to be avoided in practically all scenarios. If you have a scenario where this is legitimate, you probably already know your RDBMS so intimately, that you're writing custom indexing plugins, and wouldn't even think of asking this question...
If you tell us what you're actually trying to do, there's about a 99.9% chance we can tell you a better way to do it.
Now with that disclaimer aside:
This is not possible, without using dynamic SQL. With a sufficiently recent version of PostgreSQL, you can accomplish this with the use of 'EXECUTE IMMEDIATE', which you can read about in the manual. It basically boils down to using EXEC.
Note, however, that even using this method, the result for every row fetched in the same query must have the same data type. In other words, you can't expect that row 1 will have a data type of VARCHAR, and row 2 will have INT. That is completely impossible.
The problem you have is, that json_object does create an object out of a string array for the keys and another string array for the values. So if you feed your JSON objects into this method, it will always return an error.
So the first problem is, that you have to use a JSON or JSONB column for the values. Or you can convert the values from string to json with to_json().
Now the second problem is that you need to use another method to create your json object because you want to feed it with a string array for the keys and a json-object array for the values. For this there is a method called json_object_agg.
Then your output should be like the one you expected! Here the full query:
SELECT
json_object_agg(data.name, to_json(data.value)) AS data
FROM data