How can I convert a string to JSON in Snowflake? - sql

I have this string {id: evt_1jopsdgqxhp78yqp7pujesee, created: 2021-08-14t16:38:17z} and would like to convert it to a JSON, I tried parse_json but got an error, to_variant and converted to "{id: evt_1jopsdgqxhp78yqp7pujesee, created: 2021-08-14t16:38:17z}"

To Gokhan & Simon's point, the original data isn't valid JSON.
If you're 100% (1000%) certain it'll "ALWAYS" come that way, you can treat it as a string parsing exercise and do something like this, but once someone changes the format a bit it'll have an issue.
create temporary table abc (str varchar);
insert into abc values ('{id: evt_1jopsdgqxhp78yqp7pujesee, created: 2021-08-14t16:38:17z}');
select to_json(parse_json(json_str)) json_json
FROM (
select split_part(ltrim(str, '{'), ',', 1) as part_a,
split_part(rtrim(str, '}'), ',', 2) as part_b,
split_part(trim(part_a), ': ', 1) part_a_name,
split_part(trim(part_a), ': ', 2) part_a_val,
split_part(trim(part_b), ': ', 1) part_b_name,
split_part(trim(part_b), ': ', 2) part_b_val,
'{"'||part_a_name||'":"'||part_a_val||'", "'||part_b_name||'":"'||part_b_val||'"}' as json_str
FROM abc);
which returns a valid JSON
{"created":"2021-08-14t16:38:17z","id":"evt_1jopsdgqxhp78yqp7pujesee"}
Overall this is very fragile, but if you must do it, feel free to.

Your JSON is not valid, as you can validate it using any online tool:
https://jsonlint.com/
This is a valid JSON version of your data:
{
"id": "evt_1jopsdgqxhp78yqp7pujesee",
"created": "2021-08-14t16:38:17z"
}
And you can parse it successfully using parse_json:
select parse_json('{ "id": "evt_1jopsdgqxhp78yqp7pujesee", "created": "2021-08-14t16:38:17z"}');

Related

Parse string as JSON with Snowflake SQL

I have a field in a table of our db that works like an event-like payload, where all changes to different entities are gathered. See example below for a single field of the object:
'---\nfield_one: 1\nfield_two: 20\nfield_three: 4\nid: 1234\nanother_id: 5678\nsome_text: Hey you\na_date: 2022-11-29\nutc: this_utc\nanother_date: 2022-11-30\nutc: another_utc'
Since accessing this field with pure SQL is a pain, I was thinking of parsing it as a JSON so that it would look like this:
{
"field_one":"1",
"field_two": "20",
"field_three": "4",
"id": "1234",
"another_id": "5678",
"some_text": "Hey you",
"a_date": "2022-11-29",
"utc": "2022-11-29 15:29:28.159296000 Z",
"another_date": "2022-11-30",
"utc": "2022-11-30 13:34:59.000000000 Z"
}
And then just use a Snowflake-native approach to access the values I need.
As you can see, though, there are two fields that are called utc, since one is referring to the first date (a_date), and the second one is referring to the second date (another_date). I believe these are nested in the object, but it's difficult to assess with the format of the field.
This is a problem since I can't differentiate between one utc and another when giving the string the format I need and running a parse_json() function (due to both keys using the same name).
My SQL so far looks like the following:
select
object,
replace(object, '---\n', '{"') || '"}' as first,
replace(first, '\n', '","') as second_,
replace(second_, ': ', '":"') as third,
replace(third, ' ', '') as fourth,
replace(fourth, ' ', '') as last
from my_table
(Steps third and fourth are needed because I have some fields that have extra spaces in them)
And this actually gives me the format I need, but due to what I mentioned around the utc keys, I cannot parse the string as a JSON.
Also note that the structure of the string might change from row to row, meaning that some rows might gather two utc keys, while others might have one, and others even five.
Any ideas on how to overcome that?
Replace only one occurrence with regexp_replace():
with data as (
select '---\nfield_one: 1\nfield_two: 20\nfield_three: 4\nid: 1234\nanother_id: 5678\nsome_text: Hey you\na_date: 2022-11-29\nutc: this_utc\nanother_date: 2022-11-30\nutc: another_utc' o
)
select parse_json(last2)
from (
select o,
replace(o, '---\n', '{"') || '"}' as first,
replace(first, '\n', '","') as second_,
replace(second_, ': ', '":"') as third,
replace(third, ' ', '') as fourth,
replace(fourth, ' ', '') as last,
regexp_replace(last, '"utc"', '"utc2"', 1, 2) last2
from data
)
;
This may not be what you want but it seems to me that your problem could be solved if the UTC timestamps were to replace the dates preceding it where the keys are not duplicated. You can always calculate dates once you have the timestamps. If this is making sense, see if you can apply your parse_json solution to this output instead
set str='---\nfield_one: 1\nfield_two: 20\nfield_three: 4\nid: 1234\nanother_id: 5678\nsome_text: Hey you\na_date: 2022-11-29\nutc: 2022-11-29 15:29:28.159296000 Z\nanother_date: 2022-11-30\nutc: 2022-11-30 13:34:59.000000000 Z';
select regexp_replace($str,'[0-9]{4}-[0-9]{2}-[0-9]{2}\nutc:')

How to remove all \ from nested json in SQL Redshift?

I've got some problems with extracting values from nested json values in column.
I've got a column of data with values that looks almost like nested json, but some of jsons got \ between values and I need to clean them.
JSON looks like this:
{"mopub_json":
"{\"currency\":\"USD\",
\"country\":\"US\",
\"publisher_revenue\":0.01824}
"}
I need to get currency and publisher revenue as different columns and try this:
SET json_serialization_enable TO true;
SET json_serialization_parse_nested_strings TO true;
SELECT
JSON_EXTRACT_PATH_TEXT(column_name, 'mopub_json', 'publisher_revenue') as revenue_mopub,
JSON_EXTRACT_PATH_TEXT(column_name, 'mopub_json', 'currency') as currency_mopub
FROM(
SELECT replace(column_name, "\t", '')
FROM table_name)
I receive the next error:
[Amazon](500310) Invalid operation: column "\t" does not exist in events
When I'm trying this:
SET json_serialization_parse_nested_strings TO true;
SELECT
JSON_EXTRACT_PATH_TEXT(column_name, 'mopub_json', 'publisher_revenue') as revenue_mopub,
JSON_EXTRACT_PATH_TEXT(column_name, 'mopub_json', 'currency') as currency_mopub
FROM(
SELECT replace(column_name, chr(92), '')
FROM table_name)
I receive
Invalid operation: JSON parsing error
When I'm trying to extract values without replacing , I'm receiving empty columns.
Thank you for your help!
So your json isn't valid. JSON doesn't allow multiline text strings but I expect that the issue. Based on your query I think you don't want a single key and string but the whole structure. The reason the that quotes are backslashed is because they are inside a string. The json should look like:
{
"mopub_json": {
"currency": "USD",
"country": "US",
"publisher_revenue": 0.01824
}
}
Then the SQL you have should work.

Using JSON_VALUE() when value contains unescaped double quotes

I have a table in the database where in one field (name of the field - JSONDetail) JSON is stored. Recently we encountered a problem where in this field in one of the values there are unescaped double quotes. It's due to migration from another system which allowed double quotes to be stored in the database without backslash before them.
Example (see field "comment"):
{
"noteId": "a34f17c4-f4fd-45ea-b4da-732ef8126a6b",
"memberName": "Test LINKOUS",
"tenantId": "548bead1-bdab-e811-bce7-0003ff21d46b",
"noteType": "General Note",
"memberId": "84cf0adb-850d-e711-80c8-000d3a103f46",
"createdOn": "2020-09-13T17:47:33.2864868Z",
"comment": "test "word" test",
"contacts": [
{
"otherContactType": "",
"communicationType": ""
}
]
}
We need to identify such cases in the database. I tried:
select JSON_VALUE (JSONDetail, '$.comment') as Comment
But instead of test "word" test, it returned
How can I return what is actually stored in key "comment"?
SQL-server does not have "fix_json" function
To find junk records
select *
from table
where ISJSON(json_col) = 0
Fix founded records via back-end language (php, c#, etc)
To prevent such behavior in future add constraint
ALTER TABLE table
ADD CONSTRAINT [record should be formatted as JSON]
CHECK (ISJSON(json_col)=1)
If comment keys are followed by contacts key throught the table within the JSONDetail column, then you can use the following code block which contains SUBSTRING(), PATINDEX(), TRIM() and LEN() functions to extract the whole value of the comment key in order to compare with the value extracted from JSON_VALUE (JSONDetail, '$.comment') :
WITH t(json_extracted,str) AS
(
SELECT JSON_VALUE (JSONDetail, '$.comment'),
SUBSTRING(
JSONDetail,
PATINDEX('%"comment"%', JSONDetail),
PATINDEX('%"contacts"%', JSONDetail)-PATINDEX('%"comment"%', JSONDetail)
)
FROM tab
), t2(json_extracted,str) AS
(
SELECT json_extracted,
TRIM(
SUBSTRING( str, PATINDEX('%:%', str) + 1,
PATINDEX('%,%', str) - PATINDEX('%:%', str) - 1 ) )
FROM t
)
SELECT SUBSTRING(str,2,LEN(str)-2) AS extracted_comment,
CASE WHEN json_extracted = SUBSTRING(str,2,LEN(str)-2)
THEN
'No'
ELSE
'Yes'
END AS "is_it_corrupted"
FROM t2
Demo
[EDIT] It wasn't practical to infer the field location in the JSON string based on length. Based on CHARINDEX search for the field names, this code finds and fixes the 'comments' in the JSON.
Data
drop table if exists #json_to_fix;
go
create table #json_to_fix(
json_col nvarchar(max));
declare #json nvarchar(max)=N'
{
"noteId": "a34f17c4-f4fd-45ea-b4da-732ef8126a6b",
"memberName": "Test LINKOUS",
"tenantId": "548bead1-bdab-e811-bce7-0003ff21d46b",
"noteType": "General Note",
"memberId": "84cf0adb-850d-e711-80c8-000d3a103f46",
"createdOn": "2020-09-13T17:47:33.2864868Z",
"comment": "test "word" test",
"contacts": [
{
"otherContactType": "",
"communicationType": ""
}
]
}';
insert #json_to_fix(json_col) values (#json);
Query
select s.not_escaped, fix.string_to_fix,
replace(fix.string_to_fix, '"', '') fixed
from #json_to_fix j
cross apply
(select charindex('"comment":', j.json_col, 1) strt_ndx) c_start
cross apply
(select charindex('"contacts"', j.json_col, c_start.strt_ndx) end_ndx) c_end
cross apply
(select substring(json_col, c_start.strt_ndx, c_end.end_ndx-c_start.strt_ndx-11) not_escaped) s
cross apply
(select substring(s.not_escaped, 13, len(s.not_escaped)-13) string_to_fix) fix
Output
not_escaped string_to_fix fixed
"comment": "test "word" test" test "word" test test word test

SQL Remove \n and parse JSON in one command

The data is formatted like so:
Query:
select X from DB
Output:
{\n "_id": "5a7e4b7cf36d3920dd24bc0e",\n "price": 0,\n "name": "XXX"\n}
What I'm trying to do is both remove the \n characters and parse the response itself. I'd like to grab just the _id field.
My current query is not quite right:
Step 1: Remove the \n characters:
SELECT REPLACE(REPLACE(X, CHAR(13), ''), CHAR(10), '') from DB
Output:
{"_id": "5a7e4b7cf36d3920dd24bc0e", "price": 0,"name": "XXX"}
Question: How can I tweak this query to parse the JSON and return the _id field all at once? I've tried this with no luck:
SELECT PARSE_JSON(REPLACE(REPLACE(X, CHAR(13), ''), CHAR(10), '')) from DB
^ This query just outputs the same as the first query.
Have you tried
SELECT X:_id FROM DB

SQL Server 2017 Selecting JSON embedded within a JSON field

In SQL Server 2017, I'd like to "SELECT" a JSON object embedded within another as a string so we can store/process them later.
eg JSON:
[
{"key1":"value1",
"level2_Obj":{"key2":"value12"}
},
{"key1":"value2",
"level2_Obj":{"key22":"value22"}
},
]
From above JSON, I'd like to SELECT whole of the level2Obj JSON object, see below for what I'd like to see the "selection" result.
value1 |{"key2" :"value12"}
value2 |{"key22":"value22"}
I tried below with no luck:
SELECT * FROM
OPENJSON(#json,'$."data1"')
WITH(
[key1] nvarchar(50),
[embedded_json] nvarchar(max) '$."level2Obj"'
) AS DAP
Can some one please help how I select the contents of the 2nd level JSON object as a string?
The idea is to Write 1st level JSON properties into individual cells and rest of JSON levels into a single column of type nvarchar(max) (i.e whole of sub-level JSON object into a single column as a string for further processing in later stages).
Good day,
Firstly, Your JSON text is not properly formatted. There is extra comma after the last object in the array. I will remove this extra comma for the sake of the answer, but if this is the format you have then first step will be to clear the text and make sure that is is well formatted.
Please check if this solve your needs:
declare #json nvarchar(MAX) = '
[
{
"key1":"value1",
"level2_Obj":{"key2":"value12"}
}
,
{
"key1":"value2",
"level2_Obj":{"key22":"value22"}
}
]
'
SELECT JSON_VALUE (t1.[value], '$."key1"'), JSON_QUERY (t1.[value], '$."level2_Obj"')
FROM OPENJSON(#json,'$') t1