Using JSON_VALUE() when value contains unescaped double quotes - sql

I have a table in the database where in one field (name of the field - JSONDetail) JSON is stored. Recently we encountered a problem where in this field in one of the values there are unescaped double quotes. It's due to migration from another system which allowed double quotes to be stored in the database without backslash before them.
Example (see field "comment"):
{
"noteId": "a34f17c4-f4fd-45ea-b4da-732ef8126a6b",
"memberName": "Test LINKOUS",
"tenantId": "548bead1-bdab-e811-bce7-0003ff21d46b",
"noteType": "General Note",
"memberId": "84cf0adb-850d-e711-80c8-000d3a103f46",
"createdOn": "2020-09-13T17:47:33.2864868Z",
"comment": "test "word" test",
"contacts": [
{
"otherContactType": "",
"communicationType": ""
}
]
}
We need to identify such cases in the database. I tried:
select JSON_VALUE (JSONDetail, '$.comment') as Comment
But instead of test "word" test, it returned
How can I return what is actually stored in key "comment"?

SQL-server does not have "fix_json" function
To find junk records
select *
from table
where ISJSON(json_col) = 0
Fix founded records via back-end language (php, c#, etc)
To prevent such behavior in future add constraint
ALTER TABLE table
ADD CONSTRAINT [record should be formatted as JSON]
CHECK (ISJSON(json_col)=1)

If comment keys are followed by contacts key throught the table within the JSONDetail column, then you can use the following code block which contains SUBSTRING(), PATINDEX(), TRIM() and LEN() functions to extract the whole value of the comment key in order to compare with the value extracted from JSON_VALUE (JSONDetail, '$.comment') :
WITH t(json_extracted,str) AS
(
SELECT JSON_VALUE (JSONDetail, '$.comment'),
SUBSTRING(
JSONDetail,
PATINDEX('%"comment"%', JSONDetail),
PATINDEX('%"contacts"%', JSONDetail)-PATINDEX('%"comment"%', JSONDetail)
)
FROM tab
), t2(json_extracted,str) AS
(
SELECT json_extracted,
TRIM(
SUBSTRING( str, PATINDEX('%:%', str) + 1,
PATINDEX('%,%', str) - PATINDEX('%:%', str) - 1 ) )
FROM t
)
SELECT SUBSTRING(str,2,LEN(str)-2) AS extracted_comment,
CASE WHEN json_extracted = SUBSTRING(str,2,LEN(str)-2)
THEN
'No'
ELSE
'Yes'
END AS "is_it_corrupted"
FROM t2
Demo

[EDIT] It wasn't practical to infer the field location in the JSON string based on length. Based on CHARINDEX search for the field names, this code finds and fixes the 'comments' in the JSON.
Data
drop table if exists #json_to_fix;
go
create table #json_to_fix(
json_col nvarchar(max));
declare #json nvarchar(max)=N'
{
"noteId": "a34f17c4-f4fd-45ea-b4da-732ef8126a6b",
"memberName": "Test LINKOUS",
"tenantId": "548bead1-bdab-e811-bce7-0003ff21d46b",
"noteType": "General Note",
"memberId": "84cf0adb-850d-e711-80c8-000d3a103f46",
"createdOn": "2020-09-13T17:47:33.2864868Z",
"comment": "test "word" test",
"contacts": [
{
"otherContactType": "",
"communicationType": ""
}
]
}';
insert #json_to_fix(json_col) values (#json);
Query
select s.not_escaped, fix.string_to_fix,
replace(fix.string_to_fix, '"', '') fixed
from #json_to_fix j
cross apply
(select charindex('"comment":', j.json_col, 1) strt_ndx) c_start
cross apply
(select charindex('"contacts"', j.json_col, c_start.strt_ndx) end_ndx) c_end
cross apply
(select substring(json_col, c_start.strt_ndx, c_end.end_ndx-c_start.strt_ndx-11) not_escaped) s
cross apply
(select substring(s.not_escaped, 13, len(s.not_escaped)-13) string_to_fix) fix
Output
not_escaped string_to_fix fixed
"comment": "test "word" test" test "word" test test word test

Related

DB2 LUW - JSON_Table reading json array data into tabular format

I am trying to convert JSON array data into tabular format using Json_Table function. I tried to run the below query but I am getting the following errors:
SQL‬‎/‪JSON‬‎ ‪scalar‬‎ ‪required‬‎.‪‬‎.‪‬‎ ‪SQLCODE‬‎=‪‬‎-‪16413‬‎,‪‬‎ ‪SQLSTATE‬‎=‪2203F‬‎,‪‬‎ ‪DRIVER‬‎=‪4‬‎.‪19‬‎.‪56).
The same query is working fine when the number of elements in Employees array is 1, but not otherwise.
SELECT E."id", E."name"
FROM JSON_TABLE
(
'{
"Employees": [
{
"id": 1,
"name": "Kathi"
},
{
"id": 2,
"name": "Pavan"
}
]
}', 'strict $' COLUMNS
(
"id" INTEGER PATH 'strict $.Employees[*].id'
, "name" VARCHAR(20) PATH 'strict $.Employees[*].name'
) ERROR ON ERROR
) AS E;
Just for the benefit of others who are looking for some examples, i found below link in internet, example 3 basically helps with my case. We need to use JSON_Table from SysTools schema.
https://www.worldofdb2.com/profiles/blogs/convert-json-data-to-relational-format
JSON_TABLE produces 1 row only per 1 json document.
You may use the following only:
"id" INTEGER PATH 'strict $.Employees[n].id'
, "name" VARCHAR(20) PATH 'strict $.Employees[n].name'
where n={0, 1} for an array of 2 elements as in your example.

Postgres jsonb field to array

I was going through the Postgres Jsonb documentation but was unable to find a solution for a small issue I'm having.
I've got a table : MY_TABLE
that has the following columns:
User, Name, Data and Purchased
One thing to note is that "Data" is a jsonb and has multiple fields. One of the fields inside of "Data" is "Attribute" but it is currently a string. How can I go about changing this to a list of strings?
I have tried using json_build_array but have not had any luck
So for example, I'd want my jsonb to look like :
{
"Id": 1,
"Attributes": ["Test"]
}
instead of
{
"Id": 1,
"Attributes": "Test"
}
I only care about the "Attributes" field inside of the Json, not any other fields.
I also want to ensure for some Attributes that have an empty string "Attributes": "", they get mapped to an empty list and not a list with an empty string ([] not [""])
You can use jsonb_set(), and some conditional logic for the empty string:
jsonb_set(
mycol,
'{Attributes}',
case when js ->> 'Attributes' <> ''
then jsonb_build_array(js ->> 'Attributes')
else '[]'::jsonb
end
)

BigQuery select expect double nested column

I am trying to remove a column from a BigQuery table and I've followed the instructions as stated here:
https://cloud.google.com/bigquery/docs/manually-changing-schemas#deleting_a_column_from_a_table_schema
This did not work directly as the column I'm trying to remove is nested twice in a struct. The following SO questions are relevant but none of them solve this exact case.
Single nested field:
BigQuery select * except nested column
Double nested field (solution has all fields in the schema enumerated, which is not useful for me as my schema is huge):
BigQuery: select * replace from multiple nested column
I've tried adapting the above solutions and I think I'm close but can't quite get it to work.
This one will remove the field, but returns only the nested field, not the whole table (for the examples I want to remove a.b.field_name. See the example schema at the end):
SELECT AS STRUCT * EXCEPT(a), a.* REPLACE (
(SELECT AS STRUCT a.b.* EXCEPT (field_name)) AS b
)
FROM `table`
This next attempt gives me an error: Scalar subquery produced more than one element:
WITH a_tmp AS (
SELECT AS STRUCT a.* REPLACE (
(SELECT AS STRUCT a.b.* EXCEPT (field_name)) AS b
)
FROM `table`
)
SELECT * REPLACE (
(SELECT AS STRUCT a.* FROM a_tmp) AS a
)
FROM `table`
Is there a generalised way to solve this? Or am I forced to use the enumerated solution in the 2nd link?
Example Schema:
[
{
"name": "a",
"type": "RECORD",
"fields": [
{
"name": "b",
"type": "RECORD"
"fields": [
{
"name": "field_name",
"type": "STRING"
},
{
"name": "other_field_name".
"type": "STRING"
}
]
},
]
}
]
I would like the final schema to be the same but without field_name.
Below is for BigQuery Standard SQL
#standardSQL
SELECT * REPLACE(
(SELECT AS STRUCT(SELECT AS STRUCT a.b.* EXCEPT (field_name)) b)
AS a)
FROM `project.dataset.table`
you can test, play with it using dummy data as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT STRUCT<b STRUCT<field_name STRING, other_field_name STRING>>(STRUCT('1', '2')) a
)
SELECT * REPLACE(
(SELECT AS STRUCT(SELECT AS STRUCT a.b.* EXCEPT (field_name)) b)
AS a)
FROM `project.dataset.table`

Issues with JSON_EXTRACT in Presto for keys containing ' ' character

I'm using Presto(0.163) to query data and am trying to extract fields from a json.
I have a json like the one given below, which is present in the column 'style_attributes':
"attributes": {
"Brand Fit Name": "Regular Fit",
"Fabric": "Cotton",
"Fit": "Regular",
"Neck or Collar": "Round Neck",
"Occasion": "Casual",
"Pattern": "Striped",
"Sleeve Length": "Short Sleeves",
"Tshirt Type": "T-shirt"
}
I'm unable to extract field 'Short Sleeves'.
Below is the query i'm using:
Select JSON_EXTRACT(style_attributes,'$.attributes.Sleeve Length') as length from table;
The query fails with the following error- Invalid JSON path: '$.attributes.Sleeve Length'
For fields without ' '(space), query is running fine.
I tried to find the resolution in the Presto documentation, but with no success.
presto:default> select json_extract_scalar('{"attributes":{"Sleeve Length": "Short Sleeves"}}','$.attributes["Sleeve Length"]');
_col0
---------------
Short Sleeves
or
presto:default> select json_extract_scalar('{"attributes":{"Sleeve Length": "Short Sleeves"}}','$["attributes"]["Sleeve Length"]');
_col0
---------------
Short Sleeves
JSON Function Changes
The :func:json_extract and :func:json_extract_scalar functions now
support the square bracket syntax:
SELECT json_extract(json, '$.store[book]');
SELECT json_extract(json,'$.store["book name"]');
As part of this change, the set of characters
allowed in a non-bracketed path segment has been restricted to
alphanumeric, underscores and colons. Additionally, colons cannot be
used in a un-quoted bracketed path segment. Use the new bracket syntax
with quotes to match elements that contain special characters.
https://github.com/prestodb/presto/blob/c73359fe2173e01140b7d5f102b286e81c1ae4a8/presto-docs/src/main/sphinx/release/release-0.75.rst
SELECT
tags -- It is column with Json string data
,json_extract(tags , '$.Brand') AS Brand
,json_extract(tags , '$.Portfolio') AS Portfolio
,cost
FROM
TableName
Sample data for tags - {"Name": "pxyblob", "Owner": "", "Env": "prod", "Service": "", "Product": "", "Portfolio": "OPSXYZ", "Brand": "Limo", "AssetProtectionLevel": "", "ComponentInfo": ""}
Here is your Correct answer.
Let Say:
JSON : {"Travel Date":"2017-9-22", "City": "Seattle"}
Column Name: ITINERARY
And i wana extract 'Travel Date' form the current JSON then:
Query: SELECT JSON_EXTRACT(ITINERARY, "$.\"Travel Date\"") from Table
Note: Just add \" at starting and end of the key name.
Hope this will surely work for you need. :)

Get JSON_VALUE with Oracle SQL when multiple nodes share the same name

I have an issue where I have some JSON stored in my oracle database, and I need to extract values from it.
The problem is, there are some fields that are duplicated.
When I try this, it works as there is only one firstname key in the options array:
SELECT
JSON_VALUE('{"increment_id":"2500000043","item_id":"845768","options":[{"firstname":"Kevin"},{"lastname":"Test"}]}', '$.options.firstname') AS value
FROM DUAL;
Which returns 'Kevin'.
However, when there are two values for the firstname field:
SELECT JSON_VALUE('{"increment_id":"2500000043","item_id":"845768","options":[{"firstname":"Kevin"},{"firstname":"Okay"},{"lastname":"Test"}]}', '$.options.firstname') AS value
FROM DUAL;
It only returns NULL.
Is there any way to select the first occurence of 'firstname' in this context?
JSON_VALUE returns one SQL VALUE from the JSON data (or SQL NULL if the key does not exists).
If you have a collection of values (a JSON array) an you want one specific item of the array you use array subscripts (square brackets) like in JavaScript, for example [2] to select the third item. [0] selects the first item.
To get the first array item in your example you have to change the path expression from '$.options.firstname' to '$.options[0].firstname'
You can follow this query:-
SELECT JSON_VALUE('{
"increment_id": "2500000043",
"item_id": "845768",
"options": [
{
"firstname": "Kevin"
},
{
"firstname": "Okay"
},
{
"lastname": "Test"
}
]
}', '$.options[0].firstname') AS value
FROM DUAL;