Check if one (or more) value in a list contain in JSON in redshift - sql

I have a column in redshift that contains a JSON.
I want to create a query that checks if one value in a list exists in the JSON, and then returns the path of the value.
For example I habe those list and JSON:
possible_values = ["html","php","python"]
json = {"encoding":"ASCI","web":{"code_language":"php"}}
So I'm expected for this result:
TRUE, path: "web\code_language"
Does it possible?

Related

Filter based on 2 JSON properties after using json_extract

Imported database tables :
id | JSON
-------------|---------
Signed 32int | Raw JSON
It is easier to search via the properties of the JSON data than by id of the row itself. Each piece of JSON data contains (for this demo):
json: {
displayProperties: {},
hash: "foo"
itemType: "bar"
}
When I select I would like to matching hash, and then filter those results by a matching itemType.
My query :
SELECT json_extract(ItemDefinition.json, '$')
FROM ItemDefinition, json_tree(ItemDefinition.json, '$')
WHERE json_tree.key = 'hash' AND json_tree.value IN ${hashList}
However this returns every item that has a matching hash value. From here, I would like to also filter by key: itemType and value: "19". So I tried :
SELECT json_extract(ItemDefinition.json, '$')
FROM ItemDefinition, json_tree(ItemDefinition.json, '$')
WHERE json_tree.key = 'hash' AND json_tree.value IN ${hashList}
AND WHERE json_tree.key = 'itemType' AND json_tree.value = 19
But this isn't syntactically correct, let alone output what I am looking for. Error:
SQLITE_ERROR: near "WHERE": syntax error
The title of the question turned out to not be accurate to what I was looking for. I miss-understood what json_tree actually did. json_tree actually builds a new object with values that are filled in by the database.
What I was actually looking for was to filter by a specific value in the json column, which can be achieved by json_extract. json_extract('{column}', $.{filterValue}) will pull the raw json object out of the json column
This is the query that is working for me now:
SELECT json_extract(ItemDefinition.json, '$')
FROM ItemDefinition, json_tree(ItemDefinition.json, '$')
WHERE json_tree.key = 'hash'
AND json_tree.value IN ${hashList}
AND json_extract(ItemDefinition.json, '$.itemType') = 19
This selects the json column from ItemDefinition
Creates a json_tree from the json column
Filters results by json tree key and value
Finally filters by the property itemType from the raw json column

Is it possible to add index values to the rows of a postgreSQL query when expanding JSON array with json_array_elements?

I have a database of resume data in json format which I am trying to transform.
One of the sections in each jsib is work_history, this is in the form of a json array i.e.
"work_experience":[
{
"job_title":"title",
"job_description":"description"
},
{
"job_title":"title",
"job_description":"description"
}
]
I am iterating over each resume (json file) and importing this data into a new table using dbt and postgreSQL with each element of the array being a new row with the associated metadata of the resume. Here is the code I used for this
select
json_array_elements(rjt.raw_json::json -> 'data' -> 'work_experience') as we,
json_array_elements(rjt.raw_json::json -> 'data' -> 'work_experience') -> 'job_title' as "name",
rjt.uuid as uuid
from raw_json_table rjt
The last thing that I need to do is add a column that lists the index that each job came from within its individual workexperience array i.e. if a job was the third element in the array it would have a 2 in the "source_location" column. How can I generate this index such that it starts at 0 for each new json file.
Move the function to the FROM clause (where set-returning functions should be used). Then you can use with ordinality which also returns a column that indicates the index inside the array
select w.experience as we,
w.experience ->> 'job_title' as "name",
w.experience ->> 'job_description' as "description",
w.idx as "index",
rjt.uuid as uuid
from raw_json_table rjt
left join json_array_elements(rjt.raw_json::json -> 'data' -> 'work_experience') with ordinality
as w(experience, idx) on true
The left join is necessary so that rows from raw_json_table that don't contain array elements are still included.

Extracting JSON returns null (Presto Athena)

I'm working with SQL Presto in Athena and in a table I have a column named "data.input.additional_risk_data.basket" that has a json like this:
[
{
"data.input.additional_risk_data.basket.val.brand":null,
"data.input.additional_risk_data.basket.val.category":null,
"data.input.additional_risk_data.basket.val.item_reference":"26484651",
"data.input.additional_risk_data.basket.val.name":"Nike Force 1",
"data.input.additional_risk_data.basket.val.product_name":null,
"data.input.additional_risk_data.basket.val.published_date":null,
"data.input.additional_risk_data.basket.val.quantity":"1",
"data.input.additional_risk_data.basket.val.size":null,
"data.input.additional_risk_data.basket.val.subCategory":null,
"data.input.additional_risk_data.basket.val.unit_price":769.0,
"data.input.additional_risk_data.basket.val.upc":null,
"data.input.additional_risk_data.basket.val.url":null
}
]
I need to extract some of the data there, for example data.input.additional_risk_data.basket.val.item_reference. I'm not used to working with jsons but I tried a few things:
json_extract("data.input.additional_risk_data.basket", '$.data.input.additional_risk_data.basket.val.item_reference')
json_extract_scalar("data.input.additional_risk_data.basket", '$.data.input.additional_risk_data.basket.val.item_reference)
They all returned null. I'm wondering what is the correct way to get the values from that json
Thank you!
There are multiple "problems" with your data and json path selector. Keys are not conventional (and I have not found a way to tell athena to escape them) and your json is actually an array of json objects. What you can do - cast data to an array and process it. For example:
-- sample data
WITH dataset (json_val) AS (
VALUES (json '[
{
"data.input.additional_risk_data.basket.val.brand":null,
"data.input.additional_risk_data.basket.val.category":null,
"data.input.additional_risk_data.basket.val.item_reference":"26484651",
"data.input.additional_risk_data.basket.val.name":"Nike Force 1",
"data.input.additional_risk_data.basket.val.product_name":null,
"data.input.additional_risk_data.basket.val.published_date":null,
"data.input.additional_risk_data.basket.val.quantity":"1",
"data.input.additional_risk_data.basket.val.size":null,
"data.input.additional_risk_data.basket.val.subCategory":null,
"data.input.additional_risk_data.basket.val.unit_price":769.0,
"data.input.additional_risk_data.basket.val.upc":null,
"data.input.additional_risk_data.basket.val.url":null
}
]')
)
--query
select arr[1]['data.input.additional_risk_data.basket.val.item_reference'] item_reference -- or use unnest if there are actually more than 1 element in array expected
from(
select cast(json_val as array(map(varchar, json))) arr
from dataset
)
Output:
item_reference
"26484651"

POSTGRES JSON: Updating array value in column

I am using POSTGRES SQL JSON.
In json column the value is stored as array which I want to update using SQL query
{"roles": ["Admin"]}
The output in table column should be
{"roles": ["SYSTEM_ADMINISTRATOR"]}
I tried different queries but it is not working.
UPDATE public.bo_user
SET json = jsonb_set(json, '{roles}', to_jsonb('SYSTEM_ADMINISTRATOR')::jsonb, true);
UPDATE public.bo_user
SET json = jsonb_set(json, '{roles}', to_jsonb('["SYSTEM_ADMINISTRATOR"]')::jsonb, true);
ERROR: could not determine polymorphic type because input has type unknown
SQL state: 42804
Kindly help me with the query
but at the moment it is to update the value at 0 index
That can be done using an index based "path" for jsonb_set()
update bo_user
set "json" = jsonb_set("json", '{roles,0}'::text[], '"SYSTEM_ADMINISTRATOR"')
where "json" #>> '{roles,0}' = 'Admin'
The "path" '{roles,0}' references the first element in the array and that is replaced with the constant "SYSTEM_ADMINISTRATOR"' Note the double quotes inside the SQL string literal which are required for a valid JSON string
The WHERE clause ensures that you don't accidentally change the wrong value.
So this worked.
UPDATE public.bo_user
SET json = jsonb_set(json, '{roles}', ('["SYSTEM_ADMINISTRATOR"]')::jsonb, true)
where id = '??';

Get all entries for a specific json tag only in postgresql

I have a database with a json field which has multiple parts including one called tags, there are other entries as below but I want to return only the fields with "{"tags":{"+good":true}}".
"{"tags":{"+good":true}}"
"{"has_temps":false,"tags":{"+good":true}}"
"{"tags":{"+good":true}}"
"{"has_temps":false,"too_long":true,"too_long_as_of":"2016-02-12T12:28:28.238+00:00","tags":{"+good":true}}"
I can get part of the way there with this statement in my where clause trips.metadata->'tags'->>'+good' = 'true' but that returns all instances where tags are good and true including all entries above. I want to return entries with the specific statement "{"tags":{"+good":true}}" only. So taking out the two entries that begin has_temps.
Any thoughts on how to do this?
With jsonb column the solution is obvious:
with trips(metadata) as (
values
('{"tags":{"+good":true}}'::jsonb),
('{"has_temps":false,"tags":{"+good":true}}'),
('{"tags":{"+good":true}}'),
('{"has_temps":false,"too_long":true,"too_long_as_of":"2016-02-12T12:28:28.238+00:00","tags":{"+good":true}}')
)
select *
from trips
where metadata = '{"tags":{"+good":true}}';
metadata
-------------------------
{"tags":{"+good":true}}
{"tags":{"+good":true}}
(2 rows)
If the column's type is json then you should cast it to jsonb:
...
where metadata::jsonb = '{"tags":{"+good":true}}';
If I get you right, you can check text value of the "tags" key, like here:
select true
where '{"has_temps":false,"too_long":true,"too_long_as_of":"2016-02-12T12:28:28.238+00:00","tags":{"+good":true}}'::json->>'tags'
= '{"+good":true}'