Migrating data from jsonb to integer[] SQL - sql

I have jsonb field(data) in Postgresql with a structure like:
{ "id" => { "some_key" => [1, 2, 3] } }
I need to migrate the value to a different field.
t.jsonb "data"
t.integer "portals", default: [], array: true
When I'm trying to do like this:
UPDATE table_name
SET portals = ARRAY[data -> '1' ->> 'portals']
WHERE id = 287766
It raises an error:
Caused by PG::DatatypeMismatch: ERROR: column "portals" is of type integer[] but expression is of type text[]

Here is one way to do it. But if you search the site, as you should had to do, you get more.
Schema
create table t (
data jsonb
);
insert into t values ('{"1" : { "k1" : [1,2,3,5]} }');
insert into t values ('{"2" : { "k2" : [4,5,6,7]} }');
create table i (
id int,
v int[]
)
Some tests
select data -> '1' -> 'k1'
from t
where data ? '1'
;
insert into i values(1,ARRAY[1,2,3]);
update i
set v = (select replace(replace(data -> '1' ->> 'k1', '[', '{'), ']', '}')::int[] from t where data ? '1')
where id = 1;
select * from i;
The above gets array as a text, as you did. After that, just some text replacements to cast the text to an integer array literal.
DB Fiddle

Related

Query key values in a json column

I have a table "jobs" with one of the columns called "check_list" ( varchar(max) that has JSON values, an example value would be
{
"items":[
{
"name":"machine 1",
"state":"",
"comment":"",
"isReleaseToProductionCheck":true,
"mnachine_id":10
},
{
"name":"machine 2",
"state":"",
"comment":"",
"isReleaseToProductionCheck":true,
"machine_id":12
}
]
}
Now how would I write a SQL query to only return the rows where the column "check_list" has items[machine_id] = 12
In the end after some trial and error this was the solution that worked for me. I had to add the ISJSON check because some of the older data was invalid
WITH jobs (id, workorder, selectedMachine) AS(
SELECT
[id],
[workorder],
(
select
*
from
openjson(check_list, '$.items') with (machine_id int '$.machine_id')
where
machine_id = 12
) as selectedMachine
FROM
engineering_job_schedule
WHERE
ISJSON(check_list) > 0
)
Select
*
from
jobs
where
selectedMachine = 12

Expanding a Struct of Struct to columns in bigquery

I am working with a BQ table that has a format of a STRUCT of STRUCTs.
It looks as follows:
I would like to have a table which looks like follows:
property_hs_email_last_click_date_value
currentlyinworkflow_value
hs_first_engagement_object_id_value
hs_first_engagement_object_id_value__st
5/5/2022 23:00:00
Y
1
'Hey'
The challenge is that there are 500 fields and I would like to make this efficient instead of writing out every single line as follows:
SELECT property_hs_email_last_click_date as property_hs_email_last_click_date_value,
properties.currentlyinworkflow.value as currentlyinworkflow_value,
properties.hs_first_engagement_object_id.value as properties.hs_first_engagement_object_id_value,
properties.hs_first_engagement_object_id.value__st as hs_first_engagement_object_id_value__st
Any suggestions on how to make this more efficient?
Edit:
Here's a query that creates a table such as this:
create or replace table `project.database.TestTable` (
property_hs_email_last_click_date STRUCT < value string >,
properties struct < currentlyinworkflow struct < value string > ,
hs_first_engagement_object_id struct < value numeric , value__st string >,
first_conversion_event_name struct < value string >
>
);
insert into `project.database.TestTable`
values (struct('12/2/2022 23:00:02'), struct(struct('Yes'), struct(1, 'Thursday'), struct('Festival')) );
insert into `project.database.TestTable`
values (struct('14/2/2021 12:00:02'), struct(struct('No'), struct(5, 'Friday'), struct('Phone')) )
Below is quite generic script that extracts all leaves in JSON and then presents them as columns
create temp function extract_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function extract_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
create temp function extract_all_leaves(input string) returns string language js as '''
function flattenObj(obj, parent = '', res = {}){
for(let key in obj){
let propName = parent ? parent + '.' + key : key;
if(typeof obj[key] == 'object'){
flattenObj(obj[key], propName, res);
} else {
res[propName] = obj[key];
}
}
return JSON.stringify(res);
}
return flattenObj(JSON.parse(input));
''';
create temp table temp_table as (
select offset, key, value, format('%t', t) row_id
from your_table t,
unnest([struct(to_json_string(t) as json)]),
unnest([struct(extract_all_leaves(json) as leaves)]),
unnest(extract_keys(leaves)) key with offset
join unnest(extract_values(leaves)) value with offset
using(offset)
);
execute immediate (select '''
select * except(row_id) from (select * except(offset) from temp_table)
pivot (any_value(value) for replace(key, '.', '__') in (''' || keys_list || '''
))'''
from (select string_agg('"' || replace(key, '.', '__') || '"', ',' order by offset) keys_list from (
select key, min(offset) as offset from temp_table group by key
))
);
if applied to sample data in your question
create temp table your_table as (
select struct('12/2/2022 23:00:02' as value) as property_hs_email_last_click_date ,
struct(
struct('Yes' as value) as currentlyinworkflow ,
struct(1 as value, 'Thursday' as value__st) as hs_first_engagement_object_id ,
struct('Festival' as value) as first_conversion_event_name
) as properties
union all
select struct('14/2/2021 12:00:02'), struct(struct('No'), struct(5, 'Friday'), struct('Phone'))
);
the output is

How to transform jsonb object to an array of objects?

From this:
"data": {
"media_object_uuid": ["5171167e-c109-4926-9606-5212ee250e2f"]
}
to this:
"data": {
"media_object":[{"media_object_uuid": "5171167e-c109-4926-9606-5212ee250e2f"]
}
In words, I want to extract the first value of this array and set it on the new field media_object_uuid inside media_object. My approach to resolve this was:
update demo_test set data = jsonb_set(data,'{media_object_uuid}',('{"media_object": { "media_object_uuid":' || (data->"media_object_uuid"[0])::text || '}}'));
But I have in return media_object_uuid column doesn't exist
I think you need to call data->"media_object_uuid" as data->'media_oject_uuid', since in Postgres, double-quotes are used to refer to column/entity names (unless they are first encapsulated in single-quotes):
edb=# create table abc (data jsonb);
CREATE TABLE
edb=# insert into abc values ('{
edb'# "media_object_uuid": ["5171167e-c109-4926-9606-5212ee250e2f"]
edb'# }');
INSERT 0 1
edb=# select ('{"media_object": [{ "media_object_uuid":' || (data->'media_object_uuid')::text || '}]}') from abc;
?column?
-------------------------------------------------------------------------------------
{"media_object": [{ "media_object_uuid":["5171167e-c109-4926-9606-5212ee250e2f"]}]}
(1 row)
edb=# update abc set data = jsonb_set(data,'{media_object_uuid}', ('{"media_object": [{ "media_object_uuid":' || (data->'media_object_uuid')::text || '}]}')::jsonb);
UPDATE 1
edb=# select * from abc;
data
------------------------------------------------------------------------------------------------------------
{"media_object_uuid": {"media_object": [{"media_object_uuid": ["5171167e-c109-4926-9606-5212ee250e2f"]}]}}
(1 row)

Snowflake get_path() or flatten() array query - to find latest key:value

I have a column 'amp' in a table 'EXAMPLE'. Column 'amp' is an array which looks like this:
[{
"list": [{
"element": {
"x_id": "12356789XXX",
"y_id": "12356789XXX38998",
}
},
{
"element": {
"x_id": "5677888356789XXX",
"y_id": "1XXX387688",
}
}]
}]
How should I query using get_path() or flatten() to extract the latest x_id and y_id value (or other alternative)
In this example it is only 2 elements, but there could 1 to 6000 elements containing x_id and y_id.
Help much appreciated!
Someone may have a more elegant way than this, but you can use a CTE. In the first table expression, grab the max of the array. In the second part, grab the values you need.
set json = '[{"list": [{"element": {"x_id": "12356789XXX","y_id": "12356789XXX38998"}},{"element": {"x_id": "5677888356789XXX","y_id": "1XXX387688",}}]}]';
create temp table foo(v variant);
insert into foo select parse_json($json);
with
MAX_INDEX(M) as
(
select max("INDEX") MAX_INDEX
from foo, lateral flatten(v, recursive => true)
),
VALS(V, P, K) as
(
select "VALUE", "PATH", "KEY"
from foo, lateral flatten(v, recursive => true)
)
select k as "KEY", V::string as VALUE from vals, max_index
where VALS.P = '[0].list[' || max_index.m || '].element.x_id' or
VALS.P = '[0].list[' || max_index.m || '].element.y_id'
;
Assuming that the outer array ALWAYS contains a single dictionary element, you could use this:
SELECT amp[0]:"list"[ARRAY_SIZE(amp[0]:"list")-1]:"element":"x_id"::VARCHAR AS x_id
,amp[0]:"list"[ARRAY_SIZE(amp[0]:"list")-1]:"element":"y_id"::VARCHAR AS y_id
FROM T
;
Or if you prefer a bit more modularity/readability, you could use this:
WITH CTE1 AS (
SELECT amp[0]:"list" AS _ARRAY
FROM T
)
,CTE2 AS (
SELECT _ARRAY[ARRAY_SIZE(_ARRAY)-1]:"element" AS _DICT
FROM CTE1
)
SELECT _DICT:"x_id"::VARCHAR AS x_id
,_DICT:"y_id"::VARCHAR AS y_id
FROM CTE2
;
Note: I have not used FLATTEN here because I did not see a good reason to use it.

How To Split Pipe-Delimited Column and insert each value into new table Once?

I have an old database with a gazillion records (more or less) that have a single tags column (with tags being pipe-delimited) that looks like so:
Breakfast
Breakfast|Brunch|Buffet|Burger|Cakes|Crepes|Deli|Dessert|Dim Sum|Fast Food|Fine Wine|Spirits|Kebab|Noodles|Organic|Pizza|Salad|Seafood|Steakhouse|Sushi|Tapas|Vegetarian
Breakfast|Brunch|Buffet|Burger|Deli|Dessert|Fast Food|Fine Wine|Spirits|Noodles|Pizza|Salad|Seafood|Steakhouse|Vegetarian
Breakfast|Brunch|Buffet|Cakes|Crepes|Dessert|Fine Wine|Spirits|Salad|Seafood|Steakhouse|Tapas|Teahouse
Breakfast|Brunch|Burger|Crepes|Salad
Breakfast|Brunch|Cakes|Dessert|Dim Sum|Noodles|Pizza|Salad|Seafood|Steakhouse|Vegetarian
Breakfast|Brunch|Cakes|Dessert|Dim Sum|Noodles|Pizza|Salad|Seafood|Vegetarian
Breakfast|Brunch|Deli|Dessert|Organic|Salad
Breakfast|Brunch|Dessert|Dim Sum|Hot Pot|Seafood
Breakfast|Brunch|Dessert|Dim Sum|Seafood
Breakfast|Brunch|Dessert|Fine Wine|Spirits|Noodles|Pizza|Salad|Seafood
Breakfast|Brunch|Dessert|Fine Wine|Spirits|Salad|Vegetarian
Is there a way one could retrieve each tag and insert it into a new table tag_id | tag_nm using MySQL only?
Here is my attempt which uses PHP..., I imagine this could be more efficient with a clever MySQL query. I've placed the relationship part of it there too. There's no escaping and error checking.
$rs = mysql_query('SELECT `venue_id`, `tag` FROM `venue` AS a');
while ($row = mysql_fetch_array($rs)) {
$tag_array = explode('|',$row['tag']);
$venueid = $row['venue_id'];
foreach ($tag_array as $tag) {
$rs2 = mysql_query("SELECT `tag_id` FROM `tag` WHERE tag_nm = '$tag'");
$tagid = 0;
while ($row2 = mysql_fetch_array($rs2)) $tagid = $row2['tag_id'];
if (!$tagid) {
mysql_execute("INSERT INTO `tag` (`tag_nm`) VALUES ('$tag')");
$tagid = mysql_insert_id;
}
mysql_execute("INSERT INTO `venue_tag_rel` (`venue_id`, `tag_id`) VALUES ($venueid, $tagid)");
}
}
After finding there is no official split function I've solved the issue using only MySQL like so:
1: I created the function strSplit
CREATE FUNCTION strSplit(x varchar(21845), delim varchar(255), pos int) returns varchar(255)
return replace(
replace(
substring_index(x, delim, pos),
substring_index(x, delim, pos - 1),
''
),
delim,
''
);
Second I inserted the new tags into my new table (real names and collumns changed, to keep it simple)
INSERT IGNORE INTO tag (SELECT null, strSplit(`Tag`,'|',1) AS T FROM `old_venue` GROUP BY T)
Rinse and repeat increasing the pos by one for each collumn (in this case I had a maximum of 8 seperators)
Third to get the relationship
INSERT INTO `venue_tag_rel`
(Select a.`venue_id`, b.`tag_id` from `old_venue` a, `tag` b
WHERE
(
a.`Tag` LIKE CONCAT('%|',b.`tag_nm`)
OR a.`Tag` LIKE CONCAT(b.`tag_nm`,'|%')
OR a.`Tag` LIKE CONCAT(CONCAT('%|',b.`tag_nm`),'|%')
OR a.`Tag` LIKE b.`tag_nm`
)
)