Redshift Postgresql - How to Parse Nested JSON - sql

I am trying to parse a JSON text using JSON_EXTRACT_PATH_TEXT() function.
JSON sample:
{
"data":[
{
"name":"ping",
"idx":0,
"cnt":27,
"min":16,
"max":33,
"avg":24.67,
"dev":5.05
},
{
"name":"late",
"idx":0,
"cnt":27,
"min":8,
"max":17,
"avg":12.59,
"dev":2.63
}
]
}
'
I tried JSON_EXTRACT_PATH_TEXT(event , '{"name":"late"}', 'avg') function to get 'avg' for name = "late", but it returns blank.
Can anyone help, please?
Thanks

This is a rather complicated task in Redshift, that, unlike Postgres, has poor support to manage JSON, and no function to unnest arrays.
Here is one way to do it using a number table; you need to populate the table with incrementing numbers starting at 0, like:
create table nums as
select 0 i union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 n union all select 6
union all select 7 union all select 8 union all select 9
;
Once the table is created, you can use it to walk the JSON array using json_extract_array_element_text(), and check its content with json_extract_path_text():
select json_extract_path_text(item, 'avg') as my_avg
from (
select json_extract_array_element_text(t.items, n.i, true) as item
from (
select json_extract_path_text(mycol, 'data', true ) as items
from mytable
) t
inner join nums n on n.i < json_array_length(t.items, true)
) t
where json_extract_path_text(item, 'name') = 'late';

You'll need to use json_array_elements for that:
select obj->'avg'
from foo f, json_array_elements(f.event->'data') obj
where obj->>'name' = 'late';
Working example
create table foo (id int, event json);
insert into foo values (1,'{
"data":[
{
"name":"ping",
"idx":0,
"cnt":27,
"min":16,
"max":33,
"avg":24.67,
"dev":5.05
},
{
"name":"late",
"idx":0,
"cnt":27,
"min":8,
"max":17,
"avg":12.59,
"dev":2.63
}]}');

Related

Is there something like Spark's unionByName in BigQuery?

I'd like to concatenate tables with different schemas, filling unknown values with null.
Simply using UNION ALL of course does not work like this:
WITH
x AS (SELECT 1 AS a, 2 AS b ),
y AS (SELECT 3 AS b, 4 AS c )
SELECT * FROM x
UNION ALL
SELECT * FROM y
a b
1 2
3 4
(unwanted result)
In Spark, I'd use unionByName to get the following result:
a b c
1 2
3 4
(wanted result)
Of course, I can manually create the needed query (adding nullss) in BigQuery like so:
SELECT a, b, NULL c FROM x
UNION ALL
SELECT NULL a, b, c FROM y
But I'd prefer to have a generic solution, not requiring me to generate something like that.
So, is there something like unionByName in BigQuery? Or can one come up with a generic SQL function for this?
Consider below approach (I think it is as generic as one can get)
create temp function json_extract_keys(input string) returns array<string> language js as """
return Object.keys(JSON.parse(input));
""";
create temp function json_extract_values(input string) returns array<string> language js as """
return Object.values(JSON.parse(input));
""";
create temp table temp_table as (
select json, key, value
from (
select to_json_string(t) json from table_x as t
union all
select to_json_string(t) from table_y as t
) t, unnest(json_extract_keys(json)) key with offset
join unnest(json_extract_values(json)) value with offset
using(offset)
order by key
);
execute immediate(select '''
select * except(json) from temp_table
pivot (any_value(value) for key in ("''' || string_agg(distinct key, '","') || '"))'
from temp_table
)
if applied to sample data in your question - output is

PostgreSQL: Select unique rows where distinct values are in list

Say that I have the following table:
with data as (
select 'John' "name", 'A' "tag", 10 "count"
union all select 'John', 'B', 20
union all select 'Jane', 'A', 30
union all select 'Judith', 'A', 40
union all select 'Judith', 'B', 50
union all select 'Judith', 'C', 60
union all select 'Jason', 'D', 70
)
I know there are a number of distinct tag values, namely (A, B, C, D).
I would like to select the unique names that only have the tag A
I can get close by doing
-- wrong!
select
distinct("name")
from data
group by "name"
having count(distinct tag) = 1
however, this will include unique names that only have 1 distinct tag, regardless of what tag is it.
I am using PostgreSQL, although having more generic solutions would be great.
You're almost there - you already have groups with one tag, now just test if it is the tag you want:
select
distinct("name")
from data
group by "name"
having count(distinct tag) = 1 and max(tag)='A'
(Note max could be min as well - SQL just doesn't have single() aggregate function but that's different story.)
You can use not exists here:
select distinct "name"
from data d
where "tag" = 'A'
and not exists (
select * from data d2
where d2."name" = d."name" and d2."tag" != d."tag"
);
This is one possible way of solving it:
select
distinct("name")
from data
where "name" not in (
-- create list of names we want to exclude
select distinct name from data where "tag" != 'A'
)
But I don't know if it's the best or most efficient one.

Hierarchically aggregate JSON depending on value in row using PostgreSQL10

I have a PostgreSQL 10 table that works as a "dictionary" and is structured as follows:
key
value
style_selection_color
style_selection_weight
style_line_color
style_line_weight
...
Now I was wondering if there is a way of building a JSON with the values in the table where it would build a hierarchy depending on the value of "key"?
Something like:
style --> selection --> color and
style --> line --> color
Ending up with a JSON:
{
style: [
selection: {
color: "...",
weight: "..."
},
line: {
color: "...",
weight: "..."
}
]
}
Is such a feat achievable? If so, how would I go about it?
Could it be done so that regardless of what keys I have in my table it always returns the JSON properly built?
Thanks in advance
Working solution with PosGres 10 and above
I propose you a generic solution which convert the key data into text[] type so that it can be used as jsonpath inside the standard jsonb_set() function.
But as we will iterate on the jsonb_set() function, we need first to create an aggregate function associated to that function :
CREATE AGGREGATE jsonb_set_agg(p text[], z jsonb, b boolean)
( sfunc = jsonb_set
, stype = jsonb
, initcond = '{}'
)
Then we convert the key data into text[] and we automatically generate the list of jsonpath that will allow to build progressively and iteratively the final jsonb data :
SELECT i.id
, max(i.id) OVER (PARTITION BY t.key) AS id_max
, p.path[1 : i.id] AS jsonbpath
, to_jsonb(t.value) AS value
FROM mytable AS t
CROSS JOIN LATERAL string_to_array(t.key, '_') AS p(path)
CROSS JOIN LATERAL generate_series(1, array_length(p.path, 1)) AS i(id)
The final query looks like this :
WITH list AS
( SELECT i.id
, max(i.id) OVER (PARTITION BY t.key) AS id_max
, p.path[1 : i.id] AS jsonpath
, to_jsonb(t.value) AS value
FROM mytable AS t
CROSS JOIN LATERAL string_to_array(t.key, '_') AS p(path)
CROSS JOIN LATERAL generate_series(1, array_length(p.path, 1)) AS i(id)
)
SELECT jsonb_set_agg( l.jsonpath
, CASE
WHEN l.id = l.id_max THEN l.value
ELSE '{}' :: jsonb
END
, true
ORDER BY l.id
)
FROM list AS l
And the result is slightly different from your expectation (the top-level json array is replaced by a json object) but it sounds like more logic to me :
{"style": {"line": {"color": "C"
, "weight": "D"
}
, "selection": {"color": "A"
, "weight": "B"
}
}
}
full test result in dbfiddle.
Well, I am not sure about Postgres version, hoping this would work on your version, I tried this on version 11.
;WITH dtbl as (
select split_part(tbl.col, '_', 1) as style,
split_part(tbl.col, '_', 2) as cls,
split_part(tbl.col, '_', 3) as property_name,
tbl.val
from (
select 'style_selection_color' as col, 'red' as val
union all
select 'style_selection_weight', '1rem'
union all
select 'style_line_color', 'gray'
union all
select 'style_line_weight', '200'
union all
select 'stil_line_weight', '200'
) as tbl
),
classes as (
select dtbl.style,
dtbl.cls,
(
SELECT json_object_agg(
nested_props.property_name, nested_props.val
)
FROM (
SELECT dtbl2.property_name,
dtbl2.val
FROM dtbl dtbl2
where dtbl2.style = dtbl.style
and dtbl2.cls = dtbl.cls
) AS nested_props
) AS properties
from dtbl
group by dtbl.style, dtbl.cls),
styles as (
select style
from dtbl
group by style
)
,
class_obj as (
select classes.style,
classes.cls,
json_build_object(
classes.cls, classes.properties) as cls_json
from styles
join classes on classes.style = styles.style
)
select json_build_object(
class_obj.style,
json_agg(class_obj.cls_json)
)
from class_obj
group by style
;
If you change the first part of the query to match your table and column names this should work.
The idea is to build the json objects nested, but you cannot do this on one pass, as it does not let you nest json_agg functions, this is why we have to use more than 1 query. first build line and selection objects then aggregate them in the style objects.
Sorry for the naming, this is the best I could do.
EDIT1:
This is the output of that query.
"{""stil"" : [{""line"" : [{""weight"" : ""200""}]}]}"
"{""style"" : [{""selection"" : [{""color"" : ""red""}, {""weight"" : ""1rem""}]}, {""line"" : [{""color"" : ""gray""}, {""weight"" : ""200""}]}]}"
Looking at this output, it is not what exactly you wanted, you got an array of objects for properties:)
You wanted {"color":"red", "weight": "1rem"} but the output is
[{"color":"red"}, {"weight": "1rem"}]
EDIT2:
Well, json_object_agg is the solution, so I combined json_object_agg to build the prop objects, now I am thinking this might be made even more simpler.
This is the new output from the query.
"{""stil"" : [{""line"" : { ""weight"" : ""200"" }}]}"
"{""style"" : [{""selection"" : { ""color"" : ""red"", ""weight"" : ""1rem"" }}, {""line"" : { ""color"" : ""gray"", ""weight"" : ""200"" }}]}"
This is trimmed down version, as I thought json_object_agg made things a bit more simpler, so I got rid off some subselects. Tested on postgres 10.
https://www.db-fiddle.com/f/tjzNBoQ3LTbECfEWb9Nrcp/0
;
WITH dtbl as (
select split_part(tbl.col, '_', 1) as style,
split_part(tbl.col, '_', 2) as cls,
split_part(tbl.col, '_', 3) as property_name,
tbl.val
from (
select 'style_selection_color' as col, 'red' as val
union all
select 'style_selection_weight', '1rem'
union all
select 'style_line_color', 'gray'
union all
select 'style_line_weight', '200'
union all
select 'stil_line_weight', '200'
) as tbl
),
result as (
select dtbl.style,
dtbl.cls,
json_build_object(dtbl.cls,
(
SELECT json_object_agg(
nested_props.property_name, nested_props.val
)
FROM (
SELECT dtbl2.property_name,
dtbl2.val
FROM dtbl dtbl2
where dtbl2.style = dtbl.style
and dtbl2.cls = dtbl.cls
) AS nested_props
)) AS cls_json
from dtbl
group by dtbl.style, dtbl.cls)
select json_build_object(
result.style,
json_agg(result.cls_json)
)
from result
group by style
;
You can think of dtbl as your main table, I just added a bonus row called stil similar to other rows, to make sure that grouping is correct.
Here is the output;
{"style":
[{"line":{"color":"gray", "weight":"200"}},
{"selection":{"color":"red","weight":"1rem"}}]
}
{"stil":[{"line":{"weight":"200"}}]}

Find and replace pattern inside BigQuery string

Here is my BigQuery table. I am trying to find out the URLs that were displayed but not viewed.
create table dataset.url_visits(ID INT64 ,displayed_url string , viewed_url string);
select * from dataset.url_visits;
ID Displayed_URL Viewed_URL
1 url11,url12 url12
2 url9,url12,url13 url9
3 url1,url2,url3 NULL
In this example, I want to display
ID Displayed_URL Viewed_URL unviewed_URL
1 url11,url12 url12 url11
2 url9,url12,url13 url9 url12,url13
3 url1,url2,url3 NULL url1,url2,url3
Split the each string into an array and unnest them. Do a case to check if the items are in each other and combine to an array or a string.
Select ID, string_agg(viewing ) as viewed,
string_agg(not_viewing ) as not_viewed,
array_agg(viewing ignore nulls) as viewed_array
from (
Select ID ,
case when display in unnest(split(Viewed_URL)) then display else null end as viewing,
case when display in unnest(split(Viewed_URL)) then null else display end as not_viewing,
from (
Select 1 as ID, "url11,url12" as Displayed_URL, "url12" as Viewed_URL UNION ALL
Select 2, "url9,url12,url13", "url9" UNION ALL
Select 3, "url1,url2,url3", NULL UNION ALL
Select 4, "url9,url12,url13", "url9,url12"
),unnest(split(Displayed_URL)) as display
)
group by 1
Consider below approach
select *, (
select string_agg(url)
from unnest(split(Displayed_URL)) url
where url != ifnull(Viewed_URL, '')
) unviewed_URL
from `project.dataset.table`
if applied to sample data in your question - output is

How do I combine 2 records with a single field into 1 row with 2 fields (Oracle 11g)?

Here's a sample data
record1: field1 = test2
record2: field1 = test3
The actual output I want is
record1: field1 = test2 | field2 = test3
I've looked around the net but can't find what I'm looking for. I can use a custom function to get it in this format but I'm trying to see if there's a way to make it work without resorting to that.
thanks a lot
You need to use pivot:
with t(id, d) as (
select 1, 'field1 = test2' from dual union all
select 2, 'field1 = test3' from dual
)
select *
from t
pivot (max (d) for id in (1, 2))
If you don't have the id field you can generate it, but you will have XML type:
with t(d) as (
select 'field1 = test2' from dual union all
select 'field1 = test3' from dual
), t1(id, d) as (
select ROW_NUMBER() OVER(ORDER BY d), d from t
)
select *
from t1
pivot xml (max (d) for id in (select id from t1))
There are several ways to approach this - google pivot rows to columns. Here is one set of answers: http://www.dba-oracle.com/t_converting_rows_columns.htm