Aggregate jsonb map with string array as value in postgresql - sql

I have postgresql table with a jsonb column containing maps with strings as keys and string arrays as values. I want to aggregate all the maps into a single jsonb map. There should be no duplicate values in string array. How can I do this in postgres.
Eg:
Input: {"a": ["1", "2"]}, {"a": ["2", "3"], "b": ["5"]}
Output: {"a": ["1", "2", "3"], "b": ["5"]}
I tried '||' operator but it overwrites values if there as same keys in maps.
Eg:
Input: SELECT '{"a": ["1", "2"]}'::jsonb || '{"a": ["3"], "b": ["5"]}'::jsonb;
Output: {"a": ["3"], "b": ["5"]}

Using jsonb_object_agg with a series of cross joins:
select jsonb_object_agg(t.key, t.a) from (
select v.key, jsonb_agg(distinct v1.value) a from objects o
cross join jsonb_each(o.tags) v
cross join jsonb_array_elements(v.value) v1
group by v.key) t
See fiddle.

You can use the jsonb_object_agg aggregate function to achieve this. The jsonb_object_agg function takes a set of key-value pairs and returns a JSONB object. You can use this function to aggregate all the maps into a single JSONB map by concatenating all the maps as key-value pairs. Here is an example query:
SELECT jsonb_object_agg(key, value)
FROM (
SELECT key, jsonb_agg(value) AS value
FROM (
SELECT key, value
FROM (
SELECT 'a' AS key, '["1", "2"]'::jsonb AS value
UNION ALL
SELECT 'a' AS key, '["3"]'::jsonb AS value
UNION ALL
SELECT 'b' AS key, '["5"]'::jsonb AS value
) subq
) subq2
GROUP BY key
) subq3;
This will give you the following result:
{"a": ["1", "2", "3"], "b": ["5"]}

Related

Key value table to json in BigQuery

Hey all,
I have a table that looks like this:
row
key
val
1
a
100
2
b
200
3
c
"apple
4
d
{}
I want to convert it into JSON:
{
"a": 100,
"b": 200,
"c": "apple",
"d": {}
}
Note: the number of lines can change so this is only an example
Thx in advanced !
With string manipulation,
WITH sample_table AS (
SELECT 'a' key, '100' value UNION ALL
SELECT 'b', '200' UNION ALL
SELECT 'c', '"apple"' UNION ALL
SELECT 'd', '{}'
)
SELECT '{' || STRING_AGG(FORMAT('"%s": %s', key, value)) || '}' json
FROM sample_table;
You can get following result similar to your expected output.

BigQuery - Correlated subquery unnesting array not working

I'm trying to join array elements in BigQuery but I am getting the following error message: Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
Imagine I have two mapping tables:
CREATE OR REPLACE TABLE `test.field_id_name` (
id STRING,
name STRING
) AS (
SELECT * FROM UNNEST(
[STRUCT("s1", "string1"),
STRUCT("s2", "string2"),
STRUCT("s3", "string3")]
)
)
CREATE OR REPLACE TABLE `test.field_values` (
id STRING,
name STRING
) AS (
SELECT * FROM UNNEST(
[STRUCT("v1", "val1"),
STRUCT("v2", "val2"),
STRUCT("v3", "val3")]
)
)
And I have the following as input:
CREATE OR REPLACE TABLE `test.input` AS
SELECT [
STRUCT<id STRING, value ARRAY<STRING>>("s1", ["v1"]),
STRUCT("s2", ["v1"]),
STRUCT("s3", ["v1"])
] records
UNION ALL
SELECT [
STRUCT("s1", ["v1", "v2"]),
STRUCT("s2", ["v1", "v2"]),
STRUCT("s3", ["v1", "v2"])
]
UNION ALL
SELECT [
STRUCT("s1", ["v1", "v2", "v3"]),
STRUCT("s2", ["v1", "v2", "v3"]),
STRUCT("s3", ["v1", "v2", "v3"])
]
I am trying to produce this output:
SELECT [
STRUCT<id_mapped STRING, value_mapped ARRAY<STRING>>("string1", ["val1"]),
STRUCT("string2", ["val1"]),
STRUCT("string3", ["val1"])
] records
UNION ALL
SELECT [
STRUCT("string1", ["val1", "val2"]),
STRUCT("string2", ["val1", "val2"]),
STRUCT("string3", ["val1", "val2"])
]
UNION ALL
SELECT [
STRUCT("string1", ["val1", "val2", "val3"]),
STRUCT("string2", ["val1", "val2", "val3"]),
STRUCT("string3", ["val1", "val2", "val3"])
]
However the following query is failing with the correlated subqueries error.
SELECT
ARRAY(
SELECT
STRUCT(fin.name, ARRAY(SELECT fv.name FROM UNNEST(value) v JOIN test.field_values fv ON (v = fv.id)))
FROM UNNEST(records) r
JOIN test.field_id_name fin ON (fin.id = r.id)
)
FROM test.input
Below is for BigQuery Standard SQL
#standardSQL
SELECT ARRAY_AGG(STRUCT(id AS id_mapped, val AS value_mapped)) AS records
FROM (
SELECT fin.name AS id, ARRAY_AGG(fv.name) AS val, FORMAT('%t', t) id1, FORMAT('%t', RECORD) id2
FROM `test.input` t,
UNNEST(records) record,
UNNEST(value) val
JOIN `test.field_id_name` fin ON record.id = fin.id
JOIN `test.field_values` fv ON val = fv.id
GROUP BY id, id1, id2
)
GROUP BY id1
If to apply to sample data from your question - returns exact output you expecting

pivot from multiple rows to multiple columns in hive

I have a hive table like following
(id:int, vals: Map<String, int> , type: string)
id, vals, type
1, {"foo": 1}, "a"
1, {"foo": 2}, "b"
2, {"foo": 3}, "a"
2, {"foo": 1}, "b"
Now, there are only two types
I want to change this to following schema
id, type_a_vals, type_b_vals
1, {"foo", 1}, {"foo": 2}
2, {"foo": 3}, {"foo": 1}
and if any "type" is missing, it can be null?
An easy way keeping in mind the map column would be a self join.
select ta.id,ta.vals,tb.vals
from (select * from tbl where type = 'a') ta
full join (select * from tbl where type = 'b') tb on ta.id = tb.id
You can use conditional aggregation to solve questions like these as below. However, doing so on a map column would produce an error.
select id
,max(case when type = 'a' then vals end) as type_a_vals
,max(case when type = 'b' then vals end) as type_a_vals
from tbl
group by id

How to use a subset of the row columns when converting to JSON?

I have a table t with some columns a, b and c. I use the following query to convert rows to a JSON array of objects:
SELECT COALESCE(JSON_AGG(t ORDER BY c), '[]'::json)
FROM t
This returns as expected:
[
{
"a": ...,
"b": ...,
"c": ...
},
{
"a": ...,
"b": ...,
"c": ...
}
]
Now I want the same result, but with only columns a and b in the output. I will still use column c for ordering. The best I came up with is as following:
SELECT COALESCE(JSON_AGG(JSON_BUILD_OBJECT('a', a, 'b', b) ORDER BY c), '[]'::json)
FROM t
[
{
"a": ...,
"b": ...
},
{
"a": ...,
"b": ...
}
]
Although this works fine, I am wondering if there is a more elegant way to do this. It frustrates me that I have to manually define the JSON properties. Of course, I understand that I have to enumerate the columns a and b, but it's odd that I have to copy/paste the corresponding JSON property name, which is exactly the same as the column name anyway.
Is there a another way to do this?
You can use row_to_json instead of manually building object:
CREATE TABLE foobar (a text, b text, c text);
INSERT INTO foobar VALUES
('1', 'LOREM', 'A'),
('2', 'LOREM', 'B'),
('3', 'LOREM', 'C');
--Using CTE
WITH tmp AS (
SELECT a, b FROM foobar ORDER BY c
)
SELECT json_agg(row_to_json(t)) FROM tmp t
--Using subquery
SELECT
json_agg(row_to_json(t))
FROM
(SELECT a, b FROM foobar ORDER BY c) t;
EDIT: As you stated, result order is a strict requirement. In this case you could use a row constructor to build json object:
--Using a type to build json with desired keys
CREATE TYPE mytype AS (a text, b text);
SELECT
json_agg(
to_json(
CAST(
ROW(a, b) AS mytype
)
)
ORDER BY c)
FROM
foobar;
--Ignoring column names...
SELECT
json_agg(
to_json(
ROW(a, b)
)
ORDER BY c)
FROM
foobar;
SQL Fiddle here.
perform the ordering in a subquery or cte and then apply json_agg
SELECT COALESCE(JSON_AGG(t2), '[]'::json)
FROM (SELECT a, b FROM t ORDER BY c) t2
alternatively use jsonb. The jsonb type allows deletion of items by specifying their key
SELECT coalesce(jsonb_agg(row_to_json(t)::jsonb - 'c'
order by c), '[]'::jsonb)
FROM t

PostgreSQL get any value from jsonb object

I want to get the value of either key 'a' or 'b' if either one exists. If neither exists, I want the value of any key in the map.
Example:
'{"a": "aaa", "b": "bbbb", "c": "cccc"}' should return aaa.
'{"b": "bbbb", "c": "cccc"}' should return bbb.
'{"c": "cccc"}' should return cccc.
Currently I'm doing it like this:
SELECT COALESCE(o ->> 'a', o ->> 'b', o->> 'c') FROM...
The problem is that I don't really want to name key 'c' explicitly since there are objects that can have any key.
So how do I achieve the desired effect of "Get value of either 'a' or 'b' if either exists. If neither exists, grab anything that exists."?
I am using postgres 9.6.
maybe too long:
t=# with c(j) as (values('{"a": "aaa", "b": "bbbb", "c": "cccc"}'::jsonb))
, m as (select j,jsonb_object_keys(j) k from c)
, f as (select * from m where k not in ('a','b') limit 1)
t-# select COALESCE(j ->> 'a', j ->> 'b', j->>k) from f;
coalesce
----------
aaa
(1 row)
and with no a,b keys:
t=# with c(j) as (values('{"a1": "aaa", "b1": "bbbb", "c": "cccc"}'::jsonb))
, m as (select j,jsonb_object_keys(j) k from c)
, f as (select * from m where k not in ('a','b') limit 1)
select COALESCE(j ->> 'a', j ->> 'b', j->>k) from f;
coalesce
----------
cccc
(1 row)
Idea is to extract all keys with jsonb_object_keys and get the first "random"(because I don't order by anything) (limit 1) and then use it for last coalesce invariant