PostgreSQL get any value from jsonb object - sql

I want to get the value of either key 'a' or 'b' if either one exists. If neither exists, I want the value of any key in the map.
Example:
'{"a": "aaa", "b": "bbbb", "c": "cccc"}' should return aaa.
'{"b": "bbbb", "c": "cccc"}' should return bbb.
'{"c": "cccc"}' should return cccc.
Currently I'm doing it like this:
SELECT COALESCE(o ->> 'a', o ->> 'b', o->> 'c') FROM...
The problem is that I don't really want to name key 'c' explicitly since there are objects that can have any key.
So how do I achieve the desired effect of "Get value of either 'a' or 'b' if either exists. If neither exists, grab anything that exists."?
I am using postgres 9.6.

maybe too long:
t=# with c(j) as (values('{"a": "aaa", "b": "bbbb", "c": "cccc"}'::jsonb))
, m as (select j,jsonb_object_keys(j) k from c)
, f as (select * from m where k not in ('a','b') limit 1)
t-# select COALESCE(j ->> 'a', j ->> 'b', j->>k) from f;
coalesce
----------
aaa
(1 row)
and with no a,b keys:
t=# with c(j) as (values('{"a1": "aaa", "b1": "bbbb", "c": "cccc"}'::jsonb))
, m as (select j,jsonb_object_keys(j) k from c)
, f as (select * from m where k not in ('a','b') limit 1)
select COALESCE(j ->> 'a', j ->> 'b', j->>k) from f;
coalesce
----------
cccc
(1 row)
Idea is to extract all keys with jsonb_object_keys and get the first "random"(because I don't order by anything) (limit 1) and then use it for last coalesce invariant

Related

Aggregate jsonb map with string array as value in postgresql

I have postgresql table with a jsonb column containing maps with strings as keys and string arrays as values. I want to aggregate all the maps into a single jsonb map. There should be no duplicate values in string array. How can I do this in postgres.
Eg:
Input: {"a": ["1", "2"]}, {"a": ["2", "3"], "b": ["5"]}
Output: {"a": ["1", "2", "3"], "b": ["5"]}
I tried '||' operator but it overwrites values if there as same keys in maps.
Eg:
Input: SELECT '{"a": ["1", "2"]}'::jsonb || '{"a": ["3"], "b": ["5"]}'::jsonb;
Output: {"a": ["3"], "b": ["5"]}
Using jsonb_object_agg with a series of cross joins:
select jsonb_object_agg(t.key, t.a) from (
select v.key, jsonb_agg(distinct v1.value) a from objects o
cross join jsonb_each(o.tags) v
cross join jsonb_array_elements(v.value) v1
group by v.key) t
See fiddle.
You can use the jsonb_object_agg aggregate function to achieve this. The jsonb_object_agg function takes a set of key-value pairs and returns a JSONB object. You can use this function to aggregate all the maps into a single JSONB map by concatenating all the maps as key-value pairs. Here is an example query:
SELECT jsonb_object_agg(key, value)
FROM (
SELECT key, jsonb_agg(value) AS value
FROM (
SELECT key, value
FROM (
SELECT 'a' AS key, '["1", "2"]'::jsonb AS value
UNION ALL
SELECT 'a' AS key, '["3"]'::jsonb AS value
UNION ALL
SELECT 'b' AS key, '["5"]'::jsonb AS value
) subq
) subq2
GROUP BY key
) subq3;
This will give you the following result:
{"a": ["1", "2", "3"], "b": ["5"]}

Key value table to json in BigQuery

Hey all,
I have a table that looks like this:
row
key
val
1
a
100
2
b
200
3
c
"apple
4
d
{}
I want to convert it into JSON:
{
"a": 100,
"b": 200,
"c": "apple",
"d": {}
}
Note: the number of lines can change so this is only an example
Thx in advanced !
With string manipulation,
WITH sample_table AS (
SELECT 'a' key, '100' value UNION ALL
SELECT 'b', '200' UNION ALL
SELECT 'c', '"apple"' UNION ALL
SELECT 'd', '{}'
)
SELECT '{' || STRING_AGG(FORMAT('"%s": %s', key, value)) || '}' json
FROM sample_table;
You can get following result similar to your expected output.

How to query on fields from nested records without referring to the parent records in BigQuery?

I have data structured as follows:
{
"results": {
"A": {"first": 1, "second": 2, "third": 3},
"B": {"first": 4, "second": 5, "third": 6},
"C": {"first": 7, "second": 8, "third": 9},
"D": {"first": 1, "second": 2, "third": 3},
... },
...
}
i.e. nested records, where the lowest level has the same schema for all records in the level above. The schema would be similar to this:
results RECORD NULLABLE
results.A RECORD NULLABLE
results.A.first INTEGER NULLABLE
results.A.second INTEGER NULLABLE
results.A.third INTEGER NULLABLE
results.B RECORD NULLABLE
results.B.first INTEGER NULLABLE
...
Is there a way to do (e.g. aggregate) queries in BigQuery on fields from the lowest level, without knowledge of the keys on the (direct) parent level? Put differently, can I do a query on first for all records in results without having to specify A, B, ... in my query?
I would for example want to achieve something like
SELECT SUM(results.*.first) FROM table
in order to get 1+4+7+1 = 13,
but SELECT results.*.first isn't supported.
(I've tried playing around with STRUCTs, but haven't gotten far.)
Below trick is for BigQuery Standard SQL
#standardSQL
SELECT id, (
SELECT AS STRUCT
SUM(first) AS sum_first,
SUM(second) AS sum_second,
SUM(third) AS sum_third
FROM UNNEST([a]||[b]||[c]||[d])
).*
FROM `project.dataset.table`,
UNNEST([results])
You can test, play with above using dummy/sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 AS id, STRUCT(
STRUCT(1 AS first, 2 AS second, 3 AS third) AS A,
STRUCT(4 AS first, 5 AS second, 6 AS third) AS B,
STRUCT(7 AS first, 8 AS second, 9 AS third) AS C,
STRUCT(1 AS first, 2 AS second, 3 AS third) AS D
) AS results
)
SELECT id, (
SELECT AS STRUCT
SUM(first) AS sum_first,
SUM(second) AS sum_second,
SUM(third) AS sum_third
FROM UNNEST([a]||[b]||[c]||[d])
).*
FROM `project.dataset.table`,
UNNEST([results])
with output
Row id sum_first sum_second sum_third
1 1 13 17 21
Is there a way to do (e.g. aggregate) queries in BigQuery on fields from the lowest level, without knowledge of the keys on the (direct) parent level?
Below is for BigQuery Standard SQL and totally avoids referencing parent records (A, B, C, D, etc.)
#standardSQL
CREATE TEMP FUNCTION Nested_SUM(entries ANY TYPE, field_name STRING) AS ((
SELECT SUM(CAST(SPLIT(kv, ':')[OFFSET(1)] AS INT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(entries), r'":{(.*?)}')) entry,
UNNEST(SPLIT(entry)) kv
WHERE TRIM(SPLIT(kv, ':')[OFFSET(0)], '"') = field_name
));
SELECT id,
Nested_SUM(results, 'first') AS first_sum,
Nested_SUM(results, 'second') AS second_sum,
Nested_SUM(results, 'third') AS third_sum,
Nested_SUM(results, 'forth') AS forth_sum
FROM `project.dataset.table`
if to apply to sample data from your question as in below example
#standardSQL
CREATE TEMP FUNCTION Nested_SUM(entries ANY TYPE, field_name STRING) AS ((
SELECT SUM(CAST(SPLIT(kv, ':')[OFFSET(1)] AS INT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(entries), r'":{(.*?)}')) entry,
UNNEST(SPLIT(entry)) kv
WHERE TRIM(SPLIT(kv, ':')[OFFSET(0)], '"') = field_name
));
WITH `project.dataset.table` AS (
SELECT 1 AS id, STRUCT(
STRUCT(1 AS first, 2 AS second, 3 AS third) AS A,
STRUCT(4 AS first, 5 AS second, 6 AS third) AS B,
STRUCT(7 AS first, 8 AS second, 9 AS third) AS C,
STRUCT(1 AS first, 2 AS second, 3 AS third) AS D
) AS results
)
SELECT id,
Nested_SUM(results, 'first') AS first_sum,
Nested_SUM(results, 'second') AS second_sum,
Nested_SUM(results, 'third') AS third_sum,
Nested_SUM(results, 'forth') AS forth_sum
FROM `project.dataset.table`
output is
Row id first_sum second_sum third_sum forth_sum
1 1 13 17 21 null
I adapted Mikhail's answer in order to support grouping on the values of the lowest-level fields:
#standardSQL
CREATE TEMP FUNCTION Nested_AGGREGATE(entries ANY TYPE, field_name STRING) AS ((
SELECT ARRAY(
SELECT AS STRUCT TRIM(SPLIT(kv, ':')[OFFSET(1)], '"') AS value, COUNT(SPLIT(kv, ':')[OFFSET(1)]) AS count
FROM UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(entries), r'":{(.*?)}')) entry,
UNNEST(SPLIT(entry)) kv
WHERE TRIM(SPLIT(kv, ':')[OFFSET(0)], '"') = field_name
GROUP BY TRIM(SPLIT(kv, ':')[OFFSET(1)], '"')
)
));
SELECT id,
Nested_AGGREGATE(results, 'first') AS first_agg,
Nested_AGGREGATE(results, 'second') AS second_agg,
Nested_AGGREGATE(results, 'third') AS third_agg,
FROM `project.dataset.table`
Output for WITH `project.dataset.table` AS (SELECT 1 AS id, STRUCT( STRUCT(1 AS first, 2 AS second, 3 AS third) AS A, STRUCT(4 AS first, 5 AS second, 6 AS third) AS B, STRUCT(7 AS first, 8 AS second, 9 AS third) AS C, STRUCT(1 AS first, 2 AS second, 3 AS third) AS D) AS results ):
Row id first_agg.value first_agg.count second_agg.value second_agg.count third_agg.value third_agg.count
1 1 1 2 2 2 3 2
4 1 5 1 6 1
7 1 8 1 9 1

How to use a subset of the row columns when converting to JSON?

I have a table t with some columns a, b and c. I use the following query to convert rows to a JSON array of objects:
SELECT COALESCE(JSON_AGG(t ORDER BY c), '[]'::json)
FROM t
This returns as expected:
[
{
"a": ...,
"b": ...,
"c": ...
},
{
"a": ...,
"b": ...,
"c": ...
}
]
Now I want the same result, but with only columns a and b in the output. I will still use column c for ordering. The best I came up with is as following:
SELECT COALESCE(JSON_AGG(JSON_BUILD_OBJECT('a', a, 'b', b) ORDER BY c), '[]'::json)
FROM t
[
{
"a": ...,
"b": ...
},
{
"a": ...,
"b": ...
}
]
Although this works fine, I am wondering if there is a more elegant way to do this. It frustrates me that I have to manually define the JSON properties. Of course, I understand that I have to enumerate the columns a and b, but it's odd that I have to copy/paste the corresponding JSON property name, which is exactly the same as the column name anyway.
Is there a another way to do this?
You can use row_to_json instead of manually building object:
CREATE TABLE foobar (a text, b text, c text);
INSERT INTO foobar VALUES
('1', 'LOREM', 'A'),
('2', 'LOREM', 'B'),
('3', 'LOREM', 'C');
--Using CTE
WITH tmp AS (
SELECT a, b FROM foobar ORDER BY c
)
SELECT json_agg(row_to_json(t)) FROM tmp t
--Using subquery
SELECT
json_agg(row_to_json(t))
FROM
(SELECT a, b FROM foobar ORDER BY c) t;
EDIT: As you stated, result order is a strict requirement. In this case you could use a row constructor to build json object:
--Using a type to build json with desired keys
CREATE TYPE mytype AS (a text, b text);
SELECT
json_agg(
to_json(
CAST(
ROW(a, b) AS mytype
)
)
ORDER BY c)
FROM
foobar;
--Ignoring column names...
SELECT
json_agg(
to_json(
ROW(a, b)
)
ORDER BY c)
FROM
foobar;
SQL Fiddle here.
perform the ordering in a subquery or cte and then apply json_agg
SELECT COALESCE(JSON_AGG(t2), '[]'::json)
FROM (SELECT a, b FROM t ORDER BY c) t2
alternatively use jsonb. The jsonb type allows deletion of items by specifying their key
SELECT coalesce(jsonb_agg(row_to_json(t)::jsonb - 'c'
order by c), '[]'::jsonb)
FROM t

Idiomatic equivalent to map structure

My analytics involves the need to aggregate rows and to store the number of different values occurrences of a field someField in all the rows.
Sample data structure
[someField, someKey]
I'm trying to GROUP BY someKey and then be able to know for each of the results how many time there was each someField values
Example:
[someField: a, someKey: 1],
[someField: a, someKey: 1],
[someField: b, someKey: 1],
[someField: c, someKey: 2],
[someField: d, someKey: 2]
What I would like to achieve:
[someKey: 1, fields: {a: 2, b: 1}],
[someKey: 2, fields: {c: 1, d: 1}],
Does it work for you?
WITH data AS (
select 'a' someField, 1 someKey UNION all
select 'a', 1 UNION ALL
select 'b', 1 UNION ALL
select 'c', 2 UNION ALL
select 'd', 2)
SELECT
someKey,
ARRAY_AGG(STRUCT(someField, freq)) fields
FROM(
SELECT
someField,
someKey,
COUNT(someField) freq
FROM data
GROUP BY 1, 2
)
GROUP BY 1
Results:
It won't give exactly the results you are looking for, but it might work to receive the same queries your previous result would. As you said, for each key you can retrieve how many times (column freq) someField happened.
I've been looking for a way on how to aggregate structs and couldn't find one. But retrieving the results as an ARRAY of STRUCTS turned out to be quite straightforward.
There's probably a smarter way to do this (and get it in the format you want e.g. using an Array for the 2nd column), but this might be enough for you:
with sample as (
select 'a' as someField, 1 as someKey UNION all
select 'a' as someField, 1 as someKey UNION ALL
select 'b' as someField, 1 as someKey UNION ALL
select 'c' as someField, 2 as someKey UNION ALL
select 'd' as someField, 2 as someKey)
SELECT
someKey,
SUM(IF(someField = 'a', 1, 0)) AS a,
SUM(IF(someField = 'b', 1, 0)) AS b,
SUM(IF(someField = 'c', 1, 0)) AS c,
SUM(IF(someField = 'd', 1, 0)) AS d
FROM
sample
GROUP BY
someKey order by somekey asc
Results:
someKey a b c d
---------------------
1 2 1 0 0
2 0 0 1 1
This is well used technique in BigQuery (see here).
I'm trying to GROUP BY someKey and then be able to know for each of the results how many time there was each someField values
#standardSQL
SELECT
someKey,
someField,
COUNT(someField) freq
FROM yourTable
GROUP BY 1, 2
-- ORDER BY someKey, someField
What I would like to achieve:
[someKey: 1, fields: {a: 2, b: 1}],
[someKey: 2, fields: {c: 1, d: 1}],
This is different from what you expressed in words - it is called pivoting and based on your comment - The a, b, c, and d keys are potentially infinite - most likely is not what you need. At the same time - pivoting is easily doable too (if you have some finite number of field values) and you can find plenty of related posts