I have a jsonb column on my DB called reactions which has the structure like below.
[
{
"id": "1234",
"count": 1
},
{
"id": "2345",
"count": 1
}
]
The field holds an array of objects, each with a count and id field. I'm trying to find which object in the reactions field has the highest count across the db. Basically, I'd like to sum each object by it's id and find the max.
I've figured out how to sum up all of the reaction counts, but I'm getting stuck grouping it by the ID, and finding the sum for each individual id.
SELECT SUM((x->>'count')::integer) FROM (SELECT id, reactions FROM messages) as m
CROSS JOIN LATERAL jsonb_array_elements(m.reactions) AS x
Ideally I'd end up with something like this:
id | sum
-----------
1234 | 100
2345 | 70
5678 | 50
The messages table looks something like this
id | user | reactions
------------------------
1 | 3456 | jsonb
2 | 8573 | jsonb
The data calculation needs to take some transformation steps.
flat the jsonb column from array to individual jsonb objects using jsonb_array_elements function ;
postgres=# select jsonb_array_elements(reactions)::jsonb as data from messages;
data
----------------------------
{"id": "1234", "count": 1}
{"id": "2345", "count": 1}
{"id": "1234", "count": 1}
{"id": "2345", "count": 1}
...
populate each jsonb objects to seperate columns with jsonb_populate_record function ;
postgres=# create table data(id text ,count int);
CREATE TABLE
postgres=# select r.* from (select jsonb_array_elements(reactions)::jsonb as data from messages) as tmp, jsonb_populate_record(NULL::data, data) r;
id | count
------+-------
1234 | 1
2345 | 1
1234 | 1
2345 | 1
...
do the sum with group by.
postgres=# select r.id, sum(r.count) from (select jsonb_array_elements(reactions)::jsonb as data from messages) as tmp, jsonb_populate_record(NULL::data, data) r group by r.id;
id | sum
------+-----
2345 | 2
1234 | 2
...
The above steps should make it.
you can use the below - to convert the jsonb array to standard rows
see https://dba.stackexchange.com/questions/203250/getting-specific-key-values-from-jsonb-into-columns
select "id", sum("count")
from messages
left join lateral jsonb_to_recordset(reactions) x ("id" text, "count" int) on true
group by "id" order by 1;
I have a table that looks like this:
| user | attribute | value |
|--------|-------------|---------|
| 1 | A | 10 |
| 1 | A | 20 |
| 1 | B | 5 |
| 2 | B | 10 |
| 2 | B | 15 |
| 2 | C | 100 |
| 2 | C | 200 |
I'd like to group this table by user and collect the sum of the value field into a JSON or a MAP with attributes as keys, like:
| user | sum_values_by_attribute |
|------|--------------------------|
| 1 | {"A": 30, "B": 15} |
| 2 | {"B": 25, "C": 300} |
Is there a way to do that in Hive?
I've found related questions such as this and this but none consider the case of a summation over values.
JSON string corresponding to map<string, int> can be built in Hive using native functions only: aggregate by user, attribute, then concatenate pairs "key": value and aggregate array of them, concatenate array using concat_ws, add curly braces.
Demo:
with initial_data as (
select stack(7,
1,'A',40,
1,'A',20,
1,'B',5,
2,'B',10,
2,'B',15,
2,'C',100,
2,'C',200) as (`user`, attribute, value )
)
select `user`, concat('{',concat_ws(',',collect_set(concat('"', attribute, '": ',sum_value))), '}') as sum_values_by_attribute
from
(--aggregate groupby user, attribute
select `user`, attribute, sum(value) as sum_value from initial_data group by `user`, attribute
)s
group by `user`;
Result ( JSON string ):
user sum_values_by_attribute
1 {"A": 60,"B": 5}
2 {"B": 25,"C": 300}
Note: If you are running this on Spark, you can cast( as map<string, int>), Hive does not support casting complex types cast.
Also map<string, string> can be easily done using native functions only: the same array of key-values pairs byt without double-quotes (like A:10) concatenate to comma delimited string using concat_ws and convert to map using str_to_map function (the same WITH CTE is skipped):
select `user`, str_to_map(concat_ws(',',collect_set(concat(attribute, ':',sum_value)))) as sum_values_by_attribute
from
(--aggregate groupby user, attribute
select `user`, attribute, sum(value) as sum_value from initial_data group by `user`, attribute
)s
group by `user`;
Result ( map<string, string> ):
user sum_values_by_attribute
1 {"A":"60","B":"5"}
2 {"B":"25","C":"300"}
And if you need map<string, int>, unfortunately, it can not be done using Hive native functions only because map_to_str returns map<string, string>, not map<string, int>. You can try brickhouse collect function:
add jar '~/brickhouse/target/brickhouse-0.6.0.jar'; --check brickhouse site https://github.com/klout/brickhouse for instructions
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
select `user`, collect(attribute, sum_value) as sum_values_by_attribute
from
(--aggregate groupby user, attribute
select `user`, attribute, sum(value) as sum_value from initial_data group by `user`, attribute
)s
group by `user`;
You can first calculate the sum by attribute and user_id and then use collect list.
Pls let me know if below output is fine.
SQL Below -
select `user`,
collect_list(concat(att,":",cast(val as string))) sum_values_by_attribute
from
(select `user`,`attribute` att, sum(`value`) val from tmp2 group by u,att) tmp2
group by `user`;
Testing Query -
create table tmp2 ( `user` int, `attribute` string, `value` int);
insert into tmp2 select 1,'A',40;
insert into tmp2 select 1,'A',20;
insert into tmp2 select 1,'B',5;
insert into tmp2 select 2,'C',20;
insert into tmp2 select 1,'B',10;
insert into tmp2 select 2,'B',10;
insert into tmp2 select 2,'C',10;
select `user`,
collect_list(concat(att,":",cast(val as string))) sum_values_by_attribute
from
(select `user`,`attribute` att, sum(`value`) val from tmp2 group by u,att) tmp2
group by `user`;
I'm using psql and I have a table that looks like this:
id | dashboard_settings
-----------------------
1 | {"query": {"year_end": 2018, "year_start": 2015, "category": ["123"]}}
There are numerous rows, but for every row the "category" value is an array with one integer (in string format).
Is there a way I can 'unpackage' the category object? So that it just has 123 as an integer?
I've tried this but had no success:
SELECT jsonb_extract_path_text(dashboard_settings->'query', 'category') from table
This returns:
jsonb_extract_path_text | ["123"]
when I want:
jsonb_extract_path_text | 123
You need to use the array access operator for which is simply ->> followed by the array index:
select jsonb_extract_path(dashboard_settings->'query', 'category') ->> 0
from the_table
alternatively:
select dashboard_settings -> 'query' -> 'category' ->> 0
from the_table
Consider:
select dashboard_settings->'query'->'category'->>0 c from mytable
Demo on DB Fiddle:
| c |
| :-- |
| 123 |
I want to check the existence of an attribute in a JSONB column using SQL.
Using this is I can check if attribute equals value:
SELECT count(*) AS "count" FROM "table" WHERE column->'node' #> '[{"Attribute":"value"}]'
What syntax do I use to check the existence of Attribute?
Usually you'll check for null:
SELECT count(*) AS "count" FROM "table"
WHERE column->'node'->'Attribute' is not null
The ? operator means Does the string exist as a top-level key within the JSON value? However, you want to check whether a key exists in a nested json array of objects, so you cannot use the operator directly. You have to unnest arrays.
Sample data:
create table my_table(id serial primary key, json_column jsonb);
insert into my_table (json_column) values
('{"node": [{"Attribute":"value"}, {"other key": 0}]}'),
('{"node": [{"Attribute":"value", "other key": 0}]}'),
('{"node": [{"Not Attribute":"value"}]}');
Use jsonb_array_elements() in a lateral join to find out whether a key exists in any element of the array:
select
id,
value,
value ? 'Attribute' as key_exists_in_object
from my_table
cross join jsonb_array_elements(json_column->'node')
id | value | key_exists_in_object
----+----------------------------------------+----------------------
1 | {"Attribute": "value"} | t
1 | {"other key": 0} | f
2 | {"Attribute": "value", "other key": 0} | t
3 | {"Not Attribute": "value"} | f
(4 rows)
But this is not exactly what you are expecting. You need to aggregate results for arrays:
select
id,
json_column->'node' as array,
bool_or(value ? 'Attribute') as key_exists_in_array
from my_table
cross join jsonb_array_elements(json_column->'node')
group by id
order by id
id | array | key_exists_in_array
----+--------------------------------------------+---------------------
1 | [{"Attribute": "value"}, {"other key": 0}] | t
2 | [{"Attribute": "value", "other key": 0}] | t
3 | [{"Not Attribute": "value"}] | f
(3 rows)
Well, this looks a bit complex. You can make it easier using the function:
create or replace function key_exists_in_array(key text, arr jsonb)
returns boolean language sql immutable as $$
select bool_or(value ? key)
from jsonb_array_elements(arr)
$$;
select
id,
json_column->'node' as array,
key_exists_in_array('Attribute', json_column->'node')
from my_table
id | array | key_exists_in_array
----+--------------------------------------------+---------------------
1 | [{"Attribute": "value"}, {"other key": 0}] | t
2 | [{"Attribute": "value", "other key": 0}] | t
3 | [{"Not Attribute": "value"}] | f
(3 rows)
Given the following data in the jsonb column p06 in the table ryzom_characters:
-[ RECORD 1 ]------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p06 | {
"id": 675010,
"cname": "Bob",
"rpjobs": [
{
"progress": 25
},
{
"progress": 13
},
{
"progress": 30
}
]
}
I am attempting to sum the value of progress. I have attempted the following:
SELECT
c.cname AS cname,
jsonb_array_elements(c.p06->'rpjobs')::jsonb->'progress' AS value
FROM ryzom_characters c
Where cid = 675010
ORDER BY value DESC
LIMIT 50;
Which correctly lists the values:
cname | value
--------+-------
Savisi | 30
Savisi | 25
Savisi | 13
(3 rows)
But now I would like to sum these values, which could be null.
How do I correctly sum an object field within an array?
Here is the table structure:
Table "public.ryzom_characters"
Column | Type | Collation | Nullable | Default
---------------+------------------------+-----------+----------+---------
cid | bigint | | |
cname | character varying(255) | | not null |
p06 | jsonb | | |
x01 | jsonb | | |
Use the function jsonb_array_elements() in a lateral join in the from clause:
select cname, sum(coalesce(value, '0')::int) as value
from (
select
p06->>'cname' as cname,
value->>'progress' as value
from ryzom_characters
cross join jsonb_array_elements(p06->'rpjobs')
where cid = 675010
) s
group by cname
order by value desc
limit 50;
You can use left join instead of cross join to protect the query against inconsistent data:
left join jsonb_array_elements(p06->'rpjobs')
on jsonb_typeof(p06->'rpjobs') = 'array'
where p06->'rpjobs' <> 'null'
The function jsonb_array_elements() is a set-returning function. You should therefore use it as a row source (in the FROM clause). After the call you have a table where every row contains an array element. From there on it is relatively easy.
SELECT cname,
sum(coalesce(r.prog->>'progress'::int, 0)) AS value
FROM ryzom_characters c,
jsonb_array_elements(c.p06->'rpjobs') r (prog)
WHERE c.cid = 675010
GROUP BY cname
ORDER BY value DESC
LIMIT 50;