Postgresql order by array overlap - sql

I want to sort a table with array. The record with the most overlaps should be on top. I already have a where statement to filter the records with arrays. With the same array I want to determine the number of overlaps for sorting. Do you have an idea how the order by statement might look like ?
My Table
SELECT * FROM "nodes"
+-----------+---------------------------+
| name | tags |
+-----------+---------------------------+
| Max | ["foo", "orange", "app"] |
| Peter | ["foo", "bar", "baz"] |
| Maria | ["foo", "bar"] |
| John | ["apple"] |
+-----------+---------------------------+
Result with where
SELECT * FROM "nodes" WHERE (tags && '{"foo", "bar", "baz"}')
+-----------+---------------------------+
| name | tags |
+-----------+---------------------------+
| Max | ["foo", "orange", "app"] |
| Peter | ["foo", "bar", "baz"] |
| Maria | ["foo", "bar"] |
+-----------+---------------------------+
Result with Order
SELECT * FROM "nodes" WHERE (tags && '{"foo", "bar", "baz"}') ORDER BY ????
+-----------+---------------------------+
| name | tags |
+-----------+---------------------------+
| Peter | ["foo", "bar", "baz"] |
| Maria | ["foo", "bar"] |
| Max | ["foo", "orange", "app"] |
+-----------+---------------------------+

The only thing I can think of, is to create a function that computes the number of common elements:
create or replace function num_overlaps(p_one text[], p_other text[])
returns bigint
as
$$
select count(*)
from (
select *
from unnest(p_one)
intersect
select *
from unnest(p_other)
) x
$$
language sql
immutable;
Then use it in the order by clause:
SELECT *
FROM nodes
WHERE tags && '{"foo", "bar", "baz"}'
order by num_overlaps(tags, '{"foo", "bar", "baz"}') desc;
The drawback is, that you need to repeat the list of tags you are testing for.
It's unclear to me if those values are JSON arrays (because that's the syntax in the sample data) or native Postgres arrays (because of the && operator which doesn't work for JSON arrays) - if you are using jsonb you can replace unnest() with jsonb_array_elements_text()

First of all, the arrays are needed for the identifiers of both sides of && operator, such as STRING_TO_ARRAY(translate(tags::text, '[] "', ''), ',')::text[] instead of tags and STRING_TO_ARRAY('foo,bar,baz',',')) instead of '{"foo", "bar", "baz"}' pattern, respectively.
Then, you can unnest array elements for tags column by using JSON_ARRAY_ELEMENTS() function in order to count the occurence of each elements of returned value columns within '{"foo", "bar", "baz"}' pattern through use of STRPOS() and SIGN() functions together with SUM() aggregation :
SELECT name, tags::text
FROM "nodes"
CROSS JOIN JSON_ARRAY_ELEMENTS(tags) AS js
WHERE ( STRING_TO_ARRAY(translate(tags::text, '[] "', ''), ',')::text[]
&& STRING_TO_ARRAY('foo,bar,baz',','))
GROUP BY name, tags::text
ORDER BY SUM( SIGN( STRPOS('{"foo", "bar", "baz"}'::text,value::text) ) ) DESC
But, you may have repeated elements within tags column. In this case the above query fails. So, I suggest to use this one below containing rows eliminated by DISTINCT keyword :
SELECT name, tags
FROM
(
SELECT DISTINCT name, tags::text, STRPOS('{"foo", "bar", "baz"}'::text,value::text)
FROM "nodes"
CROSS JOIN JSON_ARRAY_ELEMENTS(tags) AS js
WHERE ( STRING_TO_ARRAY(translate(tags::text, '[] "', ''), ',')::text[]
&& STRING_TO_ARRAY('foo,bar,baz',','))
) n
GROUP BY name, tags::text
ORDER BY SUM( SIGN( strpos ) ) DESC
Demo

Related

Query Sum of jsonb column of objects

I have a jsonb column on my DB called reactions which has the structure like below.
[
{
"id": "1234",
"count": 1
},
{
"id": "2345",
"count": 1
}
]
The field holds an array of objects, each with a count and id field. I'm trying to find which object in the reactions field has the highest count across the db. Basically, I'd like to sum each object by it's id and find the max.
I've figured out how to sum up all of the reaction counts, but I'm getting stuck grouping it by the ID, and finding the sum for each individual id.
SELECT SUM((x->>'count')::integer) FROM (SELECT id, reactions FROM messages) as m
CROSS JOIN LATERAL jsonb_array_elements(m.reactions) AS x
Ideally I'd end up with something like this:
id | sum
-----------
1234 | 100
2345 | 70
5678 | 50
The messages table looks something like this
id | user | reactions
------------------------
1 | 3456 | jsonb
2 | 8573 | jsonb
The data calculation needs to take some transformation steps.
flat the jsonb column from array to individual jsonb objects using jsonb_array_elements function ;
postgres=# select jsonb_array_elements(reactions)::jsonb as data from messages;
data
----------------------------
{"id": "1234", "count": 1}
{"id": "2345", "count": 1}
{"id": "1234", "count": 1}
{"id": "2345", "count": 1}
...
populate each jsonb objects to seperate columns with jsonb_populate_record function ;
postgres=# create table data(id text ,count int);
CREATE TABLE
postgres=# select r.* from (select jsonb_array_elements(reactions)::jsonb as data from messages) as tmp, jsonb_populate_record(NULL::data, data) r;
id | count
------+-------
1234 | 1
2345 | 1
1234 | 1
2345 | 1
...
do the sum with group by.
postgres=# select r.id, sum(r.count) from (select jsonb_array_elements(reactions)::jsonb as data from messages) as tmp, jsonb_populate_record(NULL::data, data) r group by r.id;
id | sum
------+-----
2345 | 2
1234 | 2
...
The above steps should make it.
you can use the below - to convert the jsonb array to standard rows
see https://dba.stackexchange.com/questions/203250/getting-specific-key-values-from-jsonb-into-columns
select "id", sum("count")
from messages
left join lateral jsonb_to_recordset(reactions) x ("id" text, "count" int) on true
group by "id" order by 1;

Hive: Aggregate values by attribute into a JSON or MAP field

I have a table that looks like this:
| user | attribute | value |
|--------|-------------|---------|
| 1 | A | 10 |
| 1 | A | 20 |
| 1 | B | 5 |
| 2 | B | 10 |
| 2 | B | 15 |
| 2 | C | 100 |
| 2 | C | 200 |
I'd like to group this table by user and collect the sum of the value field into a JSON or a MAP with attributes as keys, like:
| user | sum_values_by_attribute |
|------|--------------------------|
| 1 | {"A": 30, "B": 15} |
| 2 | {"B": 25, "C": 300} |
Is there a way to do that in Hive?
I've found related questions such as this and this but none consider the case of a summation over values.
JSON string corresponding to map<string, int> can be built in Hive using native functions only: aggregate by user, attribute, then concatenate pairs "key": value and aggregate array of them, concatenate array using concat_ws, add curly braces.
Demo:
with initial_data as (
select stack(7,
1,'A',40,
1,'A',20,
1,'B',5,
2,'B',10,
2,'B',15,
2,'C',100,
2,'C',200) as (`user`, attribute, value )
)
select `user`, concat('{',concat_ws(',',collect_set(concat('"', attribute, '": ',sum_value))), '}') as sum_values_by_attribute
from
(--aggregate groupby user, attribute
select `user`, attribute, sum(value) as sum_value from initial_data group by `user`, attribute
)s
group by `user`;
Result ( JSON string ):
user sum_values_by_attribute
1 {"A": 60,"B": 5}
2 {"B": 25,"C": 300}
Note: If you are running this on Spark, you can cast( as map<string, int>), Hive does not support casting complex types cast.
Also map<string, string> can be easily done using native functions only: the same array of key-values pairs byt without double-quotes (like A:10) concatenate to comma delimited string using concat_ws and convert to map using str_to_map function (the same WITH CTE is skipped):
select `user`, str_to_map(concat_ws(',',collect_set(concat(attribute, ':',sum_value)))) as sum_values_by_attribute
from
(--aggregate groupby user, attribute
select `user`, attribute, sum(value) as sum_value from initial_data group by `user`, attribute
)s
group by `user`;
Result ( map<string, string> ):
user sum_values_by_attribute
1 {"A":"60","B":"5"}
2 {"B":"25","C":"300"}
And if you need map<string, int>, unfortunately, it can not be done using Hive native functions only because map_to_str returns map<string, string>, not map<string, int>. You can try brickhouse collect function:
add jar '~/brickhouse/target/brickhouse-0.6.0.jar'; --check brickhouse site https://github.com/klout/brickhouse for instructions
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
select `user`, collect(attribute, sum_value) as sum_values_by_attribute
from
(--aggregate groupby user, attribute
select `user`, attribute, sum(value) as sum_value from initial_data group by `user`, attribute
)s
group by `user`;
You can first calculate the sum by attribute and user_id and then use collect list.
Pls let me know if below output is fine.
SQL Below -
select `user`,
collect_list(concat(att,":",cast(val as string))) sum_values_by_attribute
from
(select `user`,`attribute` att, sum(`value`) val from tmp2 group by u,att) tmp2
group by `user`;
Testing Query -
create table tmp2 ( `user` int, `attribute` string, `value` int);
insert into tmp2 select 1,'A',40;
insert into tmp2 select 1,'A',20;
insert into tmp2 select 1,'B',5;
insert into tmp2 select 2,'C',20;
insert into tmp2 select 1,'B',10;
insert into tmp2 select 2,'B',10;
insert into tmp2 select 2,'C',10;
select `user`,
collect_list(concat(att,":",cast(val as string))) sum_values_by_attribute
from
(select `user`,`attribute` att, sum(`value`) val from tmp2 group by u,att) tmp2
group by `user`;

Unnest json string array

I'm using psql and I have a table that looks like this:
id | dashboard_settings
-----------------------
1 | {"query": {"year_end": 2018, "year_start": 2015, "category": ["123"]}}
There are numerous rows, but for every row the "category" value is an array with one integer (in string format).
Is there a way I can 'unpackage' the category object? So that it just has 123 as an integer?
I've tried this but had no success:
SELECT jsonb_extract_path_text(dashboard_settings->'query', 'category') from table
This returns:
jsonb_extract_path_text | ["123"]
when I want:
jsonb_extract_path_text | 123
You need to use the array access operator for which is simply ->> followed by the array index:
select jsonb_extract_path(dashboard_settings->'query', 'category') ->> 0
from the_table
alternatively:
select dashboard_settings -> 'query' -> 'category' ->> 0
from the_table
Consider:
select dashboard_settings->'query'->'category'->>0 c from mytable
Demo on DB Fiddle:
| c |
| :-- |
| 123 |

JSONB nested array query - check existence of attribute

I want to check the existence of an attribute in a JSONB column using SQL.
Using this is I can check if attribute equals value:
SELECT count(*) AS "count" FROM "table" WHERE column->'node' #> '[{"Attribute":"value"}]'
What syntax do I use to check the existence of Attribute?
Usually you'll check for null:
SELECT count(*) AS "count" FROM "table"
WHERE column->'node'->'Attribute' is not null
The ? operator means Does the string exist as a top-level key within the JSON value? However, you want to check whether a key exists in a nested json array of objects, so you cannot use the operator directly. You have to unnest arrays.
Sample data:
create table my_table(id serial primary key, json_column jsonb);
insert into my_table (json_column) values
('{"node": [{"Attribute":"value"}, {"other key": 0}]}'),
('{"node": [{"Attribute":"value", "other key": 0}]}'),
('{"node": [{"Not Attribute":"value"}]}');
Use jsonb_array_elements() in a lateral join to find out whether a key exists in any element of the array:
select
id,
value,
value ? 'Attribute' as key_exists_in_object
from my_table
cross join jsonb_array_elements(json_column->'node')
id | value | key_exists_in_object
----+----------------------------------------+----------------------
1 | {"Attribute": "value"} | t
1 | {"other key": 0} | f
2 | {"Attribute": "value", "other key": 0} | t
3 | {"Not Attribute": "value"} | f
(4 rows)
But this is not exactly what you are expecting. You need to aggregate results for arrays:
select
id,
json_column->'node' as array,
bool_or(value ? 'Attribute') as key_exists_in_array
from my_table
cross join jsonb_array_elements(json_column->'node')
group by id
order by id
id | array | key_exists_in_array
----+--------------------------------------------+---------------------
1 | [{"Attribute": "value"}, {"other key": 0}] | t
2 | [{"Attribute": "value", "other key": 0}] | t
3 | [{"Not Attribute": "value"}] | f
(3 rows)
Well, this looks a bit complex. You can make it easier using the function:
create or replace function key_exists_in_array(key text, arr jsonb)
returns boolean language sql immutable as $$
select bool_or(value ? key)
from jsonb_array_elements(arr)
$$;
select
id,
json_column->'node' as array,
key_exists_in_array('Attribute', json_column->'node')
from my_table
id | array | key_exists_in_array
----+--------------------------------------------+---------------------
1 | [{"Attribute": "value"}, {"other key": 0}] | t
2 | [{"Attribute": "value", "other key": 0}] | t
3 | [{"Not Attribute": "value"}] | f
(3 rows)

How to sum a value in a JSONB array in Postgresql?

Given the following data in the jsonb column p06 in the table ryzom_characters:
-[ RECORD 1 ]------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p06 | {
"id": 675010,
"cname": "Bob",
"rpjobs": [
{
"progress": 25
},
{
"progress": 13
},
{
"progress": 30
}
]
}
I am attempting to sum the value of progress. I have attempted the following:
SELECT
c.cname AS cname,
jsonb_array_elements(c.p06->'rpjobs')::jsonb->'progress' AS value
FROM ryzom_characters c
Where cid = 675010
ORDER BY value DESC
LIMIT 50;
Which correctly lists the values:
cname | value
--------+-------
Savisi | 30
Savisi | 25
Savisi | 13
(3 rows)
But now I would like to sum these values, which could be null.
How do I correctly sum an object field within an array?
Here is the table structure:
Table "public.ryzom_characters"
Column | Type | Collation | Nullable | Default
---------------+------------------------+-----------+----------+---------
cid | bigint | | |
cname | character varying(255) | | not null |
p06 | jsonb | | |
x01 | jsonb | | |
Use the function jsonb_array_elements() in a lateral join in the from clause:
select cname, sum(coalesce(value, '0')::int) as value
from (
select
p06->>'cname' as cname,
value->>'progress' as value
from ryzom_characters
cross join jsonb_array_elements(p06->'rpjobs')
where cid = 675010
) s
group by cname
order by value desc
limit 50;
You can use left join instead of cross join to protect the query against inconsistent data:
left join jsonb_array_elements(p06->'rpjobs')
on jsonb_typeof(p06->'rpjobs') = 'array'
where p06->'rpjobs' <> 'null'
The function jsonb_array_elements() is a set-returning function. You should therefore use it as a row source (in the FROM clause). After the call you have a table where every row contains an array element. From there on it is relatively easy.
SELECT cname,
sum(coalesce(r.prog->>'progress'::int, 0)) AS value
FROM ryzom_characters c,
jsonb_array_elements(c.p06->'rpjobs') r (prog)
WHERE c.cid = 675010
GROUP BY cname
ORDER BY value DESC
LIMIT 50;