Remove duplicates from array_agg, where elements are also arrays - sql

In a Postgres 11 database, I've got two arrays in two views which are joined to MAIN table:
vw_ideas_role_direction_mappings.user_direction_id - array (like {101,103,} or {101,103} or {101,} or {101,,,,104})
vw_ideas_role_category_mappings.user_direction_id - array like previous.
DDL of view vw_ideas_role_category_mappings:
category_id - int8
user_direction_id - array
-- no constraints
DDL of view vw_ideas_role_direction_mappings:
direction_id - int8
user_direction_id - array
-- no constraints
DDL table idea:
id - bigserial
-- no constraints
And the following query, where I join everything:
SELECT i.id,
array_agg(dvircm.user_direction_id || dvirdm.user_direction_id) AS directions_id
FROM idea.ideas i
LEFT JOIN vw_ideas_role_direction_mappings = i.direction_id
LEFT JOIN vw_ideas_role_category_mappings dvircm ON dvircm.category_id = i.category_id
GROUP BY i.id
So there can be NULL in arrays and duplicates.
This query does not remove them and further more it throws error - cannot accumulate arrays of different dimensionality (maybe because there are commas ' , ' in arrays before or after digits? Because when I create array user_direction_id I use this query
array_agg(distinct vw_user_data_all_roles.direction_id))
How to get rid of the error and remove duplicates and null after combining 2 arrays?

I think this would do what you want:
SELECT i.id, sub.directions_id
FROM idea.ideas i
LEFT JOIN LATERAL (
SELECT ARRAY (
SELECT u.id
FROM vw_ideas_role_direction_mappings d, unnest(d.user_direction_id) u(id)
WHERE d.direction_id = i.direction_id
AND u.id IS NOT NULL
UNION
SELECT u.id
FROM vw_ideas_role_category_mappings c, unnest(c.user_direction_id) u(id)
WHERE c.category_id = i.category_id
AND u.id IS NOT NULL
)
) sub(directions_id) ON sub.directions_id <> '{}'; -- exclude empty array?
UNION after unnesting removes duplicate array elements.
NULL values are removed.
About the ARRAY constructor:
Why is array_agg() slower than the non-aggregate ARRAY() constructor?
Since the ARRAY constructor always returns a row (like an aggregate function), we can use CROSS JOIN. Else, we'd use LEFT JOIN .. ON true. About the LATERAL join:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
If you need to preserve some original order, consider WITH ORDINALITY and ORDER BY ... See:
PostgreSQL unnest() with element number

Related

SQL: CROSS JOIN UNNEST and include data from rows with NULLs in CROSS JOIN UNNEST column

I'm looking for assistance in the below SQL query where column policy_array has NULL values in some rows, but arrays of policy data in others. I would like to be able to include data from rows even when policy_array is NULL in the output.
When I execute the below query it executes a CROSS JOIN UNNEST as expected but also drops all data from columns with NULLs in the column policy_array as expected as well. I can imagine a work around by having an intermediate table where NULLs in policy_array are changed to something else, but I really would prefer not to do that.
SELECT
policy,
account_id,
rejects,
overturns,
appeals,
submits
FROM relevant_table
CROSS JOIN UNNEST(policy_array) AS p (policy)
WHERE
...
There are two options either LEFT JOIN with on true:
FROM relevant_table
LEFT JOIN UNNEST(policy_array) AS p (policy) ON true
Or a little bit more hackish which uses the fact that unnest supports multiple arrays - add array with one element (also note succinct syntax for cross join unnest):
FROM relevant_table,
UNNEST(policy_array, array[1]) AS p (policy, ignored)

How to apply a filter on jsonb array of objects - after aggregating?

I have a the follow select statement:
SELECT
cards.*,
COUNT(cards.*) OVER() AS full_count,
p.printing_information
FROM
cards
LEFT JOIN
(SELECT
pr.card_id, jsonb_agg(to_jsonb(pr)) AS printing_information
FROM
printings pr
GROUP BY
pr.card_id) p ON cards.card_id = p.card_id
WHERE
...
I would like to be able to query on set_id that is within the printings table. I tried to do this within my above select statement by including pr.set_id but it then required a GROUP BY pr.card_id, pr.set_id which then made a row per printing rather than having all printings within the printing_information sub-array.
Unless I can determine how to do above, is it possible to search within the printing_information array of jsonb?
Ideally I would like to be able to do something like:
WHERE p.printing_information->set_id = '123'
Unfortunately I can't do that as it's within an array.
What's the best way to achieve this? I could just do post-processing of the result to strip out unnecessary results, but I feel there must be a better way.
SELECT cards.*
, count(cards.*) over() AS full_count
, p.printing_information
FROM cards
LEFT JOIN (
SELECT pr.card_id, jsonb_agg(to_jsonb(pr)) AS printing_information
FROM printings pr
WHERE pr.set_id = '123' -- HERE!
GROUP BY pr.card_id
) p ON cards.card_id = p.card_id
WHERE ...
This is much cheaper than filtering after the fact. And can be supported with an index on (set_id) - unlike any attempts to filter on the dynamically generated jsonb column.
This is efficient, while we need to aggregate all or most rows from table printings anyway. But your added WHERE ... implies more filters on the outer SELECT. If that results in only few rows from printings being needed, a LATERAL subquery should be more efficient:
SELECT cards.*
, count(cards.*) OVER() AS full_count
, p.printing_information
FROM cards c
CROSS JOIN LATERAL (
SELECT jsonb_agg(to_jsonb(pr)) AS printing_information
FROM printings pr
WHERE pr.card_id = c.card_id
AND pr.set_id = '123' -- here!
) p
WHERE ... -- selective filter here?!?
See:
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
Aside, there is no "array of jsonb" here. The subquery produces a jsonb containing an array.

Postgresql Iterate over an array field and use the records for another query

I'm trying to iterate over an array field in order to use every record as
a parameter to query and finally join all results, but I need help to get it.
I have a table with an array field called fleets and it can have one or more values ie. {1,2,3} I need iterate over every value to get all vehicles belonging to these fleets.
With a subquery I'm getting 3 rows with these values 1,2,3
SELECT * FROM vehicles WHERE fleet_fk=(
SELECT unnest(fleets) FROM auth_user WHERE id=4)
I'm using PostgreSQL 9.4.
If your query raises the ERROR: more than one row returned by a subquery used as an expression that means that you should use ANY:
SELECT * FROM vehicles
WHERE fleet_fk = ANY(
SELECT unnest(fleets) FROM auth_user WHERE id=4)
Since fleets is an array column you have a couple of options.
Either use the ANY construct directly (no need to unnest()):
SELECT * FROM vehicles
WHERE fleet_fk = ANY(SELECT fleets FROM auth_user WHERE id = 4);
Or rewrite as join:
SELECT v.*
FROM auth_user a
JOIN vehicles v ON v.fleet_fk = ANY(a.fleets)
WHERE a.id = 4;
Or you can unnest(), then you don't need ANY any more:
SELECT v.*
FROM auth_user a
, unnest(a.fleets) fleet_fk -- implicit LATERAL join
JOIN vehicles v USING (fleet_fk)
WHERE a.id = 4;
This is assuming you don't have another column named fleet_fk in auth_user. Use the more explicit ON clause for the join in this case to avoid the ambiguity.
Be aware that there are two implementation for ANY.
ANY for sets
ANY for arrays
Behavior of the beasts is basically the same, you just feed them differently.
DB design
Consider normalizing the hidden many-to-many (or one-to-many?) relationship in your DB schema:
How to implement a many-to-many relationship in PostgreSQL?

Flattening nested hierarchies in BigQuery

I have a BigQuery table with two nested levels of repeated field hierarchies.
I need to do self join (join the table with itself) on a leaf field in the inner level.
Usage of FLATTEN clause only flattens one level and I couldn't figure out how to do this.
In theory I need to write nested FLATTEN but I couldn't make this work.
Any help would be appreciated.
Example:
Given the following table structure:
a1, integer
a2, record (repeated)
a2.b1, integer
a2.b2, record (repeated)
a2.b2.c1, integer
How do I write a query which does a self join (join each) on a2.b2.c1 on both sides.
Nested flatten -- that is flatten of a subquery -- should work. Note it requires a plethora of parentheses.
Given the schema:
{nested_repeated_f: [
{inner_nested_repeated_f: [
{string_f}]}]}
The following query will work:
SELECT t1.f1 FROM (
SELECT nested_repeated_f.inner_nested_repeated_f.string_f as f1
FROM (FLATTEN((
SELECT nested_repeated_f.inner_nested_repeated_f.string_f
FROM
(FLATTEN(lotsOdata.nested002, nested_repeated_f.inner_nested_repeated_f))
), nested_repeated_f))) as t1
JOIN (
SELECT nested_repeated_f.inner_nested_repeated_f.string_f as f2
FROM (FLATTEN((
SELECT nested_repeated_f.inner_nested_repeated_f.string_f
FROM
(FLATTEN(lotsOdata.nested002, nested_repeated_f.inner_nested_repeated_f))
), nested_repeated_f))) as t2
on t1.f1 = t2.f2

MySQL and UUID problem

I have a database where I am trying to group together similar column values and display the NULL values as separate entries.
Currently I have the following:
SELECT i.*, IFNULL(iset.set_id, UUID()) AS the_set FROM img_ref i
LEFT JOIN image_set iset ON iset.img_id = i.id
GROUP BY the_set
This works, provided there are entries in the image_set table. If there are no entries in that table, it simply groups together all the NULL values. If I remove the 'group by' statement, I get the individual rows with the unique identifier, different in each case.
Its unlikely that the image_set table would be empty, but if it was all the 'separate' images would be grouped together as one entry instead of multiple ones.
Is there something that I'm doing wrong in the query?
You have to subquery it first before using the derived UUIDs
SELECT *
FROM (
SELECT i.*, IFNULL(iset.set_id, UUID()) AS the_set FROM img_ref i
LEFT JOIN image_set iset ON iset.img_id = i.id
) SQ
GROUP BY the_set