PostgreSQL: Aggregating combinations of names in list - sql

I have a DB structure that looks something like this:
Table "public.person"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | not null |
Table "public.person_name"
Column | Type | Collation | Nullable | Default
--------+-------------------+-----------+----------+---------
person | integer | | not null |
name | character varying | | |
Foreign-key constraints:
"person_name_person_fkey" FOREIGN KEY (person) REFERENCES person(id)
Table "public.event"
Column | Type | Collation | Nullable | Default
--------+-------------------+-----------+----------+---------
id | integer | | not null |
name | character varying | | |
Table "public.attendee"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
event | integer | | |
person | integer | | |
Foreign-key constraints:
"attendee_event_fkey" FOREIGN KEY (event) REFERENCES public.event(id)
"attendee_person_fkey" FOREIGN KEY (person) REFERENCES person(id)
With some sample data:
person:
id
----
0
1
2
3
person_name:
person | name
--------+-----------
0 | Alex
0 | Alexander
1 | Barbara
1 | Barb
2 | Cecilia
3 | Dave
3 | David
event:
id | name
----+------------
0 | Wedding
1 | Party
2 | Funeral
attendee:
event | person
-------+--------
0 | 0
0 | 1
0 | 2
1 | 1
1 | 2
2 | 2
2 | 3
I'd like to make a select statement that returns all events, with a row for every combination of nicknames that all attendees have, like this:
event_id | event_name | attendee_list
----------+------------+---------------
0 | Wedding | Alex, Barbara, Cecilia
0 | Wedding | Alexander, Barbara, Cecilia
0 | Wedding | Alex, Barb, Cecilia
0 | Wedding | Alexander, Barb, Cecilia
1 | Party | Barbara, Cecilia
1 | Party | Barb, Cecilia
2 | Funeral | Cecilia, Dave
2 | Funeral | Cecilia, David
My initial intuition was to join all of the tables together, group by event, and then use string_agg, but that puts all of everybody's nicknames in the list (of course, since it's aggregating over the whole join). My second attempt was to select the attendee names from a subquery, but subqueries can't return multiple rows. I also tried aggregating using arrays instead, as described here, but you can't aggregate arrays of differing dimensionality. Finally, I tried using some recursive magic as described here, but found it difficult to adapt to my problem, and ultimately couldn't get it to work.

Here's a recursive query that does it. I made a array of the person IDs, and in each stage of the recursion I joined the next ID with the person_name table.
WITH RECURSIVE recur AS (
SELECT
event as event_id,
event.name as event_name,
array_agg(person) as person_id_list,
ARRAY[]::text[] as person_name_list,
1 as index
FROM attendee, event
WHERE attendee.event = event.id
GROUP BY event, event.name
UNION ALL
SELECT
event_id,
event_name,
person_id_list,
person_name_list || person_name.name,
index + 1
FROM recur
JOIN person_name on (person_name.person = recur.person_id_list[recur.index])
WHERE cardinality(recur.person_id_list) >= recur.index
)
SELECT event_id, event_name, array_to_string(person_name_list, ', ') as attendee_list
FROM recur
WHERE cardinality(recur.person_id_list) < recur.index
ORDER BY event_id;

I think I got it figured out with the "recursive magic" linked before. The issue was that my real data was a little more complicated and each attendee had a "position" in the list, which didn't always work with the r.id < t.id constraint. Here's a query that works with the sample data in the question:
with recursive recur as (
select
array[person_name.person] as persons,
array[name] as names,
attendee.event
from person_name
join attendee
on person_name.person=attendee.person
union all
select
persons || t.person,
names || t.name,
attendee.event
from person_name t
join recur r
on t.person != all(r.persons)
join attendee
on t.person=attendee.person
and attendee.event=r.event
)
select event, names
from recur
where cardinality(names)=(
select count(*)
from attendee
where attendee.event=recur.event
);
This does return an additional row for every possible order of attendees as well, but I'm fine with that (and like I said, my real data has a "position" field that would constrain that). If you needed only one ordering, the ordering has to be specified in the data somewhere, so for example adding back that r.id < t.id bit would work.

Related

get category with child categories next to it

I have a categories table.
Categories
CREATE TABLE IF NOT EXISTS public.categories
(
id integer NOT NULL DEFAULT nextval('categories_id_seq'::regclass),
name text COLLATE pg_catalog."default" NOT NULL,
description text COLLATE pg_catalog."default",
shell text COLLATE pg_catalog."default",
createdat timestamp with time zone DEFAULT now(),
"isChild" boolean DEFAULT false,
"motherCategory" text COLLATE pg_catalog."default" DEFAULT 'none'::text,
CONSTRAINT categories_pkey PRIMARY KEY (id)
)
im looking for an output similar to this:
+-------------------------------------------------------+-----------------------------------------------------+
| motherCategory | childCategories |
+----+----------+---------------+----------+------------+----+---------+--------------+----------+------------+
| id | name | description | shell | createdat | id | name | description | shell | createdat |
+----+----------+---------------+----------+------------+----+---------+--------------+----------+------------+
| 1 | mother 1 | mother 1 desc | m1/shell | 13/12/2013 | 2 | child 1 | child 1 desc | c1/shell | 01/01/2014 |
| | | | | +----+---------+--------------+----------+------------+
| | | | | | 3 | child 2 | child 2 desc | c2/shell | 6/9/2069 |
+----+----------+---------------+----------+------------+----+---------+--------------+----------+------------+
| 4 | mother 2 | mother 2 desc | m2/shell | 01/02/2033 | none |
+----+----------+---------------+----------+------------+----+---------+--------------+----------+------------+
| 5 | mother 3 | mother 3 desc | m3/shell | 11/11/2011 | 6 | child 3 | child 3 desc | c3/shell | 05/05/2005 |
+----+----------+---------------+----------+------------+----+---------+--------------+----------+------------+
its a fairly complex query, well atleast for my level, basically my categories table has both mother and child categories in one place, and differs them with two columns (isChild: boolean, motherCategory: integer), isChild lets sql know that category is a child, and motherCategory stores the id of the mother category located in the same table.
as for the query i think its self explanatory, basically i want to show a list of categories where every mother category is stored next to all its children, displaying all their data aswell, and incase a mother doesn't have children, it returns none as the child element.
To be completely honest im new to sql, so im not even sure if an
output like this is possible, but incase you have any idea, help me
out!
Thanks
Please help me if you have any ideas
As per my opinion You should separate two table for mother and child
I am assuming that motherCategory is having id of mother of particular child
So your expected output can be generate by below query
SELECT
m.id, m.name, m.description, m.shell, m.createdat
,c.id, c.name, c.description, c.shell, c.createdat
FROM
(SELECT id, name, description, shell, createdat,motherCategory FROM categories WHERE isChild=0)m LEFT JOIN
(SELECT id, name, description, shell, createdat,motherCategory FROM categories WHERE isChild=1)c
ON m.id=c.motherCategory

SQL - Get unique values by key selected by condition

I want to clean a dataset because there are repeated keys that should not be there. Although the key is repeated, other fields do change. On repetition, I want to keep those entries whose country field is not null. Let's see it with a simplified example:
| email | country |
| 1#x.com | null |
| 1#x.com | PT |
| 2#x.com | SP |
| 2#x.com | PT |
| 3#x.com | null |
| 3#x.com | null |
| 4#x.com | UK |
| 5#x.com | null |
Email acts as key, and country is the field which I want to filter by. On email repetition:
Retrieve the entry whose country is not null (case 1)
If there are several entries whose country is not null, retrieve one of them, the first occurrence for simplicity (case 2)
If all the entries' country is null, again, retrieve only one of them (case 3)
If the entry key is not repeated, just retrieve it no matter what its country is (case 4 and 5)
The expected output should be:
| email | country |
| 1#x.com | PT |
| 2#x.com | SP |
| 3#x.com | null |
| 4#x.com | UK |
| 5#x.com | null |
I have thought of doing a UNION or some type of JOIN to achieve this. One possibility could be querying:
SELECT
...
FROM (
SELECT *
FROM `myproject.mydataset.mytable`
WHERE country IS NOT NULL
) AS a
...
and then match it with the full table to add those values which are missing, but I am not able to imagine the way since my experience with SQL is limited.
Also, I have read about the COALESCE function and I think it could be helpful for the task.
Consider below approach
select *
from `myproject.mydataset.mytable`
where true
qualify row_number() over(partition by email order by country nulls last) = 1

SQL / Oracle to Tableau - How to combine to sort based on two fields?

I have tables below as follows:
tbl_tasks
+---------+-------------+
| Task_ID | Assigned_ID |
+---------+-------------+
| 1 | 8 |
| 2 | 12 |
| 3 | 31 |
+---------+-------------+
tbl_resources
+---------+-----------+
| Task_ID | Source_ID |
+---------+-----------+
| 1 | 4 |
| 1 | 10 |
| 2 | 42 |
| 4 | 8 |
+---------+-----------+
A task is assigned to at least one person (denoted by the "assigned_ID") and then any number of people can be assigned as a source (denoted by "source_ID"). The ID numbers are all linked to names in another table. Though the ID numbers are named differently, they all return to the same table.
Would there be any way for me to combine the two tables based on ID such that I could search based on someone's ID number? For example- if I decide to search on or do a WHERE User_ID = 8, in order to see what Tasks that 8 is involved in, I would get back Task 1 and Task 4.
Right now, by joining all the tables together, I can easily filter on "Assigned" but not "Source" due to all the multiple entries in the table.
Use union all:
select distinct task_id
from ((select task_id, assigned_id as id
from tbl_tasks
) union all
(select task_id, source_id
from tbl_resources
)
) ti
where id = ?;
Note that this uses select distinct in case someone is assigned to the same task in both tables. If not, remove the distinct.

Subract and Add two columns from different table

I want to subract and add two columns from different table.
table book:
BookID | BookName | Author | Edition | PublishingYear | copies| Shelf | Row
1 | SQL | Robert | 3 | 2005 | 3 | A |third
table Issue: (in this I have created join with tblPerson to show the PersonName instead of PersonID)
BookID | BookName | DateIssue | ReturnDate | PersonName | copies
1 | SQL | 2015-10-12 | 2015-10-12 | john | 1
table Return:
BookID | BookName | DateIssue | ReturnDate | PersonName | copies
1 | SQL | 2015-10-12 | 2015-10-12 | john | 1
Sql Query:
Select (tblBook.copies) - (tblIssue.copies)
FROm tblBook
FULL join tblIssue
ON tblBook.copies = tblIssue.copies
This query doesn't subract these two columns(copies).
I want to minus the column (copies) tblIssue from tblbook column copies(original value) when I issue the book.
And when I return the book from tbl Return, it gives me the original value in the column(copies) in tblBook.
When you are doing joins, you need to join tables on they keys that binds the tables togheter.
Joining two tables on copies makes no logical sense, so instead join it on the primary key foreign key references BookID.
Select (tblBook.copies) - (tblIssue.copies)
FROm tblBook
FULL join tblIssue
ON tblBook.BookId = tblIssue.BookId
This will produce a following result, if no negations whas made.
BookID | BookName | Author | Edition | PublishingYear | copies| Shelf | Row | BookID | BookName | DateIssue | ReturnDate | PersonName | copies
1 | SQL | Robert | 3 | 2005 | 3 | A |third| 1 | SQL | 2015-10-12 | 2015-10-12 | john | 1
Thereby subtracting 3-1 which should equal 2. But it is always important to consider what the join result should be and then plan out your join strategy.
EDIT 1
An example query of how many books are availeble at given time
Select tblBook.BookId, MAX(tblBook.copies) - SUM(tblIssue.copies) as countOfAvailebleBooks
FROm tblBook
FULL join tblIssue
ON tblBook.BookId = tblIssue.BookId
WHERE tblIssue.ReturnDate >= '2015-10-12'AND tblIssue.IssueDate <= '2015-10-12'
This will produce an aggregate result for a given date, which is 2015-10-12 for this case, of how many books you are availeble, it is not optimal but what you have given, it seems like it is the best solution.

SQL Select Text for multiple foreign keys to lookup table in same row

I have a table similar to the following that has a history of changes to an item and holds the old and new value for a status. The status number is a foreign key to a lookup table that holds the text. I.e. 1 = 'In Inventory', 2= 'Destroyed' etc..
I want to be able to present this as human readable results and replace the integer keys with the text from the lookup table but I'm not quite sure how to do that as I can't just join on the foreign key.
Demo Database
+---------+-------------+-------------+------------+
| ITEM_ID | OLD_STATUS | NEW_STATUS | TIMESTAMP |
+---------+-------------+-------------+------------+
| 1 | 1 | 2 | 2012-03-25 |
| 1 | 2 | 3 | 2013-12-25 |
| 1 | 3 | 4 | 2015-03-25 |
+---------+-------------+-------------+------------+
You can join on the status table multiple times - something like this:
select i.item_id,
i.old_status,
i.new_status,
i.timestamp,
s1.statustext,
s2.statustext
from items i
join status s1 on i.old_status = s1.statusid
join status s2 on i.new_status = s2.statusid