Query database for distinct values and aggregate data based on condition - sql

I am trying to extract distinct items from a Postgres database pairing a column from a table with a column from another table based on a condition. Simplified version looks like this:
CREATE TABLE users
(
id SERIAL PRIMARY KEY,
name VARCHAR(255)
);
CREATE TABLE photos
(
id INT PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
flag VARCHAR(255)
);
INSERT INTO users VALUES (1, 'Bob');
INSERT INTO users VALUES (2, 'Alice');
INSERT INTO users VALUES (3, 'John');
INSERT INTO photos VALUES (1001, 1, 'a');
INSERT INTO photos VALUES (1002, 1, 'b');
INSERT INTO photos VALUES (1003, 1, 'c');
INSERT INTO photos VALUES (1004, 2, 'a');
INSERT INTO photos VALUES (1004, 2, 'x');
What I need is to extract each user name, only once, and a flag value for each of them. The flag value should prioritize a specific one, let's say b. So, the result should look like:
Bob b
Alice a
Where Bob owns a photo having the b flag, while Alice does not and John has no photos. For Alice the output for the flag value is not important (a or x would be just as good) as long as she owns no photo flagged b.
The closest thing I found were some self-join queries where the flag value would have been aggregated using min() or max(), but I am looking for a particular value, which is not first, nor last. Moreover, I found out that you can define your own aggregate functions, but I wonder if there is an easier way of conditioning the query in order to obtain the required data.
Thank you!

Here is a method with aggregation:
select u.name,
coalesce(max(flag) filter (where flag = 'b'),
min(flag)
) as flag
from users u left join
photos p
on u.id = p.user_id
group by u.id, u.name;
That said, a more typical method would be a prioritization query. Perhaps:
select distinct on (u.id) u.name, p.flag
from users u left join
photos p
on u.id = p.user_id
order by u.id, (p.flag = 'b') desc;

Related

How can I get rows from a table that matches all the foreign keys from another table

Say I have two tables role and roleApp defined like this:
create table #tempRole(roleId int);
insert into #tempRole (roleId) values (1)
insert into #tempRole (roleId) values (2)
create table #tempRoleApp(roleId int, appId int);
insert into #tempRoleApp (roleId, appId) values (1, 26)
insert into #tempRoleApp (roleId, appId) values (2, 26)
insert into #tempRoleApp (roleId, appId) values (1, 27)
So, from #tempRoleApp table, I want to get only the rows that matches all the values of the #tempRole table (1 and 2), so in this case the output needs to be 26 (as it matches both 1 and 2) but not 27 as the table does not have 2, 27).
#tempRole table is actually the output from another query so it can have arbitrary number of values.
I tried few things like:
select *
from #tempRoleApp
where roleId = ALL(select roleId FROM #tempRole)
Which does not give anything... tried few more things but not getting what I want.
I believe this gives what you were looking for.
select tra.appId
from #tempRoleApp as tra
join #tempRole as tr on tra.roleId = tr.roleId
group by tra.appId
having count(distinct tra.roleId) = (select count(distinct roleId) from #tempRole)
It uses count distinct to get the total unique roleId's in the tempRole table and compares that with the unique count of these per appId, after confirming the roleIds match between the tables.
As you clarified in the comment, once you add another tempRole roleId, now no entry has all of the Ids so no rows are returned.

Selecting X amount of rows from one table depending on value of column from another joined table

I am trying to join several tables. To simplify the situation, there is a table called Boxes which has a foreign key column for another table, Requests. This means that with a simple join I can get all the boxes that can be used to fulfill a request. But the Requests table also has a column called BoxCount which limits the number of boxes that is needed.
Is there a way to structure the query in such a way that when I join the two tables, I will only get the number of rows from Boxes that is specified in the BoxCount column of the given Request, rather than all of the rows from Boxes that have a matching foreign key?
Script to initialize sample data:
CREATE TABLE Requests (
Id int NOT NULL PRIMARY KEY,
BoxCount Int NOT NULL);
CREATE TABLE Boxes (
Id int NOT NULL PRIMARY KEY,
Label varchar,
RequestId INT FOREIGN KEY REFERENCES Requests(Id));
INSERT INTO Requests (Id, BoxCount)
VALUES
(1, 2),
(2, 3);
INSERT INTO Boxes (Id, Label, RequestId)
VALUES
(1, 'A', 1),
(2, 'B', 1),
(3, 'C', 1),
(4, 'D', 2),
(5, 'E', 2),
(6, 'F', 2),
(7, 'G', 2);
So, for example, when the hypothetical query is ran, it should return boxes A and B (because the first Request only needs 2 boxes), but not C. Similarly it should also include boxes D, E and F, but not box G, because the second request only requires 3 boxes.
Here is another approach using ROWCOUNT - a common and useful technique that every sql writer should master. The idea here is that you create a sequential number for all boxes within a request and use that to compare to the box count for filtering.
with boxord as (select *,
ROW_NUMBER() OVER (PARTITION BY RequestId ORDER BY Id) as rno
from dbo.Boxes
)
select req.*, boxord.Label, boxord.rno
from dbo.Requests as req inner join boxord on req.Id = boxord.RequestId
where req.BoxCount >= boxord.rno
order by req.Id, boxord.rno
;
fiddle to demonstrate
The INNER JOIN keyword selects records that have matching values in both tables
SELECT (cols) FROM Boxes
INNER JOIN Request on Boxes.(FK_column) = request.id
WHERE Request.BoxCount = (value)
select r.id,
r.boxcount,
b.id,
b.label
from requests r
cross apply (
select top (r.BoxCount)
id, label
from boxes
where requestid = r.id
order by id
) b;

Can you sort the result in GROUP BY?

I have two tables one is objects with the attribute of id and is_green.The other table is object_closure with the attributes of ancestor_id, descendant_od, and created_at. ie.
Objects: id, is_green
Object_closure: ancestor_id, descendant_od, created_at
There are more attributes in the Object table but not necessary to mention in this question.
I have a query like this:
-- create a table
CREATE TABLE objects (
id INTEGER PRIMARY KEY,
is_green boolean
);
CREATE TABLE object_Closure (
ancestor_id INTEGER ,
descendant_id INTEGER,
created_at date
);
-- insert some values
INSERT INTO objects VALUES (1, 1 );
INSERT INTO objects VALUES (2, 1 );
INSERT INTO objects VALUES (3, 1 );
INSERT INTO objects VALUES (4, 0 );
INSERT INTO objects VALUES (5, 1 );
INSERT INTO objects VALUES (6, 1 );
INSERT INTO object_Closure VALUES (1, 2, 12-12-2020 );
INSERT INTO object_Closure VALUES (1, 3, 12-13-2020 );
INSERT INTO object_Closure VALUES (2, 3, 12-14-2020 );
INSERT INTO object_Closure VALUES (4, 5, 12-15-2020 );
INSERT INTO object_Closure VALUES (4, 6, 12-16-2020 );
INSERT INTO object_Closure VALUES (5, 6, 12-17-2020 );
-- fetch some values
SELECT
O.id,
P.id,
group_concat(DISTINCT P.id ) as p_ids
FROM objects O
LEFT JOIN object_Closure OC on O.id=OC.descendant_id
LEFT JOIN objects P on OC.ancestor_id=P.id AND P.is_green=1
GROUP BY O.id
The result is
query result
I would like to see P.id for O.id=6 is also 5 instead of null. Afterall,5 is still a parentID (p.id). More importantly, I also want the id shown in P.id as the first created id if there are more than one. (see P.created_at).
I understand the reason why it happens is that the first one the system pick is null, and the null was created by the join with the condition of is_green; however, I need to filter out those objects that are green only in the p.id.
I cannot do an inner join (because I need the other attributes of the table and sometimes both P.id and p_ids are null, but still need to show in the result) I cannot restructure the database. It is already there and cannot be changed. I also cannot just use a Min() or Max() aggregation because I want the ID that is picked is the first created one.
So is there a way to skip the null in the join?
or is there a way to filter the selection in the select clause?
or do an order by before the grouping?
P.S. My original code concat the P.id by the order of P.created_at. For some reason, I cannot replicate it in the online SQL simulator.

How to differentiate between no rows and foreign key reference not existing?

My users table contains Alice, Bob and Charles. Alice and Bob have a 3 and 2 fruits respectively. Charles has none. A relationship is established using a foreign key constraint foreign key (user_id) references users (id) and a unique (user_id, name) constraint, allowing zero or one fruit per user.
create table users (
id integer primary key,
firstname varchar(64)
);
create table fruits (
id integer primary key not null,
user_id integer not null,
name varchar(64) not null,
foreign key (user_id) references users (id),
unique (user_id, name)
);
insert into users (id, firstname) values (1, 'Alice');
insert into users (id, firstname) values (2, 'Bob');
insert into users (id, firstname) values (3, 'Charles');
insert into fruits (id, user_id, name) values (1, 1, 'grape');
insert into fruits (id, user_id, name) values (2, 1, 'apple');
insert into fruits (id, user_id, name) values (3, 1, 'pear');
insert into fruits (id, user_id, name) values (4, 2, 'orange');
insert into fruits (id, user_id, name) values (5, 2, 'cherry');
Charles does not have an orange, so there is no resulting row (first query below). However, running the same query for a user that does not exist (second query below) also returns no result.
test=# select * from fruits where user_id = 3 and name = 'orange';
id | user_id | name
----+---------+------
(0 rows)
test=# select * from fruits where user_id = 99 and name = 'orange';
id | user_id | name
----+---------+------
(0 rows)
Is it possible to perform a single query whilst simultaneously differentiating between the user not existing vs the user existing and not having a fruit?
If so, can this also be done to find all the fruits belonging to a particular user (i.e. select * from fruits where user_id = 3 vs select * from fruits where user_id = 99.
Use a LEFT [OUTER] JOIN:
SELECT u.id, u.firstname, f.name AS fruit
-- , COALESCE(f.name, '') AS fruit -- alternatively an empty string
FROM users u
LEFT JOIN fruits f ON f.user_id = u.id
AND f.name = 'orange'
WHERE u.id = 3;
If the user exists, you always get a row.
If the user has no fruit (of that name), fruit defaults to NULL - which is otherwise impossible since fruit.name is defined NOT NULL.
To get an empty string instead, throw in COALESCE(). But no fruit can be named '' then or it'll be ambiguous again. Add a CHECK constraint to be sure.
Related:
Postgres Left Join with where condition
Aside 1: "name" is not a good name.
Aside 2: Typically, this would be a many-to-many design with three tables: "users", "fruits" and "user_fruits". See:
How to implement a many-to-many relationship in PostgreSQL?

SQL: couple people who assisted to the same event

create table people(
id_pers int,
nom_pers char(25),
d_nais date,
d_mort date,
primary key(id_pers)
);
create table event(
id_evn int,
primary key(id_evn)
);
create table assisted_to(
id_pers int,
id_evn int,
foreign key (id_pers) references people(id_pers),
foreign key (id_evn) references event(id_evn)
);
insert into people(id_pers, nom_pers, d_nais, d_mort) values (1, 'A', current_date - integer '20', current_date);
insert into people(id_pers, nom_pers, d_nais, d_mort) values (2, 'B', current_date - integer '50', current_date - integer '20');
insert into people(id_pers, nom_pers, d_nais, d_mort) values (3, 'C', current_date - integer '25', current_date - integer '20');
insert into event(id_evn) values (1);
insert into event(id_evn) values (2);
insert into event(id_evn) values (3);
insert into event(id_evn) values (4);
insert into event(id_evn) values (5);
insert into assisted_to(id_pers, id_evn) values (1, 5);
insert into assisted_to(id_pers, id_evn) values (2, 5);
insert into assisted_to(id_pers, id_evn) values (2, 4);
insert into assisted_to(id_pers, id_evn) values (3, 5);
insert into assisted_to(id_pers, id_evn) values (3, 4);
insert into assisted_to(id_pers, id_evn) values (3, 3);
I need to find couples who assisted to the same event on any particular day.
I tried:
select p1.id_pers, p2.id_pers from people p1, people p2, assisted_event ae
where ae.id_pers = p1.id_pers
and ae.id_pers = p2.id_pers
But returns 0 rows.
What am I doing wrong?
Try this:
select distint ae.id_evn,
p1.nom_pers personA, p2.nom_pers PersonB
from assieted_to ae
Join people p1
On p1.id_pers = ae.id_pers
Join people p2
On p2.id_pers = ae.id_pers
And p2.id_pers > p1.id_pers
This generates all pairs of people [couples] who assisted on the same event. With your schema, there is no way to restrict the results to cases where they assisted on the same day. The assumption is that if they assisted on the same event, then that event can only have occurred on one day.
You select two persons, so you need to select two assisted_event rows as well, because each person has its own assignment row in the assisted_event table. The idea is to build a link between p1 and p2 through a pair of assisted_event rows sharing the same id_evn
select p1.id_pers, p2.id_pers
from people p1, people p2
where exists (
select *
from assisted_event e1
join assisted_event e2 on e1.id_evn=e2.id_evn
where e1.id_pers=p1.id_pers and e2.id_pers=p2.id_pers
)
When re-phrased into ANSI JOIN syntax so I can read it, your query reads:
select p1.id_pers, p2.id_pers
from assisted_event ae
inner join people p1 ON (ae.id_pers = p1.id_pers)
inner join people p2 ON (ae.id_pers = p2.id_pers)
Since id_pers is the primary key of p1, it is impossible for ae.id_pers to be simultaneously equal to p1.id_pers and p2.id_pers. You'll need to find another approach.
You don't need to join on people at all for this, though you'll probably want to in order to populate their details. You need to self-join the people-to-events join table not the people table in order to get the desired results, filtering the self-join to include only rows where the event ID is the same but the people are different. Using > rather than <> means you don't have to use another pass to filter out the (a,b) vs (b,a) pairings.
Something like:
select ae1.id_evn event_id, ae1.id_pers id_pers1, ae2.id_pers id_pers2
from assisted_to ae1
inner join assisted_to ae2
on (ae2.id_evn = ae1.id_evn and ae1.id_pers > ae2.id_pers)
You can now, if desired, add additional joins on the event and persion tables to populate details. You'll need to join people twice with different aliases to populate the two different "sides". See Charles Bretana's example.