How to Query A Hierarchy of Tables in SQL - sql

I have a table, meeting. Among these meetings there exists a hierarchy. Some of them are yearly meetings, most are just regular meetings.
All regular meetings will be associated with at least one join table, meeting_yearly_meeting. A meeting_yearly_meeting has two columns: meeting_id and yearly_meeting_id.
Here is what these two tables look like:
meeting:
id SERIAL PRIMARY KEY,
title VARCHAR(255),
mappable BOOLEAN,
phone VARCHAR(255),
email VARCHAR(255),
city VARCHAR(255),
address VARCHAR(255),
zip VARCHAR(255),
latitude NUMERIC,
longitude NUMERIC,
description VARCHAR(255),
worship_time TIME,
state VARCHAR(255),
website VARCHAR(255),
lgbt_affirming BOOLEAN,
created TIMESTAMP default current_timestamp,
updated TIMESTAMP default current_timestamp
meeting_yearly_meeting:
id SERIAL PRIMARY KEY,
meeting_id SMALLINT,
yearly_meeting_id SMALLINT,
created TIMESTAMP default current_timestamp,
updated TIMESTAMP default current_timestamp
So from my /meetings endpoint, I want to return a collection of all meetings - both regular and yearly meetings. I want to return the meetings with all their columns, as well as an additional column: yearly_meeting.
For meeting records that have one or more associated meeting_yearly_meeting records, yearly_meeting would be a comma-delimited list of the title of the meeting record that is designated as that meeting's yearly meeting. For those meetings that do not have any associated meeting_yearly_meeting records (and therefore are themselves yearly meetings), I want the yearly_meeting field to beNULL`.
On my way to pursuing this goal, I tried something like this:
SELECT t1.*, t2.meeting_yearly_meeting AS yearly_meeting
FROM (
SELECT * FROM meeting
FULL JOIN meeting_yearly_meeting ON meeting.id = meeting_yearly_meeting.yearly_meeting_id;
) as t1,
(
SELECT CASE WHEN (meeting_yearly_meeting.id IS NOT NULL)
THEN (SELECT title FROM meeting WHERE meeting.id = meeting_yearly_meeting.yearly_meeting_id)
ELSE NULL
END
FROM (
SELECT meeting_yearly_meeting.* FROM meeting
FULL JOIN meeting_yearly_meeting ON meeting.id = meeting_yearly_meeting.meeting_id
) as meeting_yearly_meeting;
) as t2;
But this throws a syntax error.
I appreciate any insight others might have. Please let me know if there is any additional context or clarification you need!
UPDATE:
Sample meeting data: https://gist.github.com/micahbales/4013399c3fd23a0caf108124dab827c8
Sample meeting_yearly_meeting data: https://gist.github.com/micahbales/fcbdeef282bd7bf1014606cee43bfb5e
Expected return value example: https://gist.github.com/micahbales/13d2aafdc5d43c4b948dc39c2df51569

You can try to left join the yearly meetings and then use string_agg() to get your comma delimited list.
SELECT m1.id,
m1.title,
m1.mappable,
m1.phone,
m1.email,
m1.city,
m1.address,
m1.zip,
m1.latitude,
m1.longitude,
m1.description,
m1.worship_time,
m1.state,
m1.website,
m1.lgbt_affirming,
m1.created,
m1.updated,
string_agg(m2.title, ', ') yearly_meeting
FROM meeting m1
LEFT JOIN meeting_yearly_meeting mym1
ON mym1.meeting_id = m1.id
LEFT JOIN meeting m2
ON m2.id = mym1.yearly_meeting_id
GROUP BY m1.id,
m1.title,
m1.mappable,
m1.phone,
m1.email,
m1.city,
m1.address,
m1.zip,
m1.latitude,
m1.longitude,
m1.description,
m1.worship_time,
m1.state,
m1.website,
m1.lgbt_affirming,
m1.created,
m1.updated;
Edit:
A more "compact" solution could be using a correlated subquery.
SELECT m1.*,
(SELECT string_agg(m2.title, ', ')
FROM meeting_yearly_meeting mym1
LEFT JOIN meeting m2
ON m2.id = mym1.yearly_meeting_id
WHERE mym1.meeting_id = m1.id) yearly_meeting
FROM meeting m1;
But note, though it's less code, it's not necessarily faster.

Related

How to show clients with 0 reservations in certain year? (SQL)

I have these tables:
CREATE TABLE tour
(
id bigserial NOT NULL,
end_date DATE,
initial_price float8 NOT NULL,
start_date DATE,
destination_id int8,
guide_id int8,
PRIMARY KEY (id)
);
CREATE TABLE client_data
(
id bigserial NOT NULL,
name VARCHAR(255),
passport_number VARCHAR(255),
surname VARCHAR(255),
user_data_id int8,
PRIMARY KEY (id)
);
CREATE TABLE reservation
(
id bigserial not null,
actual_price float8 not null,
client_id int8,
tour_id int8,
PRIMARY KEY (id)
);
Where every reservation is connected to client_data and tour.
My goal is to show all clients that has not made any reservation in certain year eg. clients that have no reservations in 2022.
I tried something like this:
SELECT client_data.name, reservation.id, COUNT(reservation.id)
FROM client_data
LEFT OUTER JOIN reservation ON client_data.id = reservation.client_id
LEFT OUTER JOIN tour ON tour.id = reservation.tour_id
GROUP BY client_data.name, reservation.id
HAVING COUNT(reservation.id) = 0;
Or this:
SELECT client_data.name, reservation.id, COUNT(reservation.id)
FROM client_data
LEFT OUTER JOIN reservation ON client_data.id = reservation.client_id
LEFT OUTER JOIN tour ON tour.id = reservation.tour_id
WHERE reservation.id IS NULL
GROUP BY client_data.name, reservation.id;
These both ways work and show me clients that have no reservations IN GENERAL but I also need to show clients from certain year.
When I try to include
WHERE tour.start_date BETWEEN '2022-01-01' AND '2022-12-31'
the SQL statement returns 0 rows.
Any ideas how to do this?
EDIT:
I'll add full data and schema i work with.
schema: https://pastebin.com/ETvrW1tQ
data: https://pastebin.com/h1WHT0zZ
You've gotten it almost right. The reason why WHERE tour.start_date BETWEEN '2022-01-01' AND '2022-12-31' returns 0 rows is because it filters out all those clients who didn't make a reservation in that period as WHERE is applied to whole result set. So, instead of adding the date condition in the WHERE clause, I'd suggest adding it in the join condition for tour. Moreover I believe an OUTER JOIN wouldn't be required here either as you just want all the clients so, a LEFT JOIN should be sufficient. I think the following should work:
SELECT client_data.name, reservation.id, COUNT(reservation.id)
FROM client_data
LEFT JOIN reservation ON client_data.id = reservation.client_id
LEFT JOIN tour ON tour.id = reservation.tour_id and tour.start_date BETWEEN '2022-01-01' AND '2022-12-31'
WHERE reservation.id IS NULL
GROUP BY client_data.name, reservation.id;
Hope it helps
Edit
As OP mentioned the above query doesn't work as intended, I think we'll have to resort to using a subquery (or cte) here which I previously wanted to avoid due to performance reasons but maybe we're getting too ahead of ourselves on that. It's possible we can avoid it but I can't think of the correct way at the moment so here's a solution with subquery that will hopefully work.
select * from client_data where id not in (
select distinct client_id from reservation r
join tour t on r.tour_id = t.id
where t.start_date BETWEEN '2022-01-01' AND '2022-12-31'
);
In this we first find out the client_ids that did make a reservation in the said time frame and filter them out from the client data.
Have attached a fiddle in which you can play around it a bit
This will return the tour id's in 2022 that do not have a corresponding tour id in reservation:
select id as tour_id
from tour
where start_date between '2022-01-01' and '2022-12-31'
except
select tour_id
from reservation;
But since TOUR does not have a client_id, then how would you expect to get the client_id or client_name?

Inner join removes some rows unnecessarily

I have 3 tables defined like so
CREATE TABLE participants(
id SERIAL PRIMARY KEY,
Name TEXT NOT NULL,
Title TEXT NOT NULL
);
CREATE TABLE meetings (
id SERIAL PRIMARY KEY,
Subject TEXT NOT NULL,
Organizer TEXT NOT NULL,
StartTime TIMESTAMP NOT NULL,
EndTime TIMESTAMP NOT NULL
);
CREATE TABLE meetings_participants(
meeting_id int not null,
participant_id int not null,
primary key (meeting_id, participant_id),
foreign key(meeting_id) references meetings(id),
foreign key(participant_id) references participants(id)
);
I want to find meetings happening today with participants in them.
When I run this query I basically get them
SELECT * from meetings
INNER JOIN meetings_participants ON meetings.id = meetings_participants.meeting_id
INNER JOIN participants ON meetings_participants.participant_id = participants.id
WHERE starttime::date = NOW()::date;
Problem is this query discards meetings where there are no participants yet, I still wish to include them into my query result. How can I modify my query to work like that ?
You need a LEFT JOIN instead of INNER. Using ::date casting you are implying that you are only interested them to be taking place today, whether or not it might already ended. Still you should include EndTime in your query, taking into consideration that there might be meetings that span over several days:
SELECT * from meetings
left join meetings_participants on meetings.id = meetings_participants.meeting_id
left join participants on meetings_participants.participant_id = participants.id
WHERE starttime::date <= NOW()::date and endtime::date >= NOW()::date ;
DBFiddle demo here.
EDIT: Participants' name and title as JSON array:
SELECT id, subject, organizer, starttime, endtime, jsonb_pretty(tmp.participants)
from meetings m
left join lateral (
select jsonb_agg(row_to_json(tp)) as participants
from (select p.name, p.title
from meetings_participants mp
inner join participants p on mp.participant_id = p.id
where mp.meeting_id = m.id
) tp
) tmp on true
WHERE starttime::date <= NOW()::date
and endtime::date >= NOW()::date;
DBFiddle demo for participants added as JSON
You did not mention whether you want each participant on a separate row or as an aggregate (e.g. a comma separated list). If former then change inner to left join. For the latter case you could:
SELECT meetings.*, (
SELECT string_agg(participants.name, ', ')
FROM meetings_participants
JOIN participants ON meetings_participants.participant_id = participants.id
WHERE meetings_participants.meeting_id = meetings.id
) AS participants_list
FROM meetings
WHERE starttime::date = current_date

Postgresql select count with join

I have two tables:
CREATE TABLE stores (
stores_id varchar PRIMARY KEY,
owner_id varchar
);
CREATE TABLE sets (
sets_id varchar PRIMARY KEY,
stores_id varchar not null,
owner_id varchar not null,
item_id varchar not null,
);
How do I make a request that shows the number of items on the sets in stores?
With selection by owner.
For example:
select
stores.*,
count(sets.item_id)
from stores
LEFT OUTER JOIN sets on stores.owner_id = sets.owner_id
where
stores.owner_id = 'e185775fc4f5'
GROUP BY stores.owner_id;
Thank you.
I think you'd need to join on both the store and the owner, then COUNT(DISTINCT item_id)
select
st.owner_id,
st.stores_id,
count(distinct se.item_id)
from stores st left join
sets se
on st.owner_id = se.owner_id
and st.stores_id = se.stores_id
group by st.owner_id, st.stores_id;
That will give a table that shows the owner, the store, then the number of items
Is this what you want?
select st.stores_id, count(se.item_id)
from stores st left join
sets se
on st.owner_id = se.owner_id
where st.owner_id = 'e185775fc4f5'
group by st.stores_id;

Querying for who worked on an item first and second

I have a table that looks like this:
Id (PK, int, not null)
ReviewedBy (nvarchar(255), not null)
ReviewDateTime(datetime, not null)
Decision_id (int, not null)
Item_id (FK, int, not null)
The business process with this table is that each Item (shown by Item_id foreign key) is to be worked on by 2 people.
How can I query this table to determine who (ReviewedBy) reviewed the item first and who reviewed it second.
I'm really struggling to figure this out because I neglected adding a Type column to my table that would determine which the user was acting as. :(
Edit
Given the following data
Id,ReviewedBy,ReviewedWhen,SomeOtherId,
16,111111,2011-12-14 22:06:54,1,
17,187935,2011-12-14 22:07:03,1,
18,187935,2011-12-14 22:07:18,2,
19,187935,2011-12-14 22:07:20,3,
20,111111,2011-12-14 22:07:23,2,
21,187935,2011-12-14 22:07:26,3,
22,123456,2011-12-14 22:27:50,4,
with schema
CREATE TABLE [Reviews] (
[Id] INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
[ReviewedBy] NVARCHAR(6) NOT NULL,
[ReviewedWhen] TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
[SomeOtherId] INTEGER NOT NULL
);
Executing the following to get a list of people who did second reviews will return rows where there is only one review for SomeOtherId.
select t1.*
from Reviews as t1
left outer join Reviews as t2
on (t1.SomeOtherId = t2.SomeOtherId and t1.ReviewedWhen < t2.ReviewedWhen)
where t2.SomeOtherId is null;
Solution
-- First checks
select t1.ReviewedBy, count(t1.Id)
from Reviews as t1
left outer join Reviews as t2
on (t2.SomeOtherId = t1.SomeOtherId and t1.ReviewedWhen > t2.ReviewedWhen)
where t2.SomeOtherID is null
group by t1.ReviewedBy;
-- Second checks
select t1.ReviewedBy, count(t1.Id)
from Reviews as t1
left outer join Reviews as t2
on (t2.SomeOtherId = t1.SomeOtherId and t1.ReviewedWhen < t2.ReviewedWhen)
where t2.SomeOtherID is null
and t1.Id not in (select Id from Reviews group by SomeOtherId having count(SomeOtherId) = 1)
group by t1.ReviewedBy;
Essentially, it was counting items where there was only one review as both a first and second check. All I had to do was ensure that when I'm counting second checks that I'm not including rows with only one review.
I thought I could achieve this in one query but guess not.
Try this:
select
t1.ReviewedBy FirstReviewer,
t2.ReviewedBy SecondReviewer
from
Table t1
left outer join Table t2 on t1.Item_Id = t2.Item_Id and t2.ReviewDateTime > t1.ReviewDateTime
If you want to only return rows that have been reviewed by two people, change the left outer join to an inner join.
If ReviewDateTime is never updated and Id is an identity column you can change the join to join on Id rather ReviewDateTime, which will be faster.

SQL joins with multiple records into one with a default

My 'people' table has one row per person, and that person has a division (not unique) and a company (not unique).
I need to join people to p_features, c_features, d_features on:
people.person=p_features.num_value
people.division=d_features.num_value
people.company=c_features.num_value
... in a way that if there is a record match in p_features/d_features/c_features only, it would be returned, but if it was in 2 or 3 of the tables, the most specific record would be returned.
From my test data below, for example, query for person=1 would return
'FALSE'
person 3 returns maybe, person 4 returns true, and person 9 returns default
The biggest issue is that there are 100 features and I have queries that need to return all of them in one row. My previous attempt was a function which queried on feature,num_value in each table and did a foreach, but 100 features * 4 tables meant 400 reads and it brought the database to a halt it was so slow when I loaded up a few million rows of data.
create table p_features (
num_value int8,
feature varchar(20),
feature_value varchar(128)
);
create table c_features (
num_value int8,
feature varchar(20),
feature_value varchar(128)
);
create table d_features (
num_value int8,
feature varchar(20),
feature_value varchar(128)
);
create table default_features (
feature varchar(20),
feature_value varchar(128)
);
create table people (
person int8 not null,
division int8 not null,
company int8 not null
);
insert into people values (4,5,6);
insert into people values (3,5,6);
insert into people values (1,2,6);
insert into p_features values (4,'WEARING PANTS','TRUE');
insert into c_features values (6,'WEARING PANTS','FALSE');
insert into d_features values (5,'WEARING PANTS','MAYBE');
insert into default_features values('WEARING PANTS','DEFAULT');
You need to transpose the features into rows with a ranking. Here I used a common-table expression. If your database product does not support them, you can use temporary tables to achieve the same effect.
;With RankedFeatures As
(
Select 1 As FeatureRank, P.person, PF.feature, PF.feature_value
From people As P
Join p_features As PF
On PF.num_value = P.person
Union All
Select 2, P.person, PF.feature, PF.feature_value
From people As P
Join d_features As PF
On PF.num_value = P.division
Union All
Select 3, P.person, PF.feature, PF.feature_value
From people As P
Join c_features As PF
On PF.num_value = P.company
Union All
Select 4, P.person, DF.feature, DF.feature_value
From people As P
Cross Join default_features As DF
)
, HighestRankedFeature As
(
Select Min(FeatureRank) As FeatureRank, person
From RankedFeatures
Group By person
)
Select RF.person, RF.FeatureRank, RF.feature, RF.feature_value
From people As P
Join HighestRankedFeature As HRF
On HRF.person = P.person
Join RankedFeatures As RF
On RF.FeatureRank = HRF.FeatureRank
And RF.person = P.person
Order By P.person
I don't know if I had understood very well your question, but to use JOIN, you need your table loaded already and then use the SELECT statement with INNER JOIN, LEFT JOIN or whatever you need to show.
If you post some more information, maybe turn it easy to understand.
There are some aspects of your schema I'm not understanding, like how to relate to the default_features table if there's no match in any of the specific tables. The only possible join condition is on feature, but if there's no match in the other 3 tables, there's no value to join on. So, in my example, I've hard-coded the DEFAULT since I can't think of how else to get it.
Hopefully this can get you started and if you can clarify the model a bit more, the solution can be refined.
select p.person, coalesce(pf.feature_value, df.feature_value, cf.feature_value, 'DEFAULT')
from people p
left join p_features pf
on p.person = pf.num_value
left join d_features df
on p.division = df.num_value
left join c_features cf
on p.company = cf.num_value