Inner join removes some rows unnecessarily - sql

I have 3 tables defined like so
CREATE TABLE participants(
id SERIAL PRIMARY KEY,
Name TEXT NOT NULL,
Title TEXT NOT NULL
);
CREATE TABLE meetings (
id SERIAL PRIMARY KEY,
Subject TEXT NOT NULL,
Organizer TEXT NOT NULL,
StartTime TIMESTAMP NOT NULL,
EndTime TIMESTAMP NOT NULL
);
CREATE TABLE meetings_participants(
meeting_id int not null,
participant_id int not null,
primary key (meeting_id, participant_id),
foreign key(meeting_id) references meetings(id),
foreign key(participant_id) references participants(id)
);
I want to find meetings happening today with participants in them.
When I run this query I basically get them
SELECT * from meetings
INNER JOIN meetings_participants ON meetings.id = meetings_participants.meeting_id
INNER JOIN participants ON meetings_participants.participant_id = participants.id
WHERE starttime::date = NOW()::date;
Problem is this query discards meetings where there are no participants yet, I still wish to include them into my query result. How can I modify my query to work like that ?

You need a LEFT JOIN instead of INNER. Using ::date casting you are implying that you are only interested them to be taking place today, whether or not it might already ended. Still you should include EndTime in your query, taking into consideration that there might be meetings that span over several days:
SELECT * from meetings
left join meetings_participants on meetings.id = meetings_participants.meeting_id
left join participants on meetings_participants.participant_id = participants.id
WHERE starttime::date <= NOW()::date and endtime::date >= NOW()::date ;
DBFiddle demo here.
EDIT: Participants' name and title as JSON array:
SELECT id, subject, organizer, starttime, endtime, jsonb_pretty(tmp.participants)
from meetings m
left join lateral (
select jsonb_agg(row_to_json(tp)) as participants
from (select p.name, p.title
from meetings_participants mp
inner join participants p on mp.participant_id = p.id
where mp.meeting_id = m.id
) tp
) tmp on true
WHERE starttime::date <= NOW()::date
and endtime::date >= NOW()::date;
DBFiddle demo for participants added as JSON

You did not mention whether you want each participant on a separate row or as an aggregate (e.g. a comma separated list). If former then change inner to left join. For the latter case you could:
SELECT meetings.*, (
SELECT string_agg(participants.name, ', ')
FROM meetings_participants
JOIN participants ON meetings_participants.participant_id = participants.id
WHERE meetings_participants.meeting_id = meetings.id
) AS participants_list
FROM meetings
WHERE starttime::date = current_date

Related

How to show clients with 0 reservations in certain year? (SQL)

I have these tables:
CREATE TABLE tour
(
id bigserial NOT NULL,
end_date DATE,
initial_price float8 NOT NULL,
start_date DATE,
destination_id int8,
guide_id int8,
PRIMARY KEY (id)
);
CREATE TABLE client_data
(
id bigserial NOT NULL,
name VARCHAR(255),
passport_number VARCHAR(255),
surname VARCHAR(255),
user_data_id int8,
PRIMARY KEY (id)
);
CREATE TABLE reservation
(
id bigserial not null,
actual_price float8 not null,
client_id int8,
tour_id int8,
PRIMARY KEY (id)
);
Where every reservation is connected to client_data and tour.
My goal is to show all clients that has not made any reservation in certain year eg. clients that have no reservations in 2022.
I tried something like this:
SELECT client_data.name, reservation.id, COUNT(reservation.id)
FROM client_data
LEFT OUTER JOIN reservation ON client_data.id = reservation.client_id
LEFT OUTER JOIN tour ON tour.id = reservation.tour_id
GROUP BY client_data.name, reservation.id
HAVING COUNT(reservation.id) = 0;
Or this:
SELECT client_data.name, reservation.id, COUNT(reservation.id)
FROM client_data
LEFT OUTER JOIN reservation ON client_data.id = reservation.client_id
LEFT OUTER JOIN tour ON tour.id = reservation.tour_id
WHERE reservation.id IS NULL
GROUP BY client_data.name, reservation.id;
These both ways work and show me clients that have no reservations IN GENERAL but I also need to show clients from certain year.
When I try to include
WHERE tour.start_date BETWEEN '2022-01-01' AND '2022-12-31'
the SQL statement returns 0 rows.
Any ideas how to do this?
EDIT:
I'll add full data and schema i work with.
schema: https://pastebin.com/ETvrW1tQ
data: https://pastebin.com/h1WHT0zZ
You've gotten it almost right. The reason why WHERE tour.start_date BETWEEN '2022-01-01' AND '2022-12-31' returns 0 rows is because it filters out all those clients who didn't make a reservation in that period as WHERE is applied to whole result set. So, instead of adding the date condition in the WHERE clause, I'd suggest adding it in the join condition for tour. Moreover I believe an OUTER JOIN wouldn't be required here either as you just want all the clients so, a LEFT JOIN should be sufficient. I think the following should work:
SELECT client_data.name, reservation.id, COUNT(reservation.id)
FROM client_data
LEFT JOIN reservation ON client_data.id = reservation.client_id
LEFT JOIN tour ON tour.id = reservation.tour_id and tour.start_date BETWEEN '2022-01-01' AND '2022-12-31'
WHERE reservation.id IS NULL
GROUP BY client_data.name, reservation.id;
Hope it helps
Edit
As OP mentioned the above query doesn't work as intended, I think we'll have to resort to using a subquery (or cte) here which I previously wanted to avoid due to performance reasons but maybe we're getting too ahead of ourselves on that. It's possible we can avoid it but I can't think of the correct way at the moment so here's a solution with subquery that will hopefully work.
select * from client_data where id not in (
select distinct client_id from reservation r
join tour t on r.tour_id = t.id
where t.start_date BETWEEN '2022-01-01' AND '2022-12-31'
);
In this we first find out the client_ids that did make a reservation in the said time frame and filter them out from the client data.
Have attached a fiddle in which you can play around it a bit
This will return the tour id's in 2022 that do not have a corresponding tour id in reservation:
select id as tour_id
from tour
where start_date between '2022-01-01' and '2022-12-31'
except
select tour_id
from reservation;
But since TOUR does not have a client_id, then how would you expect to get the client_id or client_name?

How to Query A Hierarchy of Tables in SQL

I have a table, meeting. Among these meetings there exists a hierarchy. Some of them are yearly meetings, most are just regular meetings.
All regular meetings will be associated with at least one join table, meeting_yearly_meeting. A meeting_yearly_meeting has two columns: meeting_id and yearly_meeting_id.
Here is what these two tables look like:
meeting:
id SERIAL PRIMARY KEY,
title VARCHAR(255),
mappable BOOLEAN,
phone VARCHAR(255),
email VARCHAR(255),
city VARCHAR(255),
address VARCHAR(255),
zip VARCHAR(255),
latitude NUMERIC,
longitude NUMERIC,
description VARCHAR(255),
worship_time TIME,
state VARCHAR(255),
website VARCHAR(255),
lgbt_affirming BOOLEAN,
created TIMESTAMP default current_timestamp,
updated TIMESTAMP default current_timestamp
meeting_yearly_meeting:
id SERIAL PRIMARY KEY,
meeting_id SMALLINT,
yearly_meeting_id SMALLINT,
created TIMESTAMP default current_timestamp,
updated TIMESTAMP default current_timestamp
So from my /meetings endpoint, I want to return a collection of all meetings - both regular and yearly meetings. I want to return the meetings with all their columns, as well as an additional column: yearly_meeting.
For meeting records that have one or more associated meeting_yearly_meeting records, yearly_meeting would be a comma-delimited list of the title of the meeting record that is designated as that meeting's yearly meeting. For those meetings that do not have any associated meeting_yearly_meeting records (and therefore are themselves yearly meetings), I want the yearly_meeting field to beNULL`.
On my way to pursuing this goal, I tried something like this:
SELECT t1.*, t2.meeting_yearly_meeting AS yearly_meeting
FROM (
SELECT * FROM meeting
FULL JOIN meeting_yearly_meeting ON meeting.id = meeting_yearly_meeting.yearly_meeting_id;
) as t1,
(
SELECT CASE WHEN (meeting_yearly_meeting.id IS NOT NULL)
THEN (SELECT title FROM meeting WHERE meeting.id = meeting_yearly_meeting.yearly_meeting_id)
ELSE NULL
END
FROM (
SELECT meeting_yearly_meeting.* FROM meeting
FULL JOIN meeting_yearly_meeting ON meeting.id = meeting_yearly_meeting.meeting_id
) as meeting_yearly_meeting;
) as t2;
But this throws a syntax error.
I appreciate any insight others might have. Please let me know if there is any additional context or clarification you need!
UPDATE:
Sample meeting data: https://gist.github.com/micahbales/4013399c3fd23a0caf108124dab827c8
Sample meeting_yearly_meeting data: https://gist.github.com/micahbales/fcbdeef282bd7bf1014606cee43bfb5e
Expected return value example: https://gist.github.com/micahbales/13d2aafdc5d43c4b948dc39c2df51569
You can try to left join the yearly meetings and then use string_agg() to get your comma delimited list.
SELECT m1.id,
m1.title,
m1.mappable,
m1.phone,
m1.email,
m1.city,
m1.address,
m1.zip,
m1.latitude,
m1.longitude,
m1.description,
m1.worship_time,
m1.state,
m1.website,
m1.lgbt_affirming,
m1.created,
m1.updated,
string_agg(m2.title, ', ') yearly_meeting
FROM meeting m1
LEFT JOIN meeting_yearly_meeting mym1
ON mym1.meeting_id = m1.id
LEFT JOIN meeting m2
ON m2.id = mym1.yearly_meeting_id
GROUP BY m1.id,
m1.title,
m1.mappable,
m1.phone,
m1.email,
m1.city,
m1.address,
m1.zip,
m1.latitude,
m1.longitude,
m1.description,
m1.worship_time,
m1.state,
m1.website,
m1.lgbt_affirming,
m1.created,
m1.updated;
Edit:
A more "compact" solution could be using a correlated subquery.
SELECT m1.*,
(SELECT string_agg(m2.title, ', ')
FROM meeting_yearly_meeting mym1
LEFT JOIN meeting m2
ON m2.id = mym1.yearly_meeting_id
WHERE mym1.meeting_id = m1.id) yearly_meeting
FROM meeting m1;
But note, though it's less code, it's not necessarily faster.

Heavily polymorphed table

I have a table events. Each event can be 'initiated' and/or 'received' by a User, Visitor or a Team and I want to model these associations.
I am thinking something like
Event
type
user_actor_id
user_subject_id
visitor_actor_id
visitor_subject_id
team_actor_id
team_subject_id
Where the actor/subject refers to who initiated/received the event
Is this the correct approach? Seems like I store a lot of redundant data and I'd have to do a lot of conditional joins as it would like to query the table and get a result like
actor_id:
actor_type (either user, visitor or team)
UPDATE:
Then i'd do a select query like this
select
coalesce(ua.id, va.id, ta.id) as actor_id,
(CASE WHEN ua.id IS NOT NULL THEN 'user' WHEN va.id IS NOT NULL THEN 'visitor' ELSE 'team' END) as author_type,
(CASE WHEN ua.id IS NOT NULL THEN ua.display_name WHEN va.id IS NOT NULL THEN va.name ELSE ta.name END) as author_name,
(CASE WHEN ua.id IS NOT NULL THEN ua.avatar WHEN va.id IS NOT NULL THEN va.avatar ELSE ta.icon END) as author_name
from events e
left join users ua on ua.id = e.user_actor_id
left join users us on us.id = e.user_sibject_id
left join visitors va on va.id = e.visitor_actor_id
left join visitors vs on vs.id = e.visitor_sibject_id
left join teams ta on ta.id = e.team_actor_id
left join teams ts on ts.id = e.team_sibject_id
I would do the following:
create table tPeople ( -- contains as many rows as there are people
int ID,
nvarchar(max) Name
)
create table tRole ( -- contains three rows: Visitor, Team, User
int ID,
nvarchar(max) Name
)
create table tPeopleRole ( -- associates people with roles
int People_ID, -- FK to tPeople.ID
int Role_ID -- FK to tRole.ID
)
create table tEvent (
int ID,
int Type_ID,
int InitiatedPeople_ID, -- FK to tPeople.ID
int ReceivedPeople_ID -- FK to tPeople.ID
)
Then you can query tEvent and join on tPeople / tPeopleRole to get the initiator and receiver's names and/or roles.

Querying for who worked on an item first and second

I have a table that looks like this:
Id (PK, int, not null)
ReviewedBy (nvarchar(255), not null)
ReviewDateTime(datetime, not null)
Decision_id (int, not null)
Item_id (FK, int, not null)
The business process with this table is that each Item (shown by Item_id foreign key) is to be worked on by 2 people.
How can I query this table to determine who (ReviewedBy) reviewed the item first and who reviewed it second.
I'm really struggling to figure this out because I neglected adding a Type column to my table that would determine which the user was acting as. :(
Edit
Given the following data
Id,ReviewedBy,ReviewedWhen,SomeOtherId,
16,111111,2011-12-14 22:06:54,1,
17,187935,2011-12-14 22:07:03,1,
18,187935,2011-12-14 22:07:18,2,
19,187935,2011-12-14 22:07:20,3,
20,111111,2011-12-14 22:07:23,2,
21,187935,2011-12-14 22:07:26,3,
22,123456,2011-12-14 22:27:50,4,
with schema
CREATE TABLE [Reviews] (
[Id] INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
[ReviewedBy] NVARCHAR(6) NOT NULL,
[ReviewedWhen] TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
[SomeOtherId] INTEGER NOT NULL
);
Executing the following to get a list of people who did second reviews will return rows where there is only one review for SomeOtherId.
select t1.*
from Reviews as t1
left outer join Reviews as t2
on (t1.SomeOtherId = t2.SomeOtherId and t1.ReviewedWhen < t2.ReviewedWhen)
where t2.SomeOtherId is null;
Solution
-- First checks
select t1.ReviewedBy, count(t1.Id)
from Reviews as t1
left outer join Reviews as t2
on (t2.SomeOtherId = t1.SomeOtherId and t1.ReviewedWhen > t2.ReviewedWhen)
where t2.SomeOtherID is null
group by t1.ReviewedBy;
-- Second checks
select t1.ReviewedBy, count(t1.Id)
from Reviews as t1
left outer join Reviews as t2
on (t2.SomeOtherId = t1.SomeOtherId and t1.ReviewedWhen < t2.ReviewedWhen)
where t2.SomeOtherID is null
and t1.Id not in (select Id from Reviews group by SomeOtherId having count(SomeOtherId) = 1)
group by t1.ReviewedBy;
Essentially, it was counting items where there was only one review as both a first and second check. All I had to do was ensure that when I'm counting second checks that I'm not including rows with only one review.
I thought I could achieve this in one query but guess not.
Try this:
select
t1.ReviewedBy FirstReviewer,
t2.ReviewedBy SecondReviewer
from
Table t1
left outer join Table t2 on t1.Item_Id = t2.Item_Id and t2.ReviewDateTime > t1.ReviewDateTime
If you want to only return rows that have been reviewed by two people, change the left outer join to an inner join.
If ReviewDateTime is never updated and Id is an identity column you can change the join to join on Id rather ReviewDateTime, which will be faster.

PostgreSQL Query trimming results unnecessarily

I'm working on my first assignment using SQL on our class' PostgreSQL server. A sample database has the (partial here) schema:
CREATE TABLE users (
id int PRIMARY KEY,
userStatus varchar(100),
userType varchar(100),
userName varchar(100),
email varchar(100),
age int,
street varchar(100),
city varchar(100),
state varchar(100),
zip varchar(100),
CONSTRAINT users_status_fk FOREIGN KEY (userStatus) REFERENCES userStatus(name),
CONSTRAINT users_types_fk FOREIGN KEY (userType) REFERENCES userTypes(name)
);
CREATE TABLE events (
id int primary key,
title varchar(100),
edate date,
etime time,
location varchar(100),
user_id int, -- creator of the event
CONSTRAINT events_user_fk FOREIGN KEY (user_id) REFERENCES users(id)
);
CREATE TABLE polls (
id int PRIMARY KEY,
question varchar(100),
creationDate date,
user_id int, --creator of the poll
CONSTRAINT polls_user_fk FOREIGN KEY (user_id) REFERENCES users(id)
);
and a bunch of sample data (in particular, 127 sample users).
I have to write a query to find the number of polls created by a user within the past year, as well as the number of events created by a user that occurred in the past year. The trick is, I should have rows with 0s for both columns if the user had no such polls/events.
I have a query which seems to return the correct data, but only for 116 of the 127 users, and I cannot understand why the query is trimming these 11 users, when the WHERE clause only checks attributes of the poll/event. Following is my query:
SELECT u.id, u.userStatus, u.userType, u.email, -- Return user details
COUNT(DISTINCT e.id) AS NumEvents, -- Count number of events
COUNT(DISTINCT p.id) AS NumPolls -- Count number of polls
FROM (users AS u LEFT JOIN events AS e ON u.id = e.user_id) LEFT JOIN polls AS p ON u.id = p.user_id
WHERE (p.creationDate IS NULL OR ((now() - p.creationDate) < INTERVAL '1' YEAR) OR -- Only get polls created within last year
e.edate IS NULL OR ((now() - e.edate) < INTERVAL '1' YEAR)) -- Only get events that happened during last year
GROUP BY u.id, u.userStatus, u.userType, u.email;
Any help would be much appreciated.
Using a different query seemed to work. Here's what I ended up with:
SELECT u.id, u.userStatus, u.userType, u.email, COUNT(DISTINCT e.id) AS numevents, COUNT(DISTINCT p.id) AS numpolls
FROM users AS u LEFT OUTER JOIN (SELECT * FROM events WHERE ((now() - edate) < INTERVAL '1' YEAR)) AS e ON u.id = e.user_id
LEFT OUTER JOIN (SELECT * FROM polls WHERE ((now() - creationDate) < INTERVAL '1' YEAR)) AS p ON u.id = p.user_id
GROUP BY u.id, u.userStatus, u.userType, u.email
;
Try to avoid using DISTINCT with sub-queries for example.