I have these tables:
CREATE TABLE tour
(
id bigserial NOT NULL,
end_date DATE,
initial_price float8 NOT NULL,
start_date DATE,
destination_id int8,
guide_id int8,
PRIMARY KEY (id)
);
CREATE TABLE client_data
(
id bigserial NOT NULL,
name VARCHAR(255),
passport_number VARCHAR(255),
surname VARCHAR(255),
user_data_id int8,
PRIMARY KEY (id)
);
CREATE TABLE reservation
(
id bigserial not null,
actual_price float8 not null,
client_id int8,
tour_id int8,
PRIMARY KEY (id)
);
Where every reservation is connected to client_data and tour.
My goal is to show all clients that has not made any reservation in certain year eg. clients that have no reservations in 2022.
I tried something like this:
SELECT client_data.name, reservation.id, COUNT(reservation.id)
FROM client_data
LEFT OUTER JOIN reservation ON client_data.id = reservation.client_id
LEFT OUTER JOIN tour ON tour.id = reservation.tour_id
GROUP BY client_data.name, reservation.id
HAVING COUNT(reservation.id) = 0;
Or this:
SELECT client_data.name, reservation.id, COUNT(reservation.id)
FROM client_data
LEFT OUTER JOIN reservation ON client_data.id = reservation.client_id
LEFT OUTER JOIN tour ON tour.id = reservation.tour_id
WHERE reservation.id IS NULL
GROUP BY client_data.name, reservation.id;
These both ways work and show me clients that have no reservations IN GENERAL but I also need to show clients from certain year.
When I try to include
WHERE tour.start_date BETWEEN '2022-01-01' AND '2022-12-31'
the SQL statement returns 0 rows.
Any ideas how to do this?
EDIT:
I'll add full data and schema i work with.
schema: https://pastebin.com/ETvrW1tQ
data: https://pastebin.com/h1WHT0zZ
You've gotten it almost right. The reason why WHERE tour.start_date BETWEEN '2022-01-01' AND '2022-12-31' returns 0 rows is because it filters out all those clients who didn't make a reservation in that period as WHERE is applied to whole result set. So, instead of adding the date condition in the WHERE clause, I'd suggest adding it in the join condition for tour. Moreover I believe an OUTER JOIN wouldn't be required here either as you just want all the clients so, a LEFT JOIN should be sufficient. I think the following should work:
SELECT client_data.name, reservation.id, COUNT(reservation.id)
FROM client_data
LEFT JOIN reservation ON client_data.id = reservation.client_id
LEFT JOIN tour ON tour.id = reservation.tour_id and tour.start_date BETWEEN '2022-01-01' AND '2022-12-31'
WHERE reservation.id IS NULL
GROUP BY client_data.name, reservation.id;
Hope it helps
Edit
As OP mentioned the above query doesn't work as intended, I think we'll have to resort to using a subquery (or cte) here which I previously wanted to avoid due to performance reasons but maybe we're getting too ahead of ourselves on that. It's possible we can avoid it but I can't think of the correct way at the moment so here's a solution with subquery that will hopefully work.
select * from client_data where id not in (
select distinct client_id from reservation r
join tour t on r.tour_id = t.id
where t.start_date BETWEEN '2022-01-01' AND '2022-12-31'
);
In this we first find out the client_ids that did make a reservation in the said time frame and filter them out from the client data.
Have attached a fiddle in which you can play around it a bit
This will return the tour id's in 2022 that do not have a corresponding tour id in reservation:
select id as tour_id
from tour
where start_date between '2022-01-01' and '2022-12-31'
except
select tour_id
from reservation;
But since TOUR does not have a client_id, then how would you expect to get the client_id or client_name?
Related
I have 3 tables defined like so
CREATE TABLE participants(
id SERIAL PRIMARY KEY,
Name TEXT NOT NULL,
Title TEXT NOT NULL
);
CREATE TABLE meetings (
id SERIAL PRIMARY KEY,
Subject TEXT NOT NULL,
Organizer TEXT NOT NULL,
StartTime TIMESTAMP NOT NULL,
EndTime TIMESTAMP NOT NULL
);
CREATE TABLE meetings_participants(
meeting_id int not null,
participant_id int not null,
primary key (meeting_id, participant_id),
foreign key(meeting_id) references meetings(id),
foreign key(participant_id) references participants(id)
);
I want to find meetings happening today with participants in them.
When I run this query I basically get them
SELECT * from meetings
INNER JOIN meetings_participants ON meetings.id = meetings_participants.meeting_id
INNER JOIN participants ON meetings_participants.participant_id = participants.id
WHERE starttime::date = NOW()::date;
Problem is this query discards meetings where there are no participants yet, I still wish to include them into my query result. How can I modify my query to work like that ?
You need a LEFT JOIN instead of INNER. Using ::date casting you are implying that you are only interested them to be taking place today, whether or not it might already ended. Still you should include EndTime in your query, taking into consideration that there might be meetings that span over several days:
SELECT * from meetings
left join meetings_participants on meetings.id = meetings_participants.meeting_id
left join participants on meetings_participants.participant_id = participants.id
WHERE starttime::date <= NOW()::date and endtime::date >= NOW()::date ;
DBFiddle demo here.
EDIT: Participants' name and title as JSON array:
SELECT id, subject, organizer, starttime, endtime, jsonb_pretty(tmp.participants)
from meetings m
left join lateral (
select jsonb_agg(row_to_json(tp)) as participants
from (select p.name, p.title
from meetings_participants mp
inner join participants p on mp.participant_id = p.id
where mp.meeting_id = m.id
) tp
) tmp on true
WHERE starttime::date <= NOW()::date
and endtime::date >= NOW()::date;
DBFiddle demo for participants added as JSON
You did not mention whether you want each participant on a separate row or as an aggregate (e.g. a comma separated list). If former then change inner to left join. For the latter case you could:
SELECT meetings.*, (
SELECT string_agg(participants.name, ', ')
FROM meetings_participants
JOIN participants ON meetings_participants.participant_id = participants.id
WHERE meetings_participants.meeting_id = meetings.id
) AS participants_list
FROM meetings
WHERE starttime::date = current_date
I have a table which holds details of all Students currently enrolled in classes which looks like this:
CREATE TABLE studentInClass(
studentID int,
classID int,
FOREIGN KEY(studentID) references students(studentID),
foreign key(classID) references class(classID)
);
And another table which contains details of students who have paid for classes:
CREATE TABLE fees(
feesID INTEGER PRIMARY KEY AUTOINCREMENT,
StudentID INTEGER,
AmountPaid INT,
Date DATE,
FOREIGN KEY(StudentID) REFERENCES students(StudentID));
What I want to do is check whether a student who is in a class has not paid for that class. I am struggling to write a SQL query which does so. I have tried multiple queries such as:
Select studentInClass.StudentID
from fees, studentInClass
where fees.StudentID = studentInClass.StudentID;
But this returns no data. I'm not sure how to proceed from here. Any help will be appreciated.
You want outer join :
select s.StudentID, (case when f.AmountPaid is not null
then 'Yes'
else 'No'
end) as Is_fees_paid
from studentInClass s left join
fees f
on f.StudentID = s.StudentID;
With NOT EXISTS:
select s.*
from studentInClass s
where not exists (
select 1 from fees
where studentid = s.studentid
)
with this you get all the rows from the table studentInClass for which there is not the studentid in the table fees.
It's not clear if you also need to check the date.
check it please:
select studentInClass.StudentID
from studentInClass inner join fees ON fees.StudentID = studentInClass.StudentID
I have a table, meeting. Among these meetings there exists a hierarchy. Some of them are yearly meetings, most are just regular meetings.
All regular meetings will be associated with at least one join table, meeting_yearly_meeting. A meeting_yearly_meeting has two columns: meeting_id and yearly_meeting_id.
Here is what these two tables look like:
meeting:
id SERIAL PRIMARY KEY,
title VARCHAR(255),
mappable BOOLEAN,
phone VARCHAR(255),
email VARCHAR(255),
city VARCHAR(255),
address VARCHAR(255),
zip VARCHAR(255),
latitude NUMERIC,
longitude NUMERIC,
description VARCHAR(255),
worship_time TIME,
state VARCHAR(255),
website VARCHAR(255),
lgbt_affirming BOOLEAN,
created TIMESTAMP default current_timestamp,
updated TIMESTAMP default current_timestamp
meeting_yearly_meeting:
id SERIAL PRIMARY KEY,
meeting_id SMALLINT,
yearly_meeting_id SMALLINT,
created TIMESTAMP default current_timestamp,
updated TIMESTAMP default current_timestamp
So from my /meetings endpoint, I want to return a collection of all meetings - both regular and yearly meetings. I want to return the meetings with all their columns, as well as an additional column: yearly_meeting.
For meeting records that have one or more associated meeting_yearly_meeting records, yearly_meeting would be a comma-delimited list of the title of the meeting record that is designated as that meeting's yearly meeting. For those meetings that do not have any associated meeting_yearly_meeting records (and therefore are themselves yearly meetings), I want the yearly_meeting field to beNULL`.
On my way to pursuing this goal, I tried something like this:
SELECT t1.*, t2.meeting_yearly_meeting AS yearly_meeting
FROM (
SELECT * FROM meeting
FULL JOIN meeting_yearly_meeting ON meeting.id = meeting_yearly_meeting.yearly_meeting_id;
) as t1,
(
SELECT CASE WHEN (meeting_yearly_meeting.id IS NOT NULL)
THEN (SELECT title FROM meeting WHERE meeting.id = meeting_yearly_meeting.yearly_meeting_id)
ELSE NULL
END
FROM (
SELECT meeting_yearly_meeting.* FROM meeting
FULL JOIN meeting_yearly_meeting ON meeting.id = meeting_yearly_meeting.meeting_id
) as meeting_yearly_meeting;
) as t2;
But this throws a syntax error.
I appreciate any insight others might have. Please let me know if there is any additional context or clarification you need!
UPDATE:
Sample meeting data: https://gist.github.com/micahbales/4013399c3fd23a0caf108124dab827c8
Sample meeting_yearly_meeting data: https://gist.github.com/micahbales/fcbdeef282bd7bf1014606cee43bfb5e
Expected return value example: https://gist.github.com/micahbales/13d2aafdc5d43c4b948dc39c2df51569
You can try to left join the yearly meetings and then use string_agg() to get your comma delimited list.
SELECT m1.id,
m1.title,
m1.mappable,
m1.phone,
m1.email,
m1.city,
m1.address,
m1.zip,
m1.latitude,
m1.longitude,
m1.description,
m1.worship_time,
m1.state,
m1.website,
m1.lgbt_affirming,
m1.created,
m1.updated,
string_agg(m2.title, ', ') yearly_meeting
FROM meeting m1
LEFT JOIN meeting_yearly_meeting mym1
ON mym1.meeting_id = m1.id
LEFT JOIN meeting m2
ON m2.id = mym1.yearly_meeting_id
GROUP BY m1.id,
m1.title,
m1.mappable,
m1.phone,
m1.email,
m1.city,
m1.address,
m1.zip,
m1.latitude,
m1.longitude,
m1.description,
m1.worship_time,
m1.state,
m1.website,
m1.lgbt_affirming,
m1.created,
m1.updated;
Edit:
A more "compact" solution could be using a correlated subquery.
SELECT m1.*,
(SELECT string_agg(m2.title, ', ')
FROM meeting_yearly_meeting mym1
LEFT JOIN meeting m2
ON m2.id = mym1.yearly_meeting_id
WHERE mym1.meeting_id = m1.id) yearly_meeting
FROM meeting m1;
But note, though it's less code, it's not necessarily faster.
I have two entities: Proposal and Vote.
Proposal: A user can make a proposition.
Vote: A user can vote for a proposition.
CREATE TABLE `proposal` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
);
CREATE TABLE `vote` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`idea_id` int(11) NOT NULL,
`updated` datetime NOT NULL,
PRIMARY KEY (`id`),
);
Now I want to fetch rising Propsals, which means:
Proposal title
Total number of all time votes
has received votes within the last 3 days
I am trying to fetch without a subSELECT because I am using doctrine which doesn't allow subSELECTs. So my approach is to fetch by joining the votes table twice (first for fetching the total amount of votes, second to be able to create a WHERE clause to filter last 3 days) and do a INNER JOIN:
SELECT
p.title,
COUNT(v.p_id) AS votes,
DATEDIFF(NOW(), DATE(x.updated))
FROM proposal p
JOIN vote v ON p.id = v.p_id
INNER JOIN vote x ON p.id = x.p_id
WHERE DATEDIFF(NOW(), DATE(x.updated)) < 3
GROUP BY p.id
ORDER BY votes DESC;
It's clear that this will return a wrong votes amount as it triples the votes' COUNT(). It's actually , because it creates a cartesian product just as a CROSS JOIN does.
Is there any way I can get the proper amount without using a subSELECT?
Instead, you can create a kind of COUNTIF function using this pattern:
- COUNT(CASE WHEN <condition> THEN <field> ELSE NULL END)
For example...
SELECT
p.title,
COUNT(v.p_id) AS votes,
COUNT(CASE WHEN v.updated >= DATEADD(DAY, -3, CURRENT_DATE()) THEN v.p_id ELSE NULL END) AS new_votes
FROM
proposal p
JOIN
vote v
ON p.id = v.p_id
GROUP BY
p.title
ORDER BY
COUNT(v.p_id) DESC
;
I have stock quantity information in my database.
1 table, "stock", holds the productid (sku) along with the quantity and the filename from where it came.
The other table, "stockfile", contains all the processed filenames along with dates.
Now I need to get all the products with their latest stock quantity values.
This gives me ALL the products multiple times with all their stock quantity (resulting in 300.000 records)
SELECT stock.stockid, stock.sku, stock.quantity, stockfile.filename, stockfile.date
FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
ORDER BY stock.sku ASC
I already tried this:
SELECT * FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
GROUP BY sku
HAVING stockfile.date = MAX( stockfile.date )
ORDER BY stock.sku ASC
But it did not work
SHOW CREATE TABLE stock:
CREATE TABLE stock (
stockid bigint(20) NOT NULL AUTO_INCREMENT,
sku char(25) NOT NULL,
quantity int(5) NOT NULL,
creationdate datetime NOT NULL,
stockfileid smallint(5) unsigned NOT NULL,
touchdate datetime NOT NULL,
PRIMARY KEY (stockid)
) ENGINE=MyISAM AUTO_INCREMENT=315169 DEFAULT CHARSET=latin1
SHOW CREATE TABLE stockfile:
CREATE TABLE stockfile (
stockfileid smallint(5) unsigned NOT NULL AUTO_INCREMENT,
filename varchar(25) NOT NULL,
creationdate datetime DEFAULT NULL,
touchdate datetime DEFAULT NULL,
date datetime DEFAULT NULL,
begindate datetime DEFAULT NULL,
enddate datetime DEFAULT NULL,
PRIMARY KEY (stockfileid)
) ENGINE=MyISAM AUTO_INCREMENT=265 DEFAULT CHARSET=latin1
This is an example of the frequently-asked "greatest-n-per-group" question that we see every week on StackOverflow. Follow that tag to see other similar solutions.
SELECT s.*, f1.*
FROM stock s
INNER JOIN stockfile f1
ON (s.stockfileid = f1.stockfileid)
LEFT OUTER JOIN stockfile f2
ON (s.stockfileid = f2.stockfileid AND f1.date < f2.date)
WHERE f2.stockfileid IS NULL;
If there are multiple rows in stockfile that have the max date, you'll get them both in the result set. To resolve this, you'd have to add some tie-breaker conditions into the join on f2.
Thanks for adding the CREATE TABLE info. That's very helpful when you're asking SQL questions.
I see from the AUTO_INCREMENT table options that you have 315k rows in stock and only 265 rows in stockfile. Your stockfile table is the parent in the relationship, and the stock table is the child, with a column stockfileid that references the primary key of stockfile.
So your original question was misleading. You want the latest row from stock, not the latest row from stockfile.
SELECT f.*, s1.*
FROM stockfile f
INNER JOIN stock s1
ON (f.stockfileid = s1.stockfileid)
LEFT OUTER JOIN stock s2
ON (f.stockfileid = s2.stockfileid AND (s1.touchdate < s2.touchdate
OR s1.touchdate = s2.touchdate AND s1.stockid < s2.stockid))
WHERE s2.stockid IS NULL;
I'm assuming you want "latest" to be relative to touchdate, so if you want to use creationdate instead, you can do the edit.
I've added a term to the join so that it resolves ties. I know you said the dates are "practically unique" but as the saying goes, "one in a million is next Tuesday."
Okay, I think I understand what you're trying to do now. You want the most recent row per sku, but the date by which to compare them is in the referenced table stockfile.
SELECT s1.*, f1.*
FROM stock s1
JOIN stockfile f1 ON (s1.stockfileid = f1.stockfileid)
LEFT OUTER JOIN (stock s2 JOIN stockfile f2 ON (s2.stockfileid = f2.stockfileid))
ON (s1.sku = s2.sku AND (f1.date < f2.date OR f1.date = f2.date AND f1.stockfileid < f2.stockfileid))
WHERE s2.sku IS NULL;
This does a self-join of stock to itself, looking for a row with the same sku and a more recent date. When none is found, then s1 contains the most recent row for its sku. And each instance of stock has to join to its stockfile to get the date.
Re comment about optimization: It's hard for me to test because I don't have tables populated with data matching yours, but I'd guess you should have the following indexes:
CREATE INDEX stock_sku ON stock(sku);
CREATE INDEX stock_stockfileid ON stock(stockfileid);
CREATE INDEX stockfile_date ON stockfile(date);
I'd suggest using EXPLAIN to analyze the query without the indexes, and then create one index at a time and re-analyze with EXPLAIN to see which one gives the most direct benefit.
Use:
SELECT DISTINCT s.stockid,
s.sku,
s.quantity,
sf.filename,
sf.date
FROM STOCK s
JOIN STOCKFILE sf ON sf.stockfileid = s.stockfileid
JOIN (SELECT t.stockfileid,
MAX(t.date) 'max_date'
FROM STOCKFILE t
GROUP BY t.stockfileid) x ON x.stockfileid = sf.stockfileid
AND x.max_date = sf.date
select *
from stock
where stockfileid in (
select top 1 stockfileid
from stockfile
order by date desc
)
There are two common ways to accomplish this: a sub query or a self-join.
See this example of selecting the group-wise maximum at the MySQL site.
Edit, an example using a subquery:
SELECT stock.stockid, stock.sku, stock.quantity,
stockfile.filename, stockfile.date
FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
WHERE stockfile.date = (SELECT MAX(date) FROM stockfile);