Select newest entry from a joined MySQL table - sql

I have stock quantity information in my database.
1 table, "stock", holds the productid (sku) along with the quantity and the filename from where it came.
The other table, "stockfile", contains all the processed filenames along with dates.
Now I need to get all the products with their latest stock quantity values.
This gives me ALL the products multiple times with all their stock quantity (resulting in 300.000 records)
SELECT stock.stockid, stock.sku, stock.quantity, stockfile.filename, stockfile.date
FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
ORDER BY stock.sku ASC
I already tried this:
SELECT * FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
GROUP BY sku
HAVING stockfile.date = MAX( stockfile.date )
ORDER BY stock.sku ASC
But it did not work
SHOW CREATE TABLE stock:
CREATE TABLE stock (
stockid bigint(20) NOT NULL AUTO_INCREMENT,
sku char(25) NOT NULL,
quantity int(5) NOT NULL,
creationdate datetime NOT NULL,
stockfileid smallint(5) unsigned NOT NULL,
touchdate datetime NOT NULL,
PRIMARY KEY (stockid)
) ENGINE=MyISAM AUTO_INCREMENT=315169 DEFAULT CHARSET=latin1
SHOW CREATE TABLE stockfile:
CREATE TABLE stockfile (
stockfileid smallint(5) unsigned NOT NULL AUTO_INCREMENT,
filename varchar(25) NOT NULL,
creationdate datetime DEFAULT NULL,
touchdate datetime DEFAULT NULL,
date datetime DEFAULT NULL,
begindate datetime DEFAULT NULL,
enddate datetime DEFAULT NULL,
PRIMARY KEY (stockfileid)
) ENGINE=MyISAM AUTO_INCREMENT=265 DEFAULT CHARSET=latin1

This is an example of the frequently-asked "greatest-n-per-group" question that we see every week on StackOverflow. Follow that tag to see other similar solutions.
SELECT s.*, f1.*
FROM stock s
INNER JOIN stockfile f1
ON (s.stockfileid = f1.stockfileid)
LEFT OUTER JOIN stockfile f2
ON (s.stockfileid = f2.stockfileid AND f1.date < f2.date)
WHERE f2.stockfileid IS NULL;
If there are multiple rows in stockfile that have the max date, you'll get them both in the result set. To resolve this, you'd have to add some tie-breaker conditions into the join on f2.
Thanks for adding the CREATE TABLE info. That's very helpful when you're asking SQL questions.
I see from the AUTO_INCREMENT table options that you have 315k rows in stock and only 265 rows in stockfile. Your stockfile table is the parent in the relationship, and the stock table is the child, with a column stockfileid that references the primary key of stockfile.
So your original question was misleading. You want the latest row from stock, not the latest row from stockfile.
SELECT f.*, s1.*
FROM stockfile f
INNER JOIN stock s1
ON (f.stockfileid = s1.stockfileid)
LEFT OUTER JOIN stock s2
ON (f.stockfileid = s2.stockfileid AND (s1.touchdate < s2.touchdate
OR s1.touchdate = s2.touchdate AND s1.stockid < s2.stockid))
WHERE s2.stockid IS NULL;
I'm assuming you want "latest" to be relative to touchdate, so if you want to use creationdate instead, you can do the edit.
I've added a term to the join so that it resolves ties. I know you said the dates are "practically unique" but as the saying goes, "one in a million is next Tuesday."
Okay, I think I understand what you're trying to do now. You want the most recent row per sku, but the date by which to compare them is in the referenced table stockfile.
SELECT s1.*, f1.*
FROM stock s1
JOIN stockfile f1 ON (s1.stockfileid = f1.stockfileid)
LEFT OUTER JOIN (stock s2 JOIN stockfile f2 ON (s2.stockfileid = f2.stockfileid))
ON (s1.sku = s2.sku AND (f1.date < f2.date OR f1.date = f2.date AND f1.stockfileid < f2.stockfileid))
WHERE s2.sku IS NULL;
This does a self-join of stock to itself, looking for a row with the same sku and a more recent date. When none is found, then s1 contains the most recent row for its sku. And each instance of stock has to join to its stockfile to get the date.
Re comment about optimization: It's hard for me to test because I don't have tables populated with data matching yours, but I'd guess you should have the following indexes:
CREATE INDEX stock_sku ON stock(sku);
CREATE INDEX stock_stockfileid ON stock(stockfileid);
CREATE INDEX stockfile_date ON stockfile(date);
I'd suggest using EXPLAIN to analyze the query without the indexes, and then create one index at a time and re-analyze with EXPLAIN to see which one gives the most direct benefit.

Use:
SELECT DISTINCT s.stockid,
s.sku,
s.quantity,
sf.filename,
sf.date
FROM STOCK s
JOIN STOCKFILE sf ON sf.stockfileid = s.stockfileid
JOIN (SELECT t.stockfileid,
MAX(t.date) 'max_date'
FROM STOCKFILE t
GROUP BY t.stockfileid) x ON x.stockfileid = sf.stockfileid
AND x.max_date = sf.date

select *
from stock
where stockfileid in (
select top 1 stockfileid
from stockfile
order by date desc
)

There are two common ways to accomplish this: a sub query or a self-join.
See this example of selecting the group-wise maximum at the MySQL site.
Edit, an example using a subquery:
SELECT stock.stockid, stock.sku, stock.quantity,
stockfile.filename, stockfile.date
FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
WHERE stockfile.date = (SELECT MAX(date) FROM stockfile);

Related

How to show clients with 0 reservations in certain year? (SQL)

I have these tables:
CREATE TABLE tour
(
id bigserial NOT NULL,
end_date DATE,
initial_price float8 NOT NULL,
start_date DATE,
destination_id int8,
guide_id int8,
PRIMARY KEY (id)
);
CREATE TABLE client_data
(
id bigserial NOT NULL,
name VARCHAR(255),
passport_number VARCHAR(255),
surname VARCHAR(255),
user_data_id int8,
PRIMARY KEY (id)
);
CREATE TABLE reservation
(
id bigserial not null,
actual_price float8 not null,
client_id int8,
tour_id int8,
PRIMARY KEY (id)
);
Where every reservation is connected to client_data and tour.
My goal is to show all clients that has not made any reservation in certain year eg. clients that have no reservations in 2022.
I tried something like this:
SELECT client_data.name, reservation.id, COUNT(reservation.id)
FROM client_data
LEFT OUTER JOIN reservation ON client_data.id = reservation.client_id
LEFT OUTER JOIN tour ON tour.id = reservation.tour_id
GROUP BY client_data.name, reservation.id
HAVING COUNT(reservation.id) = 0;
Or this:
SELECT client_data.name, reservation.id, COUNT(reservation.id)
FROM client_data
LEFT OUTER JOIN reservation ON client_data.id = reservation.client_id
LEFT OUTER JOIN tour ON tour.id = reservation.tour_id
WHERE reservation.id IS NULL
GROUP BY client_data.name, reservation.id;
These both ways work and show me clients that have no reservations IN GENERAL but I also need to show clients from certain year.
When I try to include
WHERE tour.start_date BETWEEN '2022-01-01' AND '2022-12-31'
the SQL statement returns 0 rows.
Any ideas how to do this?
EDIT:
I'll add full data and schema i work with.
schema: https://pastebin.com/ETvrW1tQ
data: https://pastebin.com/h1WHT0zZ
You've gotten it almost right. The reason why WHERE tour.start_date BETWEEN '2022-01-01' AND '2022-12-31' returns 0 rows is because it filters out all those clients who didn't make a reservation in that period as WHERE is applied to whole result set. So, instead of adding the date condition in the WHERE clause, I'd suggest adding it in the join condition for tour. Moreover I believe an OUTER JOIN wouldn't be required here either as you just want all the clients so, a LEFT JOIN should be sufficient. I think the following should work:
SELECT client_data.name, reservation.id, COUNT(reservation.id)
FROM client_data
LEFT JOIN reservation ON client_data.id = reservation.client_id
LEFT JOIN tour ON tour.id = reservation.tour_id and tour.start_date BETWEEN '2022-01-01' AND '2022-12-31'
WHERE reservation.id IS NULL
GROUP BY client_data.name, reservation.id;
Hope it helps
Edit
As OP mentioned the above query doesn't work as intended, I think we'll have to resort to using a subquery (or cte) here which I previously wanted to avoid due to performance reasons but maybe we're getting too ahead of ourselves on that. It's possible we can avoid it but I can't think of the correct way at the moment so here's a solution with subquery that will hopefully work.
select * from client_data where id not in (
select distinct client_id from reservation r
join tour t on r.tour_id = t.id
where t.start_date BETWEEN '2022-01-01' AND '2022-12-31'
);
In this we first find out the client_ids that did make a reservation in the said time frame and filter them out from the client data.
Have attached a fiddle in which you can play around it a bit
This will return the tour id's in 2022 that do not have a corresponding tour id in reservation:
select id as tour_id
from tour
where start_date between '2022-01-01' and '2022-12-31'
except
select tour_id
from reservation;
But since TOUR does not have a client_id, then how would you expect to get the client_id or client_name?

Select MAX price of a book with JOIN (2 tables) in SQL Server?

I have 2 tables
CREATE TABLE BOOKS
(
numbk INT PRIMARY KEY IDENTITY,
nombk NVARCHAR(60),
_numrub INT FOREIGN KEY REFERENCES CLASSIFICATION(numrub)
)
CREATE TABLE TARIFER
(
_numbk INT FOREIGN KEY REFERENCES BOOKS(numbk),
_nomed NVARCHAR(60) FOREIGN KEY REFERENCES EDITEURS(nomed),
_date DATE,
price DECIMAL(20,2),
PRIMARY KEY (_numouv, _nomed)
)
The question is: how do I list all titles of books (nombk) that have the max price?
PS: TRAFIER has the price columns, and a foreign key from BOOKS which is _numbk
I tried this:
select
o.nombk, max(prix)
from
TARIFER tr, books o
where
o.numbk = tr._numbk
group by
o.nombk
This lists all, but when I execute this:
select max(prix)
from TARIFER tr, books o
where o.numbk = tr._numbk
It returns only the max price. I don't know why. Could someone please explain?
In SQL Server, you can use TOP (1) WITH TIES:
select top (1) with ties b.nombk, t.prix
from books b join
TARIFER t
on b.numbk = t._numbk
order by t.prix desc;
Why not use just a subquery the get the max(prix) and then use that one to list all records with that prix:
select o.nombk ,prix
from TARIFER tr , books o
where o.numbk = tr._numbk
and tr.prix in (select max(prix) from TARIFER tr)
Both queries to aggregation, but not at the same level:
The first query has group by o.nombk, so it generate one record per book, and gives you the maximum price of this book accross all tarifers.
The second query has no group by clause, hence it gives you the maximum price of all books over all tarifers.
If you want the book with the higher price, there is no need to aggregate: you can join and sort the results by price:
select top (1) with ties b.*, t.*
from books b
inner join join tarifer t on b.numbk = t._numbk
order by t.prix desc;
top (1) with ties gives you the first record; if there are several records with the same, top price, the query returns them all.

How to successfully use JOIN queries?

These are my schemas:
CREATE TABLE CUSTOMER
(
customerID numeric,
name text,
email varchar(320),
cell varchar,
address varchar,
flag text NULL,
PRIMARY KEY(customerID)
);
CREATE TABLE REFERRALS
(
customerID numeric NOT NULL,
name text NOT NULL,
PRIMARY KEY(customerID, name)
);
CREATE TABLE RENTAL
(
customerID numeric NOT NULL,
model numeric NOT NULL,
borrowDate timestamp NOT NULL,
dueDate date NOT NULL,
charge money NOT NULL,
returnDate timestamp NULL,
addFees money NULL,
notes text NULL,
PRIMARY KEY(customerID, model, borrowDate)
);
CREATE TABLE SCOOTER
(
model bigserial NOT NULL,
manufacturer text NOT NULL,
country text NOT NULL,
range numeric NOT NULL,
weight numeric NOT NULL,
topspeed numeric NOT NULL,
condition text NOT NULL,
availability text NOT NULL,
PRIMARY KEY(model)
);
For the first query, I want to show the model and manufacturer columns from SCOOTER, the name column from CUSTOMER, and the dueDate column from RENTAL, but only for in rows where SCOOTER.model = RENTAL.model and where RENTAL.returnDate is NULL. And finally, in descending order by dueDate.
This is the query I wrote:
SELECT
s.model, s.manufacturer, c.name, r.duedate
FROM
SCOOTER AS s, CUSTOMER AS c
INNER JOIN
RENTAL AS r ON r.model = s.model AND r.returnDate IS NULL
ORDER BY
r.duedate DESC;
I get this error however:
HINT: There is an entry for table "s", but it cannot be referenced from this part of the query.
STATEMENT: SELECT s.model, s.manufacturer, c.name, r.duedate FROM SCOOTER AS s, CUSTOMER AS c
INNER JOIN RENTAL AS r ON r.model = s.model AND r.returnDate IS NULL ORDER BY r.duedate desc;
ERROR: invalid reference to FROM-clause entry for table "s"
LINE 2: INNER JOIN RENTAL AS r ON r.model = s.model AND r.returnDate...
^
HINT: There is an entry for table "s", but it cannot be referenced from this part of the query.
Well, I think you should study a little bit better SQL. You only connect table RENTAL and SCOOTER, but you left out the connection with CUSTOMER.
Your code should probably look more like
SELECT SCOOTER.model, SCOOTER.manufacturer, CUSTOMER.name, RENTAL.duedate
FROM SCOOTER
INNER JOIN RENTAL ON RENTAL.model = SCOOTER.model
INNER JOIN CUSTOMER ON RENTAL.customerID = CUSTOMER.customerID
WHERE RENTAL.returnDate IS NULL ORDER BY RENTAL.duedate desc;
Hope it helps!
Cheers
You're mixing join styles there, something to be avoided. Joins look like this:
SELECT * FROM
a
INNER JOIN b ON a.column = b.column
INNER JOIN c ON a.column = c.column ...
Every row from a is connected to every row from b, where the ON clause is true. Then every row from a-b is connected to C again where the ON clause is true. This causes the data to grow sideways as more data from more tables is joined on. Tables can even be joined to themselves.
It's hard (and off topic for SO) to go into depth about every aspect of JOINs so some background reading will probably be essential

What is the most efficient way of joining tables of different dimensions?

I have the following schema:
CREATE TABLE products (
id BIGSERIAL NOT NULL,
created_at_timestamp TIMESTAMP NOT NULL DEFAULT NOW(),
last_update_timestamp TIMESTAMP NOT NULL DEFAULT NOW(),
PRIMARY KEY (id)
);
CREATE TABLE product_names (
product_id BIGINT NOT NULL,
language TEXT NOT NULL,
name TEXT NOT NULL,
PRIMARY KEY (product_id, language),
FOREIGN KEY (product_id) REFERENCES products (id)
);
CREATE TABLE product_summaries (
product_id BIGINT NOT NULL,
language TEXT NOT NULL,
summary TEXT NOT NULL,
PRIMARY KEY (product_id, language),
FOREIGN KEY (product_id) REFERENCES products (id)
);
And I want to select all Products.
However as you can see a Product contains a list of names and summaries (per language).
I can retrieve all Products
SELECT * FROM products
And then iterate all the rows (in this case in Kotlin), and then request the names and summaries:
SELECT * FROM product_names WHERE product_id = $id
And
SELECT * FROM product_summaries WHERE product_id = $id
However, this seems inefficient, since I am making 3 separate queries to the database.
I though of using JOINs to get all of this with one query, but then I get multiple repeated rows for each product_names and product_summaries entry.
So in the end, is there a better way of requesting all this data in one query?
You definitely don't want to do multiple queries and then iterate over them in the code. That's horribly inefficient. When you do the second JOIN, you need to include language in the JOIN. That should keep you from getting duplicate rows. This should give you one row for each unique combination of [products.id, product_names.language]
SELECT
products.id
,products.created_at_timestamp
,products.last_update_timestamp
,product_names.name
,product_summaries.summary
,product_names.language
FROM
products
INNER JOIN
product_names ON product_names.product_id = products.id
INNER JOIN
product_summaries ON product_summaries.product_id = products.id
AND product_summaries.language = product_names.language
I've found a way of doing it:
SELECT * FROM products as p INNER JOIN
(SELECT json_agg(product_names) as names, product_id FROM product_names GROUP BY product_id) as tb_names ON tb_names.product_id = p.id
INNER JOIN
(SELECT json_agg(product_summaries) as summaries, product_id FROM product_summaries GROUP BY product_id) as tb_summaries ON tb_summaries.product_id = p.id
returns:
1 | 2018-07-20 09:36:21.56904 | 2018-07-20 09:36:21.56904 | [{"product_id":1,"language":"EN","name":"lol"},
{"product_id":1,"language":"DE","name":"lel"}] | 1 [{"product_id":1,"language":"EN","summary":"deded"},
{"product_id":1,"language":"DE","summary":"rererere"},
{"product_id":1,"language":"FR","summary":"jejejeje"}] | 1
Basically I'm converting the multi-dimensional tables to JSON :)
Postgres is amazing!

Can I SQL join a table twice?

I have two entities: Proposal and Vote.
Proposal: A user can make a proposition.
Vote: A user can vote for a proposition.
CREATE TABLE `proposal` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
);
CREATE TABLE `vote` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`idea_id` int(11) NOT NULL,
`updated` datetime NOT NULL,
PRIMARY KEY (`id`),
);
Now I want to fetch rising Propsals, which means:
Proposal title
Total number of all time votes
has received votes within the last 3 days
I am trying to fetch without a subSELECT because I am using doctrine which doesn't allow subSELECTs. So my approach is to fetch by joining the votes table twice (first for fetching the total amount of votes, second to be able to create a WHERE clause to filter last 3 days) and do a INNER JOIN:
SELECT
p.title,
COUNT(v.p_id) AS votes,
DATEDIFF(NOW(), DATE(x.updated))
FROM proposal p
JOIN vote v ON p.id = v.p_id
INNER JOIN vote x ON p.id = x.p_id
WHERE DATEDIFF(NOW(), DATE(x.updated)) < 3
GROUP BY p.id
ORDER BY votes DESC;
It's clear that this will return a wrong votes amount as it triples the votes' COUNT(). It's actually , because it creates a cartesian product just as a CROSS JOIN does.
Is there any way I can get the proper amount without using a subSELECT?
Instead, you can create a kind of COUNTIF function using this pattern:
- COUNT(CASE WHEN <condition> THEN <field> ELSE NULL END)
For example...
SELECT
p.title,
COUNT(v.p_id) AS votes,
COUNT(CASE WHEN v.updated >= DATEADD(DAY, -3, CURRENT_DATE()) THEN v.p_id ELSE NULL END) AS new_votes
FROM
proposal p
JOIN
vote v
ON p.id = v.p_id
GROUP BY
p.title
ORDER BY
COUNT(v.p_id) DESC
;