Select MAX price of a book with JOIN (2 tables) in SQL Server? - sql

I have 2 tables
CREATE TABLE BOOKS
(
numbk INT PRIMARY KEY IDENTITY,
nombk NVARCHAR(60),
_numrub INT FOREIGN KEY REFERENCES CLASSIFICATION(numrub)
)
CREATE TABLE TARIFER
(
_numbk INT FOREIGN KEY REFERENCES BOOKS(numbk),
_nomed NVARCHAR(60) FOREIGN KEY REFERENCES EDITEURS(nomed),
_date DATE,
price DECIMAL(20,2),
PRIMARY KEY (_numouv, _nomed)
)
The question is: how do I list all titles of books (nombk) that have the max price?
PS: TRAFIER has the price columns, and a foreign key from BOOKS which is _numbk
I tried this:
select
o.nombk, max(prix)
from
TARIFER tr, books o
where
o.numbk = tr._numbk
group by
o.nombk
This lists all, but when I execute this:
select max(prix)
from TARIFER tr, books o
where o.numbk = tr._numbk
It returns only the max price. I don't know why. Could someone please explain?

In SQL Server, you can use TOP (1) WITH TIES:
select top (1) with ties b.nombk, t.prix
from books b join
TARIFER t
on b.numbk = t._numbk
order by t.prix desc;

Why not use just a subquery the get the max(prix) and then use that one to list all records with that prix:
select o.nombk ,prix
from TARIFER tr , books o
where o.numbk = tr._numbk
and tr.prix in (select max(prix) from TARIFER tr)

Both queries to aggregation, but not at the same level:
The first query has group by o.nombk, so it generate one record per book, and gives you the maximum price of this book accross all tarifers.
The second query has no group by clause, hence it gives you the maximum price of all books over all tarifers.
If you want the book with the higher price, there is no need to aggregate: you can join and sort the results by price:
select top (1) with ties b.*, t.*
from books b
inner join join tarifer t on b.numbk = t._numbk
order by t.prix desc;
top (1) with ties gives you the first record; if there are several records with the same, top price, the query returns them all.

Related

Finding max count of product

I am trying to find max count of product. The result must only display the brands which have max number of products in it. Can anyone suggest a better way to display result.
Here are the table details:
create table Brand (
Id integer PRIMARY KEY,
Name VARCHAR(40) NOT NULL
)
create table Product (
ID integer PRIMARY KEY,
Name VARCHAR(40) NOT NULL,
Price integer NOT NULL,
BrandId integer NOT NULL,
CONSTRAINT BrandId FOREIGN KEY (BrandId)
REFERENCES Brand(Id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
I have used this SQL query below but I am seeing an error below. The result must display the list of the brands that have max number of products as compared to other brands.
select count(p.id) as count, b.name AS brand_name from product p join brand b on b.Id = p.BrandId
where count(p.id) = (select max(count) from product)
group by b.name
ERROR: aggregate functions are not allowed in WHERE
LINE 2: where count(p.id) = (select max(count) from product)
^
SQL state: 42803
Character: 105
In Postgres 13+, you can use FETCH WITH TIES:
select count(*) as count, b.name AS brand_name
from product p join
brand b
on b.Id = p.BrandId
group by b.name
order by count(*) desc
fetch first 1 row with ties;
In older versions you can use window functions.

Writing a query to combine results from multiple tables with all possible combinations

I have this database schema:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name char(50) NOT NULL UNIQUE
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name char(50) NOT NULL,
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
uid INTEGER REFERENCES users (id) NOT NULL,
pid INTEGER REFERENCES products (id) NOT NULL,
quantity INTEGER NOT NULL,
price FLOAT NOT NULL CHECK (price >= 0)
);
I am trying to write a query that will give me all combinations of users and products, as well as the total amount spent by the user on that product. Specifically, if I have 5 products and 5 users, there should be 25 rows in the table. Right now I have a query that almost gets the job done, however, if the user has never purchased that product then there is no row printed at all.
Here's what I've written so far:
SELECT u.name as username, p.name as productname, SUM(o.quantity * o.price) as totalPrice
FROM users u, orders o, products p
WHERE u.id = o.uid
AND p.id = o.pid
GROUP BY u.name, p.name
ORDER BY u.name, p.name
I figure that this requires some sort of join, but my SQL knowledge is limited and I am not sure what would be the best way to go about doing this. I think if somebody can help me figure this out then I will have a much better understanding.
You can do this using cross join and left join:
select u.name as username, p.name as productname,
sum(o.quantity * o.price) as totalPrice
from users u cross join
products p left join
orders o
on o.uid = u.id and o.pid = p.id
group by u.name, p.name;
The cross join generates all the rows. The left join brings in the matching rows. A simple rule when using SQL is: Never use commas in the FROM clause. Always use explicit JOIN syntax.

Matching similar entities based on many to many relationship

I have two entities in my database that are connected with a many to many relationship. I was wondering what would be the best way to list which entities have the most similarities based on it?
I tried doing a count(*) with intersect, but the query takes too long to run on every entry in my database (there are about 20k records). When running the query I wrote, CPU usage jumps to 100% and the database has locking issues.
Here is some code showing what I've tried:
My tables look something along these lines:
/* 20k records */
create table Movie(
Id INT PRIMARY KEY,
Title varchar(255)
);
/* 200-300 records */
create table Tags(
Id INT PRIMARY KEY,
Desc varchar(255)
);
/* 200,000-300,000 records */
create table TagMovies(
Movie_Id INT,
Tag_Id INT,
PRIMARY KEY (Movie_Id, Tag_Id),
FOREIGN KEY (Movie_Id) REFERENCES Movie(Id),
FOREIGN KEY (Tag_Id) REFERENCES Tags(Id),
);
(This works, but it is terribly slow)
This is the query that I wrote to try and list them:
Usually I also filter with top 1 & add a where clause to get a specific set of related data.
SELECT
bk.Id,
rh.Id
FROM
Movies bk
CROSS APPLY (
SELECT TOP 15
b.Id,
/* Tags Score */
(
SELECT COUNT(*) FROM (
SELECT x.Tag_Id FROM TagMovies x WHERE x.Movie_Id = bk.Id
INTERSECT
SELECT x.Tag_Id FROM TagMovies x WHERE x.Movie_Id = b.Id
) Q1
)
as Amount
FROM
Movies b
WHERE
b.Id <> bk.Id
ORDER BY Amount DESC
) rh
Explanation:
Movies have tags and the user can get try to find movies similar to the one that they selected based on other movies that have similar tags.
Hmm ... just an idea, but maybe I didnt understand ...
This query should return best matched movies by tags for a given movie ID:
SELECT m.id, m.title, GROUP_CONCAT(DISTINCT t.Descr SEPARATOR ', ') as tags, count(*) as matches
FROM stack.Movie m
LEFT JOIN stack.TagMovies tm ON m.Id = tm.Movie_Id
LEFT JOIN stack.Tags t ON tm.Tag_Id = t.Id
WHERE m.id != 1
AND tm.Tag_Id IN (SELECT Tag_Id FROM stack.TagMovies tm WHERE tm.Movie_Id = 1)
GROUP BY m.id
ORDER BY matches DESC
LIMIT 15;
EDIT:
I just realized that it's for M$ SQL ... but maybe something similar can be done...
You should probably decide on a naming convention and stick with it. Are tables singular or plural nouns? I don't want to get into that debate, but pick one or the other.
Without access to your database I don't know how this will perform. It's just off the top of my head. You could also limit this by the M.id value to find the best matches for a single movie, which I think would improve performance by quite a bit.
Also, TOP x should let you get the x closest matches.
SELECT
M.id,
M.title,
SM.id AS similar_movie_id,
SM.title AS similar_movie_title,
COUNT(*) AS matched_tags
FROM
Movie M
INNER JOIN TagsMovie TM1 ON TM1.movie_id = M.movie_id
INNER JOIN TagsMovie TM2 ON
TM2.tag_id = TM1.tag_id AND
TM2.movie_id <> TM1.movie_id
INNER JOIN Movie SM ON SM.movie_id = TM2.movie_id
GROUP BY
M.id,
M.title,
SM.id AS similar_movie_id,
SM.title AS similar_movie_title
ORDER BY
COUNT(*) DESC

SQL: How to speed up a query using indexing

I am trying to speed up a query to find all CUSTOMERs who have bought a MOTORCYCLE manufactured before 1970 AND bought another MOTORCYCLE manufactured after 2010. Since my query is running very slowly, I think that I need help with finding the better indexes. My attempts are documented below:
Tables
CREATE TABLE CUSTOMER (
id int PRIMARY KEY,
fname varchar(30),
lname varchar(30)
);
CREATE TABLE MOTORCYCLE (
id int PRIMARY KEY,
name varchar(30),
year int -- Manufactured year
);
CREATE TABLE SALES (
cid int,
mid int,
FOREIGN KEY(cid) REFERENCES CUSTOMER(id),
FOREIGN KEY(mid) REFERENCES MOTOCYCLE(id),
PRIMARY KEY(pid, mid, role)
);
Indexes
Here are my indexes (I am somewhat guessing with these, but this was my attempt):
CREATE UNIQUE INDEX customerID on CUSTOMER(id);
CREATE INDEX customerName on CUSTOMER(fname, lname);
CREATE UNIQUE INDEX motorcycleID on MOTORCYCLE(id);
CREATE INDEX motorcycleName on MOTORCYCLE(name);
CREATE INDEX motorcycleYear on MOTORCYCLE(year);
CREATE INDEX salesCustomerMotorcycleID on SALES(cid, mid);
CREATE INDEX salesCustomerID on SALES(cid);
CREATE INDEX castsMotorcycleID on SALES(mid);
Queries
My query to find the customers purchasing bikes manufactured before 1970 and after 2010 is here:
SELECT fname, lname
FROM (SALES INNER JOIN CUSTOMER ON SALES.cid=CUSTOMER.id) INNER JOIN MOTORCYCLE ON MOTORCYCLE.id=SALES.mid
GROUP BY CUSTOMER.id
HAVING MIN(MOTORCYCLE.year) < 1970 AND MAX(MOTORCYCLE.year) > 2010;
And here is another working query which avoids the GROUP BY and HAVING clauses:
SELECT DISTINCT C.id, fname, lname
FROM (CUSTOMER as C inner join (SALES as S1 INNER JOIN MOTORCYCLE as M1 ON M1.id=S1.mid) on C.id=S1.cid) inner join (SALES as S2 inner join MOTORCYCLE as M2 on S2.mid=M2.id) on C.id=S2.cid
WHERE (M1.year < 1970 AND M2.year > 2010);
Any suggestions on the kinds of indexes I can use to speed up my query? Or should I change my query?
UPDATE
I found another query that also works, but it is also too slow. It has been added above. Still, it might be helpful when finding an index to speed it up.
When you check out your queries with EXPLAIN QUERY PLAN, you see that in both cases, the database looks up many related records before it filters out unneeded records (with unwanted years).
The following queries look up the motorcycle IDs before matching; which one is faster depends on the details of your data and must be measured by you:
SELECT *
FROM Customer
WHERE EXISTS (SELECT 1
FROM Sales
WHERE cid = Customer.id
AND mid IN (SELECT id
FROM Motorcycle
WHERE year < 1970))
AND EXISTS (SELECT 1
FROM Sales
WHERE cid = Customer.id
AND mid IN (SELECT id
FROM Motorcycle
WHERE year > 2010));
SELECT *
FROM Customer
WHERE EXISTS (SELECT 1
FROM Sales AS s1
JOIN Sales AS s2 ON s1.cid = s2.cid
WHERE s1.cid = Customer.id
AND s1.mid IN (SELECT id
FROM Motorcycle
WHERE year < 1970)
AND s2.mid IN (SELECT id
FROM Motorcycle
WHERE year > 2010));
SQL Fiddle
Why using group by when there's no using of aggregation function in the query?
Use distinct instead if you don't want to see any duplication

Select newest entry from a joined MySQL table

I have stock quantity information in my database.
1 table, "stock", holds the productid (sku) along with the quantity and the filename from where it came.
The other table, "stockfile", contains all the processed filenames along with dates.
Now I need to get all the products with their latest stock quantity values.
This gives me ALL the products multiple times with all their stock quantity (resulting in 300.000 records)
SELECT stock.stockid, stock.sku, stock.quantity, stockfile.filename, stockfile.date
FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
ORDER BY stock.sku ASC
I already tried this:
SELECT * FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
GROUP BY sku
HAVING stockfile.date = MAX( stockfile.date )
ORDER BY stock.sku ASC
But it did not work
SHOW CREATE TABLE stock:
CREATE TABLE stock (
stockid bigint(20) NOT NULL AUTO_INCREMENT,
sku char(25) NOT NULL,
quantity int(5) NOT NULL,
creationdate datetime NOT NULL,
stockfileid smallint(5) unsigned NOT NULL,
touchdate datetime NOT NULL,
PRIMARY KEY (stockid)
) ENGINE=MyISAM AUTO_INCREMENT=315169 DEFAULT CHARSET=latin1
SHOW CREATE TABLE stockfile:
CREATE TABLE stockfile (
stockfileid smallint(5) unsigned NOT NULL AUTO_INCREMENT,
filename varchar(25) NOT NULL,
creationdate datetime DEFAULT NULL,
touchdate datetime DEFAULT NULL,
date datetime DEFAULT NULL,
begindate datetime DEFAULT NULL,
enddate datetime DEFAULT NULL,
PRIMARY KEY (stockfileid)
) ENGINE=MyISAM AUTO_INCREMENT=265 DEFAULT CHARSET=latin1
This is an example of the frequently-asked "greatest-n-per-group" question that we see every week on StackOverflow. Follow that tag to see other similar solutions.
SELECT s.*, f1.*
FROM stock s
INNER JOIN stockfile f1
ON (s.stockfileid = f1.stockfileid)
LEFT OUTER JOIN stockfile f2
ON (s.stockfileid = f2.stockfileid AND f1.date < f2.date)
WHERE f2.stockfileid IS NULL;
If there are multiple rows in stockfile that have the max date, you'll get them both in the result set. To resolve this, you'd have to add some tie-breaker conditions into the join on f2.
Thanks for adding the CREATE TABLE info. That's very helpful when you're asking SQL questions.
I see from the AUTO_INCREMENT table options that you have 315k rows in stock and only 265 rows in stockfile. Your stockfile table is the parent in the relationship, and the stock table is the child, with a column stockfileid that references the primary key of stockfile.
So your original question was misleading. You want the latest row from stock, not the latest row from stockfile.
SELECT f.*, s1.*
FROM stockfile f
INNER JOIN stock s1
ON (f.stockfileid = s1.stockfileid)
LEFT OUTER JOIN stock s2
ON (f.stockfileid = s2.stockfileid AND (s1.touchdate < s2.touchdate
OR s1.touchdate = s2.touchdate AND s1.stockid < s2.stockid))
WHERE s2.stockid IS NULL;
I'm assuming you want "latest" to be relative to touchdate, so if you want to use creationdate instead, you can do the edit.
I've added a term to the join so that it resolves ties. I know you said the dates are "practically unique" but as the saying goes, "one in a million is next Tuesday."
Okay, I think I understand what you're trying to do now. You want the most recent row per sku, but the date by which to compare them is in the referenced table stockfile.
SELECT s1.*, f1.*
FROM stock s1
JOIN stockfile f1 ON (s1.stockfileid = f1.stockfileid)
LEFT OUTER JOIN (stock s2 JOIN stockfile f2 ON (s2.stockfileid = f2.stockfileid))
ON (s1.sku = s2.sku AND (f1.date < f2.date OR f1.date = f2.date AND f1.stockfileid < f2.stockfileid))
WHERE s2.sku IS NULL;
This does a self-join of stock to itself, looking for a row with the same sku and a more recent date. When none is found, then s1 contains the most recent row for its sku. And each instance of stock has to join to its stockfile to get the date.
Re comment about optimization: It's hard for me to test because I don't have tables populated with data matching yours, but I'd guess you should have the following indexes:
CREATE INDEX stock_sku ON stock(sku);
CREATE INDEX stock_stockfileid ON stock(stockfileid);
CREATE INDEX stockfile_date ON stockfile(date);
I'd suggest using EXPLAIN to analyze the query without the indexes, and then create one index at a time and re-analyze with EXPLAIN to see which one gives the most direct benefit.
Use:
SELECT DISTINCT s.stockid,
s.sku,
s.quantity,
sf.filename,
sf.date
FROM STOCK s
JOIN STOCKFILE sf ON sf.stockfileid = s.stockfileid
JOIN (SELECT t.stockfileid,
MAX(t.date) 'max_date'
FROM STOCKFILE t
GROUP BY t.stockfileid) x ON x.stockfileid = sf.stockfileid
AND x.max_date = sf.date
select *
from stock
where stockfileid in (
select top 1 stockfileid
from stockfile
order by date desc
)
There are two common ways to accomplish this: a sub query or a self-join.
See this example of selecting the group-wise maximum at the MySQL site.
Edit, an example using a subquery:
SELECT stock.stockid, stock.sku, stock.quantity,
stockfile.filename, stockfile.date
FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
WHERE stockfile.date = (SELECT MAX(date) FROM stockfile);