Flattening nested query in WHERE clause with NOT IN - sql

Suppose I have these two tables, simplified for the purpose of the question:
CREATE TABLE merchandises
(
id BIGSERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
price INT NOT NULL
)
CREATE TABLE gifts
(
id BIGSERIAL NOT NULL PRIMARY KEY,
from_user VARCHAR(255) REFERENCES users(id),
to_user VARCHAR(255) REFERENCES users(id),
with_merchandise BIGINT REFERENCES merchandises(id)
)
The merchandises table lists available merchandises. The gifts table show records that a user has sent a merchandise to another user as gift (proper index is in place to avoid duplication).
What I would like to query is a list of merchandises that a user can send to another user, provided that the merchandises should not have been gifted before.
This is a query that works, but I hope that I can find one that does not have a nested query, thinking that it might give better performance thanks to the optimizer of POSTGRESQL.
SELECT DISTINCT ON (m.id) m.id, m.name, m.description
FROM merchandises m
WHERE m.id NOT IN (
SELECT g.with_merchandise
FROM gifts g
WHERE g.from_user = 'some_user_id' AND g.to_user = 'some_other_user_id'
)
ORDER BY m.id ASC
LIMIT 20 OFFSET 0
In the previous attempt, I had this query, but I found out that it does not work:
SELECT DISTINCT ON (m.id) m.id, m.name, m.description
FROM merchandises m
LEFT JOIN gifts g
ON m.id = g.with_merchandise
WHERE g.id IS NULL
OR g.from_user <> 'some_user_id' AND g.to_user <> 'some_other_user_id'
ORDER BY m.id ASC
LIMIT 20 OFFSET 0
This query does not work because even though the WHERE clause filters out gift entries from two specific users, two other users might have given gifts with the same merchandise (same merchandise_id).

Even though you asked to remove the subquery, using a not exists subquery might run faster than not in especially if the not in query returns a lot of values:
SELECT m.id, m.name, m.description
FROM merchandises m
WHERE NOT EXISTS (
SELECT 1
FROM gifts g
WHERE g.with_merchandise = m.id
AND g.from_user = 'some_user_id'
AND g.to_user = 'some_other_user_id'
)
This query can take advantage of a composite key on gifts(with_merchandise,from_user,to_user)
If you still rather use left join, then move your conditions for from_user and to_user from the where to the on clause
SELECT m.id, m.name, m.description
FROM merchandises m
LEFT JOIN gifts g ON m.id = g.with_merchandise
AND g.from_user = 'some_user_id' AND g.to_user = 'some_other_user_id'
WHERE g.id IS NULL
ORDER BY m.id ASC
LIMIT 20 OFFSET 0

This uses a left outer join and should perform well.
SELECT m.*
FROM merchandises m
LEFT OUTER JOIN (SELECT with_merchandise FROM gifts WHERE from_user = 'some_user_id' AND to_user = 'some_other_user_id' GROUP BY with_merchandise) g ON m.id = g.with_merchandise
WHERE g.with_merchandise IS NULL
ORDER BY m.id ASC
LIMIT 20 OFFSET 0

Related

SQL - Selecting highest scores for different categories

Lets say i've got a db with 3 tables:
Players (PK id_player, name...),
Tournaments (PK id_tournament, name...),
Game (PK id_turn, FK id_tournament, FK id_player and score)
Players participate in tournaments. Table called Game keeps track of each player's score for different tournaments)
I want to create a view that looks like this:
torunament_name Winner highest_score
Tournament_1 Jones 300
Tournament_2 White 250
I tried different aproaches but I'm fairly new to sql (and alsoto this forum)
I tried using union all clause like:
select * from (select "Id_player", avg("score") as "Score" from
"Game" where "Id_tournament" = '1' group by "Id_player" order by
"Score" desc) where rownum <= 1
union all
select * from (select "Id_player", avg("score") as "Score" from
"Game" where "Id_tournament" = '2' group by "Id_player" order by
"Score" desc) where rownum <= 1;
and ofc it works but whenever a tournament happens, i would have to manually add a select statement to this with Id_torunament = nextvalue
EDIT:
So lets say that player with id 1 scored 50 points in tournament a, player 2 scored 40 points, player 1 wins, so the table should show only player 1 as the winner (or if its possible 2or more players if its a tie) of this tournament. Next row shows the winner of second tournament. I dont think Im going to put multiple games for one player in the same tournament, but if i would, it would probably count avg from all his scores.
EDIT2:
Create table scripts:
create table players
(id_player numeric(5) constraint pk_id_player primary key, name
varchar2(50));
create table tournaments
(id_tournament numeric(5) constraint pk_id_tournament primary key,
name varchar2(50));
create table game
(id_game numeric(5) constraint pk_game primary key, id_player
numeric(5) constraint fk_id_player references players(id_player),
id_tournament numeric(5) constraint fk_id_tournament references
tournaments(id_tournament), score numeric(3));
RDBM screenshot
FINAL EDIT:
Ok, in case anyone is wondering I used Jorge Campos script, changed it a bit and it works. Thank you all for helping. Unfortunately I cannot upvote comments yet, so I can only thank by posting. Heres the final script:
select
t.name,
p.name as winner,
g.score
from
game g inner join tournaments t
on g.id_tournament = t.id_tournament
inner join players p
on g.id_player = p.id_player
inner join
(select g.id_tournament, g.id_player,
row_number() over (partition by t.name order by
score desc) as rd from game g join tournaments t on
g.id_tournament = t.id_tournament
) a
on g.id_player = a.id_player
and g.id_tournament = a.id_tournament
and a.rd=1
order by t.name, g.score desc;
This query could be simplified depending on the RDBMs you are using.
select
t.name,
p.name as winner,
g.score
from
game g inner join tournaments t
on g.id_tournament = t.id_tournament
inner join players p
on g.id_player = p.id_player
inner join
(select id_tournament,
id_player,
row_number() over (partition by t.name order by score desc) as rd
from game
) a
on g.id_player = a.id_player
and g.id_tournament = a.id_tournament
and a.rd=1
order by t.name, g.score desc
Assuming what you want as "Display high score of each player in each tournament"
your query would be like below in MS Sql server
select
t.name as tournament_name,
p.name as Winner,
Max(g.score) as [Highest_Score]
from Tournmanents t
Inner join Game g on t.id_tournament=g.id_tournament
inner join Players p on p.id_player=g.id_player
group by
g.id_tournament,
g.id_player,
t.name,
p.name
Please check this if this works for you
SELECT tournemntData.id_tournament ,
tournemntData.name ,
dbo.Players.name ,
tournemntData.Score
FROM dbo.Game
INNER JOIN ( SELECT dbo.Tournaments.id_tournament ,
dbo.Tournaments.name ,
MAX(dbo.Game.score) AS Score
FROM dbo.Game
INNER JOIN dbo.Tournaments ONTournaments.id_tournament = Game.id_tournament
INNER JOIN dbo.Players ON Players.id_player = Game.id_player
GROUP BY dbo.Tournaments.id_tournament ,
dbo.Tournaments.name
) tournemntData ON tournemntData.id_tournament =Game.id_tournament
INNER JOIN dbo.Players ON Players.id_player = Game.id_player
WHERE tournemntData.Score = dbo.Game.score

Matching similar entities based on many to many relationship

I have two entities in my database that are connected with a many to many relationship. I was wondering what would be the best way to list which entities have the most similarities based on it?
I tried doing a count(*) with intersect, but the query takes too long to run on every entry in my database (there are about 20k records). When running the query I wrote, CPU usage jumps to 100% and the database has locking issues.
Here is some code showing what I've tried:
My tables look something along these lines:
/* 20k records */
create table Movie(
Id INT PRIMARY KEY,
Title varchar(255)
);
/* 200-300 records */
create table Tags(
Id INT PRIMARY KEY,
Desc varchar(255)
);
/* 200,000-300,000 records */
create table TagMovies(
Movie_Id INT,
Tag_Id INT,
PRIMARY KEY (Movie_Id, Tag_Id),
FOREIGN KEY (Movie_Id) REFERENCES Movie(Id),
FOREIGN KEY (Tag_Id) REFERENCES Tags(Id),
);
(This works, but it is terribly slow)
This is the query that I wrote to try and list them:
Usually I also filter with top 1 & add a where clause to get a specific set of related data.
SELECT
bk.Id,
rh.Id
FROM
Movies bk
CROSS APPLY (
SELECT TOP 15
b.Id,
/* Tags Score */
(
SELECT COUNT(*) FROM (
SELECT x.Tag_Id FROM TagMovies x WHERE x.Movie_Id = bk.Id
INTERSECT
SELECT x.Tag_Id FROM TagMovies x WHERE x.Movie_Id = b.Id
) Q1
)
as Amount
FROM
Movies b
WHERE
b.Id <> bk.Id
ORDER BY Amount DESC
) rh
Explanation:
Movies have tags and the user can get try to find movies similar to the one that they selected based on other movies that have similar tags.
Hmm ... just an idea, but maybe I didnt understand ...
This query should return best matched movies by tags for a given movie ID:
SELECT m.id, m.title, GROUP_CONCAT(DISTINCT t.Descr SEPARATOR ', ') as tags, count(*) as matches
FROM stack.Movie m
LEFT JOIN stack.TagMovies tm ON m.Id = tm.Movie_Id
LEFT JOIN stack.Tags t ON tm.Tag_Id = t.Id
WHERE m.id != 1
AND tm.Tag_Id IN (SELECT Tag_Id FROM stack.TagMovies tm WHERE tm.Movie_Id = 1)
GROUP BY m.id
ORDER BY matches DESC
LIMIT 15;
EDIT:
I just realized that it's for M$ SQL ... but maybe something similar can be done...
You should probably decide on a naming convention and stick with it. Are tables singular or plural nouns? I don't want to get into that debate, but pick one or the other.
Without access to your database I don't know how this will perform. It's just off the top of my head. You could also limit this by the M.id value to find the best matches for a single movie, which I think would improve performance by quite a bit.
Also, TOP x should let you get the x closest matches.
SELECT
M.id,
M.title,
SM.id AS similar_movie_id,
SM.title AS similar_movie_title,
COUNT(*) AS matched_tags
FROM
Movie M
INNER JOIN TagsMovie TM1 ON TM1.movie_id = M.movie_id
INNER JOIN TagsMovie TM2 ON
TM2.tag_id = TM1.tag_id AND
TM2.movie_id <> TM1.movie_id
INNER JOIN Movie SM ON SM.movie_id = TM2.movie_id
GROUP BY
M.id,
M.title,
SM.id AS similar_movie_id,
SM.title AS similar_movie_title
ORDER BY
COUNT(*) DESC

How to let COUNT show zeros in tables with many to many relationship?

the following query does not show the Groups where no users belong to.
I would like to have the shown with a count of 0 too. How do I do this?
Like this should it be
Group A 8
Group B 0
Group C 2
This is it now
Group A 8
Group C 2
SELECT UsersToGroups.GroupID,
groups.Group,
COUNT(UsersToGroups.UserID) AS countUsersPerGroup
FROM users_Groups AS groups
LEFT JOIN AssociationUsersToGroups AS UsersToGroups ON
UsersToGroups.GroupID =
groups.ID
LEFT JOIN users_Users AS users ON
UsersToGroups.UserID =
users.ID
GROUP BY GroupID,
groups.Group
ORDER BY groups.Group ASC
Query will select all groups
SELECT groups.ID,
groups.Group,
FROM users_Groups AS groups
If you add LEFT JOIN AssociationUsersToGroups you should receive groups with number of participants:
SELECT groups.ID,
groups.Group,
COUNT(UsersToGroups.UserID) AS countUsersPerGroup
FROM users_Groups AS groups
LEFT JOIN AssociationUsersToGroups AS UsersToGroups ON
UsersToGroups.GroupID =
groups.ID
GROUP BY groups.ID, groups.Group
First of all i don't see why you need to join the user table?
There's no need to assuming that you have a foreign key relationship between "users" and "users-to-group" association
table with ON DELETE CASCADE
This works for me:
-- setting up test-tables and test-data
create table #Groups
(
ID int,
GroupName varchar(100)
)
create table #UsersToGroup
(
GroupID int,
UserID int
)
insert into #Groups
values (1,'Group A'),(2,'Group B'),(3,'Group C')
insert into #UsersToGroup
values (1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),(1,8),(3,1),(3,2)
-- the query you want:
select g.ID as GroupID,
g.GroupName,
count(utg.UserID) as countUsersPerGroup
from #Groups g
left join #UsersToGroup utg on g.ID = utg.GroupID
group by g.ID, g.GroupName
order by g.GroupName asc
-- cleanup
drop table #Groups
drop table #UsersToGroup
output:

mysql "group by" very slow query

I have this query in a table with about 100k records, it runs quite slow (3-4s), when I take out the group it's much faster (less than 0.5s). I'm quite at loss what to do to fix this:
SELECT msg.id,
msg.thread_id,
msg.senderid,
msg.recipientid,
from_user.username AS from_name,
to_user.username AS to_name
FROM msgtable AS msg
LEFT JOIN usertable AS from_user ON msg.senderid = from_user.id
LEFT JOIN usertabe AS to_user ON msg.recipientid = to_user.id
GROUP BY msg.thread_id
ORDER BY msg.id desc
msgtable has indexes on thread_id, id, senderid and recipientid.
explain returns:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE msg ALL NULL NULL NULL NULL 162346 Using temporary; Using filesort
1 SIMPLE from_user eq_ref PRIMARY PRIMARY 4 db.msg.senderid 1
1 SIMPLE to_user eq_ref PRIMARY PRIMARY 4 db.msg.recipientid 1
Any ideas how to speed this up while returning the same result (there are multiple messages per thread, i want to return only one message per thread in this query).
thanks in advance.
try this:
select m.thread_id, m.id, m.senderid, m.recipientid,
f.username as from_name, t.username as to_name
from msgtable m
join usertable f on m.senderid = f.id
join usertable t on m.recipientid = t.id
where m.id = (select MAX(id) from msgtable where thread_id = m.thread_id)
Or this:
select m.thread_id, m.id, m.senderid, m.recipientid,
(select username from usertable where id = m.senderid) as from_name,
(select username from usertable where id = m.recipientid) as to_name
from msgtable m
where m.id = (select MAX(id) from msgtable where thread_id = m.thread_id)
Why were the user tables left joined? Can a message be missing a from or to?..
The biggest problem is that you have no usable indexes on msgtable. Create an index on at least senderid and recipientid, and it should help the speed of your query, as it will limit the number of results needing to be scanned.

how can i rewrite a select query in this situation

Here are two table in parent/child relationship.
What i need to do is to select students with there average mark:
CREATE TABLE dbo.Students(
Id int NOT NULL,
Name varchar(15) NOT NULL,
CONSTRAINT PK_Students PRIMARY KEY CLUSTERED
(
CREATE TABLE [dbo].[Results](
Id int NOT NULL,
Subject varchar(15) NOT NULL,
Mark int NOT NULL
)
ALTER TABLE [dbo].[Results] WITH CHECK ADD CONSTRAINT [FK_Results_Students] FOREIGN KEY([Id])
REFERENCES [dbo].[Students] ([Id])
I wrote a query like this :
SELECT name , coalesce(avg(r.[mark]),0) as Avmark
FROM students s
LEFT JOIN results r ON s.[id]=r.[id]
GROUP BY s.[name]
ORDER BY ISNULL(AVG(r.[mark]),0) DESC;
But the result is that all of students with there avg mark in desc order.What i need is to restrict result set with students that have the highest average mark agaist other,i.e.if the are two students with avg mark 50 and 1 with 25 i need to display only those students with 50.If there are only one student with highest avg mark- only he must appear in result set.How can i do this in best way?
SQL Server 2005+, using CTEs:
WITH grade_average AS (
SELECT r.id,
AVG(r.mark) 'avg_mark'
FROM RESULTS r
GROUP BY r.id),
highest_average AS (
SELECT MAX(ga.avg_mark) 'highest_avg_mark'
FROM grade_average ga)
SELECT DISTINCT
s.name,
ga.avg_mark
FROM STUDENTS s
JOIN grade_average ga ON ha.id = s.id
JOIN highest_average ha ON ha.highest_avg_mark = ga.avg_mark
Non-CTE equivalent:
SELECT DISTINCT
s.name,
ga.avg_mark
FROM STUDENTS s
JOIN (SELECT r.id,
AVG(r.mark) 'avg_mark'
FROM RESULTS r
GROUP BY r.id) ga ON ha.id = s.id
JOIN SELECT MAX(ga.avg_mark) 'highest_avg_mark'
FROM (SELECT r.id,
AVG(r.mark) 'avg_mark'
FROM RESULTS r
GROUP BY r.id) ga) ha ON ha.highest_avg_mark = ga.avg_mark
If you're using a relatively new version of MS SQL server, you can use WITH to make this simple to write:
WITH T AS (
SELECT
name,
coalesce(avg(r.[mark]),0) as mark
FROM students s
LEFT JOIN results r ON s.[id]=r.[id]
GROUP BY s.[name])
SELECT name as 'ФИО', mark as 'Средний бал'
FROM T
WHERE T.mark = (SELECT MAX(mark) from T)
Is it as simple as this? For all versions of SQL Server 2000+
SELECT TOP 1 WITH TIES
name, ISNULL(avg(r.[mark]),0) as AvMark
FROM
students s
LEFT JOIN
results r ON s.[id]=r.[id]
GROUP BY
s.[name]
ORDER BY
ISNULL(avg(r.[mark]),0) DESC;
SELECT name as 'ФИО',
coalesce(avg(r.[mark]),0) as 'Средний бал'
FROM students s
LEFT JOIN results r
ON s.[id]=r.[id]
GROUP BY s.[name]
HAVING AVG(r.[mark]) >= 50
ORDER BY ISNULL(AVG(r.[mark]),0) DESC
about HAVING clause