Slow/heavy query, can it be made more efficient? - sql

I have the query below - which works, but it takes about 80 seconds to run in MSSMS. I am wondering if it can be made more efficient?
An explanation will follow below.
SELECT A.courseid, A.studentid, attendanceStatus, lessonDate,
(SELECT top 1 SnoozeFrom FROM [tblConsecutiveSnooze] C
WHERE A.courseID = C.courseID AND A.StudentID = C.StudentID
ORDER BY SnoozeFrom DESC ) AS latestSnooze ,
(SELECT top 1 SnoozeTimes FROM [tblConsecutiveSnooze] D
WHERE A.courseID = D.courseID AND A.StudentID = D.StudentID
ORDER BY SnoozeFrom DESC ) AS snoozeTimes
FROM [tblStudentAttendance] A INNER JOIN tblcourses
ON A.courseID = tblcourses.courseID
WHERE [lessonDate] > getdate()-21
AND EXISTS (SELECT * FROM tblstudentCourses B WHERE A.courseID = B.courseID
AND B.[DateLeft] IS NULL AND A.StudentID = B.StudentID)
ORDER BY courseid , studentID, [lessonDate]
So what I am doing is trying to access all student attendance records (from tblStudentAttendance) within the last 21 days, when i have confirmed (via the EXISTS) that the student is indeed still registered to the course.
Those 2 sub SELECT queries can be combined into one, but that does not have an impact on the run time of the query.
What does seem to effect the run time greatly is the EXISTS condition. So any suggestions appreciated.
UPDATE:
Sql Plan: http://199.127.217.23/mm.sqlplan
Indexes:
tblStudentAttendance (id, PK)
tblCourses (courseID, PK)
tblConsecutiveSnooze (id, PK)
tblstudentCourses (id, PK)

If you look at the execution plan you posted you will find that adding the missing index could improve the query performance by up to 33%. You can try adding the following non-clustered index to the tblStudentCourses Table. You can change the index name to whatever suits you.
USE [hvps453_datab2]
GO
CREATE NONCLUSTERED INDEX [NC_tblStuddentCourses_DL_SI_CI]
ON [dbo].[tblStudentCourses] ([DateLeft])
INCLUDE ([studentID],[courseId])
GO

How about a derived table with a row number to find the most recent SnoozeFrom record? Alternatively a CTE could also be used.
SELECT A.courseid
, A.studentid
, attendanceStatus
, lessonDate
, ConsecSnooze.SnoozeFrom AS latestSnooze
, ConsecSnooze.SnoozeTimes AS snoozeTimes
FROM [tblStudentAttendance] A
INNER JOIN tblcourses
ON A.courseID = tblcourses.courseID
LEFT JOIN (
SELECT SnoozeFrom
, SnoozeTimes
, C.courseID
, C.StudentID
, ROW_NUMBER() OVER (PARTITION BY C.CourseID, C.StudentID ORDER BY SnoozeFrom DESC) AS RowNum
FROM [tblConsecutiveSnooze] C
) as ConsecSnooze ON ConsecSnooze.courseID = A.courseID
AND ConsecSnooze.StudentID = A.studentID
AND ConsecSnooze.RowNum = 1
WHERE [lessonDate] > getdate() - 21
AND EXISTS (
SELECT *
FROM tblstudentCourses B
WHERE A.courseID = B.courseID
AND B.[DateLeft] IS NULL
AND A.StudentID = B.StudentID
)
ORDER BY courseid
, studentID
, [lessonDate]

If the primary way that you access tblConsecutiveSnooze is by CourseID and StudentID, then I highly recommend that you change the PK to be nonclustered and add a clustered index on CourseID, StudentID. This is far superior to just adding a nonclustered index and leaving the clustered PK on id. Furthermore, it's possible you don't even need the id column, if there are no FKs to it (that don't make sense to switch to CourseID, StudentID). While I am a proponent of surrogate keys, not every table needs them!
I'd also like to recommend that you stop naming columns simply id. A column name should be the same in every table it exists within, including its base table.

Related

How to select from multiple tables in a group by query?

I have some database tables containing some documents that people need to sign. The tables are defined (somewhat simplified) as follows.
create table agreement (
id integer NOT NULL,
name character varying(50) NOT NULL,
org_id integer NOT NULL,
CONSTRAINT agreement_pkey PRIMARY KEY (id)
CONSTRAINT org FOREIGN KEY (org_id) REFERENCES org (id) MATCH SIMPLE
)
create table version (
id integer NOT NULL,
content text NOT NULL,
publish_date timestamp NOT NULL,
agreement_id integer NOT NULL,
CONSTRAINT version_pkey PRIMARY KEY (id)
CONSTRAINT agr FOREIGN KEY (agreement_id) REFERENCES agreement (id) MATCH SIMPLE
)
I skipped the org table, to reduce clutter. I have been trying to write a query that would give me all the right agreement information for a given org. So far, I can do
SELECT a.id, a.name FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.name = $1
GROUP BY a.id
This seems to give me a single record for each agreement that belongs to the org I want and has at least one version. But I need to also include content and date published of the latest version available. How do I do that?
Also, I have a separate table called signatures that links to a user and a version. If possible, I would like to extend this query to only include agreements where a given user didn't yet sign the latest version.
Edit: reflected the need for the org join, since I select orgs by name rather than by id
You can use a correlated subquery:
SELECT a.id, a.name, v.*
FROM agreement a JOIN
version v
ON a.id = v.agreement_id
WHERE a.org_id = $1 AND
v.publish_date = (SELECT MAX(v2.publish_date) FROM version v2 WHERE v2.agreement_id = v.agreement_id);
Notes:
The org table is not needed because agreement has an org_id.
No aggregation is needed for this query. You are filtering for the most recent record.
The correlated subquery is one method that retrieves the most recent version.
Postgresql has Window Functions.
Window functions allow you to operate a sort over a specific column or set of columns. the rank function returns the row's place in the results for the sort. If you filter to just where the rank is 1 then you will always get just one row and it will be the highest sorted for the partition.
select u.id, u.name, u.content, u.publish_date from (
SELECT a.id, a.name, v.content, v.publish_date, rank() over (partition by a.id order by v.id desc) as pos
FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.id = $1
) as u
where pos = 1
SELECT a.id, a.name, max(v.publish_date) publish_date FROM agreement AS a
JOIN version as v ON (a.id = v.agreement_id)
JOIN org as o ON (o.id = a.org_id)
WHERE o.id = $1
GROUP BY a.id, a.name

SQL - Selecting highest scores for different categories

Lets say i've got a db with 3 tables:
Players (PK id_player, name...),
Tournaments (PK id_tournament, name...),
Game (PK id_turn, FK id_tournament, FK id_player and score)
Players participate in tournaments. Table called Game keeps track of each player's score for different tournaments)
I want to create a view that looks like this:
torunament_name Winner highest_score
Tournament_1 Jones 300
Tournament_2 White 250
I tried different aproaches but I'm fairly new to sql (and alsoto this forum)
I tried using union all clause like:
select * from (select "Id_player", avg("score") as "Score" from
"Game" where "Id_tournament" = '1' group by "Id_player" order by
"Score" desc) where rownum <= 1
union all
select * from (select "Id_player", avg("score") as "Score" from
"Game" where "Id_tournament" = '2' group by "Id_player" order by
"Score" desc) where rownum <= 1;
and ofc it works but whenever a tournament happens, i would have to manually add a select statement to this with Id_torunament = nextvalue
EDIT:
So lets say that player with id 1 scored 50 points in tournament a, player 2 scored 40 points, player 1 wins, so the table should show only player 1 as the winner (or if its possible 2or more players if its a tie) of this tournament. Next row shows the winner of second tournament. I dont think Im going to put multiple games for one player in the same tournament, but if i would, it would probably count avg from all his scores.
EDIT2:
Create table scripts:
create table players
(id_player numeric(5) constraint pk_id_player primary key, name
varchar2(50));
create table tournaments
(id_tournament numeric(5) constraint pk_id_tournament primary key,
name varchar2(50));
create table game
(id_game numeric(5) constraint pk_game primary key, id_player
numeric(5) constraint fk_id_player references players(id_player),
id_tournament numeric(5) constraint fk_id_tournament references
tournaments(id_tournament), score numeric(3));
RDBM screenshot
FINAL EDIT:
Ok, in case anyone is wondering I used Jorge Campos script, changed it a bit and it works. Thank you all for helping. Unfortunately I cannot upvote comments yet, so I can only thank by posting. Heres the final script:
select
t.name,
p.name as winner,
g.score
from
game g inner join tournaments t
on g.id_tournament = t.id_tournament
inner join players p
on g.id_player = p.id_player
inner join
(select g.id_tournament, g.id_player,
row_number() over (partition by t.name order by
score desc) as rd from game g join tournaments t on
g.id_tournament = t.id_tournament
) a
on g.id_player = a.id_player
and g.id_tournament = a.id_tournament
and a.rd=1
order by t.name, g.score desc;
This query could be simplified depending on the RDBMs you are using.
select
t.name,
p.name as winner,
g.score
from
game g inner join tournaments t
on g.id_tournament = t.id_tournament
inner join players p
on g.id_player = p.id_player
inner join
(select id_tournament,
id_player,
row_number() over (partition by t.name order by score desc) as rd
from game
) a
on g.id_player = a.id_player
and g.id_tournament = a.id_tournament
and a.rd=1
order by t.name, g.score desc
Assuming what you want as "Display high score of each player in each tournament"
your query would be like below in MS Sql server
select
t.name as tournament_name,
p.name as Winner,
Max(g.score) as [Highest_Score]
from Tournmanents t
Inner join Game g on t.id_tournament=g.id_tournament
inner join Players p on p.id_player=g.id_player
group by
g.id_tournament,
g.id_player,
t.name,
p.name
Please check this if this works for you
SELECT tournemntData.id_tournament ,
tournemntData.name ,
dbo.Players.name ,
tournemntData.Score
FROM dbo.Game
INNER JOIN ( SELECT dbo.Tournaments.id_tournament ,
dbo.Tournaments.name ,
MAX(dbo.Game.score) AS Score
FROM dbo.Game
INNER JOIN dbo.Tournaments ONTournaments.id_tournament = Game.id_tournament
INNER JOIN dbo.Players ON Players.id_player = Game.id_player
GROUP BY dbo.Tournaments.id_tournament ,
dbo.Tournaments.name
) tournemntData ON tournemntData.id_tournament =Game.id_tournament
INNER JOIN dbo.Players ON Players.id_player = Game.id_player
WHERE tournemntData.Score = dbo.Game.score

Matching similar entities based on many to many relationship

I have two entities in my database that are connected with a many to many relationship. I was wondering what would be the best way to list which entities have the most similarities based on it?
I tried doing a count(*) with intersect, but the query takes too long to run on every entry in my database (there are about 20k records). When running the query I wrote, CPU usage jumps to 100% and the database has locking issues.
Here is some code showing what I've tried:
My tables look something along these lines:
/* 20k records */
create table Movie(
Id INT PRIMARY KEY,
Title varchar(255)
);
/* 200-300 records */
create table Tags(
Id INT PRIMARY KEY,
Desc varchar(255)
);
/* 200,000-300,000 records */
create table TagMovies(
Movie_Id INT,
Tag_Id INT,
PRIMARY KEY (Movie_Id, Tag_Id),
FOREIGN KEY (Movie_Id) REFERENCES Movie(Id),
FOREIGN KEY (Tag_Id) REFERENCES Tags(Id),
);
(This works, but it is terribly slow)
This is the query that I wrote to try and list them:
Usually I also filter with top 1 & add a where clause to get a specific set of related data.
SELECT
bk.Id,
rh.Id
FROM
Movies bk
CROSS APPLY (
SELECT TOP 15
b.Id,
/* Tags Score */
(
SELECT COUNT(*) FROM (
SELECT x.Tag_Id FROM TagMovies x WHERE x.Movie_Id = bk.Id
INTERSECT
SELECT x.Tag_Id FROM TagMovies x WHERE x.Movie_Id = b.Id
) Q1
)
as Amount
FROM
Movies b
WHERE
b.Id <> bk.Id
ORDER BY Amount DESC
) rh
Explanation:
Movies have tags and the user can get try to find movies similar to the one that they selected based on other movies that have similar tags.
Hmm ... just an idea, but maybe I didnt understand ...
This query should return best matched movies by tags for a given movie ID:
SELECT m.id, m.title, GROUP_CONCAT(DISTINCT t.Descr SEPARATOR ', ') as tags, count(*) as matches
FROM stack.Movie m
LEFT JOIN stack.TagMovies tm ON m.Id = tm.Movie_Id
LEFT JOIN stack.Tags t ON tm.Tag_Id = t.Id
WHERE m.id != 1
AND tm.Tag_Id IN (SELECT Tag_Id FROM stack.TagMovies tm WHERE tm.Movie_Id = 1)
GROUP BY m.id
ORDER BY matches DESC
LIMIT 15;
EDIT:
I just realized that it's for M$ SQL ... but maybe something similar can be done...
You should probably decide on a naming convention and stick with it. Are tables singular or plural nouns? I don't want to get into that debate, but pick one or the other.
Without access to your database I don't know how this will perform. It's just off the top of my head. You could also limit this by the M.id value to find the best matches for a single movie, which I think would improve performance by quite a bit.
Also, TOP x should let you get the x closest matches.
SELECT
M.id,
M.title,
SM.id AS similar_movie_id,
SM.title AS similar_movie_title,
COUNT(*) AS matched_tags
FROM
Movie M
INNER JOIN TagsMovie TM1 ON TM1.movie_id = M.movie_id
INNER JOIN TagsMovie TM2 ON
TM2.tag_id = TM1.tag_id AND
TM2.movie_id <> TM1.movie_id
INNER JOIN Movie SM ON SM.movie_id = TM2.movie_id
GROUP BY
M.id,
M.title,
SM.id AS similar_movie_id,
SM.title AS similar_movie_title
ORDER BY
COUNT(*) DESC

SELECT Statement in CASE

Please don't downgrade this as it is bit complex for me to explain. I'm working on data migration so some of the structures look weird because it was designed by someone like that.
For ex, I have a table Person with PersonID and PersonName as columns. I have duplicates in the table.
I have Details table where I have PersonName stored in a column. This PersonName may or may not exist in the Person table. I need to retrieve PersonID from the matching records otherwise put some hardcode value in PersonID.
I can't write below query because PersonName is duplicated in Person Table, this join doubles the rows if there is a matching record due to join.
SELECT d.Fields, PersonID
FROM Details d
JOIN Person p ON d.PersonName = p.PersonName
The below query works but I don't know how to replace "NULL" with some value I want in place of NULL
SELECT d.Fields, (SELECT TOP 1 PersonID FROM Person where PersonName = d.PersonName )
FROM Details d
So, there are some PersonNames in the Details table which are not existent in Person table. How do I write CASE WHEN in this case?
I tried below but it didn't work
SELECT d.Fields,
CASE WHEN (SELECT TOP 1 PersonID
FROM Person
WHERE PersonName = d.PersonName) = null
THEN 123
ELSE (SELECT TOP 1 PersonID
FROM Person
WHERE PersonName = d.PersonName) END Name
FROM Details d
This query is still showing the same output as 2nd query. Please advise me on this. Let me know, if I'm unclear anywhere. Thanks
well.. I figured I can put ISNULL on top of SELECT to make it work.
SELECT d.Fields,
ISNULL(SELECT TOP 1 p.PersonID
FROM Person p where p.PersonName = d.PersonName, 124) id
FROM Details d
A simple left outer join to pull back all persons with an optional match on the details table should work with a case statement to get your desired result.
SELECT
*
FROM
(
SELECT
Instance=ROW_NUMBER() OVER (PARTITION BY PersonName),
PersonID=CASE WHEN d.PersonName IS NULL THEN 'XXXX' ELSE p.PersonID END,
d.Fields
FROM
Person p
LEFT OUTER JOIN Details d on d.PersonName=p.PersonName
)AS X
WHERE
Instance=1
Ooh goody, a chance to use two LEFT JOINs. The first will list the IDs where they exist, and insert a default otherwise; the second will eliminate the duplicates.
SELECT d.Fields, ISNULL(p1.PersonID, 123)
FROM Details d
LEFT JOIN Person p1 ON d.PersonName = p1.PersonName
LEFT JOIN Person p2 ON p2.PersonName = p1.PersonName
AND p2.PersonID < p1.PersonID
WHERE p2.PersonID IS NULL
You could use common table expressions to build up the missing datasets, i.e. your complete Person table, then join that to your Detail table as follows;
declare #n int;
-- set your default PersonID here;
set #n = 123;
-- Make sure previous SQL statement is terminated with semilcolon for with clause to parse successfully.
-- First build our unique list of names from table Detail.
with cteUniqueDetailPerson
(
[PersonName]
)
as
(
select distinct [PersonName]
from [Details]
)
-- Second get unique Person entries and record the most recent PersonID value as the active Person.
, cteUniquePersonPerson
(
[PersonID]
, [PersonName]
)
as
(
select
max([PersonID]) -- if you wanted the original Person record instead of the last, change this to min.
, [PersonName]
from [Person]
group by [PersonName]
)
-- Third join unique datasets to get the PersonID when there is a match, otherwise use our default id #n.
-- NB, this would also include records when a Person exists with no Detail rows (they are filtered out with the final inner join)
, cteSudoPerson
(
[PersonID]
, [PersonName]
)
as
(
select
coalesce(upp.[PersonID],#n) as [PersonID]
coalesce(upp.[PersonName],udp.[PersonName]) as [PersonName]
from cteUniquePersonPerson upp
full outer join cteUniqueDetailPerson udp
on udp.[PersonName] = p.[PersonName]
)
-- Fourth, join detail to the sudo person table that includes either the original ID or our default ID.
select
d.[Fields]
, sp.[PersonID]
from [Details] d
inner join cteSudoPerson sp
on sp.[PersonName] = d.[PersonName];

SQL - Find duplicates with equivalencies

I'm having trouble wrapping my mind around developing this SQL query. Given the following two tables:
ACADEMIC_HISTORY ( STUDENT_ID, TERM, COURSE_ID, COURSE_GRADE )
COURSE_EQUIVALENCIES ( COURSE_ID, COURSE_ID_EQUIVALENT )
What would be the best way to detect if students have taken the same (or an equivalent) course in the past with a passing grade (C or better)?
Example
Student #1 took the course ABC001 and received a grade of C. Ten years later, the course was renamed ABC011 and the appropriate entry was made in COURSE_EQUIVALENCIES. The student retook the course under this new name and received a grade of B. How can I construct a SQL query that will detect the duplicate courses and only count the first passing grade?
(The actual case is significantly more complicated, but this should get me started.)
Thanks in advance.
EDIT:
It's not even necessary to keep or discard any information. A query that simply shows classes with duplicates will be sufficient.
you could use something like:
SELECT
STUDENT_ID
,MIN (COURSE_GRADE)
FROM (
SELECT * FROM
ACADEMIC_HISTORY
WHERE COURSE_ID =1
UNION
SELECT
h.STUDENT_ID
,h2.COURSE_ID
,h2.COURSE_GRADE
FROM
ACADEMIC_HISTORY AS h
LEFT OUTER JOIN COURSE_EQUIVELANCIES as e
ON e.COURSE_ID = h.COURSE_ID
LEFT OUTER JOIN ACADEMIC_HISTORY as h2
ON h.STUDENT_ID = h2.STUDENT_ID
AND h2.COURSE_ID = e.COURSE_ID_EQUIVELANT
WHERE
h.COURSE_ID =1
) AS t
WHERE STUDENT_ID =1
GROUP BY STUDENT_ID
http://sqlfiddle.com/#!3/d608f/20
Sorry posted with a bug.. it preferred the score of the actual course requested over any equivalencies - fixed now
this only looks for one level of equivalencies.. but maybe you want to enforce that and have that part of the data entry process.. review all possible equivalencies and enter the valid ones
EDIT: for first pass of qualifying course (using numbered terms..)
SELECT TOP 1
STUDENT_ID
,MIN (COURSE_GRADE)
FROM (
SELECT * FROM
ACADEMIC_HISTORY
WHERE COURSE_ID =1
UNION
SELECT
h.STUDENT_ID
,h2.COURSE_ID
,h2.TERM
,h2.COURSE_GRADE
FROM
ACADEMIC_HISTORY AS h
LEFT OUTER JOIN COURSE_EQUIVELANCIES as e
ON e.COURSE_ID = h.COURSE_ID
LEFT OUTER JOIN ACADEMIC_HISTORY as h2
ON h.STUDENT_ID = h2.STUDENT_ID
AND h2.COURSE_ID = e.COURSE_ID_EQUIVELANT
WHERE
h.COURSE_ID =1
) AS t
WHERE STUDENT_ID =1
GROUP BY STUDENT_ID, TERM
ORDER BY TERM ASC
http://sqlfiddle.com/#!3/fdded/6
(note TOP is a t-sql command for MySQL you need LIMIT)
The data (in LOWERCASE)
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp;
SET search_path='tmp';
CREATE TABLE academic_history
( student_id INTEGER NOT NULL
, course_id CHAR(6)
, course_grade CHAR(1)
, PRIMARY KEY(student_id,course_id)
);
INSERT INTO academic_history ( student_id,course_id,course_grade) VALUES
(1, 'ABC001' , 'C' )
, (1, 'ABC011' , 'B' )
, (2, 'ABC011' , 'A' )
;
CREATE TABLE course_equivalencies
( course_id CHAR(6)
, course_id_equivalent CHAR(6)
);
INSERT INTO course_equivalencies(course_id,course_id_equivalent) VALUES
( 'ABC011' , 'ABC001' )
;
The query:
-- EXPLAIN ANALYZE
WITH canon AS (
SELECT ah.student_id AS student_id
, ah.course_id AS course_id
, COALESCE (eq.course_id_equivalent,ah.course_id) AS course_id_equivalent
FROM academic_history ah
LEFT JOIN course_equivalencies eq ON eq.course_id = ah.course_id
)
SELECT h.student_id
, c.course_id_equivalent
, MIN(h.course_grade) AS the_grade
FROM academic_history h
JOIN canon c ON c.student_id = h.student_id AND c.course_id = h.course_id
GROUP BY h.student_id, c.course_id_equivalent
ORDER BY h.student_id, c.course_id_equivalent
;
The output:
NOTICE: drop cascades to 2 other objects
DETAIL: drop cascades to table tmp.academic_history
drop cascades to table tmp.course_equivalencies
DROP SCHEMA
CREATE SCHEMA
SET
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "academic_history_pkey" for table "academic_history"
CREATE TABLE
INSERT 0 3
CREATE TABLE
INSERT 0 1
student_id | course_id_equivalent | the_grade
------------+----------------------+-----------
1 | ABC001 | B
2 | ABC001 | A
(2 rows)