giving priority to values in an SQL stmt - sql

is there any way i can do a query to specify that I want to give priority to some value?
for instance i have:
SELECT TOP (20)
r.MD5, r.Title, r.Link, t.Category, t.tfidf, COUNT(r.MD5) AS matching_terms
FROM
Resource AS r INNER JOIN tags AS t ON r.MD5 = t.MD5
WHERE
(t.Category IN ('algorithm', 'k-means', 'statistics', 'clustering', 'science'))
GROUP BY r.MD5, r.Title, r.Link, t.Category, t.tfidf
ORDER BY matching_terms DESC, t.tfidf DESC
i want that 'algorithm' is given higher priority when finding results. any ideas?

I'm not sure how high a priority you want to make 'algorithm', but in any case, you can add this to the ORDER BY clause, in order to make it the most important category (all other categories are equally important):
ORDER BY ..., CASE t.Category = 'algorithm' THEN 0 ELSE 1 END, ...
If however your concept of "priority" is somehow correlated with the importance of the matching_terms expression, you could also try something like this (you'd have to nest your above select)
SELECT TOP(20) FROM (
[your original select without TOP(20) clause]
)
ORDER BY (matching_terms * CASE t.Category = 'algorithm'
THEN 1.5 ELSE 1 END) DESC, t.tfidf DESC
But that's just an example to give you an idea.
UPDATE: Following you comment, you can generate a case statement like this:
ORDER BY CASE t.Category WHEN 'algorithm' THEN 0
WHEN 'k-means' THEN 1
WHEN 'statistics' THEN 2
WHEN 'clustering' THEN 3
WHEN 'science' THEN 4 END
Or alternatively (especially if your list of categories is large), then you should add a sort field to tags, containing the priority. Then you could simply order by sort

SELECT TOP (10) r.MD5, r.Title, r.Link, t.Category, t.tfidf, COUNT(r.MD5) AS matching_terms
FROM Resource AS r INNER JOIN
tags AS t ON r.MD5 = t.MD5
WHERE (t.Category IN ('astrophysics', 'athletics', 'sports', 'football', 'soccer'))
GROUP BY r.MD5, r.Title, r.Link, t.Category, t.tfidf
ORDER BY (CASE t .Category WHEN 'astrophysics' THEN 0 WHEN 'athletics' THEN 1 WHEN 'sports' THEN 2 WHEN 'football' THEN 3 WHEN 'soccer' THEN 4 END)
Thanks for giving me the idea Lukas Eder

Related

Getting undesired output in SQL Server while using pivot

I am looking for an output of:
but getting this instead:
The two tables which I have used are Table 1 and Table 2.
I am providing the links for the table as well Kaggle Dataset
The code I have
SELECT *
FROM
(SELECT
nr.region, Medal,
COUNT(Medal) AS 'Total_Medal'
FROM
athlete_events AS ae
JOIN
noc_regions AS nr ON ae.NOC = nr.NOC
WHERE
Medal <> 'NA'
GROUP BY
Medal, nr.region) AS t1
PIVOT
(COUNT(Medal)
FOR Medal IN ([Gold], [Silver], [Bronze])
) pivot_table
ORDER BY
Total_Medal DESC
Please help me to solve this, consider me a novice
Try This, you will get exact output
Select OH.Region,
Count(Case When O.Medal='Gold' Then 1 End) AS Gold,
Count(Case When O.Medal='Silver' Then 1 End) AS Silver,
Count(Case When O.Medal='Bronze' Then 1 End) AS Bronze
from athlete_events O
Join noc_regions OH On OH.NOC = O.NOC
Group by OH.Region
Order By Gold Desc,Silver Desc,Bronze Desc
I think you probably just want something like:
SELECT *
FROM (
SELECT nr.region, medal
FROM athlete_events ae INNER JOIN noc_regions nr
ON ae.noc = nr.noc
WHERE medal <> 'NA'
) t1
PIVOT(COUNT(medal) FOR medal in ([Gold], [Silver], [Bronze])) pt
ORDER BY gold+silver+bronze DESC
There doesn't appear to be any reason for your inner grouping and counting. The pivot can handle the count and, since you don't want total medal in your output (except for the order), you can just sum the medal fields in the ORDER BY.
So I kinda gave up on pivot myself. Caused me to many problems in the past (always my own mistakes ofcourse) and when I can I try a different approache.
So I'll show you a query that tells you how you can get the results you want, but it won't answer your pivot question. This method isn't the most efficient, but it's fine for a size of these records and I think it makes the query even more readable.
SELECT
region
, SUM(Gold) AS 'Gold'
, SUM(Silver) AS 'Silver'
, SUM(Bronze) AS 'Bronze'
FROM
noc_regions
LEFT JOIN (SELECT NOC, COUNT(*) AS 'Gold' FROM athlete_events WHERE Medal = 'Gold' GROUP BY NOC) gold_events ON noc_regions.NOC = gold_events.NOC
LEFT JOIN (SELECT NOC, COUNT(*) AS 'Silver' FROM athlete_events WHERE Medal = 'Silver' GROUP BY NOC) silver_events ON noc_regions.NOC = silver_events.NOC
LEFT JOIN (SELECT NOC, COUNT(*) AS 'Bronze' FROM athlete_events WHERE Medal = 'Bronze' GROUP BY NOC) bronze_events ON noc_regions.NOC = bronze_events.NOC
GROUP BY
region
ORDER BY
Gold DESC
, Silver DESC
, Bronze DESC
, region ASC;

SQL aggregate function alias

I'm a beginner at SQL and this is the question I have been asked to solve:
Say that a big city is defined as a place of type city with a population of at
least 100,000. Write an SQL query that returns the scheme (state_name,no_big_city,big_city_population) ordered by state_name, listing those states which have either (a) at least five big cities or (b) at least one million people living in big cities. The column state_name is the name of the state, no_big_city is the number of big cities in the state, and big_city_population is the number of people living in big cities in the state.
Now, as far as I can see, the following query returns correct results:
SELECT state.name AS state_name
, COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city
, SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population
FROM state
JOIN place
ON state.code = place.state_code
GROUP BY state_name
HAVING
COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5 OR
SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000
ORDER BY state_name;
However, the two aggregate functions used in the code appear twice. MY question: is there any way of making this code duplication disappear preserving functionality?
To be clear, I have already tried using the alias, but I just get a "column does not exist" error.
The manual clarifies:
An output column's name can be used to refer to the column's value in
ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses;
there you must write out the expression instead.
Bold emphasis mine.
You can avoid typing long expressions repeatedly with a subquery or CTE:
SELECT state_name, no_big_city, big_city_population
FROM (
SELECT s.name AS state_name
, COUNT(*) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS no_big_city
, SUM(population) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS big_city_population
FROM state s
JOIN place p ON s.code = p.state_code
GROUP BY s.name -- can be input column name as well, best schema-qualified to avoid ambiguity
) sub
WHERE no_big_city >= 5
OR big_city_population >= 1000000
ORDER BY state_name;
While being at it, I simplified with the aggregate FILTER clause (Postgres 9.4+):
How can I simplify this game statistics query?
However, I suggest this simpler and faster query to begin with:
SELECT s.state_name, p.no_big_city, p.big_city_population
FROM state s
JOIN (
SELECT state_code AS code -- alias just to simplify join
, count(*) AS no_big_city
, sum(population) AS big_city_population
FROM place
WHERE type = 'city'
AND population >= 100000
GROUP BY 1 -- can be ordinal number referencing position in SELECT list
HAVING count(*) >= 5 OR sum(population) >= 1000000 -- simple expressions now
) p USING (code)
ORDER BY 1; -- can also be ordinal number
I am demonstrating another option to reference expressions in GROUP BY and ORDER BY. Only use that if it doesn't impair readability and maintainability.
Not sure if this is a comment or an answer, since it is more preference based as opposed to technical, but I'll post it anyway
What I usually do when I need to reference calculated columns (usually a LOT at the same time) is I put my calculated columns within a derived table and then reference the calculated columns using its alias outside of the derived table. This syntax should be ANSI-SQL correct, but I am not familiar with PostGRES
select * from (
SELECT STATE.NAME AS state_name
,COUNT(CASE WHEN place.type = 'city'
AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city
,SUM(CASE WHEN place.type = 'city'
AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population
FROM STATE
INNER JOIN place
ON STATE.code = place.state_code
GROUP BY state_name
) sub
where no_big_city >= 5
and big_city_population >=100000
--HAVING COUNT(CASE WHEN place.type = 'city'
-- AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5
-- OR SUM(CASE WHEN place.type = 'city'
-- AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000
ORDER BY state_name;
The nice thing about this approach is, although you are adding complication via a subquery/derived table, the formula is kept in one place, so any changes only have to happen once. I do not know if this will perform worse than simply repeating the calcuation in the group-by, but I can't imagine it would be that much worse.
SELECT clause is what you want to select from the filtred by WHERE clause table(s).
GROUP BY is a condition how to group filtered records to use in aggregation functions in the SELECT. So alias cannot be there.
But you can wrap your filtered records and select from them. Something like that:
SELECT state_name, no_big_city, big_city_population
FROM
(
SELECT
state.name AS state_name,
COUNT(1) no_big_city,
MAX(place.population) max_city_population,
SUM(place.population) AS big_city_population
FROM state JOIN place ON state.code = place.state_code
WHERE
place.type = 'city' AND
place.population >= 100000
GROUP BY state.name
)
WHERE
no_big_city >= 5 OR
max_city_population > 1000000
ORDER BY state_name
Also, moving conditions
place.type = 'city' AND
place.population >= 100000
out of CASE to WHERE will perform better. "No city" or "small city records will not be processed. especially if there is an index on place.type column.

Select Case is not working with Order by

I was using a simple sql query and getting an ordered list, but when I changed some of the values in the column I'm sorting by, those rows were no longer being sorted correctly.
select distinct u.Email,
case
when l.region_id is null then 'EU'
else l.region_id
end
as Location
from TB_User u
left join cat..location l on l.location=u.Location
where u.Username in (....)
order by l.region_id
I have about 5 rows that returned null for their region_id so they would be at the top of the result set. When I added the case and replaced their value, they still remain at the top. Is there anyway to make these rows sort according to their given value?
You can use CASE also in the ORDER BY. But in this case it seems that you instead want to order by the column which uses the CASE.
ORDER BY Location
If you instead want the null-regions at the bottom:
ORDER BY CASE WHEN l.region_id is null THEN 0 ELSE 1 END DESC,
Location ASC
If your rdbms doesn't support this (like SQL-Server does) you have to repeat it:
ORDER BY CASE WHEN l.region_id IS NULL THEN 'EU' ELSE l.region_id END ASC
You just order by the column value, which is null.
If you want to order by the case statement, just copy it in the order by clause:
order by
case
when l.region_id is null then 'EU'
else l.region_id end
If you are using SQL, try within the SELECT statement, use:
ISNULL(l.region_id, 'EU') AS Location
and then
ORDER BY 2
This will make your query:
SELECT DISTINCT u.Email, ISNULL(l.region_id, 'EU') AS Location
FROM TB_User u
LEFT JOIN cat..location l ON l.location=u.Location
WHERE u.Username in (....)
ORDER BY 2

SQL Specific Order

Given the following query:
SELECT DISTINCT n.iswinner, i.name
FROM nominees n, institutions i
WHERE n.iid = i.iid and n.filmname = '127 Hours'
ORDER BY name
I get the output:
iswinner name
NULL academy awards
NULL baftas
NULL critics' choice awards
NULL golden globes
NULL screen actors guild
NULL writers guild of america
I am trying to figure out if it is possible to order this output in a more specific manner. The order I am looking for is to list first 'academy awards', then 'golden globes' then anything with a 'guild' in its name, and finally anything else alphabetically. So therefore the output I'm looking for is more along the lines of this:
iswinner name
NULL academy awards
NULL golden globes
NULL screen actors guild
NULL writers guild of america
NULL bafta
NULL critics' choice awards
Is there a way to do such a thing? I believe I should use something like CASE, but I couldn't seem to figure out the correct syntax for it. Thanks for any help.
Yes there is a way to do something like this and just like you thought, you can do this with a CASE statement. Something like the following should do the trick:
SELECT
DISTINCT n.iswinner,
i.name,
CASE
WHEN i.name = 'academy awards' THEN 1
WHEN i.name = 'golden globes' THEN 2
WHEN i.name like '%guild%' THEN 3
ELSE 4
END AS Order
FROM nominees n, institutions i
WHERE n.iid = i.iid and n.filmname = '127 Hours'
ORDER BY
Order,
i.name
So, to give you a little more information on what is being done here. In the ORDER clause, we're ordering by a CASE statement. Basically, based upon what the i.name field is, we are assigning an integer number to order by. academy awards is assigned 1, golden globes is assigned 2, anything that contains 'guild' is assigned 3, and anything else is assigned 4. So we first order by this CASE statement (which gives the specific ordering you want) and then order by the name field which will satisfy your the second requirement of ordering anything else by name (which we previously assigned the value of 4 in the CASE statement for ordering).
I hope this makes sense to you.
Same query using GROUP BY rather than DISTINCT:
SELECT
n.iswinner,
i.name
FROM nominees n, institutions i
WHERE n.iid = i.iid and n.filmname = '127 Hours'
GROUP BY
n.iswinner,
i.isname
ORDER BY
CASE
WHEN i.name = 'academy awards' THEN 1
WHEN i.name = 'golden globes' THEN 2
WHEN i.name like '%guild%' THEN 3
ELSE 4
END,
i.name
order by
case
when name = 'academy awards' then 1
when name = 'golden globes' then 2
when name like '%guild%' then 3
else 4
end
, name
No need for a case clause. SQL Fiddle
select *
from (
select distinct n.iswinner, i.name
from nominees n, institutions i
where n.iid = i.iid and n.filmname = '127 Hours'
) s
order by
name != 'academy awards',
name != 'golden globes',
name not like '%guild%',
name
false orders before true

Efficiently pull different columns using a common correlated subquery

I need to pull multiple columns from a subquery which also requires a WHERE filter referencing columns of the FROM table. I have a couple of questions about this:
Is there another solution to this problem besides mine below?
Is another solution even necessary or is this solution efficient enough?
Example:
In the following example I'm writing a view to present test scores, particularly to discover failures that may need to be addressed or retaken.
I cannot simply use JOIN because I need to filter my actual subquery first (notice I'm getting TOP 1 for the "examinee", sorted either by score or date descending)
My goal is to avoid writing (and executing) essentially the same subquery repeatedly.
SELECT ExamineeID, LastName, FirstName, Email,
(SELECT COUNT(examineeTestID)
FROM exam.ExamineeTest tests
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2) Attempts,
(SELECT TOP 1 ExamineeTestID
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestExamineeTestID,
(SELECT TOP 1 Score
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestScore,
(SELECT TOP 1 DateDue
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestDateDue,
(SELECT TOP 1 TimeCommitted
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestTimeCommitted,
(SELECT TOP 1 ExamineeTestID
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentExamineeTestID,
(SELECT TOP 1 Score
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentScore,
(SELECT TOP 1 DateDue
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentDateDue,
(SELECT TOP 1 TimeCommitted
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentTimeCommitted
FROM exam.Examinee E
To answer your second question first, yes, a better way is in order, because the query you're using is hard to understand, hard to maintain, and even if the performance is acceptable now, it's a shame to query the same table multiple times when you don't need to plus the performance may not always be acceptable if your application ever grows to an appreciable size.
To answer your first question, I have a few methods for you. These assume SQL 2005 or up unless where noted.
Note that you don't need BestExamineeID and CurrentExamineeID because they will always be the same as ExamineeID unless no tests were taken and they're NULL, which you can tell from the other columns being NULL.
You can think of OUTER/CROSS APPLY as an operator that lets you move correlated subqueries from the WHERE clause into the JOIN clause. They can have an outer reference to a previously-named table, and can return more than one column. This enables you to do the job only once per logical query rather than once for each column.
SELECT
ExamineeID,
LastName,
FirstName,
Email,
B.Attempts,
BestScore = B.Score,
BestDateDue = B.DateDue,
BestTimeCommitted = B.TimeCommitted,
CurrentScore = C.Score,
CurrentDateDue = C.DateDue,
CurrentTimeCommitted = C.TimeCommitted
FROM
exam.Examinee E
OUTER APPLY ( -- change to CROSS APPLY if you only want examinees who've tested
SELECT TOP 1
Score, DateDue, TimeCommitted,
Attempts = Count(*) OVER ()
FROM exam.ExamineeTest T
WHERE
E.ExamineeID = T.ExamineeID
AND T.TestRevisionID = 3
AND T.TestID = 2
ORDER BY Score DESC
) B
OUTER APPLY ( -- change to CROSS APPLY if you only want examinees who've tested
SELECT TOP 1
Score, DateDue, TimeCommitted
FROM exam.ExamineeTest T
WHERE
E.ExamineeID = T.ExamineeID
AND T.TestRevisionID = 3
AND T.TestID = 2
ORDER BY DateDue DESC
) C
You should experiment to see if my Count(*) OVER () is better than having an additional OUTER APPLY that just gets the count. If you're not restricting the Examinee from the exam.Examinee table, it may be better to just do a normal aggregate in a derived table.
Here's another method that (sort of) goes and gets all the data in one swoop. It conceivably could perform better than other queries, except my experience is that windowing functions can get very and surprisingly expensive in some situations, so testing is in order.
WITH Data AS (
SELECT
*,
Count(*) OVER (PARTITION BY ExamineeID) Cnt,
Row_Number() OVER (PARTITION BY ExamineeID ORDER BY Score DESC) ScoreOrder,
Row_Number() OVER (PARTITION BY ExamineeID ORDER BY DateDue DESC) DueOrder
FROM
exam.ExamineeTest
), Vals AS (
SELECT
ExamineeID,
Max(Cnt) Attempts,
Max(CASE WHEN ScoreOrder = 1 THEN Score ELSE NULL END) BestScore,
Max(CASE WHEN ScoreOrder = 1 THEN DateDue ELSE NULL END) BestDateDue,
Max(CASE WHEN ScoreOrder = 1 THEN TimeCommitted ELSE NULL END) BestTimeCommitted,
Max(CASE WHEN DueOrder = 1 THEN Score ELSE NULL END) BestScore,
Max(CASE WHEN DueOrder = 1 THEN DateDue ELSE NULL END) BestDateDue,
Max(CASE WHEN DueOrder = 1 THEN TimeCommitted ELSE NULL END) BestTimeCommitted
FROM Data
GROUP BY
ExamineeID
)
SELECT
E.ExamineeID,
E.LastName,
E.FirstName,
E.Email,
V.Attempts,
V.BestScore, V.BestDateDue, V.BestTimeCommitted,
V.CurrentScore, V.CurrentDateDue, V.CurrentTimeCommitted
FROM
exam.Examinee E
LEFT JOIN Vals V ON E.ExamineeID = V.ExamineeID
-- change join to INNER if you only want examinees who've tested
Finally, here's a SQL 2000 method:
SELECT
E.ExamineeID,
E.LastName,
E.FirstName,
E.Email,
Y.Attempts,
Y.BestScore, Y.BestDateDue, Y.BestTimeCommitted,
Y.CurrentScore, Y.CurrentDateDue, Y.CurrentTimeCommitted
FROM
exam.Examinee E
LEFT JOIN ( -- change to inner if you only want examinees who've tested
SELECT
X.ExamineeID,
X.Cnt Attempts,
Max(CASE Y.Which WHEN 1 THEN T.Score ELSE NULL END) BestScore,
Max(CASE Y.Which WHEN 1 THEN T.DateDue ELSE NULL END) BestDateDue,
Max(CASE Y.Which WHEN 1 THEN T.TimeCommitted ELSE NULL END) BestTimeCommitted,
Max(CASE Y.Which WHEN 2 THEN T.Score ELSE NULL END) CurrentScore,
Max(CASE Y.Which WHEN 2 THEN T.DateDue ELSE NULL END) CurrentDateDue,
Max(CASE Y.Which WHEN 2 THEN T.TimeCommitted ELSE NULL END) CurrentTimeCommitted
FROM
(
SELECT ExamineeID, Max(Score) MaxScore, Max(DueDate) MaxDueDate, Count(*) Cnt
FROM exam.ExamineeTest
WHERE
TestRevisionID = 3
AND TestID = 2
GROUP BY ExamineeID
) X
CROSS JOIN (SELECT 1 UNION ALL SELECT 2) Y (Which)
INNER JOIN exam.ExamineeTest T
ON X.ExamineeID = T.ExamineeID
AND (
(Y.Which = 1 AND X.MaxScore = T.MaxScore)
OR (Y.Which = 2 AND X.MaxDueDate = T.MaxDueDate)
)
WHERE
T.TestRevisionID = 3
AND T.TestID = 2
GROUP BY
X.ExamineeID,
X.Cnt
) Y ON E.ExamineeID = Y.ExamineeID
This query will return unexpected extra rows if the combination of (ExamineeID, Score) or (ExamineeID, DueDate) can return multiple rows. That's probably not unlikely with Score. If neither is unique, then you need to use (or add) some additional column that can grant uniqueness so it can used to select one row. If only Score can be duplicated then an additional pre-query that gets the max Score first, then dovetailing in with the max DueDate would combine to pull the most recent score that was a tie for the highest at the same time as getting the most recent data. Let me know if you need more SQL 2000 help.
Note: The biggest thing that is going to control whether CROSS APPLY or a ROW_NUMBER() solution is better is whether you have an index on the columns that are being looked up and whether the data is dense or sparse.
Index + you're pulling only a few examinees with lots of tests each = CROSS APPLY wins.
Index + you're pulling a huge number of examines with only a few tests each = ROW_NUMBER() wins.
No index = string concatenation/value packing method wins (not shown here).
The group by solution that I gave for SQL 2000 will probably perform the worst, but not guaranteed. Like I said, testing is in order.
If any of my queries do give performance problems let me know and I'll see what I can do to help. I'm sure I probably have typos as I didn't work up any DDL to recreate your tables, but I did my best without trying it.
If performance really does become crucial, I would create ExamineeTestBest and ExamineeTestCurrent tables that get pushed to by a trigger on the ExamineeTest table that would always keep them updated. However, this is denormalization and probably not necessary or a good idea unless you've scaled so awfully big that retrieving results becomes unacceptably long.
It's not same subquery. It's three different subqueries.
count() on all
TOP (1) ORDER BY Score DESC
TOP (1) ORDER BY DateDue DESC
You can't avoid executing it less than 3 times.
The question is, how to make it execute no more than 3 times.
One option would be to write 3 inline table functions and use them with outer apply. Make sure they are actually inline, otherwise your performance will drop a hundred times. One of these three functions might be:
create function dbo.topexaminee_byscore(#ExamineeID int)
returns table
as
return (
SELECT top (1)
ExamineeTestID as bestExamineeTestID,
Score as bestScore,
DateDue as bestDateDue,
TimeCommitted as bestTimeCommitted
FROM exam.ExamineeTest
WHERE (ExamineeID = #ExamineeID) AND (TestRevisionID = 3) AND (TestID = 2)
ORDER BY Score DESC
)
Another option would be to do essentially the same, but with subqueries. Because you fetch data for all students anyway, there shouldn't be too much of a difference performance-wise. Create three subqueries, for example:
select bestExamineeTestID, bestScore, bestDateDue, bestTimeCommitted
from (
SELECT
ExamineeTestID as bestExamineeTestID,
Score as bestScore,
DateDue as bestDateDue,
TimeCommitted as bestTimeCommitted,
row_number() over (partition by ExamineeID order by Score DESC) as takeme
FROM exam.ExamineeTest
WHERE (TestRevisionID = 3) AND (TestID = 2)
) as foo
where foo.takeme = 1
Same for ORDER BY DateDue DESC and for all records, with respective columns being selected.
Join these three on the examineeid.
What is going to be better/more performant/more readable is up to you. Do some testing.
It looks like you can replace the three columns that are based on the alias "bestTest" with a view. All three of those subqueries have the same WHERE clause and the same ORDER BY clause.
Ditto for the subquery aliased "bestNewTest". Ditto ditto for the subquery aliased "currentTeest".
If I counted right, that would replace 8 subqueries with 3 views. You can join on the views. I think the joins would be faster, but if I were you, I'd check the execution plan of both versions.
You could use a CTE and OUTER APPLY.
;WITH testScores AS
(
SELECT ExamineeID, ExamineeTestID, Score, DateDue, TimeCommitted
FROM exam.ExamineeTest
WHERE TestRevisionID = 3 AND TestID = 2
)
SELECT ExamineeID, LastName, FirstName, Email, total.Attempts,
bestTest.*, currentTest.*
FROM exam.Examinee
LEFT OUTER JOIN
(
SELECT ExamineeID, COUNT(ExamineeTestID) AS Attempts
FROM testScores
GROUP BY ExamineeID
) AS total ON exam.Examinee.ExamineeID = total.ExamineeID
OUTER APPLY
(
SELECT TOP 1 ExamineeTestID, Score, DateDue, TimeCommitted
FROM testScores
WHERE exam.Examinee.ExamineeID = t.ExamineeID
ORDER BY Score DESC
) AS bestTest (bestExamineeTestID, bestScore, bestDateDue, bestTimeCommitted)
OUTER APPLY
(
SELECT TOP 1 ExamineeTestID, Score, DateDue, TimeCommitted
FROM testScores
WHERE exam.Examinee.ExamineeID = t.ExamineeID
ORDER BY DateDue DESC
) AS currentTest (currentExamineeTestID, currentScore, currentDateDue,
currentTimeCommitted)