SQL - Query results that display a record with multiple rows of data on one row horizontally - sql

I've provided statements to create the two example tables for my question below.
In this example, the second table contains the same student id multiple times due to having multiple classIDs. However, I need the query results to be displayed horizontally on one row. (The max # of classID's per studentid is 15)
How would you write a select statement that joins the two tables and all classID data repeats within the same column like the example below:
CREATE TABLE Student (
StudentId int,
FirstName VarChar (255),
LastName VarChar (255)
);
CREATE TABLE Classes (
StudentId int,
ClassId int,
ClassName VarChar (255),
ClassCost int
);
INSERT INTO Student (StudentId, FirstName, LastName)
VALUES
(123, 'Carol', 'Dwek'),
(456, 'Cal', 'Newport');
INSERT INTO Classes (StudentId, ClassId, ClassName,ClassCost)
VALUES
(123, 972, 'Psychology',30),
(456, 214, 'Focus',99),
(123, 903, 'Sociology',30),
(456, 851, 'Meditation',99),
(456, 911, 'Reading',20),
(456, 111, 'Deep Work',50),
(456, 117, 'Time Management',25),
(456, 999, 'Goal Setting',50);
Results:

If you are happy to use a hardcoded ceiling on number of classes a student can attend then a way that should perform better than multiple self joins (which will likely re-evaluate the row numbering multiple times) is to use the numbering to pivot on instead.
The same ordering that provides the row numbering (StudentId, ClassId) can also be used to do the grouping by StudentId (provided by primary key (StudentId, ClassId) in the plan below)
The query is still pretty ugly though and best done in the application (if there is an application and you aren't just running adhoc queries in SSMS to view the results there)
With Numbered As
(
SELECT *,
rn = row_number() over (PARTITION BY StudentID ORDER BY ClassID)
FROM Classes
), Pivoted As
(
SELECT StudentId,
ClassId1 = MAX(CASE WHEN rn = 1 THEN ClassId END),
ClassName1 = MAX(CASE WHEN rn = 1 THEN ClassName END),
ClassCost1 = MAX(CASE WHEN rn = 1 THEN ClassCost END),
ClassId2 = MAX(CASE WHEN rn = 2 THEN ClassId END),
ClassName2 = MAX(CASE WHEN rn = 2 THEN ClassName END),
ClassCost2 = MAX(CASE WHEN rn = 2 THEN ClassCost END),
ClassId3 = MAX(CASE WHEN rn = 3 THEN ClassId END),
ClassName3 = MAX(CASE WHEN rn = 3 THEN ClassName END),
ClassCost3 = MAX(CASE WHEN rn = 3 THEN ClassCost END),
ClassId4 = MAX(CASE WHEN rn = 4 THEN ClassId END),
ClassName4 = MAX(CASE WHEN rn = 4 THEN ClassName END),
ClassCost4 = MAX(CASE WHEN rn = 4 THEN ClassCost END),
ClassId5 = MAX(CASE WHEN rn = 5 THEN ClassId END),
ClassName5 = MAX(CASE WHEN rn = 5 THEN ClassName END),
ClassCost5 = MAX(CASE WHEN rn = 5 THEN ClassCost END),
ClassId6 = MAX(CASE WHEN rn = 6 THEN ClassId END),
ClassName6 = MAX(CASE WHEN rn = 6 THEN ClassName END),
ClassCost6 = MAX(CASE WHEN rn = 6 THEN ClassCost END)
FROM Numbered
GROUP BY StudentId
)
SELECT S.FirstName, S.LastName, P.*
FROM Student S
JOIN Pivoted P
ON P.StudentId = S.StudentId

This is only possible in a single query if you know in advance how many potential classes a student might possibly have. If you don't know and can't guess a reasonable maximum, the SQL language simply will NOT be able to produce the desired output in a single statement.
If you can guess the maximum course load, the query looks like this:
WITH NumberedClasses As (
SELECT *, row_number() over (partition by StudentID order by ClassID) rn
FROM Classes
)
SELECT s.*
,c1.ClassId, c1.ClassName, c1.ClassCost
,c2.ClassID, c2.ClassName, c2.ClassCost
-- ...
,cn.ClassID, cn.ClassName, cn.ClassCost
FROM Student s
LEFT JOIN NumberedClasses c1 ON c1.StudentID = s.StudentID AND c1.rn = 1
LEFT JOIN NumberedClasses c2 ON c2.StudentID = s.StudentID And c2.rn = 2
-- ...
LEFT JOIN NumberedClasses cn ON cn.StudentID = s.StudentID And cn.rn = {n}
Note: this tends to be SLOW — and not just a little slow; we're talking several minutes to finish (or longer) if you have a reasonable amount of data. And, yes, you really do have to repeat yourself in two places for as many times as you have possible class enrollments.
It's also worth mentioning here this kind of PIVOT is antithetical to the formal set theory which underpins all relational databases. For this reason, you're usually MUCH better off doing this work in the client code or reporting tool.

Related

flatten data in SQL based on fixed set of column

I am stuck with a specific scenario of flattening the data and need help for it. I need the output as flattened data where the column values are not fixed. Due to this I want to restrict the output to fixed set of columns.
Given Table 'test_table'
ID
Name
Property
1
C1
xxx
2
C2
xyz
2
C3
zz
The scenario is, column Name can have any no. of values corresponding to an ID. I need to flatten the data based in such a way that there is one row per ID field. Since the Name field varies with each ID, I want to flatten it for fix 3 columns like Co1, Co2, Co3. The output should look like
ID
Co1
Co1_Property
Co2
Co2_Property
Co3
Co3_Property
1
C1
xxx
null
null
2
C2
xyz
C3
zz
Could not think of a solution using Pivot or aggregation. Any help would be appreciated.
You can use arrays:
select id,
array_agg(name order by name)[safe_ordinal(1)] as name_1,
array_agg(property order by name)[safe_ordinal(1)] as property_1,
array_agg(name order by name)[safe_ordinal(2)] as name_2,
array_agg(property order by name)[safe_ordinal(2)] as property_2,
array_agg(name order by name)[safe_ordinal(3)] as name_3,
array_agg(property order by name)[safe_ordinal(3)] as property_3
from t
group by id;
All current answers are too verbose and involve heavy repetition of same fragments of code again and again and if you need to account more columns you need to copy paste and add more lines which will make it even more verbose!
My preference is to avoid such type of coding and rather use something more generic as in below example
select * from (
select *, row_number() over(partition by id) col
from `project.dataset.table`)
pivot (max(name) as name, max(property) as property for col in (1, 2, 3))
If applied to sample data in your question - output is
If you want to change number of output columns - you just simply modify for col in (1, 2, 3) part of query.
For example if you would wanted to have 5 columns - you would use for col in (1, 2, 3, 4, 5) - that simple!!!
The standard practice is to use conditional aggregation. That is, to use CASE expressions to pick which row goes to which column, then MAX() to collapse multiple rows into individual rows...
SELECT
id,
MAX(CASE WHEN name = 'C1' THEN name END) AS co1,
MAX(CASE WHEN name = 'C1' THEN property END) AS co1_property,
MAX(CASE WHEN name = 'C2' THEN name END) AS co2,
MAX(CASE WHEN name = 'C2' THEN property END) AS co2_property,
MAX(CASE WHEN name = 'C3' THEN name END) AS co3,
MAX(CASE WHEN name = 'C3' THEN property END) AS co3_property
FROM
yourTable
GROUP BY
id
Background info:
Not having an ELSE in the CASE expression implicitly means ELSE NULL
The intention is therefore for each column to recieve NULL from every input row, except for the row being pivoted into that column
Aggregates, such as MAX() essentially skip NULL values
MAX( {NULL,NULL,'xxx',NULL,NULL} ) therefore equals 'xxx'
A similar approach "bunches" the values to the left (so that NULL values always only appears to the right...)
That approach first uses row_number() to give each row a value corresponding to which column you want to put that row in to..
WITH
sorted AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY name) AS seq_num
FROM
yourTable
)
SELECT
id,
MAX(CASE WHEN seq_num = 1 THEN name END) AS co1,
MAX(CASE WHEN seq_num = 1 THEN property END) AS co1_property,
MAX(CASE WHEN seq_num = 2 THEN name END) AS co2,
MAX(CASE WHEN seq_num = 2 THEN property END) AS co2_property,
MAX(CASE WHEN seq_num = 3 THEN name END) AS co3,
MAX(CASE WHEN seq_num = 3 THEN property END) AS co3_property
FROM
yourTable
GROUP BY
id

How to merge 2 tables into 1 row, with multiple entries from second table in Oracle SQL

newbee question on (Oracle) SQL.
I'd like this table :
ash_id ash_contact_name ash_contact_telefoonnummber
15313 Name1 022457852114
15313 Name2 122457852114
15313 Name3 222457852114
15313 Name4 322457852114
15313 Name5 422457852114
To Look like this in 1 row :
15313 Name1 022457852114 Name2 122457852114 Name3 222457852114 Name4 322457852114 Name5 422457852114
So I get only 1x the id from first table and multiple coloms with the name with My code now looks like this :
select ash.ash_id ,
con.ash_contact_name, con.ash_contact_telefoonnummer
from D00ASH01.ash_admin_stakeholder ash,ash_contacts con
where con.ash_id = ash.ash_id and con.ash_id = 15313
order by ash.ash_id
The eventual code while not include "con.ash_id = 15313" as I will need to have all the entries. The end result will include more fields from the first table, so I can not just use the second table alone. For now, I want to start to build it up simple.
I tried to make it work with a join but did not made it.
All suggestions welcome,
thanks
Check this, it might help you. If it is not then let me know.
SELECT ash.ash_id , con.ash_contact_name, con.ash_contact_telefoonnummer FROM D00ASH01.ash_admin_stakeholder as ash INNER JOIN ash_contacts as con ON con.ash_id = ash.ash_id WHERE con.ash_id = 15313 ORDER BY ash.ash_id;
SELECT * FROM ash INNER JOIN con USING( ash_id ) Where con_id = 15313 ORDER BY ash_id ;
I'did get the solution for this :
select
c.ash_id,
c.ash_naam_kbo_NL ,
c.ash_naam_kbo_FR ,
c.ash_naam_kbo_DE,
max(Case when rn = 1 then c.ASH_CONTACT_NAME else null end) as name1,
max(Case when rn = 1 then c.ASH_CONTACT_GSMNUMMER else null end) as gsm1,
max(Case when rn = 1 then c.ASH_CONTACT_FAXNUMMER else null end) as fax1,
max(Case when rn = 1 then c.ASH_CONTACT_EMAILADRES else null end) as email1,
max(Case when rn = 2 then c.ASH_CONTACT_NAME else null end) as name2,
max(Case when rn = 2 then c.ASH_CONTACT_GSMNUMMER else null end) as gsm2,
max(Case when rn = 2 then c.ASH_CONTACT_FAXNUMMER else null end) as fax2,
max(Case when rn = 2 then c.ASH_CONTACT_EMAILADRES else null end) as email2
from (
select
table_ash.ash_id,
table_ash.ash_naam_kbo_NL ,
table_ash.ash_naam_kbo_FR ,
table_ash.ash_naam_kbo_DE,
table_contacts.ASH_CONTACT_NAME,
table_contacts.ASH_CONTACT_GSMNUMMER,
table_contacts.ASH_CONTACT_FAXNUMMER,
table_contacts.ASH_CONTACT_EMAILADRES,
ROW_NUMBER () over (partition by table_ash.ash_id order by table_ash.ash_id,
table_ash.ash_naam_kbo_NL) rn
from
ASH_ADMIN_STAKEHOLDER table_ash,
ash_contacts table_contacts
where
table_ash.ash_id = table_contacts.ash_id) c
group by
c.ash_id,
c.ash_naam_kbo_NL ,
c.ash_naam_kbo_FR ,
c.ash_naam_kbo_DE;
So, there's a "select" from a "select". The trick is to generate a rownnumber and use it like a index. For as many as rownumbers as needed, include a max function in the query. This code is the base of answer I needed.
SQL code written in Oracle:
WITH CTE AS(
SELECT
            UP.CLASS,
            UP.NS || UP.RN AS NSR,
            UP.VAL
FROM
            (
            SELECT
                        ROW_NUMBER ()
         OVER (
           PARTITION BY S.CLASS
            ORDER BY
                        S.CLASS) RN,
                        S.*
            FROM
                        STAKEHOLDER S
            ORDER BY
                        CLASS,
                        SID) SS
UNPIVOT (VAL FOR NS IN (NAME, SID)) UP
)
SELECT
            *
FROM
            CTE
PIVOT(MAX(VAL) FOR NSR IN ('NAME1' AS NAME1,
            'SID1' AS SID1,
            'NAME2' AS NAME2,
            'SID2' AS SID2,
            'NAME3' AS NAME3,
            'SID3' AS SID3))
This is not difficult if we handle it with our natural way of thinking. After grouping the table by CLASS, we convert NAME and SID columns into rows and create names commanding values to be converted to columns. Format of names is the original column name + number of subgroups, like NAME1, SID1, NAME2, SID2,… for group 1 and NAME1, SID1, … for group2. Then we concatenate groups and transpose row to columns. The problem is SQL does not support dynamic row-to-column/column-to-row transposition. When the number of columns is small and columns are fixed, the language can mange to do the transpositions. As the number of columns increases, the scenario becomes more and more awkward. Enumerating all columns to be converted is complicated and SQL code becomes bloated. If columns are dynamic, SQL needs to turn to complex and roundabout ways to handle them.
Yet, it is really easy to code the transposition task with the open-source esProc SPL:
| |A|
|1|=connect("ORACLE")|
|2|=A1.query#x("SELECT \* FROM STAKEHOLDER ORDER BY CLASS,SID")|
|3|=A2.fname().m(2:)|
|4|=A2.group#o(CLASS)|
|5|=A4.conj(\~.news(A3;CLASS,A3(#)/A4.\~.#:COL,\~:VAL))|

SQL Joined Tables - Multiple rows on joined table per 'on' matched field merged into one row?

I have two tables I am pulling data from. Here is a minimal recreation of what I have:
Select
Jobs.Job_Number,
Jobs.Total_Amount,
Job_Charges.Charge_Code,
Job_Charges.Charge_Amount
From
DB.Jobs
Inner Join
DB.Job_Charges
On
Jobs.Job_Number = Job_Charges.Job_Number;
So, what happens is that I end up getting a row for each different Charge_Code and Charge_Amount per Job_Number. Everything else on the row is the same. Is it possible to have it return something more like:
Job_Number - Total_Amount - Charge_Code[1] - Charge_Amount[1] - Charge_Code[2] - Charge_Amount[2]
ETC?
This way it creates one line per job number with each associated charge and amount on the same line. I have been reading through W3 but haven't been able to tell definitively if this is possible or not. Anything helps, thank you!
To pivot your resultset over a fixed number of columns, you can use row_number() and conditional aggregation:
select
job_number,
total_amount,
max(case when rn = 1 then charge_code end) charge_code1,
max(case when rn = 1 then charge_amount end) charge_amount1,
max(case when rn = 2 then charge_code end) charge_code2,
max(case when rn = 2 then charge_amount end) charge_amount2,
max(case when rn = 3 then charge_code end) charge_code3,
max(case when rn = 3 then charge_amount end) charge_amount3
from (
select
j.job_number,
j.total_amount,
c.charge_code,
c.charge_amount,
row_number() over(partition by job_number, total_amount order by c.charge_code) rn
from DB.Jobs j
inner join DB.Job_Charges c on j.job_number = c.job_number
) t
group by job_number, total_amount
The above query handes up to 3 charge codes and amounts par job number (ordered by job codes). You can expand the select clause with more max(case ...) expressions to handle more of them.

T-SQL query Fetching data in a single line where multiple rows have similar filters

We have a table in which I'm having 2 lines with the similar document no. and line no. in which I want the values in a single line in which I can see some values of other lines.
Original dataset
As attached in screen I have doc no. and line no is same in 2 lines and if I need values from 2nd line that should look like the second screenshot.
Result image should be:
You can define the row_numbers using row_number() function based on GST Component or Entry_no as because there are three type of GST's CGST-SGST-IGST moreover the other is UGST which is related to any union territory.
select max(case when entryno = 1 then 1 end) as Entry_no, doc,
max(case when entryno = 1 then GSTComp end) as GSTComp1,
max(case when entryno = 1 then [GST%] end) as [GST%1],
max(case when entryno = 1 then GSTAmt end) as GSTAmt1,
. . .
max(case when entryno = 3 then GSTAmt end) as GSTComp3
from (select *, row_number() over (partition by doc order by entryno) as seq
from table
) t
group by doc;
SELECT
t1.Entry_no
, t1.doc
, t1.GSTComp
, t1.GST%
, t1.GSTAmt
, t2.GSTComp as t2.GSTComp2
, t2.GSTAmt as t2.GSTAmt2
FROM table t1 INNER JOIN table t2
ON t1.doc = t2.doc and t1.lin = t2.lin
Try this query

Efficiently pull different columns using a common correlated subquery

I need to pull multiple columns from a subquery which also requires a WHERE filter referencing columns of the FROM table. I have a couple of questions about this:
Is there another solution to this problem besides mine below?
Is another solution even necessary or is this solution efficient enough?
Example:
In the following example I'm writing a view to present test scores, particularly to discover failures that may need to be addressed or retaken.
I cannot simply use JOIN because I need to filter my actual subquery first (notice I'm getting TOP 1 for the "examinee", sorted either by score or date descending)
My goal is to avoid writing (and executing) essentially the same subquery repeatedly.
SELECT ExamineeID, LastName, FirstName, Email,
(SELECT COUNT(examineeTestID)
FROM exam.ExamineeTest tests
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2) Attempts,
(SELECT TOP 1 ExamineeTestID
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestExamineeTestID,
(SELECT TOP 1 Score
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestScore,
(SELECT TOP 1 DateDue
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestDateDue,
(SELECT TOP 1 TimeCommitted
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestTimeCommitted,
(SELECT TOP 1 ExamineeTestID
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentExamineeTestID,
(SELECT TOP 1 Score
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentScore,
(SELECT TOP 1 DateDue
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentDateDue,
(SELECT TOP 1 TimeCommitted
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentTimeCommitted
FROM exam.Examinee E
To answer your second question first, yes, a better way is in order, because the query you're using is hard to understand, hard to maintain, and even if the performance is acceptable now, it's a shame to query the same table multiple times when you don't need to plus the performance may not always be acceptable if your application ever grows to an appreciable size.
To answer your first question, I have a few methods for you. These assume SQL 2005 or up unless where noted.
Note that you don't need BestExamineeID and CurrentExamineeID because they will always be the same as ExamineeID unless no tests were taken and they're NULL, which you can tell from the other columns being NULL.
You can think of OUTER/CROSS APPLY as an operator that lets you move correlated subqueries from the WHERE clause into the JOIN clause. They can have an outer reference to a previously-named table, and can return more than one column. This enables you to do the job only once per logical query rather than once for each column.
SELECT
ExamineeID,
LastName,
FirstName,
Email,
B.Attempts,
BestScore = B.Score,
BestDateDue = B.DateDue,
BestTimeCommitted = B.TimeCommitted,
CurrentScore = C.Score,
CurrentDateDue = C.DateDue,
CurrentTimeCommitted = C.TimeCommitted
FROM
exam.Examinee E
OUTER APPLY ( -- change to CROSS APPLY if you only want examinees who've tested
SELECT TOP 1
Score, DateDue, TimeCommitted,
Attempts = Count(*) OVER ()
FROM exam.ExamineeTest T
WHERE
E.ExamineeID = T.ExamineeID
AND T.TestRevisionID = 3
AND T.TestID = 2
ORDER BY Score DESC
) B
OUTER APPLY ( -- change to CROSS APPLY if you only want examinees who've tested
SELECT TOP 1
Score, DateDue, TimeCommitted
FROM exam.ExamineeTest T
WHERE
E.ExamineeID = T.ExamineeID
AND T.TestRevisionID = 3
AND T.TestID = 2
ORDER BY DateDue DESC
) C
You should experiment to see if my Count(*) OVER () is better than having an additional OUTER APPLY that just gets the count. If you're not restricting the Examinee from the exam.Examinee table, it may be better to just do a normal aggregate in a derived table.
Here's another method that (sort of) goes and gets all the data in one swoop. It conceivably could perform better than other queries, except my experience is that windowing functions can get very and surprisingly expensive in some situations, so testing is in order.
WITH Data AS (
SELECT
*,
Count(*) OVER (PARTITION BY ExamineeID) Cnt,
Row_Number() OVER (PARTITION BY ExamineeID ORDER BY Score DESC) ScoreOrder,
Row_Number() OVER (PARTITION BY ExamineeID ORDER BY DateDue DESC) DueOrder
FROM
exam.ExamineeTest
), Vals AS (
SELECT
ExamineeID,
Max(Cnt) Attempts,
Max(CASE WHEN ScoreOrder = 1 THEN Score ELSE NULL END) BestScore,
Max(CASE WHEN ScoreOrder = 1 THEN DateDue ELSE NULL END) BestDateDue,
Max(CASE WHEN ScoreOrder = 1 THEN TimeCommitted ELSE NULL END) BestTimeCommitted,
Max(CASE WHEN DueOrder = 1 THEN Score ELSE NULL END) BestScore,
Max(CASE WHEN DueOrder = 1 THEN DateDue ELSE NULL END) BestDateDue,
Max(CASE WHEN DueOrder = 1 THEN TimeCommitted ELSE NULL END) BestTimeCommitted
FROM Data
GROUP BY
ExamineeID
)
SELECT
E.ExamineeID,
E.LastName,
E.FirstName,
E.Email,
V.Attempts,
V.BestScore, V.BestDateDue, V.BestTimeCommitted,
V.CurrentScore, V.CurrentDateDue, V.CurrentTimeCommitted
FROM
exam.Examinee E
LEFT JOIN Vals V ON E.ExamineeID = V.ExamineeID
-- change join to INNER if you only want examinees who've tested
Finally, here's a SQL 2000 method:
SELECT
E.ExamineeID,
E.LastName,
E.FirstName,
E.Email,
Y.Attempts,
Y.BestScore, Y.BestDateDue, Y.BestTimeCommitted,
Y.CurrentScore, Y.CurrentDateDue, Y.CurrentTimeCommitted
FROM
exam.Examinee E
LEFT JOIN ( -- change to inner if you only want examinees who've tested
SELECT
X.ExamineeID,
X.Cnt Attempts,
Max(CASE Y.Which WHEN 1 THEN T.Score ELSE NULL END) BestScore,
Max(CASE Y.Which WHEN 1 THEN T.DateDue ELSE NULL END) BestDateDue,
Max(CASE Y.Which WHEN 1 THEN T.TimeCommitted ELSE NULL END) BestTimeCommitted,
Max(CASE Y.Which WHEN 2 THEN T.Score ELSE NULL END) CurrentScore,
Max(CASE Y.Which WHEN 2 THEN T.DateDue ELSE NULL END) CurrentDateDue,
Max(CASE Y.Which WHEN 2 THEN T.TimeCommitted ELSE NULL END) CurrentTimeCommitted
FROM
(
SELECT ExamineeID, Max(Score) MaxScore, Max(DueDate) MaxDueDate, Count(*) Cnt
FROM exam.ExamineeTest
WHERE
TestRevisionID = 3
AND TestID = 2
GROUP BY ExamineeID
) X
CROSS JOIN (SELECT 1 UNION ALL SELECT 2) Y (Which)
INNER JOIN exam.ExamineeTest T
ON X.ExamineeID = T.ExamineeID
AND (
(Y.Which = 1 AND X.MaxScore = T.MaxScore)
OR (Y.Which = 2 AND X.MaxDueDate = T.MaxDueDate)
)
WHERE
T.TestRevisionID = 3
AND T.TestID = 2
GROUP BY
X.ExamineeID,
X.Cnt
) Y ON E.ExamineeID = Y.ExamineeID
This query will return unexpected extra rows if the combination of (ExamineeID, Score) or (ExamineeID, DueDate) can return multiple rows. That's probably not unlikely with Score. If neither is unique, then you need to use (or add) some additional column that can grant uniqueness so it can used to select one row. If only Score can be duplicated then an additional pre-query that gets the max Score first, then dovetailing in with the max DueDate would combine to pull the most recent score that was a tie for the highest at the same time as getting the most recent data. Let me know if you need more SQL 2000 help.
Note: The biggest thing that is going to control whether CROSS APPLY or a ROW_NUMBER() solution is better is whether you have an index on the columns that are being looked up and whether the data is dense or sparse.
Index + you're pulling only a few examinees with lots of tests each = CROSS APPLY wins.
Index + you're pulling a huge number of examines with only a few tests each = ROW_NUMBER() wins.
No index = string concatenation/value packing method wins (not shown here).
The group by solution that I gave for SQL 2000 will probably perform the worst, but not guaranteed. Like I said, testing is in order.
If any of my queries do give performance problems let me know and I'll see what I can do to help. I'm sure I probably have typos as I didn't work up any DDL to recreate your tables, but I did my best without trying it.
If performance really does become crucial, I would create ExamineeTestBest and ExamineeTestCurrent tables that get pushed to by a trigger on the ExamineeTest table that would always keep them updated. However, this is denormalization and probably not necessary or a good idea unless you've scaled so awfully big that retrieving results becomes unacceptably long.
It's not same subquery. It's three different subqueries.
count() on all
TOP (1) ORDER BY Score DESC
TOP (1) ORDER BY DateDue DESC
You can't avoid executing it less than 3 times.
The question is, how to make it execute no more than 3 times.
One option would be to write 3 inline table functions and use them with outer apply. Make sure they are actually inline, otherwise your performance will drop a hundred times. One of these three functions might be:
create function dbo.topexaminee_byscore(#ExamineeID int)
returns table
as
return (
SELECT top (1)
ExamineeTestID as bestExamineeTestID,
Score as bestScore,
DateDue as bestDateDue,
TimeCommitted as bestTimeCommitted
FROM exam.ExamineeTest
WHERE (ExamineeID = #ExamineeID) AND (TestRevisionID = 3) AND (TestID = 2)
ORDER BY Score DESC
)
Another option would be to do essentially the same, but with subqueries. Because you fetch data for all students anyway, there shouldn't be too much of a difference performance-wise. Create three subqueries, for example:
select bestExamineeTestID, bestScore, bestDateDue, bestTimeCommitted
from (
SELECT
ExamineeTestID as bestExamineeTestID,
Score as bestScore,
DateDue as bestDateDue,
TimeCommitted as bestTimeCommitted,
row_number() over (partition by ExamineeID order by Score DESC) as takeme
FROM exam.ExamineeTest
WHERE (TestRevisionID = 3) AND (TestID = 2)
) as foo
where foo.takeme = 1
Same for ORDER BY DateDue DESC and for all records, with respective columns being selected.
Join these three on the examineeid.
What is going to be better/more performant/more readable is up to you. Do some testing.
It looks like you can replace the three columns that are based on the alias "bestTest" with a view. All three of those subqueries have the same WHERE clause and the same ORDER BY clause.
Ditto for the subquery aliased "bestNewTest". Ditto ditto for the subquery aliased "currentTeest".
If I counted right, that would replace 8 subqueries with 3 views. You can join on the views. I think the joins would be faster, but if I were you, I'd check the execution plan of both versions.
You could use a CTE and OUTER APPLY.
;WITH testScores AS
(
SELECT ExamineeID, ExamineeTestID, Score, DateDue, TimeCommitted
FROM exam.ExamineeTest
WHERE TestRevisionID = 3 AND TestID = 2
)
SELECT ExamineeID, LastName, FirstName, Email, total.Attempts,
bestTest.*, currentTest.*
FROM exam.Examinee
LEFT OUTER JOIN
(
SELECT ExamineeID, COUNT(ExamineeTestID) AS Attempts
FROM testScores
GROUP BY ExamineeID
) AS total ON exam.Examinee.ExamineeID = total.ExamineeID
OUTER APPLY
(
SELECT TOP 1 ExamineeTestID, Score, DateDue, TimeCommitted
FROM testScores
WHERE exam.Examinee.ExamineeID = t.ExamineeID
ORDER BY Score DESC
) AS bestTest (bestExamineeTestID, bestScore, bestDateDue, bestTimeCommitted)
OUTER APPLY
(
SELECT TOP 1 ExamineeTestID, Score, DateDue, TimeCommitted
FROM testScores
WHERE exam.Examinee.ExamineeID = t.ExamineeID
ORDER BY DateDue DESC
) AS currentTest (currentExamineeTestID, currentScore, currentDateDue,
currentTimeCommitted)