SQL Query - Select one row or the other based on the condition - sql

I have the following query producing the given result.
SELECT
p.BookingID, p.PaxID, p.LeadPax, p.FirstName AS LeadPaxName
FROM
[dbo].[BookingV2_Pax] p
WHERE
p.[LeadPax] = 1
UNION
SELECT
PN.BookingID, PN.PaxID, PN.LeadPax, PN.LeadPaxName
FROM
(SELECT
p.BookingID, p.PaxID, p.LeadPax, p.FirstName AS LeadPaxName,
ROW_NUMBER() OVER (PARTITION BY p.BookingID ORDER BY p.PaxID) AS rn
FROM
[dbo].[BookingV2_Pax] p
WHERE
p.[LeadPax] = 0) PN
WHERE
PN.rn = 1
Results:
What I'm doing here is selecting the row which has LeadPax = true and selecting the first row from the other rows which has LeadPax = false
What I want is selecting the either the row with LeadPax = 1 OR LeadPax = 0.
I understand that if this were a column related problem, I can use CASE or COALESCE but how do I do this with rows efficiently?
Also, any pointers on how to optimize the original query would be highly appreciated

Here is one way
Generate the row number only when LeadPax = False. Then use AND/OR logic to get the relevant data
SELECT PN.BookingID,
PN.PaxID,
PN.LeadPax,
PN.LeadPaxName
FROM (SELECT p.BookingID,
p.PaxID,
p.LeadPax,
p.FirstName AS LeadPaxName,
case when PN.[LeadPax] = 0 then Row_number() OVER (partition BY p.BookingID ORDER BY p.PaxID)
else 1 end AS rn
FROM [dbo].[BookingV2_Pax] p) PN
WHERE ( PN.rn = 1 AND PN.[LeadPax] = 0 )
OR PN.[LeadPax] = 1

Related

I just started learning SQL and I couldn't do the query, can you help me?

There is a field in the sql query that I can't do. First of all, a new column must be added to the table below. The value of this column needs to be percent complete, so it's a percentage value. So for example, there are 7 values from Cupboard=1 shelves. Where IsCounted is here, 3 of them are counted. In other words, those with Cupboard = 1 should write the percentage value of 3/7 as the value in the new column to be created. If the IsCounted of the others is 0, it will write zero percent. How can I do this?
My Sql Code:
SELECT a.RegionName,
a.Cupboard,
a.Shelf,
(CASE WHEN ToplamSayım > 0 THEN 1 ELSE 0 END) AS IsCounted
FROM (SELECT p.RegionName,
r.Shelf,
r.Cupboard,
(SELECT COUNT(*)
FROM FAZIKI.dbo.PM_ProductCountingNew
WHERE RegionCupboardShelfTypeId = r.Id) AS ToplamSayım
FROM FAZIKI.dbo.DF_PMRegionType p
JOIN FAZIKI.dbo.DF_PMRegionCupboardShelfType r ON p.Id = r.RegionTypeId
WHERE p.WarehouseId = 45) a
ORDER BY a.RegionName;
The result is as in the picture below:
It looks like a windowed AVG should do the trick, although it's not entirely clear what the partitioning column should be.
The SELECT COUNT can be simplified to an EXISTS
SELECT a.RegionName,
a.Cupboard,
a.Shelf,
a.IsCounted,
AVG(a.IsCounted * 1.0) OVER (PARTITION BY a.RegionName, a.Cupboard) Percentage
FROM (
SELECT p.RegionName,
r.Shelf,
r.Cupboard,
CASE WHEN EXISTS (SELECT 1
FROM FAZIKI.dbo.PM_ProductCountingNew pcn
WHERE pcn.RegionCupboardShelfTypeId = r.Id
) THEN 1 ELSE 0 END AS IsCounted
FROM FAZIKI.dbo.DF_PMRegionType p
JOIN FAZIKI.dbo.DF_PMRegionCupboardShelfType r ON p.Id = r.RegionTypeId
WHERE p.WarehouseId = 45
) a
ORDER BY a.RegionName;

ROW_NUMBER() OVER PARTITION skips row number 1 occasionaly

I have an SQL Query like this:
SELECT [a].*,
[rp].[TestId],
[r].[Deleted]
FROM (SELECT *,
ROW_NUMBER() OVER (PARTITION BY [RollId] ORDER BY [TimeStamp] DESC) AS row
FROM [RollAction]) a
INNER JOIN [RollPermission] rp ON ([rp].[RoId] = [a].[RollId]
AND [rp].[RoType] = [a].[RoType]
AND [rp].[UserId] = [a].[UserId]
AND [rp].[Deleted] = 0)
INNER JOIN [Roll] r ON ([r].[Id] = [a].[RollId]
AND [r].[RoType] = [a].[RollType]
AND [r].[Deleted] = 0)
WHERE row = 1
AND [a].[Action] = 'Fetched'
AND [a].[RollType] = 'Test'
AND [a].[Deleted] = 0
AND [a].[UserId] = 5
ORDER BY [a].[TimeStamp] DESC
OFFSET 0 ROWS FETCH NEXT 3 ROWS ONLY;
What i want to accomplish: Fetch the first 3 rows from RollAction inner join filtered by RollPermission and Roll like the inner joins that I've done.
It works, but it skips one partition, so it does fetch the first,third and fourth row. Because one partition doesn't have row 1 and 2 for some reason, so it gets filtered away in the WHERE Clause.
Why does that partition skip rows? I've tried to take away all Where clauses but it still skips, i tried to take away the innerjoins too.
My question is how do i force it to not skip row number 1 and 2, or exchange row = 1 to select the lowest row number that exists in each partition
Correct answer came from Xanatos.
Move AND [a].[Action] = 'Fetched' AND [a].[RollType] = 'Test' AND [a].[Deleted] = 0 AND [a].[UserId] = 5 inside the subquery
RANK() instead of ROW_NUMBER might be a possibilty

Conditional for a field generated by a SELECT statement

SELECT
Req_PK,
Req_PostDate,
Req_code,
Req_CreateDate,
Req_FillDate,
Req_Canceldate,
Req_Hold,
(
Select Convert(varchar(50),Count(CanReq_PK))
From CanReq
Where CanReq_ReqFK = Req_PK) AS Applications,
Req_PublishstatusFK
FROM Req
WHERE Req_Filled <> 1
AND Req_Cancelled <> 1
AND Req_Template <> 1
AND Req_PublishstatusFK = 1
AND Req_publishstatusfk = 1
How do I modify this query so the Applications field/alias is not the everything returned by the SELECT statement but everything except the ones with value '0'?
Help is very much appreciated!
Use an explicit JOIN instead:
SELECT r.Req_PK, r.Req_PostDate, r.Req_code, r.Req_CreateDate, r.Req_FillDate,
r.Req_Canceldate, r.Req_Hold,
cr.num_Applications,
r.Req_PublishstatusFK
FROM Req r JOIN
(SELECT cr.CanReq_ReqFK, Count(*) as num_applications
FROM CanReq cr
GROUP BY cr.CanReq_ReqFK
) cr
ON cr.CanReq_ReqFK = r.Req_PK
WHERE r.Req_Filled <> 1 AND r.Req_Cancelled <> 1 AND r.Req_Template <> 1 AND
r.Req_PublishstatusFK = 1 AND r.Req_publishstatusfk = 1;
This will filter out any rows that don't have a record in CanReq -- the ones that would have a count of 0 in your version of the query.
I don't know why you would want the count to be a string. Of course, you can include the conversion if you need it, but it doesn't seem necessary to me.

SQL Group By and assign highest a value

I am struggling with the SQL (mssql) to manipulate my data as i need it. I have a table like this;
SOMEID, SOMEFIELD, DATE
5 True 01-01-2010
5 True 01-01-2011
5 False 05-05-2012
7 True 05-05-2011
7 False 06-07-2015
What I am trying to achieve is to add another column which assigns the value 1 if they are the most recent for that ID, and 0 if not. So in the above data example the new column values from top to bottom would be 0, 0, 1, 0, 1.
I know I need to group by date but am having trouble assigning the values.
Thanks for any pointers!
You can use row_number() in SQL Server like this:
select *
, case when (row_number() over (partition by SOMEID order by [Date] desc)) = 1 then 1 else 0 end seq
from
yourTable
order by
SOMEID, [Date];
SQL Fiddle Demo
You can use a self join to get the highest row per group then in update query use a case statement to assign value to new column
update a
set a.[somecol] = case when b.[SOMEID] is null then 1 else 0 end
from demo a
left join demo b on a.[SOMEID] = b.[SOMEID]
and a.[DATE] < b.[DATE]
DEMO
try this
SELECT SOMEID, SOMEFIELD, DATE
, CASE WHEN (SELECT MAX(SubTab.Date)
FROM myTable SubTab
WHERE SubTab.SOMEID = myTable.SOMEID
) = myTable.DATE
THEN 1 ELSE 0 END
FROM myTable

Efficiently pull different columns using a common correlated subquery

I need to pull multiple columns from a subquery which also requires a WHERE filter referencing columns of the FROM table. I have a couple of questions about this:
Is there another solution to this problem besides mine below?
Is another solution even necessary or is this solution efficient enough?
Example:
In the following example I'm writing a view to present test scores, particularly to discover failures that may need to be addressed or retaken.
I cannot simply use JOIN because I need to filter my actual subquery first (notice I'm getting TOP 1 for the "examinee", sorted either by score or date descending)
My goal is to avoid writing (and executing) essentially the same subquery repeatedly.
SELECT ExamineeID, LastName, FirstName, Email,
(SELECT COUNT(examineeTestID)
FROM exam.ExamineeTest tests
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2) Attempts,
(SELECT TOP 1 ExamineeTestID
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestExamineeTestID,
(SELECT TOP 1 Score
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestScore,
(SELECT TOP 1 DateDue
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestDateDue,
(SELECT TOP 1 TimeCommitted
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY Score DESC) bestTimeCommitted,
(SELECT TOP 1 ExamineeTestID
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentExamineeTestID,
(SELECT TOP 1 Score
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentScore,
(SELECT TOP 1 DateDue
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentDateDue,
(SELECT TOP 1 TimeCommitted
FROM exam.ExamineeTest T
WHERE E.ExamineeID = ExamineeID AND TestRevisionID = 3 AND TestID = 2
ORDER BY DateDue DESC) currentTimeCommitted
FROM exam.Examinee E
To answer your second question first, yes, a better way is in order, because the query you're using is hard to understand, hard to maintain, and even if the performance is acceptable now, it's a shame to query the same table multiple times when you don't need to plus the performance may not always be acceptable if your application ever grows to an appreciable size.
To answer your first question, I have a few methods for you. These assume SQL 2005 or up unless where noted.
Note that you don't need BestExamineeID and CurrentExamineeID because they will always be the same as ExamineeID unless no tests were taken and they're NULL, which you can tell from the other columns being NULL.
You can think of OUTER/CROSS APPLY as an operator that lets you move correlated subqueries from the WHERE clause into the JOIN clause. They can have an outer reference to a previously-named table, and can return more than one column. This enables you to do the job only once per logical query rather than once for each column.
SELECT
ExamineeID,
LastName,
FirstName,
Email,
B.Attempts,
BestScore = B.Score,
BestDateDue = B.DateDue,
BestTimeCommitted = B.TimeCommitted,
CurrentScore = C.Score,
CurrentDateDue = C.DateDue,
CurrentTimeCommitted = C.TimeCommitted
FROM
exam.Examinee E
OUTER APPLY ( -- change to CROSS APPLY if you only want examinees who've tested
SELECT TOP 1
Score, DateDue, TimeCommitted,
Attempts = Count(*) OVER ()
FROM exam.ExamineeTest T
WHERE
E.ExamineeID = T.ExamineeID
AND T.TestRevisionID = 3
AND T.TestID = 2
ORDER BY Score DESC
) B
OUTER APPLY ( -- change to CROSS APPLY if you only want examinees who've tested
SELECT TOP 1
Score, DateDue, TimeCommitted
FROM exam.ExamineeTest T
WHERE
E.ExamineeID = T.ExamineeID
AND T.TestRevisionID = 3
AND T.TestID = 2
ORDER BY DateDue DESC
) C
You should experiment to see if my Count(*) OVER () is better than having an additional OUTER APPLY that just gets the count. If you're not restricting the Examinee from the exam.Examinee table, it may be better to just do a normal aggregate in a derived table.
Here's another method that (sort of) goes and gets all the data in one swoop. It conceivably could perform better than other queries, except my experience is that windowing functions can get very and surprisingly expensive in some situations, so testing is in order.
WITH Data AS (
SELECT
*,
Count(*) OVER (PARTITION BY ExamineeID) Cnt,
Row_Number() OVER (PARTITION BY ExamineeID ORDER BY Score DESC) ScoreOrder,
Row_Number() OVER (PARTITION BY ExamineeID ORDER BY DateDue DESC) DueOrder
FROM
exam.ExamineeTest
), Vals AS (
SELECT
ExamineeID,
Max(Cnt) Attempts,
Max(CASE WHEN ScoreOrder = 1 THEN Score ELSE NULL END) BestScore,
Max(CASE WHEN ScoreOrder = 1 THEN DateDue ELSE NULL END) BestDateDue,
Max(CASE WHEN ScoreOrder = 1 THEN TimeCommitted ELSE NULL END) BestTimeCommitted,
Max(CASE WHEN DueOrder = 1 THEN Score ELSE NULL END) BestScore,
Max(CASE WHEN DueOrder = 1 THEN DateDue ELSE NULL END) BestDateDue,
Max(CASE WHEN DueOrder = 1 THEN TimeCommitted ELSE NULL END) BestTimeCommitted
FROM Data
GROUP BY
ExamineeID
)
SELECT
E.ExamineeID,
E.LastName,
E.FirstName,
E.Email,
V.Attempts,
V.BestScore, V.BestDateDue, V.BestTimeCommitted,
V.CurrentScore, V.CurrentDateDue, V.CurrentTimeCommitted
FROM
exam.Examinee E
LEFT JOIN Vals V ON E.ExamineeID = V.ExamineeID
-- change join to INNER if you only want examinees who've tested
Finally, here's a SQL 2000 method:
SELECT
E.ExamineeID,
E.LastName,
E.FirstName,
E.Email,
Y.Attempts,
Y.BestScore, Y.BestDateDue, Y.BestTimeCommitted,
Y.CurrentScore, Y.CurrentDateDue, Y.CurrentTimeCommitted
FROM
exam.Examinee E
LEFT JOIN ( -- change to inner if you only want examinees who've tested
SELECT
X.ExamineeID,
X.Cnt Attempts,
Max(CASE Y.Which WHEN 1 THEN T.Score ELSE NULL END) BestScore,
Max(CASE Y.Which WHEN 1 THEN T.DateDue ELSE NULL END) BestDateDue,
Max(CASE Y.Which WHEN 1 THEN T.TimeCommitted ELSE NULL END) BestTimeCommitted,
Max(CASE Y.Which WHEN 2 THEN T.Score ELSE NULL END) CurrentScore,
Max(CASE Y.Which WHEN 2 THEN T.DateDue ELSE NULL END) CurrentDateDue,
Max(CASE Y.Which WHEN 2 THEN T.TimeCommitted ELSE NULL END) CurrentTimeCommitted
FROM
(
SELECT ExamineeID, Max(Score) MaxScore, Max(DueDate) MaxDueDate, Count(*) Cnt
FROM exam.ExamineeTest
WHERE
TestRevisionID = 3
AND TestID = 2
GROUP BY ExamineeID
) X
CROSS JOIN (SELECT 1 UNION ALL SELECT 2) Y (Which)
INNER JOIN exam.ExamineeTest T
ON X.ExamineeID = T.ExamineeID
AND (
(Y.Which = 1 AND X.MaxScore = T.MaxScore)
OR (Y.Which = 2 AND X.MaxDueDate = T.MaxDueDate)
)
WHERE
T.TestRevisionID = 3
AND T.TestID = 2
GROUP BY
X.ExamineeID,
X.Cnt
) Y ON E.ExamineeID = Y.ExamineeID
This query will return unexpected extra rows if the combination of (ExamineeID, Score) or (ExamineeID, DueDate) can return multiple rows. That's probably not unlikely with Score. If neither is unique, then you need to use (or add) some additional column that can grant uniqueness so it can used to select one row. If only Score can be duplicated then an additional pre-query that gets the max Score first, then dovetailing in with the max DueDate would combine to pull the most recent score that was a tie for the highest at the same time as getting the most recent data. Let me know if you need more SQL 2000 help.
Note: The biggest thing that is going to control whether CROSS APPLY or a ROW_NUMBER() solution is better is whether you have an index on the columns that are being looked up and whether the data is dense or sparse.
Index + you're pulling only a few examinees with lots of tests each = CROSS APPLY wins.
Index + you're pulling a huge number of examines with only a few tests each = ROW_NUMBER() wins.
No index = string concatenation/value packing method wins (not shown here).
The group by solution that I gave for SQL 2000 will probably perform the worst, but not guaranteed. Like I said, testing is in order.
If any of my queries do give performance problems let me know and I'll see what I can do to help. I'm sure I probably have typos as I didn't work up any DDL to recreate your tables, but I did my best without trying it.
If performance really does become crucial, I would create ExamineeTestBest and ExamineeTestCurrent tables that get pushed to by a trigger on the ExamineeTest table that would always keep them updated. However, this is denormalization and probably not necessary or a good idea unless you've scaled so awfully big that retrieving results becomes unacceptably long.
It's not same subquery. It's three different subqueries.
count() on all
TOP (1) ORDER BY Score DESC
TOP (1) ORDER BY DateDue DESC
You can't avoid executing it less than 3 times.
The question is, how to make it execute no more than 3 times.
One option would be to write 3 inline table functions and use them with outer apply. Make sure they are actually inline, otherwise your performance will drop a hundred times. One of these three functions might be:
create function dbo.topexaminee_byscore(#ExamineeID int)
returns table
as
return (
SELECT top (1)
ExamineeTestID as bestExamineeTestID,
Score as bestScore,
DateDue as bestDateDue,
TimeCommitted as bestTimeCommitted
FROM exam.ExamineeTest
WHERE (ExamineeID = #ExamineeID) AND (TestRevisionID = 3) AND (TestID = 2)
ORDER BY Score DESC
)
Another option would be to do essentially the same, but with subqueries. Because you fetch data for all students anyway, there shouldn't be too much of a difference performance-wise. Create three subqueries, for example:
select bestExamineeTestID, bestScore, bestDateDue, bestTimeCommitted
from (
SELECT
ExamineeTestID as bestExamineeTestID,
Score as bestScore,
DateDue as bestDateDue,
TimeCommitted as bestTimeCommitted,
row_number() over (partition by ExamineeID order by Score DESC) as takeme
FROM exam.ExamineeTest
WHERE (TestRevisionID = 3) AND (TestID = 2)
) as foo
where foo.takeme = 1
Same for ORDER BY DateDue DESC and for all records, with respective columns being selected.
Join these three on the examineeid.
What is going to be better/more performant/more readable is up to you. Do some testing.
It looks like you can replace the three columns that are based on the alias "bestTest" with a view. All three of those subqueries have the same WHERE clause and the same ORDER BY clause.
Ditto for the subquery aliased "bestNewTest". Ditto ditto for the subquery aliased "currentTeest".
If I counted right, that would replace 8 subqueries with 3 views. You can join on the views. I think the joins would be faster, but if I were you, I'd check the execution plan of both versions.
You could use a CTE and OUTER APPLY.
;WITH testScores AS
(
SELECT ExamineeID, ExamineeTestID, Score, DateDue, TimeCommitted
FROM exam.ExamineeTest
WHERE TestRevisionID = 3 AND TestID = 2
)
SELECT ExamineeID, LastName, FirstName, Email, total.Attempts,
bestTest.*, currentTest.*
FROM exam.Examinee
LEFT OUTER JOIN
(
SELECT ExamineeID, COUNT(ExamineeTestID) AS Attempts
FROM testScores
GROUP BY ExamineeID
) AS total ON exam.Examinee.ExamineeID = total.ExamineeID
OUTER APPLY
(
SELECT TOP 1 ExamineeTestID, Score, DateDue, TimeCommitted
FROM testScores
WHERE exam.Examinee.ExamineeID = t.ExamineeID
ORDER BY Score DESC
) AS bestTest (bestExamineeTestID, bestScore, bestDateDue, bestTimeCommitted)
OUTER APPLY
(
SELECT TOP 1 ExamineeTestID, Score, DateDue, TimeCommitted
FROM testScores
WHERE exam.Examinee.ExamineeID = t.ExamineeID
ORDER BY DateDue DESC
) AS currentTest (currentExamineeTestID, currentScore, currentDateDue,
currentTimeCommitted)