Cross apply with outer reference forces scan - sql

How do I coerce SQL into seeking against my index in this scenario? I have a cross apply which, if fed static values seeks correctly. If fed input from the outer rows, it fails to generate a plan. What's the difference... should it be able to take the rows from the topmost operator and feed them into the cross apply?
select *
from AccessorGrantPermissableAssociations a
cross apply
(
select z.AccessorId, z.PermissableId, max(z.CreatedDate) CreatedDate
-- notice forceseek (cannot generate query plan when using reference to alias 'a'
-- works fine when provided static values
from AccessorGrant z (forceseek)
where
-- works
z.AccessorId = 1 and z.PermissableId = 1
-- doesn't work
--z.AccessorId = a.AccessorId and z.PermissableId = a.PermissableId
and z.CreatedDate <= cast(switchoffset(#asOfMoment, '-00:00') as datetime2)
group by z.AccessorId, z.PermissableId
) b
I can prove that the index works becuase I can execute the following with a fast seek:
select z.AccessorId, z.PermissableId, max(z.CreatedDate) CreatedDate
from AccessorGrant z (forceseek)
where z.AccessorId = 1 and z.PermissableId = 1
and z.CreatedDate <= cast(switchoffset(#asOfMoment, '-00:00') as datetime2)
group by z.AccessorId, z.PermissableId
For your info, there is an index on AccessorGrant:
(AccessorId, PermissableId, CreatedDate)
Reiterate Question:
Why doesn't the same query work in a cross apply that does if its provided static values? How can I get the most recent date for every pair of AccessibleId and PermissableId with an efficient plan?
update plans (pasteplan didn't work for me)
Here is a plan using z.AccessorId = 1 and z.PermissableId = 1:
Here is a plan using z.AccessorId = a.AccessorId and z.PermissableId = a.PermissableId:

This looks like a slight variation of a classic top-n-per-group problem.
It can be done with CROSS APPLY or with ROW_NUMBER. The best method depends on your data distribution.
If we keep the CROSS APPLY approach, I would rewrite your query like this:
select *
from
AccessorGrantPermissableAssociations AS a
cross apply
(
select TOP(1)
z.AccessorId, z.PermissableId, z.CreatedDate
from
AccessorGrant AS z
where
z.AccessorId = a.AccessorId
and z.PermissableId = a.PermissableId
and z.CreatedDate <= cast(switchoffset(#asOfMoment, '-00:00') as datetime2)
ORDER BY z.CreatedDate DESC
) AS b
;
It produces the same result, but with explicit instruction to the server to get only one row from AccessorGrant for each row from AccessorGrantPermissableAssociations. It looks like optimizer is not smart enough to convert MAX into TOP(1) when it is buried behind sub-query in this case. It can do this transformation in the simple query, but can't in this case.
If it still doesn't do seek, change the index to match the query exactly: (AccessorId, PermissableId, CreatedDate DESC).
Most likely if you write the query in this form you would not need a FORCESEEK hint.

Go back to z.AccessorId = a.AccessorId and z.PermissableId = a.PermissableId in your b subquery, but add WHERE a.AccessorId = 1 and a.PermissibleId = 1 in the main query, and get rid of the FORCESEEK table hint.
select *
from AccessorGrantPermissableAssociations a
cross apply
(
select z.AccessorId, z.PermissableId, max(z.CreatedDate) CreatedDate
from AccessorGrant z
where
z.AccessorId = a.AccessorId and z.PermissableId = a.PermissableId
and z.CreatedDate <= cast(switchoffset(#asOfMoment, '-00:00') as datetime2)
group by z.AccessorId, z.PermissableId
) b
WHERE a.AccessorId = 1 and a.PermissableId = 1
Your CROSS APPLY is evaluated for every row in your outer query, so you want to limit the outer query as much as possible.
But really, are you not after just the max created date? It should more properly be this:
select *
from AccessorGrantPermissableAssociations a
cross apply
(
select max(z.CreatedDate) CreatedDate
from AccessorGrant z
where
z.AccessorId = a.AccessorId and z.PermissableId = a.PermissableId
and z.CreatedDate <= cast(switchoffset(#asOfMoment, '-00:00') as datetime2)
) b
WHERE a.AccessorId = 1 and a.PermissableId = 1

Related

CTE self join slow down the execution

I am using the following query in SP.
DECLARE #DateFrom datetime = '01/01/1753',
#DateTo datetime = '12/31/9999'
BEGIN
WITH tmpTethers
AS
(
SELECT TL.str_systemid AS SystemCode,
ISNULL(ml.name, ml.location) AS [System],
TL.dte_created AS [Date],
TL.str_LengthId AS TetherRegId,
0 AS LengthCut,
ISNULL(TL.dbl_newlength, 0) AS LengthAdded,
CAST(0 AS FLOAT) AS RemainingLength,
1 AS Mode,
UT.description AS UOM
FROM OP_TetherLength AS TL
INNER JOIN master_location AS ML ON ML.location = TL.str_systemid
LEFT JOIN udc_type AS UT ON TL.lng_lengthuom = UT.udc
WHERE (TL.dte_dateadded BETWEEN #DateFrom AND #DateTo)
UNION ALL
SELECT RR.systemcode AS SystemCode,
ISNULL(ML.name, ML.location) AS [System],
RR.datecreated AS [Date],
RR.oms_repairid AS TetherRegId,
ISNULL(RR.cutlength, 0) AS LengthCut,
0 AS LengthAdded,
0 AS RemainingLength,
0 AS Mode,
UT.description AS UOM
FROM Repair_Registration AS RR
INNER JOIN master_location AS ML ON RR.systemcode = ml.location
LEFT JOIN udc_type AS UT ON RR.cutlength_uomid = UT.udc
WHERE --RR.cut_umbilical_tether = 0 AND
RR.cutbackrequired = 1 AND
(RR.datecreated BETWEEN #DateFrom AND #DateTo)
),
tmpOrderedTethers
AS
(
SELECT TOP 1000
SystemCode,
[System],
[Date],
TetherRegId,
LengthCut,
LengthAdded,
RemainingLength,
Mode,
UOM,
ROW_NUMBER() OVER(PARTITION BY SystemCode ORDER BY [Date] ) AS RowNumber
FROM tmpTethers
ORDER BY SystemCode
),
tmpFinalTethers
AS
(
SELECT SystemCode,
[System],
[Date],
TetherRegId,
LengthCut,
LengthAdded,
CASE
WHEN Mode = 1 THEN LengthAdded
ELSE 0 - LengthCut
END AS RemainingLength,
Mode,
UOM,
RowNumber
FROM tmpOrderedTethers
WHERE RowNumber = 1
UNION ALL
SELECT tmpOT.SystemCode,
tmpOT.[System],
tmpOT.[Date],
tmpOT.TetherRegId,
tmpOT.LengthCut,
tmpOT.LengthAdded,
CASE
WHEN tmpOT.Mode = 1 THEN /*tmpFT.RemainingLength +*/ tmpOT.LengthAdded
ELSE tmpFT.RemainingLength - tmpOT.LengthCut
END AS RemainingLength,
CASE
WHEN tmpOT.Mode = 1 OR tmpFT.Mode = 1 THEN 1
ELSE 0
END AS Mode,
tmpOT.UOM,
tmpOT.RowNumber
FROM tmpOrderedTethers AS tmpOT
INNER JOIN tmpFinalTethers AS tmpFT ON tmpFT.SystemCode = tmpOT.SystemCode AND
tmpFT.RowNumber = tmpOT.RowNumber - 1
),
---- FT - Previous
---- OT - Current
SELECT SystemCode,
[System],
[Date],
TetherRegId,
LengthCut,
LengthAdded,
RemainingLength,
UOM,
RowNumber
,ROW_NUMBER() OVER(PARTITION BY SystemCode ORDER BY [Date] desc) AS SortNumber
FROM tmpGetFinalTethers
ORDER BY SystemCode, SortNumber
OPTION (MAXRECURSION 1000)
END
In above query when I am commenting the following part then execution time reduced and data come fast:
SELECT tmpOT.SystemCode,
tmpOT.[System],
tmpOT.[Date],
tmpOT.TetherRegId,
tmpOT.LengthCut,
tmpOT.LengthAdded,
CASE
WHEN tmpOT.Mode = 1 THEN /*tmpFT.RemainingLength +*/ tmpOT.LengthAdded
ELSE tmpFT.RemainingLength - tmpOT.LengthCut
END AS RemainingLength,
CASE
WHEN tmpOT.Mode = 1 OR tmpFT.Mode = 1 THEN 1
ELSE 0
END AS Mode,
tmpOT.UOM,
tmpOT.RowNumber
FROM tmpOrderedTethers AS tmpOT
INNER JOIN tmpFinalTethers AS tmpFT ON tmpFT.SystemCode = tmpOT.SystemCode AND
tmpFT.RowNumber = tmpOT.RowNumber - 1
Please let me know how I can refine this.
It seems like you have row by row processing in your [tmpFinalTethers] and [tmpGetFinalTethers] cte's.
Each row returned in [tmpFinalTethers] is based on [tmpOrderedTethers] and [tmpOrderedTethers]'s data is based on [tmpTethers]. Therefore the logic which contains in [tmpOrderedTethers] and [tmpTethers] will be executed n times, where n is a number of rows returned by [tmpFinalTethers].
The reason is because cte's are not materialized objects. They are not get stored in memory or disc, so they're executing each time you reference them outside of declaration.
Loading the resultset of [tmpOrderedTethers] to temp table may help if you really need row by row processing for your task and don't have other options.
Also it seems like your [tmpFinalTethers] and [tmpGetFinalTethers] have the same logic inside. I am not sure what the purpose for it. Mb you can do final select from [tmpFinalTethers] and get rid of [tmpGetFinalTethers].
Edited:
Try smth like this:
;WITH tmpTethers AS (...),
tmpOrderedTethers AS (...)
SELECT * INTO #tmpOrderedTethers FROM tmpOrderedTethers
;WITH tmpFinalTethers (
SELECT ... FROM #tmpOrderedTethers WHERE ...
UNION ALL
SELECT ... FROM #tmpOrderedTethers tmpOT INNER JOIN ...
)
Edited 2:
As you have OPTION (MAXRECURSION 1000) I suppose you always get 1000<= number of rows. For such amount of rows your solution with recursive cte combined with temp table will probably work. At least it would be better than cursor, because it consumes some resources in addition to row by row processing. But if you will need to process let's say 10 000 of rows then row by row processing is definitely not appropriate solution and you should find another one.

ROW_NUMBER() Query Plan SORT Optimization

The query below accesses the Votes table that contains over 30 million rows. The result set is then selected from using WHERE n = 1. In the query plan, the SORT operation in the ROW_NUMBER() windowed function is 95% of the query's cost and it is taking over 6 minutes to complete execution.
I already have an index on same_voter, eid, country include vid, nid, sid, vote, time_stamp, new to cover the where clause.
Is the most efficient way to correct this to add an index on vid, nid, sid, new DESC, time_stamp DESC or is there an alternative to using the ROW_NUMBER() function for this to achieve the same results in a more efficient manner?
SELECT v.vid, v.nid, v.sid, v.vote, v.time_stamp, v.new, v.eid,
ROW_NUMBER() OVER (
PARTITION BY v.vid, v.nid, v.sid ORDER BY v.new DESC, v.time_stamp DESC) AS n
FROM dbo.Votes v
WHERE v.same_voter <> 1
AND v.eid <= #EId
AND v.eid > (#EId - 5)
AND v.country = #Country
One possible alternative to using ROW_NUMBER():
SELECT
V.vid,
V.nid,
V.sid,
V.vote,
V.time_stamp,
V.new,
V.eid
FROM
dbo.Votes V
LEFT OUTER JOIN dbo.Votes V2 ON
V2.vid = V.vid AND
V2.nid = V.nid AND
V2.sid = V.sid AND
V2.same_voter <> 1 AND
V2.eid <= #EId AND
V2.eid > (#EId - 5) AND
V2.country = #Country AND
(V2.new > V.new OR (V2.new = V.new AND V2.time_stamp > V.time_stamp))
WHERE
V.same_voter <> 1 AND
V.eid <= #EId AND
V.eid > (#EId - 5) AND
V.country = #Country AND
V2.vid IS NULL
The query basically says to get all rows matching your criteria, then join to any other rows that match the same criteria, but which would be ranked higher for the partition based on the new and time_stamp columns. If none are found then this must be the row that you want (it's ranked highest) and if none are found that means that V2.vid will be NULL. I'm assuming that vid otherwise can never be NULL. If it's a NULLable column in your table then you'll need to adjust that last line of the query.

troubles with next and previous query

I have a list and the returned table looks like this. I took the preview of only one car but there are many more.
What I need to do now is check that the current KM value is larger then the previous and smaller then the next. If this is not the case I need to make a field called Trustworthy and should fill it with either 1 or 0 (true/ false).
The result that I have so far is this:
validKMstand and validkmstand2 are how I calculate it. It did not work in one list so that is why I separated it.
In both of my tries my code does not work.
Here is the code that I have so far.
FullList as (
SELECT
*
FROM
eMK_Mileage as Mileage
)
, ValidChecked1 as (
SELECT
UL1.*,
CASE WHEN EXISTS(
SELECT TOP(1)UL2.*
FROM FullList AS UL2
WHERE
UL2.FK_CarID = UL1.FK_CarID AND
UL1.KM_Date > UL2.KM_Date AND
UL1.KM > UL2.KM
ORDER BY UL2.KM_Date DESC
)
THEN 1
ELSE 0
END AS validkmstand
FROM FullList as UL1
)
, ValidChecked2 as (
SELECT
List1.*,
(CASE WHEN List1.KM > ulprev.KM
THEN 1
ELSE 0
END
) AS validkmstand2
FROM ValidChecked1 as List1 outer apply
(SELECT TOP(1)UL3.*
FROM ValidChecked1 AS UL3
WHERE
UL3.FK_CarID = List1.FK_CarID AND
UL3.KM_Date <= List1.KM_Date AND
List1.KM > UL3.KM
ORDER BY UL3.KM_Date DESC) ulprev
)
SELECT * FROM ValidChecked2 order by FK_CarID, KM_Date
Maybe something like this is what you are looking for?
;with data as
(
select *, rn = row_number() over (partition by fk_carid order by km_date)
from eMK_Mileage
)
select
d.FK_CarID, d.KM, d.KM_Date,
valid =
case
when (d.KM > d_prev.KM /* or d_prev.KM is null */)
and (d.KM < d_next.KM /* or d_next.KM is null */)
then 1 else 0
end
from data d
left join data d_prev on d.FK_CarID = d_prev.FK_CarID and d_prev.rn = d.rn - 1
left join data d_next on d.FK_CarID = d_next.FK_CarID and d_next.rn = d.rn + 1
order by d.FK_CarID, d.KM_Date
With SQL Server versions 2012+ you could have used the lag() and lead() analytical functions to access the previous/next rows, but in versions before you can accomplish the same thing by numbering rows within partitions of the set. There are other ways too, like using correlated subqueries.
I left a couple of conditions commented out that deal with the first and last rows for every car - maybe those should be considered valid is they fulfill only one part of the comparison (since the previous/next rows are null)?

Reuse subquery result in WHERE-Clause for INSERT

i am using Microsoft SQL Server 2008
i would like to save the result of a subquery to reuse it in a following subquery.
Is this possible?
What is best practice to do this? (I am very new to SQL)
My query looks like:
INSERT INTO [dbo].[TestTable]
(
[a]
,[b]
)
SELECT
(
SELECT TOP 1 MAT_WS_ID
FROM #TempTableX AS X_ALIAS
WHERE OUTERBASETABLE.LT_ALL_MATERIAL = X_ALIAS.MAT_RM_NAME
)
,(
SELECT TOP 1 MAT_WS_NAME
FROM #TempTableY AS Y_ALIAS
WHERE Y_ALIAS.MAT_WS_ID = MAT_WS_ID
--(
--SELECT TOP 1 MAT_WS_ID
--FROM #TempTableX AS X_ALIAS
--WHERE OUTERBASETABLE.LT_ALL_MATERIAL = X_ALIAS.MAT_RM_NAME
--)
)
FROM [dbo].[LASERTECHNO] AS OUTERBASETABLE
My question is:
Is this correct what i did.
I replaced the second SELECT Statement in the WHERE-Clause for [b] (which is commented out and exactly the same as for [a]), with the result of the first SELECT Statement of [a] (=MAT_WS_ID).
It seems to give the right results.
But i dont understand why!
I mean MAT_WS_ID is part of both temporary tables X_ALIAS and Y_ALIAS.
So in the SELECT statement for [b], in the scope of the [b]-select-query, MAT_WS_ID could only be known from the Y_ALIAS table. (Or am i wrong, i am more a C++, maybe the scope things in SQL and C++ are totally different)
I just wannt to know what is the best way in SQL Server to reuse an scalar select result.
Or should i just dont care and copy the select for every column and the sql server optimizes it by its own?
One approach would be outer apply:
SELECT mat.MAT_WS_ID
, (
SELECT TOP 1 MAT_WS_NAME
FROM #TempTableY AS Y_ALIAS
WHERE Y_ALIAS.MAT_WS_ID = mat.MAT_WS_ID
)
FROM [dbo].[LASERTECHNO] AS OUTERBASETABLE
OUTER APPLY
(
SELECT TOP 1 MAT_WS_ID
FROM #TempTableX AS X_ALIAS
WHERE OUTERBASETABLE.LT_ALL_MATERIAL = X_ALIAS.MAT_RM_NAME
) as mat
You could rank rows in #TempTableX and #TempTableY partitioning them by MAT_RM_NAME in the former and by MAT_WS_ID in the latter, then use normal joins with filtering by rownum = 1 in both tables (rownum being the column containing the ranking numbers in each of the two tables):
WITH x_ranked AS (
SELECT
*,
rownum = ROW_NUMBER() OVER (PARTITION BY MAT_RM_NAME ORDER BY (SELECT 1))
FROM #TempTableX
),
y_ranked AS (
SELECT
*,
rownum = ROW_NUMBER() OVER (PARTITION BY MAT_WS_ID ORDER BY (SELECT 1))
FROM #TempTableY
)
INSERT INTO dbo.TestTable (a, b)
SELECT
x.MAT_WS_ID,
y.MAT_WS_NAME
FROM dbo.LASERTECHNO t
LEFT JOIN x_ranked x ON t.LT_ALL_MATERIAL = x.MAT_RM_NAME AND x.rownum = 1
LEFT JOIN y_ranked y ON x.MAT_WS_ID = y.MAT_WS_ID AND y.rownum = 1
;
The ORDER BY (SELECT 1) bit is a trick to specify an indeterminate ordering, which, accordingly, would result in indeterminate rownum = 1 rows picked by the query. That is to more or less duplicate your TOP 1 without an explicit order, but I would recommend you to specify a more sensible ORDER BY clause to make the results more predictable.

SQL Server : convert sub select query to join

I have 2 two tables questionpool and question where question is a many to one of question pool. I have created a query using a sub select query which returns the correct random results but I need to return more than one column from the question table.
The intent of the query is to return a random test from the 'question' table for each 'QuizID' from the 'Question Pool' table.
SELECT QuestionPool.QuestionPoolID,
(
SELECT TOP (1) Question.QuestionPoolID
FROM Question
WHERE Question.GroupID = QuestionPool.QuestionPoolID
ORDER BY NEWID()
)
FROM QuestionPool
WHERE QuestionPool.QuizID = '5'
OUTER APPLY is suited to this:
Select *
FROM QuestionPool
OUTER APPLY
(
SELECT TOP 1 *
FROM Question
WHERE Question.GroupID = QuestionPool.QuestionPoolID
ORDER BY NEWID()
) x
WHERE QuestionPool.QuizID = '5'
Another example of OUTER APPLY use http://www.ienablemuch.com/2012/04/outer-apply-walkthrough.html
Live test: http://www.sqlfiddle.com/#!3/d8afc/1
create table m(i int, o varchar(10));
insert into m values
(1,'alpha'),(2,'beta'),(3,'delta');
create table x(i int, j varchar, k varchar(10));
insert into x values
(1,'a','hello'),
(1,'b','howdy'),
(2,'x','great'),
(2,'y','super'),
(3,'i','uber'),
(3,'j','neat'),
(3,'a','nice');
select m.*, '' as sep, r.*
from m
outer apply
(
select top 1 *
from x
where i = m.i
order by newid()
) r
Not familiar with SQL server, but I hope this would do:
Select QuestionPool.QuestionPoolID, v.QuestionPoolID, v.xxx -- etc
FROM QuestionPool
JOIN
(
SELECT TOP (1) *
FROM Question
WHERE Question.GroupID = QuestionPool.QuestionPoolID
ORDER BY NEWID()
) AS v ON v.QuestionPoolID = QuestionPool.QuestionPoolID
WHERE QuestionPool.QuizID = '5'
Your query appears to be bringing back an arbitrary Question.QuestionPoolId for each QuestionPool.QuestionPoolId subject to the QuizId filter.
I think the following query does this:
select qp.QuestionPoolId, max(q.QuestionPoolId) as any_QuestionPoolId
from Question q join
qp.QuestionPoolId qp
on q.GroupId = qp.QuestionPoolId
WHERE QuestionPool.QuizID = '5'
group by qp.QuestionPoolId
This returns a particular question.
The following query would allow you to get more fields:
select qp.QuestionPoolId, q.*
from (select q.*, row_number() over (partition by GroupId order by (select NULL)) as randrownum
from Question q
) join
(select qp.QuestionPoolId, max(QuetionPool qp
on q.GroupId = qp.QuestionPoolId
WHERE QuestionPool.QuizID = '5' and
randrownum = 1
This uses the row_number() to arbitrarily enumerate the rows. The "Select NULL" provides the random ordering (alternatively, you could use "order by GroupId".
Common Table Expressions (CTEs) are rather handy for this type of thing...
http://msdn.microsoft.com/en-us/library/ms175972(v=sql.90).aspx