ROW_NUMBER() Query Plan SORT Optimization - sql

The query below accesses the Votes table that contains over 30 million rows. The result set is then selected from using WHERE n = 1. In the query plan, the SORT operation in the ROW_NUMBER() windowed function is 95% of the query's cost and it is taking over 6 minutes to complete execution.
I already have an index on same_voter, eid, country include vid, nid, sid, vote, time_stamp, new to cover the where clause.
Is the most efficient way to correct this to add an index on vid, nid, sid, new DESC, time_stamp DESC or is there an alternative to using the ROW_NUMBER() function for this to achieve the same results in a more efficient manner?
SELECT v.vid, v.nid, v.sid, v.vote, v.time_stamp, v.new, v.eid,
ROW_NUMBER() OVER (
PARTITION BY v.vid, v.nid, v.sid ORDER BY v.new DESC, v.time_stamp DESC) AS n
FROM dbo.Votes v
WHERE v.same_voter <> 1
AND v.eid <= #EId
AND v.eid > (#EId - 5)
AND v.country = #Country

One possible alternative to using ROW_NUMBER():
SELECT
V.vid,
V.nid,
V.sid,
V.vote,
V.time_stamp,
V.new,
V.eid
FROM
dbo.Votes V
LEFT OUTER JOIN dbo.Votes V2 ON
V2.vid = V.vid AND
V2.nid = V.nid AND
V2.sid = V.sid AND
V2.same_voter <> 1 AND
V2.eid <= #EId AND
V2.eid > (#EId - 5) AND
V2.country = #Country AND
(V2.new > V.new OR (V2.new = V.new AND V2.time_stamp > V.time_stamp))
WHERE
V.same_voter <> 1 AND
V.eid <= #EId AND
V.eid > (#EId - 5) AND
V.country = #Country AND
V2.vid IS NULL
The query basically says to get all rows matching your criteria, then join to any other rows that match the same criteria, but which would be ranked higher for the partition based on the new and time_stamp columns. If none are found then this must be the row that you want (it's ranked highest) and if none are found that means that V2.vid will be NULL. I'm assuming that vid otherwise can never be NULL. If it's a NULLable column in your table then you'll need to adjust that last line of the query.

Related

Use of MAX function in SQL query to filter data

The code below joins two tables and I need to extract only the latest date per account, though it holds multiple accounts and history records. I wanted to use the MAX function, but not sure how to incorporate it for this case. I am using My SQL server.
Appreciate any help !
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from
Property.dbo.PROP
inner join
Property.dbo.PROP_DATA on Property.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where
(PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
order by
PROP.EffDate DESC
Assuming your DBMS supports windowing functions and the with clause, a max windowing function would work:
with all_data as (
select
PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label,
max (PROP.EffDate) over (partition by PROP.PolNo) as max_date
from Actuarial.dbo.PROP
inner join Actuarial.dbo.PROP_DATA
on Actuarial.dbo.PROP.FileID = Actuarial.dbo.PROP_DATA.FileID
where (PROP_DATA.Label in ('Occupancy' , 'OccupancyTIV'))
and (PROP.EffDate >= '42278' and PROP.EffDate <= '42643')
and (PROP.Status = 'Bound')
and (Prop.FileTime = Max(Prop.FileTime))
)
select
FileName, InsName, Status, FileTime, SubmissionNo,
PolNo, EffDate, ExpDate, Region, UnderWriter, Data, Label
from all_data
where EffDate = max_date
ORDER BY EffDate DESC
This also presupposes than any given account would not have two records on the same EffDate. If that's the case, and there is no other objective means to determine the latest account, you could also use row_numer to pick a somewhat arbitrary record in the case of a tie.
Using straight SQL, you can use a self-join in a subquery in your where clause to eliminate values smaller than the max, or smaller than the top n largest, and so on. Just set the number in <= 1 to the number of top values you want per group.
Something like the following might do the trick, for example:
select
p.FileName
, p.InsName
, p.Status
, p.FileTime
, p.SubmissionNo
, p.PolNo
, p.EffDate
, p.ExpDate
, p.Region
, p.Underwriter
, pd.Data
, pd.Label
from Actuarial.dbo.PROP p
inner join Actuarial.dbo.PROP_DATA pd
on p.FileID = pd.FileID
where (
select count(*)
from Actuarial.dbo.PROP p2
where p2.FileID = p.FileID
and p2.EffDate <= p.EffDate
) <= 1
and (
pd.Label in ('Occupancy' , 'OccupancyTIV')
and p.Status = 'Bound'
)
ORDER BY p.EffDate DESC
Have a look at this stackoverflow question for a full working example.
Not tested
with temp1 as
(
select foo
from bar
whre xy = MAX(xy)
)
select PROP.FileName,PROP.InsName, PROP.Status,
PROP.FileTime, PROP.SubmissionNo, PROP.PolNo,
PROP.EffDate,PROP.ExpDate, PROP.Region,
PROP.Underwriter, PROP_DATA.Data , PROP_DATA.Label
from Actuarial.dbo.PROP
inner join temp1 t
on Actuarial.dbo.PROP.FileID = t.dbo.PROP_DATA.FileID
ORDER BY PROP.EffDate DESC

PostgreSQL use case when result in where clause

I use complex CASE WHEN for selecting values. I would like to use this result in WHERE clause, but Postgres says column 'd' does not exists.
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t WHERE d IS NOT NULL
LIMIT 100, OFFSET 100;
Then I thought I can use it like this:
select * from (
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t
LIMIT 100, OFFSET 100) t
WHERE d IS NOT NULL;
But now I am not getting a 100 rows as result. Probably (I am not sure) I could use LIMIT and OFFSET outside select case statement (where WHERE statement is), but I think (I am not sure why) this would be a performance hit.
Case returns array or null. What is the best/fastest way to exclude some rows if result of case statement is null? I need 100 rows (or less if not exists - of course). I am using Postgres 9.4.
Edited:
SELECT count(*) OVER() AS count, t.id, t.size, t.price, t.location, t.user_id, p.city, t.price_type, ht.value as houses_type_value, ST_X(t.coordinates) as x, ST_Y(t.coordinates) AS y,
CASE WHEN t.classification='public' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.classification='protected' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.id IN (SELECT rl.table_id FROM table_private_list rl WHERE rl.owner_id=t.user_id AND rl.user_id=41026) THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
ELSE null
END AS main_image_description
FROM table t LEFT JOIN table_modes m ON m.id = t.mode_id
LEFT JOIN table_types y ON y.id = t.type_id
LEFT JOIN post_codes p ON p.id = t.post_code_id
LEFT JOIN table_houses_types ht on ht.id = t.houses_type_id
WHERE datetime_sold IS NULL AND datetime_deleted IS NULL AND t.published=true AND coordinates IS NOT NULL AND coordinates && ST_MakeEnvelope(17.831490030182, 44.404640972306, 12.151558389557, 47.837396630872) AND main_image_description IS NOT NULL
GROUP BY t.id, m.value, y.value, p.city, ht.value ORDER BY t.id LIMIT 100 OFFSET 0
To use the CASE WHEN result in the WHERE clause you need to wrap it up in a subquery like you did, or in a view.
SELECT * FROM (
SELECT id, name, CASE
WHEN name = 'foo' THEN true
WHEN name = 'bar' THEN false
ELSE NULL
END AS c
FROM case_in_where
) t WHERE c IS NOT NULL
With a table containing 1, 'foo', 2, 'bar', 3, 'baz' this will return records 1 & 2. I don't know how long this SQL Fiddle will persist, but here is an example: http://sqlfiddle.com/#!15/1d3b4/3 . Also see https://stackoverflow.com/a/7950920/101151
Your limit is returning less than 100 rows if those 100 rows starting at offset 100 contain records for which d evaluates to NULL. I don't know how to limit the subselect without including your limiting logic (your case statements) re-written to work inside the where clause.
WHERE ... AND (
t.classification='public' OR t.classification='protected'
OR t.id IN (SELECT rl.table_id ... rl.user_id=41026))
The way you write it will be different and it may be annoying to keep the CASE logic in sync with the WHERE limiting statements, but it would allow your limits to work only on matching data.

troubles with next and previous query

I have a list and the returned table looks like this. I took the preview of only one car but there are many more.
What I need to do now is check that the current KM value is larger then the previous and smaller then the next. If this is not the case I need to make a field called Trustworthy and should fill it with either 1 or 0 (true/ false).
The result that I have so far is this:
validKMstand and validkmstand2 are how I calculate it. It did not work in one list so that is why I separated it.
In both of my tries my code does not work.
Here is the code that I have so far.
FullList as (
SELECT
*
FROM
eMK_Mileage as Mileage
)
, ValidChecked1 as (
SELECT
UL1.*,
CASE WHEN EXISTS(
SELECT TOP(1)UL2.*
FROM FullList AS UL2
WHERE
UL2.FK_CarID = UL1.FK_CarID AND
UL1.KM_Date > UL2.KM_Date AND
UL1.KM > UL2.KM
ORDER BY UL2.KM_Date DESC
)
THEN 1
ELSE 0
END AS validkmstand
FROM FullList as UL1
)
, ValidChecked2 as (
SELECT
List1.*,
(CASE WHEN List1.KM > ulprev.KM
THEN 1
ELSE 0
END
) AS validkmstand2
FROM ValidChecked1 as List1 outer apply
(SELECT TOP(1)UL3.*
FROM ValidChecked1 AS UL3
WHERE
UL3.FK_CarID = List1.FK_CarID AND
UL3.KM_Date <= List1.KM_Date AND
List1.KM > UL3.KM
ORDER BY UL3.KM_Date DESC) ulprev
)
SELECT * FROM ValidChecked2 order by FK_CarID, KM_Date
Maybe something like this is what you are looking for?
;with data as
(
select *, rn = row_number() over (partition by fk_carid order by km_date)
from eMK_Mileage
)
select
d.FK_CarID, d.KM, d.KM_Date,
valid =
case
when (d.KM > d_prev.KM /* or d_prev.KM is null */)
and (d.KM < d_next.KM /* or d_next.KM is null */)
then 1 else 0
end
from data d
left join data d_prev on d.FK_CarID = d_prev.FK_CarID and d_prev.rn = d.rn - 1
left join data d_next on d.FK_CarID = d_next.FK_CarID and d_next.rn = d.rn + 1
order by d.FK_CarID, d.KM_Date
With SQL Server versions 2012+ you could have used the lag() and lead() analytical functions to access the previous/next rows, but in versions before you can accomplish the same thing by numbering rows within partitions of the set. There are other ways too, like using correlated subqueries.
I left a couple of conditions commented out that deal with the first and last rows for every car - maybe those should be considered valid is they fulfill only one part of the comparison (since the previous/next rows are null)?

Select 1st and 2nd Record before record X

SQL Server 2008-12
I have table:
InteractionKey char(18)
dEventTime datetime
SeqNo int
cEventData1
There will be multiple entries per InteractionKey - dEventTime only goes out to the Seconds and SeqNo is incremented if two entries occur on the same second.
What I need to do is select the First and Second record BEFORE the record where
cEventData1 = 'Disconnect'
The final product will give me a count of occurrences grouped by cEventData1.
I am currently using a cursor (will update with cursor source momentarily) I would like to use a CTE - but I really struggle with understanding them...
Any ideas would be appreciated!
Update with Data Sample
INTERACTIONKEY dEventTime SeqNo cEventData1
100186322420130722 2013-07-22 11:50:49.000 1 EnterPassword
100186322420130722 2013-07-22 11:50:49.000 2 CheckPassword
100186322420130722 2013-07-22 11:50:49.000 3 Attendant Disconnect
The result of the query would ideally tell me - : NOTE The Action column here can be simply 'Attendant Disconnect' as Action
cEventData1 Action Count
CheckPassword Attendant Disconnect 1
Here is the query I ended up going with based upon the below answer
SELECT DISTINCT t1.InteractionKey,
DisconnectTime = t1.dEventTime,
PreviousEventTime = t2.dEventTime,
PreviousEvent = t2.cEventData1,
t2.SeqNo
FROM IVRHistory t1
OUTER APPLY
( SELECT TOP 1 t2.dEventTime, t2.SeqNo, t2.cEventData1
FROM IVRHistory t2
WHERE t1.InteractionKey = t2.InteractionKey
AND t1.dEventTime >= t2.dEventTime
AND t1.SeqNo > t2.SeqNo
AND t2.cEventData1 <> 'Attendant Disconnect'
ORDER BY t2.dEventTime DESC, t2.SeqNo DESC
) t2
WHERE t1.cEventData1 = 'Attendant Disconnect'
I would approach this using APPLY:
SELECT t1.InteractionKey,
DisconnectTime = t1.dEventTime,
PreviousEventTime = t2.dEventTime,
PreviousEvent = t2.cEventData1,
t2.SeqNo
FROM T t1
OUTER APPLY
( SELECT TOP 2 t2.dEventTime, t2.SeqNo, t2.cEventData1
FROM T t2
WHERE t1.InteractionKey = t2.InteractionKey
AND t1.dEventTime > t2.dEventTime
ORDER BY t2.dEventTime DESC
) t2
WHERE t1.cEventData1 = 'Disconnect';
This will give you the two records immediately preceeding the disconnect event. If you need more than two records if there are duplicate times you can use TOP 2 WITH TIES.
Without your sample input and output I am guessing a bit, but from what you have said your final aggregate would be:
SELECT t2.cEventData1,
Occurances = COUNT(*)
FROM T t1
OUTER APPLY
( SELECT TOP 2 t2.dEventTime, t2.SeqNo, t2.cEventData1
FROM T t2
WHERE t1.InteractionKey = t2.InteractionKey
AND t1.dEventTime > t2.dEventTime
ORDER BY t2.dEventTime DESC
) t2
WHERE t1.cEventData1 = 'Disconnect'
GROUP BY t2.cEventData1;

Fastest way to check if the the most recent result for a patient has a certain value

Mssql < 2005
I have a complex database with lots of tables, but for now only the patient table and the measurements table matter.
What I need is the number of patient where the most recent value of 'code' matches a certain value. Also, datemeasurement has to be after '2012-04-01'. I have fixed this in two different ways:
SELECT
COUNT(P.patid)
FROM T_Patients P
WHERE P.patid IN (SELECT patid
FROM T_Measurements M WHERE (M.code ='xxxx' AND result= 'xx')
AND datemeasurement =
(SELECT MAX(datemeasurement) FROM T_Measurements
WHERE datemeasurement > '2012-01-04' AND patid = M.patid
GROUP BY patid
GROUP by patid)
AND:
SELECT
COUNT(P.patid)
FROM T_Patient P
WHERE 1 = (SELECT TOP 1 case when result = 'xx' then 1 else 0 end
FROM T_Measurements M
WHERE (M.code ='xxxx') AND datemeasurement > '2012-01-04' AND patid = P.patid
ORDER by datemeasurement DESC
)
This works just fine, but it makes the query incredibly slow because it has to join the outer table on the subquery (if you know what I mean). The query takes 10 seconds without the most recent check, and 3 minutes with the most recent check.
I'm pretty sure this can be done a lot more efficient, so please enlighten me if you will :).
I tried implementing HAVING datemeasurment=MAX(datemeasurement) but that keeps throwing errors at me.
So my approach would be to write a query just getting all the last patient results since 01-04-2012, and then filtering that for your codes and results. So something like
select
count(1)
from
T_Measurements M
inner join (
SELECT PATID, MAX(datemeasurement) as lastMeasuredDate from
T_Measurements M
where datemeasurement > '01-04-2012'
group by patID
) lastMeasurements
on lastMeasurements.lastmeasuredDate = M.datemeasurement
and lastMeasurements.PatID = M.PatID
where
M.Code = 'Xxxx' and M.result = 'XX'
The fastest way may be to use row_number():
SELECT COUNT(m.patid)
from (select m.*,
ROW_NUMBER() over (partition by patid order by datemeasurement desc) as seqnum
FROM T_Measurements m
where datemeasurement > '2012-01-04'
) m
where seqnum = 1 and code = 'XXX' and result = 'xx'
Row_number() enumerates the records for each patient, so the most recent gets a value of 1. The result is just a selection.