How to calculate RowTotal of CTE that run in less time - sql

I have store procedure which have following structure
WITH ItemsContact (
IsCostVariantItem
,ItemID
,AttributeSetID
,ItemTypeID
,HidePrice
,HideInRSSFeed
,HideToAnonymous
,IsOutOfStock
,AddedOn
,BaseImage
,AlternateText
,SKU
,[Name]
,DownloadableID
,[Description]
,ShortDescription
,[Weight]
,Quantity
,Price
,ListPrice
,IsFeatured
,IsSpecial
,ViewCount
,SoldItem
,TotalDiscount
,RatedValue
,RowNumber
)
AS (
SELECT ------,
ROW_NUMBER() OVER (
ORDER BY i.[ItemID] DESC
) AS RowNumber
FROM -------
)
,rowTotal (RowTotal)
AS (
SELECT MAX(RowNumber)
FROM ItemsContact
)
SELECT CONVERT(INT, r.RowTotal) AS RowTotal
,c.*
FROM ItemsContact c
,rowTotal r
WHERE RowNumber >= #offset
AND RowNumber <= (#offset + #limit - 1)
ORDER BY ItemID
when i execute this i have found from execution plan
SQL Server Execution Times:
CPU time = 344 ms, elapsed time = 362 ms.
Now i remove the second cte ie rowTotal
WITH ItemsContact (
IsCostVariantItem
,ItemID
,AttributeSetID
,ItemTypeID
,HidePrice
,HideInRSSFeed
,HideToAnonymous
,IsOutOfStock
,AddedOn
,BaseImage
,AlternateText
,SKU
,[Name]
,DownloadableID
,[Description]
,ShortDescription
,[Weight]
,Quantity
,Price
,ListPrice
,IsFeatured
,IsSpecial
,ViewCount
,SoldItem
,TotalDiscount
,RatedValue
,RowNumber
)
AS (
SELECT ------,
ROW_NUMBER() OVER (
ORDER BY i.[ItemID] DESC
) AS RowNumber
FROM -------
)
SELECT c.*
FROM ItemsContact c
,rowTotal r
WHERE RowNumber >= #offset
AND RowNumber <= (#offset + #limit - 1)
ORDER BY ItemID
And it show execution plan as
SQL Server Execution Times:
CPU time = 63 ms, elapsed time = 61 ms.
My first code to calculate rowtotal work fine but it takes more time.My question is why MAX(RowNumber) take so longer time and how can i optimized this code.Thanks in advance for any help.

Since MAX(RowNumber) will always be equal to the total number of rows, try just having:
SELECT ------,
ROW_NUMBER() OVER (
ORDER BY i.[ItemID] DESC
) AS RowNumber,
COUNT(*) OVER () as RowTotal
FROM -------
As your first CTE.

The way you have coded it, your SQL has syntax errors. Another thing is that if you removed rowTotal from your second query, it simply wouldn't work because it still has a reference to it. So I don't know where these second execution times are from.
However, if I use code blocks as templates and remove errors, the execution plan for this query should be quite simple: you should have a (clustered) index scan on your ------- table and sort operator, along with some other operators (sequence projection for a ranking ROW_NUMBER function, some join operator like nested loop etc). Clustered index scan and sort should be most processor intensive operations.
SQL server should here calculate row numbers for each row, find a maximum of it and constraint results between the two row numbers calculated from input variables. Obvously there is a paging functionality built using this query and there is a lot about paging in SQL Server on SO, so look for it and you can find a lot of related information.
If there is a known layer built on this query, you should change it. It uses additional unnecesarry column for max(row_number(ID)) that is constant through all rows (38k?) and logically has just a scalar value in it. Instead you should return count(*) as #Damien_The_Unbeliever suggested in its solution, but separate it from the resultset. This way you would simplify the query and have instead something like this:
SELECT
N,
*
FROM
YourTable
CROSS apply(
SELECT N = ROW_NUMBER() OVER (ORDER BY ItemID DESC)
) x
WHERE N BETWEEN #offset AND #offset + #limit - 1
ORDER BY ItemID
It should be easy to get the result count in the next query. AND if you have a really big table, you can count approximate number of rows using this method.
P.S. If you already haven't checked your execution plan for index problems, do it.

Related

SQL Azure query aggregate performance issue

I'm trying to improve our SQL Azure database performamce, trying to change the use of CURSOR while this is (as everybody told me) something to avoid.
Our table is about GPS information, rows with a id clustered index and secondary indexes on device, timestamp and geography index on location.
I'm trying to compute some statistic such minimum speed (doppler and computed), total distance, average speed, ... along period for a specific device.
I have NO choice on the stat and CAN'T change the table or output because of production.
I have a clear performance issue when running this inline tbl function on my SQL Azure DB.
ALTER FUNCTION [dbo].[fn_logMetrics_3]
(
#p_device smallint,
#p_from dateTime,
#p_to dateTime,
#p_moveThresold int = 1
)
RETURNS TABLE
AS
RETURN
(
WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY timestamp) AS RowNum,
Timestamp,
Location,
Alt,
Speed
FROM
LogEvents
WHERE
Device = #p_device
AND Timestamp >= #p_from
AND Timestamp <= #p_to),
CTE1 AS
(
SELECT
t1.Speed as Speed,
t1.Alt as Alt,
t2.Alt - t1.Alt as DeltaElevation,
t1.Timestamp as Time0,
t2.Timestamp as Time1,
DATEDIFF(second, t2.Timestamp, t1.Timestamp) as Duration,
t1.Location.STDistance(t2.Location) as Distance
FROM
CTE t1
INNER JOIN
CTE t2 ON t1.RowNum = t2.RowNum + 1),
CTE2 AS
(
SELECT
Speed, Alt,
DeltaElevation,
Time0, Time1,
Duration,
Distance,
CASE
WHEN Duration <> 0
THEN (Distance / Duration) * 3.6
ELSE NULL
END AS CSpeed,
CASE
WHEN DeltaElevation > 0
THEN DeltaElevation
ELSE NULL
END As PositiveAscent,
CASE
WHEN DeltaElevation < 0
THEN DeltaElevation
ELSE NULL
END As NegativeAscent,
CASE
WHEN Distance < #p_moveThresold
THEN Duration
ELSE NULL
END As StopTime,
CASE
WHEN Distance > #p_moveThresold
THEN Duration
ELSE NULL
END As MoveTime
FROM
CTE1 t1
)
SELECT
COUNT(*) as Count,
MIN(Speed) as HSpeedMin, MAX(Speed) as HSpeedMax,
AVG(Speed) as HSpeedAverage,
MIN(CSpeed) as CHSpeedMin, MAX(CSpeed) as CHSpeedMax,
AVG(CSpeed) as CHSpeedAverage,
SUM(Distance) as CumulativeDistance,
MAX(Alt) as AltMin, MIN(Alt) as AltMax,
SUM(PositiveAscent) as PositiveAscent,
SUM(NegativeAscent) as NegativeAscent,
SUM(StopTime) as StopTime,
SUM(MoveTime) as MoveTime
FROM
CTE2 t1
)
The broad idea is
CTE is selecting the correponding rows, following the parameters
CTE1 perform aggregation within two consecutive row, in order to get Duration and Distance
then CTE2 perform operation on these Distance and Duration
Finally the last select is doing aggregation such sum and average over each columns
Everything working pretty well, until the last SELECT call where the agregate function (which are only few sum and average) killed the performance.
This query selecting 1500 rows against table with 4M rows is taking 1500ms.
when replacing the last select with
SELECT ÇOUNT(*) as count FROM CTE2 t1
then it's take only few ms.. (down to 2ms according to SQL Studio statistics).
with
SELECT
COUNT(*) as Count,
SUM(MoveTime) as MoveTime
it's about 125ms
with
SELECT
COUNT(*) as Count,
SUM(StopTime) as StopTime,
SUM(MoveTime) as MoveTime
it's about 250ms
like each aggregate are running on consecutive loop operation over all the row, within the same thread and without beeing parallelized
For information, the CURSOR version (I wrote couple of year ago) of this function is running actually at least twice fast...
What is wrong with this aggregate? How to optimize it?
UPDATE :
The query plans for
SELECT COUNT(*) as Count
The query plans for the full Select with agregate
According the answer of Joe C, I introduce a #tmp table in the plans and perform the aggregate on it. The result is about twice as fast, which is an interesting fact.

TSQL Last Record Efficiency Cursor, SubQuery, or CTE

Consider the following query...
SELECT
*
,CAST(
(CurrentSampleDateTime - PreviousSampleDateTime) AS FLOAT
) * 24.0 * 60.0 AS DeltaMinutes
FROM
(
SELECT
C.SampleDateTime AS CurrentSampleDateTime
,C.Location
,C.CurrentValue
,(
SELECT TOP 1
Previous.SampleDateTime
FROM Samples AS Previous
WHERE
Previous.Location = C.Location
AND Previous.SampleDateTime < C.SampleDateTime
ORDER BY Previous.SampleDateTime DESC
) AS PreviousSampleDateTime
FROM Samples AS C
) AS TempResults
Assuming all things being equal such as indexing, etc is this the most efficient way of achieving the above results? That is using a SubQuery to retrieve the last record?
Would I be better off creating a cursor that orders by Location, SampleDateTime and setting up variables for CurrentSampleDateTime and PreviousSampleDateTime...setting the Previous to the Current at the bottom of the while loop?
I'm not very good with CTE's is this something that could be accomplished more efficiently with a CTE? If so what would that look like?
I'm likely going to have to retrieve PreviousValue along with Previous SampleDateTime in order to get an average of the two. Does that change the results any.
Long story short what is the best/most efficient way of holding onto the values of a previous record if you need to use those values in calculations on the current record?
----UPDATE
I should note that I have a clustered index on Location, SampleDateTime, CurrentValue so maybe that is what is affecting the results more than anything.
with 5,591,571 records my query (the one above) on average takes 3 mins and 20 seconds
The CTE that Joachim Isaksson below on average is taking 5 mins and 15 secs.
Maybe it's taking longer because it's not using the clustered index but is using the rownumber for the joins?
I started testing the cursor method but it's already at 10 minutes...so no go on that one.
I'll give it a day or so but think I will accept the CTE answer provided by Joachim Isaksson just because I found a new method of getting the last row.
Can anyone concur that it's the index on Location, SampleDateTime, CurrentValue that is making the subquery method faster?
I don't have SQL Server 2012 so can't test the LEAD/LAG method. I'd bet that would be quicker than anything I've tried assuming Microsoft implemented that efficiently. Probably just have to swap a pointer to a memory reference at the end of each row.
If you are using SQL Server 2012, you can use the LAG window function that retrieves the value of the specified column from the previous row. It returns null if there is no previous row.
SELECT
a.*,
CAST((a.SampleDateTime - LAG(a.SampleDateTime) OVER(PARTITION BY a.location ORDER BY a.SampleDateTime ASC)) AS FLOAT)
* 24.0 * 60.0 AS DeltaMinutes
FROM samples a
ORDER BY
a.location,
a.SampleDateTime
You'd have to run some tests to see if it's faster. If you're not using SQL Server 2012 then at least this may give others an idea of how it can be done with 2012. I like #Joachim Isaksson 's answer using a CTE with a Row_Number()/Partition By for 2008 and 2005.
SQL Fiddle
Have you considered creating a temp table to use instead of a CTE or subquery? You can create indexes on the temp table that are more suited for the join on RowNumber.
CREATE TABLE #tmp (
RowNumber INT,
Location INT,
SampleDateTime DATETIME,
CurrentValue INT)
;
INSERT INTO #tmp
SELECT
ROW_NUMBER() OVER (PARTITION BY Location
ORDER BY SampleDateTime DESC) rn,
Location,
SampleDateTime,
CurrentValue
FROM Samples
;
CREATE INDEX idx_location_row ON #tmp(Location,RowNumber) INCLUDE (SampleDateTime,CurrentValue);
SELECT
a.Location,
a.SampleDateTime,
a.CurrentValue,
CAST((a.SampleDateTime - b.SampleDateTime) AS FLOAT) * 24.0 * 60.0 AS DeltaMinutes
FROM #tmp a
LEFT JOIN #tmp b ON
a.Location = b.Location
AND b.RowNumber = a.RowNumber +1
ORDER BY
a.Location,
a.SampleDateTime
SQL Fiddle #2
As always, testing with your real data is king.
Here's a CTE version that shows the samples for each location with time deltas from the previous sample. It uses OVER ranking, which usually does well in comparison to subqueries for solving the same problem.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Location
ORDER BY SampleDateTime DESC) rn
FROM Samples
)
SELECT a.*,CAST((a.SampleDateTime - b.SampleDateTime) AS FLOAT)
* 24.0 * 60.0 AS DeltaMinutes
FROM cte a
LEFT JOIN cte b ON a.Location = b.Location AND b.rn = a.rn +1
An SQLfiddle to test with.

How to write this sql query

I have a SQL Server table with the following structure
cod_turn (PrimaryKey)
taken (bit)
time (datetime)
and several other fields which are irrelevant to the problem. I cant alter the table structure because the app was made by someone else.
given a numeric variable parameter, which we will assume to be "3" for this example, and a time, I need to create a query which looking from that time on, it looks the first 3 consecutive records which are not marked as "taken". I cant figure out how to make the query in pure sql, if possible.
PS: I accepted the answer because it was correct, but I made a bad description of the problem. I will open another question later. Feeling stupid after seeing the size of the answers =)
SELECT TOP 3 * FROM table WHERE taken = 0 AND time>=#Time ORDER BY time
Where #Time is whatever time you pass in.
Assuming current versions of SQL Server and assuming you've named you "numeric variable parameter" as #top int. Note:the parenthesis around #top are required when using a parameter-ized TOP
SELECT TOP (#top)
cod_turn,
taken ,
time
FROM yourtable
WHERE Taken = 0 AND time>=#Time
ORDER BY time DESC
You can also do
with cte as
(
SELECT
ROW_NUMBER() over (order by time desc) rn
cod_turn,
taken ,
time
FROM yourtable
WHERE Taken = 0 AND time>=#Time
)
SELECT
cod_turn,
taken ,
time
FROM CTE
WHERE rn <= #top
ORDER BY time DESC
SELECT TOP 3
*
FROM
table
WHERE
time >= #inserted_time
AND taken = 0
ORDER BY
cod_turn ASC
select MT.*
from
(
select cod_turn, ROW_NUMBER() OVER (ORDER BY cod_turn) [RowNumber] -- or by time
from myTable
where taken = 0
and time >= #myTime
) T
inner join myTable MT on MT.cod_turn = T.cod_turn
where T.RowNumber < #myNumber
select top 3 * from theTable where taken = 0 and time > theTime orderby time

Simple Database Query - Is there a faster way without a cursor?

I have 2 tables that I'm trying to query. The first has a list of meters. The second, has the data for those meters. I want to get the newest reading for each meter.
Originally, this was in a group by statement, but it ended up processing all 7 million rows in our database, and took a little over a second. A subquery and a number of other ways of writing it had the same result.
I have a clustered index that covers the EndTime and the MeterDataConfigurationId columns in the MeterRecordings table.
Ultimately, this is what I wrote, which performs in about 20 milliseconds. It seems like SQL should be smart enough to perform the "group by" query in the same time.
Declare #Meters Table
(
MeterId Integer,
LastValue float,
LastTimestamp DateTime
)
Declare MeterCursor Cursor For
Select Id
From MeterDataConfiguration
Declare #MeterId Int
Open MeterCursor
Fetch Next From MeterCursor Into #MeterId
While ##FETCH_STATUS = 0
Begin
Declare #LastValue int
Declare #LastTimestamp DateTime
Select #LastValue = mr.DataValue, #LastTimestamp = mr.EndTime
From MeterRecording mr
Where mr.MeterDataConfigurationId = #MeterId
And mr.EndTime = (Select MAX(EndTime) from MeterRecording mr2 Where mr2.MeterDataConfigurationId = #MeterId)
Insert Into #Meters
Select #MeterId, #LastValue, #LastTimestamp
Fetch Next From MeterCursor Into #MeterId
End
Deallocate MeterCursor
Select *
From #Meters
Here is an example of the same query that performs horribly:
select mdc.id, mr.EndTime
from MeterDataConfiguration mdc
inner join MeterRecording mr on
mr.MeterDataConfigurationId = mdc.Id
and mr.EndTime = (select MAX(EndTime) from MeterRecording mr2 where MeterDataConfigurationId = mdc.Id)
You can try a CTE (Common Table Expression) using ROW_NUMBER:
;WITH Readings AS
(
SELECT
mdc.id, mr.EndTime,
ROW_NUMBER() OVER(PARTIION BY mdc.id ORDER BY mr.EndTime DESC) AS 'RowID'
FROM dbo.MeterDataConfiguration mdc
INNER JOIN dbo.MeterRecording mr ON mr.MeterDataConfigurationId = mdc.Id
)
SELECT
ID, EndTime, RowID
FROM
Readings
WHERE
RowID = 1
This creates "partitions" of data, one for each mdc.id, and numbers them sequentially, descending on mr.EndTime, so for each partition, you get the most recent reading as the RowID = 1 row.
Of course, to get decent performance, you need appropriate indices on:
mr.MeterDataConfigurationId since it's a foreign key into MeterDataConfiguration, right?
mr.EndTime since you do an ORDER BY on it
mdc.Id which I assume is a primary key, so it's indexed already
Update: sorry, I missed this tidbit:
I have a clustered index that covers
the EndTime and the
MeterDataConfigurationId columns in
the MeterRecordings table.
Quite honestly : I would toss that. Don't you have some other unique ID on the MeterRecordings table that would be suitable as a clustering index? An INT IDENTITY ID or something??
If you have a compound index on (EndTime, MeterDataConfigurationId), this won't be able to be used for both purposes - ordering on EndTime, and joining on MeterDataConfigurationId - one of them will not be doable - pity!
How does this query perform? This one gets all the data in MeterRecording ignoring the list in MeterDataConfiguration. If this is not safe to do so, that can be joined to this query to restrict the output.
SELECT Id, DataValue, EndTime
FROM (
select mr.MeterDataConfigurationId as Id,
mr.DataValue
mr.EndTime,
RANK() OVER(PARTITION BY mr.MeterDataConfigurationId
ORDER BY mr.EndTime DESC) as r
from MeterRecording mr) as M
WHERE M.r = 1
I would go with marc's answer, but if you ever need to use Cursors again(you should try to avoid them) and you need to process a lot of records, i would suggest that you create a temp table (or table variable) that has all the columns from the table you are reading plus an autogenerated identity field (IDENTITY(1,1)) and then just use a while loop to read from the table. Basically, increment an int variable (call it #id) inside the loop and do
select
#col1Value = column1,
#col2Value = column2, ...
from #temp_table
where id = #id
this is behaves just like a cursor, but i find this to be much faster.

Find the longest sequence of a value in a table

This is an SQL Question, I think it is difficult one - I'm not sure it is possible to achieve in a simple SQL sentence or a stored procedure:
I want to find the number of the longest sequence of the same (known) number in a column in a table:
example:
TABLE:
DATE SALEDITEMS
1/1/09 4
1/2/09 3
1/3/09 3
1/4/09 4
1/5/09 3
calling the sp/sentence for 4 will give 1 calling the sp/sentecne for 3 will give 2
as there was 2 times in a row number 3.
I'm running SQL server 2008.
UPDATE: I generated a million rows of random data, and abandoned the recursive CTE solution, as its query plan didn't make good use of indexes in the optimizer.
But the non-recursive solution I originaly posted turned out to work great, as long as there was an additional non-clustered index on (SALEDITEMS, [DATE]). This makes sense, since the query needs to filter in both directions (both by date and by SALEDITEMS). With this additional index, queries on a million rows return in under 2 seconds on my (not very beefy) desktop mathine. Without this index, the query was dog-slow.
BTW, this is a great example of how SQL Server's cost-based query optimization totally breaks down in some cases. The recursive CTE solution has a cost (on my PC) of 42 and takes at least several minutes to finish. The non-recursive solution has a cost of 15,446 (!!!) and completes in 1.5 seconds. Moral of the story: when comparing SQL Server query plans, don't assume that cost necessarily correlates to query performance!
Anyway, here's the solution I'd recommend (the same non-recursive CTE I posted earlier) :
DECLARE #SALEDITEMS INT = 3;
WITH SalesNoMatch ([DATE], SALEDITEMS, NoMatchDate)
AS
(
SELECT [DATE], SALEDITEMS,
(SELECT MIN([DATE]) FROM Sales s2 WHERE s2.SALEDITEMS <> #SALEDITEMS
AND s2.[DATE] > s1.[DATE]) as NoMatchDate
FROM Sales s1
)
, SalesMatchCount ([DATE], ConsecutiveCount) AS
(
SELECT [DATE], 1+(SELECT COUNT(1) FROM Sales s2 WHERE s2.[DATE] > s1.[DATE] AND s2.[DATE] < NoMatchDate)
FROM SalesNoMatch s1
WHERE s1.SALEDITEMS = #SALEDITEMS
)
SELECT MAX(ConsecutiveCount)
FROM SalesMatchCount;
Here's the DDL I used to test this, including indexes you'll need:
CREATE TABLE [Sales](
[DATE] date NOT NULL,
[SALEDITEMS] int NOT NULL
);
CREATE UNIQUE CLUSTERED INDEX IX_Sales ON Sales ([DATE]);
CREATE UNIQUE NONCLUSTERED INDEX IX_Sales2 ON Sales (SALEDITEMS, [DATE]);
And here's how I created my test data-- 1,000,001 rows with ascending dates with SALEDITEMS randomly set between 1 and 10.
INSERT INTO Sales ([DATE], SALEDITEMS)
VALUES ('1/1/09', 5)
DECLARE #i int = 0;
WHILE (#i < 1000000)
BEGIN
INSERT INTO Sales ([DATE], SALEDITEMS)
SELECT DATEADD (d, 1, (SELECT MAX ([DATE]) FROM Sales)), ABS(CHECKSUM(NEWID())) % 10 + 1
SET #i = #i + 1;
END
Here's the recursive-CTE solution that I abandoned:
DECLARE #SALEDITEMS INT = 3;
-- recursive CTE solution (remember to set MAXRECURSION!)
WITH SalesRowNum ([DATE], SALEDITEMS, RowNum)
AS
(
SELECT [DATE], SALEDITEMS, ROW_NUMBER() OVER (ORDER BY s1.[DATE]) as RowNum
FROM Sales s1
)
, SalesCTE (RowNum, [DATE], ConsecutiveCount)
AS
(
SELECT s1.RowNum, s1.[DATE], 1 AS ConsecutiveCount
FROM SalesRowNum s1
WHERE SALEDITEMS = #SALEDITEMS
UNION ALL
SELECT s1.RowNum, s1.[DATE], ConsecutiveCount + 1 AS ConsecutiveCount
FROM SalesRowNum s1
INNER JOIN SalesCTE s2 ON s1.RowNum = s2.RowNum + 1
WHERE SALEDITEMS = #SALEDITEMS
)
SELECT MAX(ConsecutiveCount)
FROM SalesCTE;
Untested, because you did not provide DDL and sample data:
DECLARE #SALEDITEMS INT;
SET #SALEDITEMS=3;
SELECT MAX(cnt) FROM(
SELECT COUNT(*) FROM YourTable JOIN (
SELECT y1.[Date] AS d1, y2.[Date] AS d2
FROM YourTable AS y1 JOIN YourTable AS y2
ON y1.SALEDITEMS=#SALEDITEMS AND y2.SALEDITEMS=#SALEDITEMS
AND NOT EXISTS(SELECT 1 FROM YourTable AS y
WHERE y.SALEDITEMS<>#SALEDITEMS
AND y1.[Date] < y.[Date] AND y.[Date] < y2.[Date])
) AS t
WHERE [Date] BETWEEN t.d1 AND t.d2
) AS t;