Stored proc gives different result set on different server - sql

I have put together a stored procedure on my dev machine, which runs SQL Server 10.50.6220 (Express). It works correctly and returns the expected (and consistent) results.
I have then done a full backup and restored to a test machine running SQL Server 10.50.6000.34. The stored proc on the new server now returns incorrect results, whats more, the results it returns are different each time it is run.
ALTER PROCEDURE [dbo].[Get_Station_Utilisation]
#From NVARCHAR(50),
#To NVARCHAR(50)
AS
IF #From='' SET #From = NULL
IF #To='' SET #To = NULL
SELECT T.StationID As [Station ID], dbo.Stations.StationName As [Station Name], T.StaffWorking As [Workers], T.Mins
FROM
(SELECT StatsID As StationID, [Count] As StaffWorking, SUM(Duration) AS Mins
FROM
(SELECT dbo.Active_Clockings.StationID AS StatsID, COUNT(*) AS [Count], DATEDIFF(Minute, Times2.Time, Times1.Time) AS Duration
FROM
(SELECT TOP 100 PERCENT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rownum, [Time]
FROM
(SELECT DISTINCT (dbo.Active_Clockings.StartTime) AS [Time]
FROM dbo.Active_Clockings
UNION
SELECT DISTINCT (dbo.Active_Clockings.FinishTime) AS [Time]
FROM dbo.Active_Clockings) AS AllTimes
ORDER BY [Time] DESC) AS Times1 JOIN
(SELECT TOP 100 PERCENT ROW_NUMBER() OVER (ORDER BY(SELECT 1)) AS rownum, [Time]
FROM
(SELECT DISTINCT (dbo.Active_Clockings.StartTime) AS [Time]
FROM dbo.Active_Clockings
UNION
SELECT DISTINCT (dbo.Active_Clockings.FinishTime) AS [Time]
FROM dbo.Active_Clockings) AS AllTimes
ORDER BY [Time] DESC) AS Times2 ON Times1.rownum = Times2.rownum + 1 JOIN
dbo.Active_Clockings ON Times1.Time > dbo.Active_Clockings.StartTime AND Times2.Time < dbo.Active_Clockings.FinishTime
AND (#From IS NULL OR (dbo.Active_Clockings.FinishTime > CAST(#From as date)))
AND (#To IS NULL OR dbo.Active_Clockings.FinishTime < DATEADD(Day, 1, CAST(#To as date)))
GROUP BY Times1.rownum, Times2.Time, Times1.Time, dbo.Active_Clockings.StationID) AS Totals
GROUP BY [Count], StatsID
) AS T INNER JOIN
dbo.Stations ON T.StationID= dbo.Stations.ID
ORDER BY T.StationID, T.StaffWorking ASC
Each row of underlying data contains a staff member, the station they were working at and their start & finish times. The purpose of the stored proc is to return a list of stations, along with the number of mins that each quantity of workers were at that station, as below:
My question is, what could be causing the incorrect and inconsistent results on the test server? And what can I do to fix it?
I have read this, possibly related, question:
Stored proc gives different result set than tsql, only on some servers
and have tried creating local variables for the parameters but it does not seem to have any effect.

what could be causing the inconsistent results
Non-deterministic ordering
ROW_NUMBER() OVER (ORDER BY(SELECT 1))
By ORDER BY(SELECT 1) you are telling the optimiser here that you don't care in which order the rows will be numbered. I didn't analyse the whole query, but is it really the case?
Another bit that has a strong smell is SELECT TOP 100 PERCENT with some ORDER BY in the inner/subquery. It looks like you think that adding ORDER BY like this in the inner query guarantees something. It doesn't.
If you need your row numbers ordered by [Time] DESC, then put it in ROW_NUMBER:
ROW_NUMBER() OVER (ORDER BY [Time] DESC)

Thanks to #Vladimir, I have managed to tweak the stored procedure so that it returns the correct results. As suggested, I moved the sorting behavior to the ROW_NUMBER function, rather than the ORDER BY clause (although it actually needed to be ASC, not DESC).
I will mark his answer as correct but thought I would post my final code here for completeness:
ALTER PROCEDURE [dbo].[Get_Station_Utilisation]
#From NVARCHAR(50),
#To NVARCHAR(50)
AS
IF #From='' SET #From = NULL
IF #To='' SET #To = NULL
SELECT T.StationID As [Station ID], dbo.Stations.StationName As [Station Name], T.StaffWorking As [Workers], T.Mins
FROM
(SELECT StatsID As StationID, [Count] As StaffWorking, SUM(Duration) AS Mins
FROM
(SELECT dbo.Active_Clockings.StationID AS StatsID, COUNT(*) AS [Count], DATEDIFF(Minute, Times2.Time, Times1.Time) AS Duration
FROM
(SELECT ROW_NUMBER() OVER (ORDER BY [Time] ASC) AS rownum, [Time]
FROM
(SELECT DISTINCT (dbo.Active_Clockings.StartTime) AS [Time]
FROM dbo.Active_Clockings
UNION
SELECT DISTINCT (dbo.Active_Clockings.FinishTime) AS [Time]
FROM dbo.Active_Clockings) AS AllTimes
) AS Times1 JOIN
(SELECT ROW_NUMBER() OVER (ORDER BY [Time] ASC) AS rownum, [Time]
FROM
(SELECT DISTINCT (dbo.Active_Clockings.StartTime) AS [Time]
FROM dbo.Active_Clockings
UNION
SELECT DISTINCT (dbo.Active_Clockings.FinishTime) AS [Time]
FROM dbo.Active_Clockings) AS AllTimes
) AS Times2 ON Times1.rownum = Times2.rownum + 1 JOIN
dbo.Active_Clockings ON Times1.Time > dbo.Active_Clockings.StartTime AND Times2.Time < dbo.Active_Clockings.FinishTime
AND (#From IS NULL OR (dbo.Active_Clockings.FinishTime > CAST(#From as date)))
AND (#To IS NULL OR dbo.Active_Clockings.FinishTime < DATEADD(Day, 1, CAST(#To as date)))
GROUP BY Times1.rownum, Times2.Time, Times1.Time, dbo.Active_Clockings.StationID) AS Totals
GROUP BY [Count], StatsID
) AS T INNER JOIN
dbo.Stations ON T.StationID= dbo.Stations.ID
ORDER BY T.StationID, T.StaffWorking ASC

Related

Taking most recent values in sum over date range

I have a table which has the following columns: DeskID *, ProductID *, Date *, Amount (where the columns marked with * make the primary key). The products in use vary over time, as represented in the image below.
Table format on the left, and a (hopefully) intuitive representation of the data on the right for one desk
The objective is to have the sum of the latest amounts of products by desk and date, including products which are no longer in use, over a date range.
e.g. using the data above the desired table is:
So on the 1st Jan, the sum is 1 of Product A
On the 2nd Jan, the sum is 2 of A and 5 of B, so 7
On the 4th Jan, the sum is 1 of A (out of use, so take the value from the 3rd), 5 of B, and 2 of C, so 8 in total
etc.
I have tried using a partition on the desk and product ordered by date to get the most recent value and turned the following code into a function (Function1 below) with #date Date parameter
select #date 'Date', t.DeskID, SUM(t.Amount) 'Sum' from (
select #date 'Date', t.DeskID, t.ProductID, t.Amount
, row_number() over (partition by t.DeskID, t.ProductID order by t.Date desc) as roworder
from Table1 t
where 1 = 1
and t.Date <= #date
) t
where t.roworder = 1
group by t.DeskID
And then using a utility calendar table and cross apply to get the required values over a time range, as below
select * from Calendar c
cross apply Function1(c.CalendarDate)
where c.CalendarDate >= '20190101' and c.CalendarDate <= '20191009'
This has the expected results, but is far too slow. Currently each desk uses around 50 products, and the products roll every month, so after just 5 years each desk has a history of ~3000 products, which causes the whole thing to grind to a halt. (Roughly 30 seconds for a range of a single month)
Is there a better approach?
Change your function to the following should be faster:
select #date 'Date', t.DeskID, SUM(t.Amount) 'Sum'
FROM (SELECT m.DeskID, m.ProductID, MAX(m.[Date) AS MaxDate
FROM Table1 m
where m.[Date] <= #date) d
INNER JOIN Table1 t
ON d.DeskID=t.DeskID
AND d.ProductID=t.ProductID
and t.[Date] = d.MaxDate
group by t.DeskID
The performance of TVF usually suffers. The following removes the TVF completely:
-- DROP TABLE Table1;
CREATE TABLE Table1 (DeskID int not null, ProductID nvarchar(32) not null, [Date] Date not null, Amount int not null, PRIMARY KEY ([Date],DeskID,ProductID));
INSERT Table1(DeskID,ProductID,[Date],Amount)
VALUES (1,'A','2019-01-01',1),(1,'A','2019-01-02',2),(1,'B','2019-01-02',5),(1,'A','2019-01-03',1)
,(1,'B','2019-01-03',4),(1,'C','2019-01-03',3),(1,'B','2019-01-04',5),(1,'C','2019-01-04',2),(1,'C','2019-01-05',2)
GO
DECLARE #StartDate date=N'2019-01-01';
DECLARE #EndDate date=N'2019-01-05';
;WITH cte_p
AS
(
SELECT DISTINCT DeskID,ProductID
FROM Table1
WHERE [Date] <= #EndDate
),
cte_a
AS
(
SELECT #StartDate AS [Date], p.DeskID, p.ProductID, ISNULL(a.Amount,0) AS Amount
FROM (
SELECT t.DeskID, t.ProductID
, MAX(t.Date) AS FirstDate
FROM Table1 t
WHERE t.Date <= #StartDate
GROUP BY t.DeskID, t.ProductID) f
INNER JOIN Table1 a
ON f.DeskID=a.DeskID
AND f.ProductID=a.ProductID
AND f.[FirstDate]=a.[Date]
RIGHT JOIN cte_p p
ON p.DeskID=a.DeskID
AND p.ProductID=a.ProductID
UNION ALL
SELECT DATEADD(DAY,1,a.[Date]) AS [Date], t.DeskID, t.ProductID, t.Amount
FROM Table1 t
INNER JOIN cte_a a
ON t.DeskID=a.DeskID
AND t.ProductID=a.ProductID
AND t.[Date] > a.[Date]
AND t.[Date] <= DATEADD(DAY,1,a.[Date])
WHERE a.[Date]<#EndDate
UNION ALL
SELECT DATEADD(DAY,1,a.[Date]) AS [Date], a.DeskID, a.ProductID, a.Amount
FROM cte_a a
WHERE NOT EXISTS(SELECT 1 FROM Table1 t
WHERE t.DeskID=a.DeskID
AND t.ProductID=a.ProductID
AND t.[Date] > a.[Date]
AND t.[Date] <= DATEADD(DAY,1,a.[Date]))
AND a.[Date]<#EndDate
)
SELECT [Date], DeskID, SUM(Amount)
FROM cte_a
GROUP BY [Date], DeskID;

Duplicate rows when trying to grab last row of another table

I am trying to grab the last (latest) blog article's link for each month using a stored procedure but I cannot seem to find a way past my problem.
Currently, my code below repeats the (the latest blog article's) 'LINK' column like so:
SELECT AVG(DATEPART(mm, b.blog_date)) AS MonthNum --CANNOT USE MONTHNUM IN ORDER BY UNLESS WRAPPED WITH AVG() [average], weird but works
, CAST(DateName(month, DateAdd(month, Datepart(MONTH, b.blog_date), -1)) AS varchar(24)) AS MONTH
, CAST(DATEPART(YEAR, b.blog_date) AS varchar(4)) AS YEAR
, CAST(count(b.blog_content) AS varchar(24)) as ARTICLES
, (SELECT TOP (1) b.blog_url
FROM Management.Blog
WHERE (website_owner_id = 2)
GROUP BY blog_date
, blog_url
ORDER BY blog_date DESC
) AS LINK
, CAST(DateName(month, DateAdd(month, Datepart(MONTH, b.blog_date), -1)) AS varchar(24)) + CAST(DATEPART(YEAR, b.blog_date) AS varchar(4)) AS ID
, blog_date as DATE
FROM Management.Blog b
WHERE b.website_owner_id = 2
GROUP BY CAST(DateName(month, DateAdd(month, Datepart(MONTH, b.blog_date), -1)) AS varchar(24))
, CAST(DATEPART(YEAR, b.blog_date) AS varchar(4))
, b.blog_url
, blog_date
, CAST(DateName(month, DateAdd(month, Datepart(MONTH, b.blog_date), -1)) AS varchar(24)) + CAST(DATEPART(YEAR, b.blog_date) AS varchar(4))
ORDER BY DATE DESC
I understand the code is horrible to read (& probably to execute on the SQL server too) but I'm in a position where I am only new to SQL server (coming from MySQL where I've only really had to use a basic select query) and I am open to any suggestions to changing the query and/or table design.
Essentially there should be no duplicates of the ID column (which is only really added in to assist in removing the duplicates and can be omitted if need be).
Without sample data I'm unable to test whether this would work
After FROM Management.Blog b
Add this
INNER JOIN(
SELECT MonthNum = DATEPART(MONTH, BL.blog_date))
,blog_date
,RN = ROW_NUMBER()OVER(ORDER BY BL.blog_date DESC)
,BL.blog_url
FROM Management.Blog BL
) X ON B.blog_date = X.blog_date
AND X.RN = 1
Replace
(SELECT TOP (1) b.blog_url
FROM Management.Blog
WHERE (website_owner_id = 2)
GROUP BY blog_date
, blog_url
ORDER BY blog_date DESC
) AS LINK
with
X.blog_url AS [LINK]
change this in GROUP BY
, b.blog_url
with
, x.blog_url
I non-functioning query usually does not do a very good job of conveying what someone wants. Based on your explanation:
I am trying to grab the last (latest) blog article's link for each month
I would expect something like this:
SELECT b.*
FROM (SELECT b.*,
ROW_NUMBER() OVER (PARTITION BY YEAR(b.blog_date), MONTH(b.blog_date), b.blog_url, b.website_owner_id
ORDER BY blog_date DESC
) as seqnum
FROM Management.Blog b
) b
WHERE b.website_owner_id = 2 AND
seqnum = 1;

Counting New Unique Values in Growing Time Window

I have a large table of users (as a guid), some associated values, and a time stamp of when each row was inserted. A user might be associated with many rows in this table.
guid | <other columns> | insertdate
I want to count for each month: how many unique new users were inserted. It's easy to do manually:
select count(distinct guid)
from table
where insertdate >= '20060201' and insertdate < '20060301'
and guid not in (select guid from table where
insertdate >= '20060101' and insertdate < '20060201')
How could this be done for each successive month in sql?
I thought to use a rank function to associate clearly each guid with a month:
select guid,
,dense_rank() over ( order by datepart(YYYY, insertdate),
datepart(m, t.TransactionDateTime)) as MonthRank
from table
and then iterate upon each rank value:
declare #no_times int
declare #counter int = 1
set #no_times = select count(distinct concat(datepart(year, t.TransactionDateTime),
datepart(month, t.TransactionDateTime))) from table
while #no_times > 0 do
(
select count(*), #counter
where guid not in (select guid from table where rank = #counter)
and rank = #int + 1
#counter += 1
#no_times -= 1
union all
)
end
I know this strategy is probably the wrong way to go about things.
Ideally, I would like a result set to look like this:
MonthRank | NoNewUsers
I would be extremely interested and grateful if a sql wizard could point me in the right direction.
SELECT
DATEPART(year,t.insertdate) AS YearNum
,DATEPART(mm,t.insertdate) as MonthNum
,COUNT(DISTINCT guid) AS NoNewUsers
,DENSE_RANK() OVER (ORDER BY COUNT(DISTINCT t.guid) DESC) AS MonthRank
FROM
table t
LEFT JOIN table t2
ON t.guid = t2.guid
AND t.insertdate > t2.insertdate
WHERE
t2.guid IS NULL
GROUP BY
DATEPART(year,t.insertdate)
,DATEPART(mm,t.insertdate)
Use a left join to see if the table ever existed as a prior insert date and if they didn't then count them using aggregation like you normally would. If you want to add a rank to see which month(s) have the highest number of new users then you can use your DENSE_RANK() function but because you are already grouping by want you want you do not need a partition clause.
If you want the first time that a guid entered, then your query doesn't exactly work. You can get the first time with two aggregations:
select year(first_insertdate), month(first_insertdate), count(*)
from (select t.guid, min(insertdate) as first_insertdate
from t
group by t.guid
) t
group by year(first_insertdate), month(first_insertdate)
order by year(first_insertdate), month(first_insertdate);
If you are looking for counting guids each time they skip a month, then you can use lag():
select year(insertdate), month(insertdate), count(*)
from (select t.*,
lag(insertdate) over (partition by guid order by insertdate) as prev_insertdate
from t
) t
where prev_insertdate is null or
datediff(month, prev_insertdate, insertdate) >= 2
group by year(insertdate), month(insertdate)
order by year(insertdate), month(insertdate);
I solved it with the terrible while loop, then a friend helped me to solve it more efficiently in another way.
The loop version:
--ranked by month
select t.TransactionID
,t.BuyerUserID
,concat(datepart(year, t.InsertDate), datepart(month,
t.InsertDate)) MonthRankName
,dense_rank() over ( order by datepart(YYYY, t.InsertDate),
datepart(m, t.InsertDate)) as MonthRank
into #ranked
from table t;
--iteratate
declare #counter int = 1
declare #no_times int
select #no_times = count(distinct concat(datepart(year, t.InsertDate),
datepart(month, t.InsertDate))) from table t;
select count(distinct r.guid) as NewUnique, r.Monthrank into #results
from #ranked r
where r.MonthRank = 1 group by r.MonthRank;
while #no_times > 1
begin
insert into #results
select count(distinct rt.guid) as NewUnique, #counter + 1 as MonthRank
from #ranked r
where rt.guid not in
(
select rt2.guid from #ranked rt2
where rt2.MonthRank = #counter
)
and rt.MonthRank = #counter + 1
set #counter = #counter+1
set #no_times = #no_times-1
end
select * from #results r
This turned out to run pretty slowly (as you might expect)
What turned out to be faster by a factor of 10 was this method:
select t.guid,
cast (concat(datepart(year, min(t.InsertDate)),
case when datepart(month, min(t.InsertDate)) < 10 then
'0'+cast( datepart(month, min(t.InsertDate)) as varchar(10))
else cast (datepart(month, min(t.InsertDate)) as varchar(10)) end
) as int) as MonthRankName
into #NewUnique
from table t
group by t.guid;
select count(1) as NewUniques, t.MonthRankName from #NewUnique t
group by t.MonthRankName
order by t.MonthRankName
Simply identifying the very first month each guid appears, then counting the number of these occurring each month. With a bit of a hack to get YearMonth formatted nicely (this seems to be more efficient than format([date], 'yyyyMM') but need to experiment more on that.

Selecting 1 row per day closest to 4am? [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 6 years ago.
We're currently working on a query for a report that returns a series of data. The customer has specified that they want to receive 5 rows total, with the data from the previous 5 days (as defined by a start date and an end date variable). For each day, they want the data from the row that's closest to 4am.
I managed to get it to work for a single day, but I certainly don't want to union 5 separate select statements simply to fetch these values. Is there any way to accomplish this via CTEs?
select top 1
'W' as [RecordType]
, [WellIdentifier] as [ProductionPtID]
, t.Name as [Device Name]
, t.RecordDate --convert(varchar, t.RecordDate, 112) as [RecordDate]
, TubingPressure as [Tubing Pressure]
, CasingPressure as [Casing Pressure]
from #tTempData t
Where cast (t.recorddate as time) = '04:00:00.000'
or datediff (hh,'04:00:00.000',cast (t.recorddate as time)) < -1.2
order by Name, RecordDate desc
assuming that the #tTempData only contains the previous 5 days records
SELECT *
FROM
(
SELECT *, rn = row_number() over
(
partition by convert(date, recorddate)
order by ABS ( datediff(minute, convert(time, recorddate) , '04:00' )
)
FROM #tTempData
)
WHERE rn = 1
You can use row_number() like this to get the top 5 last days most closest to 04:00
SELECT TOP 5 * FROM (
select t.* ,
ROW_NUMBER() OVER(PARTITION BY t.recorddate
ORDER BY abs(datediff (minute,'04:00:00.000',cast (t.recorddate as time))) rnk
from #tTempData t)
WHERE rnk = 1
ORDER BY recorddate DESC
You can use row_number() for this purpose:
select t.*
from (select t.*,
row_number() over (partition by cast(t.recorddate as date)
order by abs(datediff(ms, '04:00:00.000',
cast(t.recorddate as time)
))
) seqnum
from #tTempData t
) t
where seqnum = 1;
You can add an appropriate where clause in the subquery to get the dates that you are interested in.
Try something like this:
select
'W' as [RecordType]
, [WellIdentifier] as [ProductionPtID]
, t.Name as [Device Name]
, t.RecordDate --convert(varchar, t.RecordDate, 112) as [RecordDate]
, TubingPressure as [Tubing Pressure]
, CasingPressure as [Casing Pressure]
from #tTempData t
Where exists
(select 1 from #tTempData t1 where
ABS(datediff (hh,'04:00:00.000',cast (t.recorddate as time))) <
ABS(datediff (hh,'04:00:00.000',cast (t1.recorddate as time)))
and GETDATE(t.RecordDate) = GETDATE(t1.RecordDate)
)dt
and t.RecordDate between YOURDATERANGE
order by Name, RecordDate desc;

SQL - Replace repeated rows with null values while preserving number of rows

I am trying to get only one instance of a year instead of 12 because I am using this column in a lookup table to provide parameters to a report. Because I am using both monthly and yearly data, I am trying to get them both in the same table.
I have a table like this
--Date--------Year
--------------------
1/2012-------2012
2/2012-------2012
3/2012-------2012
4/2012-------2012
5/2012-------2012
6/2012-------2012
7/2012-------2012
8/2012-------2012
9/2012-------2012
10/2012------2012
11/2012------2012
12/2012------2012
1/2013-------2013
2/2013-------2013
And this is my desired table
--Date--------Year
--------------------
1/2012-------2012
2/2012-------null
3/2012-------null
4/2012-------null
5/2012-------null
6/2012-------null
7/2012-------null
8/2012-------null
9/2012-------null
10/2012------null
11/2012------null
12/2012------null
1/2013-------2013
2/2013-------null
Can someone give me an idea of how to solve a problem like this?
The code I am using right now is
SELECT CAST(MONTH(rmp.EcoDate) AS Varchar(2)) + '/' + CAST(YEAR(rmp.EcoDate) AS varchar(4)) AS Date, Year(rmp.EcoDate) as EcoYear
FROM PhdRpt.ReportCaseList_542 AS rcl INNER JOIN
CaseCases AS cc ON rcl.CaseCaseId = cc.CaseCaseId INNER JOIN
PhdRpt.RptMonthlyProduction_542 AS rmp ON rcl.ReportRunCaseId = rmp.ReportRunCaseId`
GROUP BY rmp.EcoDate
You can do this by enumerating the rows within a year. Then update all but the first:
with toupdate as (
select t.*, row_number() over (partition by [year] order by [date]) as seqnum
from t
)
update toupdate
set [year] = NULL
where seqnum > 1;
If you want this as a select statement:
with ts as (
select t.*, row_number() over (partition by [year] order by [date]) as seqnum
from t
)
select [date],
(case when seqnum = 1 then [year] end) as [year]
from ts;