Selecting 1 row per day closest to 4am? [duplicate]

Selecting 1 row per day closest to 4am? [duplicate] - sql

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 6 years ago.
We're currently working on a query for a report that returns a series of data. The customer has specified that they want to receive 5 rows total, with the data from the previous 5 days (as defined by a start date and an end date variable). For each day, they want the data from the row that's closest to 4am.
I managed to get it to work for a single day, but I certainly don't want to union 5 separate select statements simply to fetch these values. Is there any way to accomplish this via CTEs?
select top 1
'W' as [RecordType]
, [WellIdentifier] as [ProductionPtID]
, t.Name as [Device Name]
, t.RecordDate --convert(varchar, t.RecordDate, 112) as [RecordDate]
, TubingPressure as [Tubing Pressure]
, CasingPressure as [Casing Pressure]
from #tTempData t
Where cast (t.recorddate as time) = '04:00:00.000'
or datediff (hh,'04:00:00.000',cast (t.recorddate as time)) < -1.2
order by Name, RecordDate desc

assuming that the #tTempData only contains the previous 5 days records
SELECT *
FROM
(
SELECT *, rn = row_number() over
(
partition by convert(date, recorddate)
order by ABS ( datediff(minute, convert(time, recorddate) , '04:00' )
)
FROM #tTempData
)
WHERE rn = 1

You can use row_number() like this to get the top 5 last days most closest to 04:00
SELECT TOP 5 * FROM (
select t.* ,
ROW_NUMBER() OVER(PARTITION BY t.recorddate
ORDER BY abs(datediff (minute,'04:00:00.000',cast (t.recorddate as time))) rnk
from #tTempData t)
WHERE rnk = 1
ORDER BY recorddate DESC

You can use row_number() for this purpose:
select t.*
from (select t.*,
row_number() over (partition by cast(t.recorddate as date)
order by abs(datediff(ms, '04:00:00.000',
cast(t.recorddate as time)
))
) seqnum
from #tTempData t
) t
where seqnum = 1;
You can add an appropriate where clause in the subquery to get the dates that you are interested in.

Try something like this:
select
'W' as [RecordType]
, [WellIdentifier] as [ProductionPtID]
, t.Name as [Device Name]
, t.RecordDate --convert(varchar, t.RecordDate, 112) as [RecordDate]
, TubingPressure as [Tubing Pressure]
, CasingPressure as [Casing Pressure]
from #tTempData t
Where exists
(select 1 from #tTempData t1 where
ABS(datediff (hh,'04:00:00.000',cast (t.recorddate as time))) <
ABS(datediff (hh,'04:00:00.000',cast (t1.recorddate as time)))
and GETDATE(t.RecordDate) = GETDATE(t1.RecordDate)
)dt
and t.RecordDate between YOURDATERANGE
order by Name, RecordDate desc;

Related

SQL - Return count of consecutive days where value was unchanged

I have a table like
date
ticker
Action
'2022-03-01'
AAPL
BUY
'2022-03-02'
AAPL
SELL.
'2022-03-03'
AAPL
BUY.
'2022-03-01'
CMG
SELL.
'2022-03-02'
CMG
HOLD.
'2022-03-03'
CMG
HOLD.
'2022-03-01'
GPS
SELL.
'2022-03-02'
GPS
SELL.
'2022-03-03'
GPS
SELL.
I want to do a group by ticker then count all the times that Actions have sequentially been the value that they are as of the last date, here it's 2022-03-03. ie for this example table it'd be like;
ticker
NumSequentialDaysAction
AAPL
0
CMG
1
GPS
2
Fine to pass in 2022-03-03 as a value, don't need to figure that out on the fly.
Tried something like this
---Table Creation---
CREATE TABLE UserTable
([Date] DATETIME2, [Ticker] varchar(5), [Action] varchar(5))
;
INSERT INTO UserTable
([Date], [Ticker], [Action])
VALUES
('2022-03-01' , 'AAPL' , 'BUY'),
('2022-03-02' , 'AAPL' , 'SELL'),
('2022-03-03' , 'AAPL' , 'BUY'),
('2022-03-01' , 'CMG' , 'SELL'),
('2022-03-02' , 'CMG' , 'HOLD'),
('2022-03-03' , 'CMG' , 'HOLD'),
('2022-03-01' , 'GPS' , 'SELL'),
('2022-03-02' , 'GPS' , 'SELL'),
('2022-03-03' , 'GPS' , 'SELL')
;
---Attempted Solution---
I'm thinking that I need to do a sub query to get the last value and join on itself to get the matching values. Then apply a window function, ordered by date to see that the proceeding value is sequential.
WITH CTE AS (SELECT Date, Ticker, Action,
ROW_NUMBER() OVER (PARTITION BY Ticker, Action ORDER BY Date) as row_num
FROM UserTable)
SELECT Ticker, COUNT(DISTINCT Date) as count_of_days
FROM CTE
WHERE row_num = 1
GROUP BY Ticker;
WITH CTE AS (SELECT Date, Ticker, Action,
DENSE_RANK() OVER (PARTITION BY Ticker ORDER BY Action,Date) as rank
FROM table)
SELECT Ticker, COUNT(DISTINCT Date) as count_of_days
FROM CTE
WHERE rank = 1
GROUP BY Ticker;

You can do this with the help of the LEAD function like so. You didn't specify which RDBMS you're using. This solution works in PostgreSQL:
WITH "withSequential" AS (
SELECT
ticker,
(LEAD("Action") OVER (PARTITION BY ticker ORDER BY date ASC) = "Action") AS "nextDayIsSameAction"
FROM UserTable
)
SELECT
ticker,
SUM(
CASE
WHEN "nextDayIsSameAction" IS TRUE THEN 1
ELSE 0
END
) AS "NumSequentialDaysAction"
FROM "withSequential"
GROUP BY ticker

Here is a way to do this using gaps and islands solution.
Thanks for sharing the create and insert scripts, which helps to build the solution quickly.
dbfiddle link.
https://dbfiddle.uk/rZLDTrNR
with data
as (
select date
,ticker
,action
,case when lag(action) over(partition by ticker order by date) <> action then
1
else 0
end as marker
from usertable
)
,interim_data
as (
select *
,sum(marker) over(partition by ticker order by date) as grp_val
from data
)
,interim_data2
as (
select *
,count(*) over(partition by ticker,grp_val) as NumSequentialDaysAction
from interim_data
)
select ticker,NumSequentialDaysAction
from interim_data2
where date='2022-03-03'

Another option, you could use the difference between two row_numbers approach as the following:
select [Ticker], count(*)-1 NumSequentialDaysAction -- you could use (distinct) to remove duplicate rows
from
(
select *,
row_number() over (partition by [Ticker] order by [Date]) -
row_number() over (partition by [Ticker], [Action] order by [Date]) grp
from UserTable
where [date] <= '2022-03-03'
) RN_Groups
/* get only rows where [Action] = last date [Action] */
where [Action] = (select top 1 [Action] from UserTable T
where T.[Ticker] = RN_Groups.[Ticker] and [date] <= '2022-03-03'
order by [Date] desc)
group by [Ticker], [Action], grp
See demo

How do I include a days calculation?

We got this to work well, but I want to show a column that will have the days since the last actual_date
I don't know how to code 'day' to be an output column.
WITH
cte_ul_ev AS (
SELECT
ev.full_name,
ev.event_name,
ev.actual_date,
ev.service_provider_name,
datediff(day, actual_date, getdate())
row_num = ROW_NUMBER() OVER (PARTITION BY ev.full_name ORDER BY ev.actual_date DESC) --<<--<<--
FROM
dbo.event_expanded_view ev
WHERE
ev.full_name IS NOT NULL
AND ev.category_code IN ('OTHER_ACT', 'CONTACTS', 'PEOPLEPLANS', 'PEOPLETESTS', 'PERSONREQ')
)
SELECT
ue.full_name,
ue.event_name,
ue.actual_date,
ue.service_provider_name
FROM
cte_ul_ev ue
WHERE
ue.row_num = 1;

you just missing a comma and , and wrong way of aliasing the column and seesm like distinct is ans extra thing you are doing
;WITH
cte_ul_ev AS (
SELECT
ev.full_name,
ev.event_name,
ev.actual_date,
ev.service_provider_name,
datediff(day, actual_date, getdate()) as DaysDiff,
ROW_NUMBER() OVER (PARTITION BY ev.full_name ORDER BY ev.actual_date DESC) as row_num --<<--<<--
FROM
dbo.event_expanded_view ev
WHERE
ev.full_name IS NOT NULL
AND ev.category_code IN ('OTHER_ACT', 'CONTACTS', 'PEOPLEPLANS', 'PEOPLETESTS', 'PERSONREQ')
)
SELECT
ue.full_name,
ue.event_name,
ue.actual_date,
ue.service_provider_name.
ue.DaysDiff
FROM
cte_ul_ev ue
WHERE
ue.row_num = 1;

How to get the validity date range of a price from individual daily prices in SQL

I have some prices for the month of January.
Date,Price
1,100
2,100
3,115
4,120
5,120
6,100
7,100
8,120
9,120
10,120
Now, the o/p I need is a non-overlapping date range for each price.
price,from,To
100,1,2
115,3,3
120,4,5
100,6,7
120,8,10
I need to do this using SQL only.
For now, if I simply group by and take min and max dates, I get the below, which is an overlapping range:
price,from,to
100,1,7
115,3,3
120,4,10

This is a gaps-and-islands problem. The simplest solution is the difference of row numbers:
select price, min(date), max(date)
from (select t.*,
row_number() over (order by date) as seqnum,
row_number() over (partition by price, order by date) as seqnum2
from t
) t
group by price, (seqnum - seqnum2)
order by min(date);
Why this works is a little hard to explain. But if you look at the results of the subquery, you will see how the adjacent rows are identified by the difference in the two values.

SELECT Lag.price,Lag.[date] AS [From], MIN(Lead.[date]-Lag.[date])+Lag.[date] AS [to]
FROM
(
SELECT [date],[Price]
FROM
(
SELECT [date],[Price],LAG(Price) OVER (ORDER BY DATE,Price) AS LagID FROM #table1 A
)B
WHERE CASE WHEN Price <> ISNULL(LagID,1) THEN 1 ELSE 0 END = 1
)Lag
JOIN
(
SELECT [date],[Price]
FROM
(
SELECT [date],Price,LEAD(Price) OVER (ORDER BY DATE,Price) AS LeadID FROM [#table1] A
)B
WHERE CASE WHEN Price <> ISNULL(LeadID,1) THEN 1 ELSE 0 END = 1
)Lead
ON Lag.[Price] = Lead.[Price]
WHERE Lead.[date]-Lag.[date] >= 0
GROUP BY Lag.[date],Lag.[price]
ORDER BY Lag.[date]

Another method using ROWS UNBOUNDED PRECEDING
SELECT price, MIN([date]) AS [from], [end_date] AS [To]
FROM
(
SELECT *, MIN([abc]) OVER (ORDER BY DATE DESC ROWS UNBOUNDED PRECEDING ) end_date
FROM
(
SELECT *, CASE WHEN price = next_price THEN NULL ELSE DATE END AS abc
FROM
(
SELECT a.* , b.[date] AS next_date, b.price AS next_price
FROM #table1 a
LEFT JOIN #table1 b
ON a.[date] = b.[date]-1
)AA
)BB
)CC
GROUP BY price, end_date

How to calculate total hours from multiple in time and out time from below?

first punch as in time,
second punch as out time
if possible avoid duplicate punch on same time within a minute
I need to get all in time ,outtime in a row with total hours
like below any format.
I tried below query but can't get my expected output
WITH Level1
AS (
SELECT A.emp_reader_id,
DT
,A.EventCatId
,A.Belongs_to
,ROW_NUMBER() OVER ( PARTITION BY A.Belongs_to,A.emp_reader_id ORDER BY DT ) AS RowNum
FROM dbo.trnevents A
)
,
LEVEL2
AS (-- find the last and next event type for each row
SELECT A.emp_reader_id,A.DT , A.EventCatId ,COALESCE(LastVal.EventCatId, 10) AS LastEvent,
COALESCE(NextVal.EventCatId, 10) AS NextEvent ,A.Belongs_to
FROM Level1 A
LEFT JOIN Level1 LastVal
ON A.emp_reader_id = LastVal.emp_reader_id and A.Belongs_to=LastVal.Belongs_to
AND A.RowNum - 1 = LastVal.RowNum
LEFT JOIN Level1 NextVal
ON A.emp_reader_id = NextVal.emp_reader_id and A.Belongs_to=NextVal.Belongs_to
AND A.RowNum + 1 = NextVal.RowNum
)
select * from level2 where emp_reader_id=92 order by dt desc
Expected output:

Try this below script. I considered all DT with Sam Minutes as single entry for the calculation.
WITH CTE AS
(
SELECT MAX(emp_reader_id) emp_reader_id,
CAST(DT AS DATE) Date_for_Group,
LEFT(CAST(DT AS VARCHAR),16) Time_For_Group,
ROW_NUMBER() OVER(PARTITION BY CAST(DT AS DATE) ORDER BY LEFT(CAST(DT AS VARCHAR),16)) RN,
CASE
WHEN ROW_NUMBER() OVER(PARTITION BY CAST(DT AS DATE) ORDER BY LEFT(CAST(DT AS VARCHAR),16))%2 = 0 THEN 'OUT'
ELSE 'IN'
END In_Out
FROM your_table
GROUP BY CAST(DT AS DATE),LEFT(CAST(DT AS VARCHAR),16)
)
SELECT A.emp_reader_id,A.Date_for_Group,
SUM(DATEDIFF(Minute,CAST(A.Time_For_Group AS DATETIME),CAST(B.Time_For_Group AS DATETIME)))/60 Hr,
SUM(DATEDIFF(Minute,CAST(A.Time_For_Group AS DATETIME),CAST(B.Time_For_Group AS DATETIME)))%60 Min
FROM CTE A
INNER JOIN CTE B
ON A.emp_reader_id = B.emp_reader_id
AND A.RN = B.RN -1
AND A.Date_for_Group = B.Date_for_Group
WHERE A.In_Out = 'IN'
GROUP BY A.emp_reader_id,A.Date_for_Group

first assign rownumber to datetime column then start the same result set with rownumber+1
Then Inner join them on rownumbers. After that select min an max from timein and out columns and group by on date to get total workhours of that day. hope it helps.
select empid
,date
,min(timein) as timein,max (timeout) timeout,convert(nvarchar(20),datediff(hh,min (timein),max(timeout))%24)
+':'+
convert(nvarchar(20),datediff(mi,min (timein),max(timeout))%60) as totalhrs
from(
Select a.empid,cast(a.dt as date) date,b.dt as timein,a.dt as timeout from(
SELECT DT
,[empid]
, id
,row_number() over(order by dt) as inn
FROM [test1].[dbo].[Table_2]
)a
inner join(
SELECT distinct DT
,[empid]
, id
,rank() over(order by dt)+1 as out
FROM [test1].[dbo].[Table_2])b
on FORMAT(a.dt,'hh:mm') <> FORMAT(b.dt,'hh:mm')
and cast(a.dt as date)=cast(b.dt as date)
and a.inn=b.out)b
group by b.empid,b.date

Stored proc gives different result set on different server

I have put together a stored procedure on my dev machine, which runs SQL Server 10.50.6220 (Express). It works correctly and returns the expected (and consistent) results.
I have then done a full backup and restored to a test machine running SQL Server 10.50.6000.34. The stored proc on the new server now returns incorrect results, whats more, the results it returns are different each time it is run.
ALTER PROCEDURE [dbo].[Get_Station_Utilisation]
#From NVARCHAR(50),
#To NVARCHAR(50)
AS
IF #From='' SET #From = NULL
IF #To='' SET #To = NULL
SELECT T.StationID As [Station ID], dbo.Stations.StationName As [Station Name], T.StaffWorking As [Workers], T.Mins
FROM
(SELECT StatsID As StationID, [Count] As StaffWorking, SUM(Duration) AS Mins
FROM
(SELECT dbo.Active_Clockings.StationID AS StatsID, COUNT(*) AS [Count], DATEDIFF(Minute, Times2.Time, Times1.Time) AS Duration
FROM
(SELECT TOP 100 PERCENT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rownum, [Time]
FROM
(SELECT DISTINCT (dbo.Active_Clockings.StartTime) AS [Time]
FROM dbo.Active_Clockings
UNION
SELECT DISTINCT (dbo.Active_Clockings.FinishTime) AS [Time]
FROM dbo.Active_Clockings) AS AllTimes
ORDER BY [Time] DESC) AS Times1 JOIN
(SELECT TOP 100 PERCENT ROW_NUMBER() OVER (ORDER BY(SELECT 1)) AS rownum, [Time]
FROM
(SELECT DISTINCT (dbo.Active_Clockings.StartTime) AS [Time]
FROM dbo.Active_Clockings
UNION
SELECT DISTINCT (dbo.Active_Clockings.FinishTime) AS [Time]
FROM dbo.Active_Clockings) AS AllTimes
ORDER BY [Time] DESC) AS Times2 ON Times1.rownum = Times2.rownum + 1 JOIN
dbo.Active_Clockings ON Times1.Time > dbo.Active_Clockings.StartTime AND Times2.Time < dbo.Active_Clockings.FinishTime
AND (#From IS NULL OR (dbo.Active_Clockings.FinishTime > CAST(#From as date)))
AND (#To IS NULL OR dbo.Active_Clockings.FinishTime < DATEADD(Day, 1, CAST(#To as date)))
GROUP BY Times1.rownum, Times2.Time, Times1.Time, dbo.Active_Clockings.StationID) AS Totals
GROUP BY [Count], StatsID
) AS T INNER JOIN
dbo.Stations ON T.StationID= dbo.Stations.ID
ORDER BY T.StationID, T.StaffWorking ASC
Each row of underlying data contains a staff member, the station they were working at and their start & finish times. The purpose of the stored proc is to return a list of stations, along with the number of mins that each quantity of workers were at that station, as below:
My question is, what could be causing the incorrect and inconsistent results on the test server? And what can I do to fix it?
I have read this, possibly related, question:
Stored proc gives different result set than tsql, only on some servers
and have tried creating local variables for the parameters but it does not seem to have any effect.

what could be causing the inconsistent results
Non-deterministic ordering
ROW_NUMBER() OVER (ORDER BY(SELECT 1))
By ORDER BY(SELECT 1) you are telling the optimiser here that you don't care in which order the rows will be numbered. I didn't analyse the whole query, but is it really the case?
Another bit that has a strong smell is SELECT TOP 100 PERCENT with some ORDER BY in the inner/subquery. It looks like you think that adding ORDER BY like this in the inner query guarantees something. It doesn't.
If you need your row numbers ordered by [Time] DESC, then put it in ROW_NUMBER:
ROW_NUMBER() OVER (ORDER BY [Time] DESC)

Thanks to #Vladimir, I have managed to tweak the stored procedure so that it returns the correct results. As suggested, I moved the sorting behavior to the ROW_NUMBER function, rather than the ORDER BY clause (although it actually needed to be ASC, not DESC).
I will mark his answer as correct but thought I would post my final code here for completeness:
ALTER PROCEDURE [dbo].[Get_Station_Utilisation]
#From NVARCHAR(50),
#To NVARCHAR(50)
AS
IF #From='' SET #From = NULL
IF #To='' SET #To = NULL
SELECT T.StationID As [Station ID], dbo.Stations.StationName As [Station Name], T.StaffWorking As [Workers], T.Mins
FROM
(SELECT StatsID As StationID, [Count] As StaffWorking, SUM(Duration) AS Mins
FROM
(SELECT dbo.Active_Clockings.StationID AS StatsID, COUNT(*) AS [Count], DATEDIFF(Minute, Times2.Time, Times1.Time) AS Duration
FROM
(SELECT ROW_NUMBER() OVER (ORDER BY [Time] ASC) AS rownum, [Time]
FROM
(SELECT DISTINCT (dbo.Active_Clockings.StartTime) AS [Time]
FROM dbo.Active_Clockings
UNION
SELECT DISTINCT (dbo.Active_Clockings.FinishTime) AS [Time]
FROM dbo.Active_Clockings) AS AllTimes
) AS Times1 JOIN
(SELECT ROW_NUMBER() OVER (ORDER BY [Time] ASC) AS rownum, [Time]
FROM
(SELECT DISTINCT (dbo.Active_Clockings.StartTime) AS [Time]
FROM dbo.Active_Clockings
UNION
SELECT DISTINCT (dbo.Active_Clockings.FinishTime) AS [Time]
FROM dbo.Active_Clockings) AS AllTimes
) AS Times2 ON Times1.rownum = Times2.rownum + 1 JOIN
dbo.Active_Clockings ON Times1.Time > dbo.Active_Clockings.StartTime AND Times2.Time < dbo.Active_Clockings.FinishTime
AND (#From IS NULL OR (dbo.Active_Clockings.FinishTime > CAST(#From as date)))
AND (#To IS NULL OR dbo.Active_Clockings.FinishTime < DATEADD(Day, 1, CAST(#To as date)))
GROUP BY Times1.rownum, Times2.Time, Times1.Time, dbo.Active_Clockings.StationID) AS Totals
GROUP BY [Count], StatsID
) AS T INNER JOIN
dbo.Stations ON T.StationID= dbo.Stations.ID
ORDER BY T.StationID, T.StaffWorking ASC

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Selecting 1 row per day closest to 4am? [duplicate] - sql

assuming that the #tTempData only contains the previous 5 days records SELECT * FROM ( SELECT *, rn = row_number() over ( partition by convert(date, recorddate) order by ABS ( datediff(minute, convert(time, recorddate) , '04:00' ) ) FROM #tTempData ) WHERE rn = 1

You can use row_number() like this to get the top 5 last days most closest to 04:00 SELECT TOP 5 * FROM ( select t.* , ROW_NUMBER() OVER(PARTITION BY t.recorddate ORDER BY abs(datediff (minute,'04:00:00.000',cast (t.recorddate as time))) rnk from #tTempData t) WHERE rnk = 1 ORDER BY recorddate DESC

Related

SQL - Return count of consecutive days where value was unchanged

How do I include a days calculation?

How to get the validity date range of a price from individual daily prices in SQL

How to calculate total hours from multiple in time and out time from below?

Stored proc gives different result set on different server

Categories

Resources