How do I include a days calculation? - sql

We got this to work well, but I want to show a column that will have the days since the last actual_date
I don't know how to code 'day' to be an output column.
WITH
cte_ul_ev AS (
SELECT
ev.full_name,
ev.event_name,
ev.actual_date,
ev.service_provider_name,
datediff(day, actual_date, getdate())
row_num = ROW_NUMBER() OVER (PARTITION BY ev.full_name ORDER BY ev.actual_date DESC) --<<--<<--
FROM
dbo.event_expanded_view ev
WHERE
ev.full_name IS NOT NULL
AND ev.category_code IN ('OTHER_ACT', 'CONTACTS', 'PEOPLEPLANS', 'PEOPLETESTS', 'PERSONREQ')
)
SELECT
ue.full_name,
ue.event_name,
ue.actual_date,
ue.service_provider_name
FROM
cte_ul_ev ue
WHERE
ue.row_num = 1;

you just missing a comma and , and wrong way of aliasing the column and seesm like distinct is ans extra thing you are doing
;WITH
cte_ul_ev AS (
SELECT
ev.full_name,
ev.event_name,
ev.actual_date,
ev.service_provider_name,
datediff(day, actual_date, getdate()) as DaysDiff,
ROW_NUMBER() OVER (PARTITION BY ev.full_name ORDER BY ev.actual_date DESC) as row_num --<<--<<--
FROM
dbo.event_expanded_view ev
WHERE
ev.full_name IS NOT NULL
AND ev.category_code IN ('OTHER_ACT', 'CONTACTS', 'PEOPLEPLANS', 'PEOPLETESTS', 'PERSONREQ')
)
SELECT
ue.full_name,
ue.event_name,
ue.actual_date,
ue.service_provider_name.
ue.DaysDiff
FROM
cte_ul_ev ue
WHERE
ue.row_num = 1;

Related

SQL Server LEAD function

-- FIRST LOGIN DATE
WITH CTE_FIRST_LOGIN AS
(
SELECT
PLAYER_ID, EVENT_DATE,
ROW_NUMBER() OVER (PARTITION BY PLAYER_ID ORDER BY EVENT_DATE ASC) AS RN
FROM
ACTIVITY
),
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS AS
(
SELECT
PLAYER_ID,
LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE
FROM
ACTIVITY A
JOIN
CTE_FIRST_LOGIN C ON A.PLAYER_ID = C.PLAYER_ID
WHERE
NEXT_DATE = DATEADD(DAY, 1, A.EVENT_DATE) AND C.RN = 1
GROUP BY
A.PLAYER_ID
)
-- FRACTION
SELECT
NULLIF(ROUND(1.00 * COUNT(CTE_CONSEC.PLAYER_ID) / COUNT(DISTINCT PLAYER_ID), 2), 0) AS FRACTION
FROM
ACTIVITY
JOIN
CTE_CONSEC_PLAYERS CTE_CONSEC ON CTE_CONSEC.PLAYER_ID = ACTIVITY.PLAYER_ID
I am getting the following error when I run this query.
[42S22] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid column name 'NEXT_DATE'. (207) (SQLExecDirectW)
This is a leetcode medium question 550. Game Play Analysis IV. I wanted to know why it can't identify the column NEXT_DATE here and what am I missing? Thanks!
The problem is in this CTE:
-- CONSECUTIVE LOGINS prep
CTE_CONSEC_PLAYERS AS (
SELECT
PLAYER_ID,
LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE
FROM ACTIVITY A
JOIN CTE_FIRST_LOGIN C ON A.PLAYER_ID = C.PLAYER_ID
WHERE NEXT_DATE = DATEADD(DAY, 1, A.EVENT_DATE) AND C.RN = 1
GROUP BY A.PLAYER_ID
)
Note that you are creating NEXT_DATE as a column alias in this CTE but also referring to it in the WHERE clause. This is invalid because by SQL clause-ordering rules the NEXT_DATE column alias does not exist until you get to the ORDER BY clause which is the last evaluated clause in a SQL query or subquery. You don't have an ORDER BY clause in this subquery, so technically the NEXT_DATE column alias only exists to [sub]queries that both come after and reference your CTE_CONSEC_PLAYERS CTE.
To fix this you'd probably want two CTEs like this (untested):
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS_pre AS (
SELECT
PLAYER_ID,
RN,
EVENT_DATE,
LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE
FROM ACTIVITY A
JOIN CTE_FIRST_LOGIN C ON A.PLAYER_ID = C.PLAYER_ID
)
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS AS (
SELECT
PLAYER_ID,
MAX(NEXT_DATE) AS NEXT_DATE,
FROM CTE_CONSEC_PLAYERS_pre
WHERE NEXT_DATE = DATEADD(DAY, 1, EVENT_DATE) AND RN = 1
GROUP BY PLAYER_ID
)
You gave every table an alias (for example JOIN CTE_FIRST_LOGIN C has the alias C), and every column access is via the alias. You need to add the correct alias from the correct table to NEXT_DATE.
Your primary issue is that NEXT_DATE is a window function, and therefore cannot be referred to in the WHERE because of SQL's order of operations.
But it seems this query is over-complicated.
The problem to be solved appears to be: how many players logged in the day after they first logged in, as a percentage of all players.
This can be done in a single pass (no joins), by using multiple window functions together:
WITH CTE_FIRST_LOGIN AS (
SELECT
PLAYER_ID,
EVENT_DATE,
ROW_NUMBER() OVER (PARTITION BY PLAYER_ID ORDER BY EVENT_DATE) AS RN,
-- if EVENT_DATE is a datetime and can have multiple per day then group by CAST(EVENT_DATE AS date) first
LEAD(EVENT_DATE, 1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) AS NextDate
FROM ACTIVITY
),
BY_PLAYERS AS (
SELECT
c.PLAYER_ID,
SUM(CASE WHEN c.RN = 1 AND c.NextDate = DATEADD(DAY, 1, c.EVENT_DATE)
THEN 1 END) AS IsConsecutive
FROM CTE_FIRST_LOGIN AS c
GROUP BY c.PLAYER_ID
)
SELECT ROUND(
1.00 *
COUNT(c.IsConsecutive) /
NULLIF(COUNT(*), 0)
,2) AS FRACTION
FROM BY_PLAYERS AS c;
You could theoretically merge BY_PLAYERS into the outer query and use COUNT(DISTINCT but splitting them feels cleaner

How to get the validity date range of a price from individual daily prices in SQL

I have some prices for the month of January.
Date,Price
1,100
2,100
3,115
4,120
5,120
6,100
7,100
8,120
9,120
10,120
Now, the o/p I need is a non-overlapping date range for each price.
price,from,To
100,1,2
115,3,3
120,4,5
100,6,7
120,8,10
I need to do this using SQL only.
For now, if I simply group by and take min and max dates, I get the below, which is an overlapping range:
price,from,to
100,1,7
115,3,3
120,4,10
This is a gaps-and-islands problem. The simplest solution is the difference of row numbers:
select price, min(date), max(date)
from (select t.*,
row_number() over (order by date) as seqnum,
row_number() over (partition by price, order by date) as seqnum2
from t
) t
group by price, (seqnum - seqnum2)
order by min(date);
Why this works is a little hard to explain. But if you look at the results of the subquery, you will see how the adjacent rows are identified by the difference in the two values.
SELECT Lag.price,Lag.[date] AS [From], MIN(Lead.[date]-Lag.[date])+Lag.[date] AS [to]
FROM
(
SELECT [date],[Price]
FROM
(
SELECT [date],[Price],LAG(Price) OVER (ORDER BY DATE,Price) AS LagID FROM #table1 A
)B
WHERE CASE WHEN Price <> ISNULL(LagID,1) THEN 1 ELSE 0 END = 1
)Lag
JOIN
(
SELECT [date],[Price]
FROM
(
SELECT [date],Price,LEAD(Price) OVER (ORDER BY DATE,Price) AS LeadID FROM [#table1] A
)B
WHERE CASE WHEN Price <> ISNULL(LeadID,1) THEN 1 ELSE 0 END = 1
)Lead
ON Lag.[Price] = Lead.[Price]
WHERE Lead.[date]-Lag.[date] >= 0
GROUP BY Lag.[date],Lag.[price]
ORDER BY Lag.[date]
Another method using ROWS UNBOUNDED PRECEDING
SELECT price, MIN([date]) AS [from], [end_date] AS [To]
FROM
(
SELECT *, MIN([abc]) OVER (ORDER BY DATE DESC ROWS UNBOUNDED PRECEDING ) end_date
FROM
(
SELECT *, CASE WHEN price = next_price THEN NULL ELSE DATE END AS abc
FROM
(
SELECT a.* , b.[date] AS next_date, b.price AS next_price
FROM #table1 a
LEFT JOIN #table1 b
ON a.[date] = b.[date]-1
)AA
)BB
)CC
GROUP BY price, end_date

How to calculate total hours from multiple in time and out time from below?

first punch as in time,
second punch as out time
if possible avoid duplicate punch on same time within a minute
I need to get all in time ,outtime in a row with total hours
like below any format.
I tried below query but can't get my expected output
WITH Level1
AS (
SELECT A.emp_reader_id,
DT
,A.EventCatId
,A.Belongs_to
,ROW_NUMBER() OVER ( PARTITION BY A.Belongs_to,A.emp_reader_id ORDER BY DT ) AS RowNum
FROM dbo.trnevents A
)
,
LEVEL2
AS (-- find the last and next event type for each row
SELECT A.emp_reader_id,A.DT , A.EventCatId ,COALESCE(LastVal.EventCatId, 10) AS LastEvent,
COALESCE(NextVal.EventCatId, 10) AS NextEvent ,A.Belongs_to
FROM Level1 A
LEFT JOIN Level1 LastVal
ON A.emp_reader_id = LastVal.emp_reader_id and A.Belongs_to=LastVal.Belongs_to
AND A.RowNum - 1 = LastVal.RowNum
LEFT JOIN Level1 NextVal
ON A.emp_reader_id = NextVal.emp_reader_id and A.Belongs_to=NextVal.Belongs_to
AND A.RowNum + 1 = NextVal.RowNum
)
select * from level2 where emp_reader_id=92 order by dt desc
Expected output:
Try this below script. I considered all DT with Sam Minutes as single entry for the calculation.
WITH CTE AS
(
SELECT MAX(emp_reader_id) emp_reader_id,
CAST(DT AS DATE) Date_for_Group,
LEFT(CAST(DT AS VARCHAR),16) Time_For_Group,
ROW_NUMBER() OVER(PARTITION BY CAST(DT AS DATE) ORDER BY LEFT(CAST(DT AS VARCHAR),16)) RN,
CASE
WHEN ROW_NUMBER() OVER(PARTITION BY CAST(DT AS DATE) ORDER BY LEFT(CAST(DT AS VARCHAR),16))%2 = 0 THEN 'OUT'
ELSE 'IN'
END In_Out
FROM your_table
GROUP BY CAST(DT AS DATE),LEFT(CAST(DT AS VARCHAR),16)
)
SELECT A.emp_reader_id,A.Date_for_Group,
SUM(DATEDIFF(Minute,CAST(A.Time_For_Group AS DATETIME),CAST(B.Time_For_Group AS DATETIME)))/60 Hr,
SUM(DATEDIFF(Minute,CAST(A.Time_For_Group AS DATETIME),CAST(B.Time_For_Group AS DATETIME)))%60 Min
FROM CTE A
INNER JOIN CTE B
ON A.emp_reader_id = B.emp_reader_id
AND A.RN = B.RN -1
AND A.Date_for_Group = B.Date_for_Group
WHERE A.In_Out = 'IN'
GROUP BY A.emp_reader_id,A.Date_for_Group
first assign rownumber to datetime column then start the same result set with rownumber+1
Then Inner join them on rownumbers. After that select min an max from timein and out columns and group by on date to get total workhours of that day. hope it helps.
select empid
,date
,min(timein) as timein,max (timeout) timeout,convert(nvarchar(20),datediff(hh,min (timein),max(timeout))%24)
+':'+
convert(nvarchar(20),datediff(mi,min (timein),max(timeout))%60) as totalhrs
from(
Select a.empid,cast(a.dt as date) date,b.dt as timein,a.dt as timeout from(
SELECT DT
,[empid]
, id
,row_number() over(order by dt) as inn
FROM [test1].[dbo].[Table_2]
)a
inner join(
SELECT distinct DT
,[empid]
, id
,rank() over(order by dt)+1 as out
FROM [test1].[dbo].[Table_2])b
on FORMAT(a.dt,'hh:mm') <> FORMAT(b.dt,'hh:mm')
and cast(a.dt as date)=cast(b.dt as date)
and a.inn=b.out)b
group by b.empid,b.date

Selecting 1 row per day closest to 4am? [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 6 years ago.
We're currently working on a query for a report that returns a series of data. The customer has specified that they want to receive 5 rows total, with the data from the previous 5 days (as defined by a start date and an end date variable). For each day, they want the data from the row that's closest to 4am.
I managed to get it to work for a single day, but I certainly don't want to union 5 separate select statements simply to fetch these values. Is there any way to accomplish this via CTEs?
select top 1
'W' as [RecordType]
, [WellIdentifier] as [ProductionPtID]
, t.Name as [Device Name]
, t.RecordDate --convert(varchar, t.RecordDate, 112) as [RecordDate]
, TubingPressure as [Tubing Pressure]
, CasingPressure as [Casing Pressure]
from #tTempData t
Where cast (t.recorddate as time) = '04:00:00.000'
or datediff (hh,'04:00:00.000',cast (t.recorddate as time)) < -1.2
order by Name, RecordDate desc
assuming that the #tTempData only contains the previous 5 days records
SELECT *
FROM
(
SELECT *, rn = row_number() over
(
partition by convert(date, recorddate)
order by ABS ( datediff(minute, convert(time, recorddate) , '04:00' )
)
FROM #tTempData
)
WHERE rn = 1
You can use row_number() like this to get the top 5 last days most closest to 04:00
SELECT TOP 5 * FROM (
select t.* ,
ROW_NUMBER() OVER(PARTITION BY t.recorddate
ORDER BY abs(datediff (minute,'04:00:00.000',cast (t.recorddate as time))) rnk
from #tTempData t)
WHERE rnk = 1
ORDER BY recorddate DESC
You can use row_number() for this purpose:
select t.*
from (select t.*,
row_number() over (partition by cast(t.recorddate as date)
order by abs(datediff(ms, '04:00:00.000',
cast(t.recorddate as time)
))
) seqnum
from #tTempData t
) t
where seqnum = 1;
You can add an appropriate where clause in the subquery to get the dates that you are interested in.
Try something like this:
select
'W' as [RecordType]
, [WellIdentifier] as [ProductionPtID]
, t.Name as [Device Name]
, t.RecordDate --convert(varchar, t.RecordDate, 112) as [RecordDate]
, TubingPressure as [Tubing Pressure]
, CasingPressure as [Casing Pressure]
from #tTempData t
Where exists
(select 1 from #tTempData t1 where
ABS(datediff (hh,'04:00:00.000',cast (t.recorddate as time))) <
ABS(datediff (hh,'04:00:00.000',cast (t1.recorddate as time)))
and GETDATE(t.RecordDate) = GETDATE(t1.RecordDate)
)dt
and t.RecordDate between YOURDATERANGE
order by Name, RecordDate desc;

SQL Server 2008 - Finding duplicates using ROW_NUMBER

i have the following SQL which works to find duplicates
SELECT *
FROM (SELECT
id,
ShipAddress,
ShipZIPPostal,
ROW_NUMBER() OVER (PARTITION BY shipaddress, shipzippostal ORDER BY shipaddress) ROWNUM
FROM orders
WHERE CONVERT(date, orderdate) = CONVERT(date, GETDATE())) x
WHERE rownum > 1
I would like to only see rows where, if the value of Rownum > 1 then i would like to see its corresponding row where rownum =1.
So basically, if a row has duplicates, i want to see the original row and all its duplicates.
If a row does not have duplicates, then i don't want to see it (it will have rownum = 1 )
How would i do this please?
cheers
Use count(*) rather than row_number():
SELECT *
FROM (SELECT id, ShipAddress, ShipZIPPostal,
COUNT(*) OVER (PARTITION BY shipaddress, shipzippostal) as cnt
FROM orders
WHERE CONVERT(date, orderdate) = CONVERT(date, GETDATE())
) x
WHERE cnt > 1;
In addition to Gordon's answer, if you want to keep the row_number() approach for some academic reason, you can do this:
SELECT *
FROM (SELECT
id,
ShipAddress,
ShipZIPPostal,
ROW_NUMBER() OVER (PARTITION BY shipaddress, shipzippostal ORDER BY shipaddress) ROWNUM
FROM orders
WHERE CONVERT(date, orderdate) = CONVERT(date, GETDATE())) x
WHERE EXISTS(
SELECT * FROM x x2
WHERE x.shipaddress=x2.shipaddress
AND x.shipzippostal=x2.shipzippostal
AND x2.ROWNUM>1
)
I'd actually prefer a cte structure like this personally:
WITH cte AS (
SELECT
id,
ShipAddress,
ShipZIPPostal,
ROW_NUMBER() OVER (PARTITION BY shipaddress, shipzippostal ORDER BY shipaddress) ROWNUM
FROM orders
WHERE CONVERT(date, orderdate) = CONVERT(date, GETDATE())
)
SELECT *
FROM cte
WHERE EXISTS(
SELECT * FROM cte x2
WHERE cte.shipaddress=x2.shipaddress
AND cte.shipzippostal=x2.shipzippostal
AND x2.ROWNUM>1
)
You could add a second row_number, but change the order by to ID so it will be different, and compare the 2 row_numbers
SELECT
*
FROM
(SELECT
id,
ShipAddress,
ShipZIPPostal,
ROW_NUMBER() OVER (PARTITION BY shipaddress,shipzippostal ORDER BY id) ROWNUM1,
ROW_NUMBER() OVER (PARTITION BY shipaddress,shipzippostal ORDER BY id DESC) ROWNUM2
FROM
orders
WHERE
CONVERT(DATE,orderdate) = CONVERT(DATE,GETDATE())
) x
WHERE
ROWNUM1 <> ROWNUM2