SQL Time Attendance Query - sql

Recently I made a switch from MS Access to SQL Server. Due to this switch I am having issues with making one SQL query to work.
This is how the current table looks like in SQL.
This is what I am trying to get as result from the query:
Previously I was able to make it work in MS Access with the following query:
SELECT m.UserEnrollNumber, m.Checktime AS TimeIn, (SELECT Min(s.Checktime)
FROM CheckInOut1 s
WHERE s.UserEnrollNumber = m.UserEnrollNumber
AND s.Checktime > m.Checktime
AND s.Checktime <= Int(m.Checktime) + 1) AS TimeOut
FROM CheckInOut1 AS m
WHERE ((((SELECT COUNT(*)
FROM CheckInOut1 s
WHERE s.UserEnrollNumber = m.UserEnrollNumber
AND s.Checktime <= m.Checktime
AND s.Checktime >= INT(m.Checktime)) Mod 2)=1));
The following query as answer from #GMB:
select
employee_id,
min(time_in_out) check_in,
max(time_in_out) check_out
from (
select t.*, row_number() over(partition by employee_id order by time_in_out) - 1 rn
from mytable t
) t
group by employee_id, floor(rn / 2)
order by employee_id, floor(rn / 2)
from SQL table:
gives me the following result:
Seems like the minimum and maximum rows are shown, but the rows in between are not.
The following query from #Gordon Linoff:
SELECT cio.EmployeeID, cio.TimeInOut AS CheckIn,
cio.TimeInOut as CheckOut
FROM (SELECT cio.*,
ROW_NUMBER() OVER (PARTITION BY cio.EmployeeID, CONVERT(date, cio.TimeInOut) ORDER BY cio.TimeInOut) as seqnum,
LEAD(cio.TimeInOut) OVER (PARTITION BY cio.EmployeeID, CONVERT(date, cio.TimeInOut) ORDER BY cio.TimeInOut) as next_TimeInOut
FROM CheckInOut22 cio
) cio
WHERE seqnum % 2 = 1;
Gives me the following result:
Checkin is the same as CheckOut.
All help would be appreciated.

This is much simpler in SQL Server. Use window functions:
SELECT cio.EmployeeID, cio.TimeInOut AS CheckIn,
cio.next_TimeInOut as CheckOut
FROM (SELECT cio.*,
ROW_NUMBER() OVER (PARTITION BY cio.EmployeeID, CONVERT(date, cio.TimeInOut) ORDER BY cio.TimeInOut) as seqnum,
LEAD(cio.TimeInOut) OVER (PARTITION BY cio.EmployeeID, CONVERT(date, cio.TimeInOut) ORDER BY cio.TimeInOut) as next_TimeInOut
FROM CheckInOut cio
) cio
WHERE seqnum % 2 = 1;

Related

SQL Server LEAD function

-- FIRST LOGIN DATE
WITH CTE_FIRST_LOGIN AS
(
SELECT
PLAYER_ID, EVENT_DATE,
ROW_NUMBER() OVER (PARTITION BY PLAYER_ID ORDER BY EVENT_DATE ASC) AS RN
FROM
ACTIVITY
),
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS AS
(
SELECT
PLAYER_ID,
LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE
FROM
ACTIVITY A
JOIN
CTE_FIRST_LOGIN C ON A.PLAYER_ID = C.PLAYER_ID
WHERE
NEXT_DATE = DATEADD(DAY, 1, A.EVENT_DATE) AND C.RN = 1
GROUP BY
A.PLAYER_ID
)
-- FRACTION
SELECT
NULLIF(ROUND(1.00 * COUNT(CTE_CONSEC.PLAYER_ID) / COUNT(DISTINCT PLAYER_ID), 2), 0) AS FRACTION
FROM
ACTIVITY
JOIN
CTE_CONSEC_PLAYERS CTE_CONSEC ON CTE_CONSEC.PLAYER_ID = ACTIVITY.PLAYER_ID
I am getting the following error when I run this query.
[42S22] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid column name 'NEXT_DATE'. (207) (SQLExecDirectW)
This is a leetcode medium question 550. Game Play Analysis IV. I wanted to know why it can't identify the column NEXT_DATE here and what am I missing? Thanks!
The problem is in this CTE:
-- CONSECUTIVE LOGINS prep
CTE_CONSEC_PLAYERS AS (
SELECT
PLAYER_ID,
LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE
FROM ACTIVITY A
JOIN CTE_FIRST_LOGIN C ON A.PLAYER_ID = C.PLAYER_ID
WHERE NEXT_DATE = DATEADD(DAY, 1, A.EVENT_DATE) AND C.RN = 1
GROUP BY A.PLAYER_ID
)
Note that you are creating NEXT_DATE as a column alias in this CTE but also referring to it in the WHERE clause. This is invalid because by SQL clause-ordering rules the NEXT_DATE column alias does not exist until you get to the ORDER BY clause which is the last evaluated clause in a SQL query or subquery. You don't have an ORDER BY clause in this subquery, so technically the NEXT_DATE column alias only exists to [sub]queries that both come after and reference your CTE_CONSEC_PLAYERS CTE.
To fix this you'd probably want two CTEs like this (untested):
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS_pre AS (
SELECT
PLAYER_ID,
RN,
EVENT_DATE,
LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE
FROM ACTIVITY A
JOIN CTE_FIRST_LOGIN C ON A.PLAYER_ID = C.PLAYER_ID
)
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS AS (
SELECT
PLAYER_ID,
MAX(NEXT_DATE) AS NEXT_DATE,
FROM CTE_CONSEC_PLAYERS_pre
WHERE NEXT_DATE = DATEADD(DAY, 1, EVENT_DATE) AND RN = 1
GROUP BY PLAYER_ID
)
You gave every table an alias (for example JOIN CTE_FIRST_LOGIN C has the alias C), and every column access is via the alias. You need to add the correct alias from the correct table to NEXT_DATE.
Your primary issue is that NEXT_DATE is a window function, and therefore cannot be referred to in the WHERE because of SQL's order of operations.
But it seems this query is over-complicated.
The problem to be solved appears to be: how many players logged in the day after they first logged in, as a percentage of all players.
This can be done in a single pass (no joins), by using multiple window functions together:
WITH CTE_FIRST_LOGIN AS (
SELECT
PLAYER_ID,
EVENT_DATE,
ROW_NUMBER() OVER (PARTITION BY PLAYER_ID ORDER BY EVENT_DATE) AS RN,
-- if EVENT_DATE is a datetime and can have multiple per day then group by CAST(EVENT_DATE AS date) first
LEAD(EVENT_DATE, 1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) AS NextDate
FROM ACTIVITY
),
BY_PLAYERS AS (
SELECT
c.PLAYER_ID,
SUM(CASE WHEN c.RN = 1 AND c.NextDate = DATEADD(DAY, 1, c.EVENT_DATE)
THEN 1 END) AS IsConsecutive
FROM CTE_FIRST_LOGIN AS c
GROUP BY c.PLAYER_ID
)
SELECT ROUND(
1.00 *
COUNT(c.IsConsecutive) /
NULLIF(COUNT(*), 0)
,2) AS FRACTION
FROM BY_PLAYERS AS c;
You could theoretically merge BY_PLAYERS into the outer query and use COUNT(DISTINCT but splitting them feels cleaner

Lag functions and SUM

I need to get the list of users that have been offline for at least 20 min every day. Here's my data
I have this starting query but am stuck on how to sum the difference in offline_mins i.e. need to add "and sum(offline_mins)>=20" to the where clause
SELECT
userid,
connected,
LAG(recordeddt) OVER(PARTITION BY userid
ORDER BY userid,
recordeddt) AS offline_period,
DATEDIFF(minute, LAG(recordeddt) OVER(PARTITION BY userid
ORDER BY userid,
recordeddt),recordeddt) offline_mins
FROM device_data where connected=0;
My expected results :
Thanks in advance.
This reads like a gaps-and-island problem, where you want to group together adjacent rows having the same userid and status.
As a starter, here is a query that computes the islands:
select userid, connected, min(recordeddt) startdt, max(lead_recordeddt) enddt,
datediff(min(recordeddt), max(lead_recordeddt)) duration
from (
select dd.*,
row_number() over(partition by userid order by recordeddt) rn1,
row_number() over(partition by userid, connected order by recordeddt) rn2,
lead(recordeddt) over(partition by userid order by recordeddt) lead_recordeddt
from device_data dd
) dd
group by userid, connected, rn1 - rn2
Now, say you want users that were offline for at least 20 minutes every day. You can breakdown the islands per day, and use a having clause for filtering:
select userid
from (
select recordedday, userid, connected,
datediff(min(recordeddt), max(lead_recordeddt)) duration
from (
select dd.*, v.*,
row_number() over(partition by v.recordedday, userid order by recordeddt) rn1,
row_number() over(partition by v.recordedday, userid, connected order by recordeddt) rn2,
lead(recordeddt) over(partition by v.recordedday, userid order by recordeddt) lead_recordeddt
from device_data dd
cross apply (values (convert(date, recordeddt))) v(recordedday)
) dd
group by convert(date, recordeddt), userid, connected, rn1 - rn2
) dd
group by userid
having count(distinct case when connected = 0 and duration >= 20 then recordedday end) = count(distinct recordedday)
As noted this is a gaps and island problem. This is my take on it using a simple lag function to create groups, filter out the connected rows and then work on the date ranges.
CREATE TABLE #tmp(ID int, UserID int, dt datetime, connected int)
INSERT INTO #tmp VALUES
(1,1,'11/2/20 10:00:00',1),
(2,1,'11/2/20 10:05:00',0),
(3,1,'11/2/20 10:10:00',0),
(4,1,'11/2/20 10:15:00',0),
(5,1,'11/2/20 10:20:00',0),
(6,2,'11/2/20 10:00:00',1),
(7,2,'11/2/20 10:05:00',1),
(8,2,'11/2/20 10:10:00',0),
(9,2,'11/2/20 10:15:00',0),
(10,2,'11/2/20 10:20:00',0),
(11,2,'11/2/20 10:25:00',0),
(12,2,'11/2/20 10:30:00',0)
SELECT UserID, connected,DATEDIFF(minute,MIN(DT), MAX(DT)) OFFLINE_MINUTES
FROM
(
SELECT *, SUM(CASE WHEN connected <> LG THEN 1 ELSE 0 END) OVER (ORDER BY UserID,dt) grp
FROM
(
select *, LAG(connected,1,connected) OVER(PARTITION BY UserID ORDER BY UserID,dt) LG
from #tmp
) x
) y
WHERE connected <> 1
GROUP BY UserID,grp,connected
HAVING DATEDIFF(minute,MIN(DT), MAX(DT)) >= 20

getting difference between two invoices by ranking and subtracting one from the other

Trying to grab difference in invoices
Attempted using cte's for ranks 1 and 2, but they have a subquery in them and cant be done!
the second query looks the same, but with rank=2.
select *
from (
SELECT i.id, i.subtotal/100 as subtotal, i.created_at, i.paid_at
,RANK() OVER (PARTITION BY i.subscription_id ORDER BY i.created_at DESC) AS Rank
From Invoices i
) as r
where r.rank = 1
order by r.created_at desc;
Following the path that you are on (using row_number()/rank()), you can use conditional aggregation. Assuming you want the difference of the subtotal, then:
select sum(case when seqnum = 1 then subtotal
else - subtotal
end) as difference
from (select i.*, i.subtotal/100 as subtotal,
row_number() over (partition by i.subscription_id order by i.created_at desc) as seqnum
from Invoices i
) i
where seqnum in (1, 2)
order by r.created_at desc;

Max dates for each sequence within partitions

I would like to see if somebody has an idea how to get the max and min dates within each 'id' using the 'row_num' column as an indicator when the sequence starts/ends in SQL Server 2016.
The screenshot below shows the desired output in columns 'min_date' and 'max_date'.
Any help would be appreciated.
You could use windowed MIN/MAX:
WITH cte AS (
SELECT *,SUM(CASE WHEN row_num > 1 THEN 0 ELSE 1 END)
OVER(PARTITION BY id, cat ORDER BY date_col) AS grp
FROM tab
)
SELECT *, MIN(date_col) OVER(PARTITION BY id, cat, grp) AS min_date,
MAX(date_col) OVER(PARTITION BY id, cat, grp) AS max_date
FROM cte
ORDER BY id, date_col, cat;
Rextester Demo
Try something like
SELECT
Q1.id, Q1.cat,
MIN(Q1.date) AS min_dat,
MAX(Q1.date) AS max_dat
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id, cat ORDER BY [date]) AS r1,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY [date]) AS r2
) AS Q1
GROUP BY
Q1.id, Q1.r2 - Q1.r1

SQL Server 2008 - Finding duplicates using ROW_NUMBER

i have the following SQL which works to find duplicates
SELECT *
FROM (SELECT
id,
ShipAddress,
ShipZIPPostal,
ROW_NUMBER() OVER (PARTITION BY shipaddress, shipzippostal ORDER BY shipaddress) ROWNUM
FROM orders
WHERE CONVERT(date, orderdate) = CONVERT(date, GETDATE())) x
WHERE rownum > 1
I would like to only see rows where, if the value of Rownum > 1 then i would like to see its corresponding row where rownum =1.
So basically, if a row has duplicates, i want to see the original row and all its duplicates.
If a row does not have duplicates, then i don't want to see it (it will have rownum = 1 )
How would i do this please?
cheers
Use count(*) rather than row_number():
SELECT *
FROM (SELECT id, ShipAddress, ShipZIPPostal,
COUNT(*) OVER (PARTITION BY shipaddress, shipzippostal) as cnt
FROM orders
WHERE CONVERT(date, orderdate) = CONVERT(date, GETDATE())
) x
WHERE cnt > 1;
In addition to Gordon's answer, if you want to keep the row_number() approach for some academic reason, you can do this:
SELECT *
FROM (SELECT
id,
ShipAddress,
ShipZIPPostal,
ROW_NUMBER() OVER (PARTITION BY shipaddress, shipzippostal ORDER BY shipaddress) ROWNUM
FROM orders
WHERE CONVERT(date, orderdate) = CONVERT(date, GETDATE())) x
WHERE EXISTS(
SELECT * FROM x x2
WHERE x.shipaddress=x2.shipaddress
AND x.shipzippostal=x2.shipzippostal
AND x2.ROWNUM>1
)
I'd actually prefer a cte structure like this personally:
WITH cte AS (
SELECT
id,
ShipAddress,
ShipZIPPostal,
ROW_NUMBER() OVER (PARTITION BY shipaddress, shipzippostal ORDER BY shipaddress) ROWNUM
FROM orders
WHERE CONVERT(date, orderdate) = CONVERT(date, GETDATE())
)
SELECT *
FROM cte
WHERE EXISTS(
SELECT * FROM cte x2
WHERE cte.shipaddress=x2.shipaddress
AND cte.shipzippostal=x2.shipzippostal
AND x2.ROWNUM>1
)
You could add a second row_number, but change the order by to ID so it will be different, and compare the 2 row_numbers
SELECT
*
FROM
(SELECT
id,
ShipAddress,
ShipZIPPostal,
ROW_NUMBER() OVER (PARTITION BY shipaddress,shipzippostal ORDER BY id) ROWNUM1,
ROW_NUMBER() OVER (PARTITION BY shipaddress,shipzippostal ORDER BY id DESC) ROWNUM2
FROM
orders
WHERE
CONVERT(DATE,orderdate) = CONVERT(DATE,GETDATE())
) x
WHERE
ROWNUM1 <> ROWNUM2