SQL DISTINCT Column with 2nd Criteria as datetime - sql

I have on just started learning SQL in SQL Server Management Studio and getting thrown into the deep end.
I just need unique DriverID that has a LogoffTime in the last 3 month, with the headings included below.
What I have so far:
SELECT
Dr.DriverName, Dr.DriverNumber, Dr.DriverID,
DL.DriverID, DL.LogoffTime,
ROW_NUMBER() OVER (PARTITION BY DL.DriverID ORDER BY DL.LogoffTime DESC) AS rn
FROM
Taxihistory.dbo.DriverLogon DL, Taxihistory.dbo.Driver Dr
WHERE
DL.DriverID = Dr.DriverID
AND DL.LogoffTime <= '20180931'
AND rn = 1
ORDER BY
DL.LogoffTime DESC;
I am currently getting this error:
Msg 207, Level 16, State 1, Line 7
Invalid column name 'rn'

In case you want to explore CTE (Common Table Expression) option, you may also be able to achieve this with CTE. You can try something like below:
WITH CTE AS (
SELECT dr.drivername,
dr.drivernumber,
dr.driverid,
dl.logofftime,
row_number() OVER (PARTITION BY dl.driverid
ORDER BY dl.logofftime DESC) AS rn
FROM taxihistory.dbo.driverlogon dl
INNER JOIN taxihistory.dbo.driver dr
ON dr.driverid = dl.driverid
WHERE dl.logofftime <= Convert(datetime, '2018-09-30') )
SELECT tbl.drivername,
tbl.drivernumber,
tbl.driverid,
tbl.logofftime
FROM CTE tbl
WHERE tbl.rn = 1
ORDER BY tbl.logofftime DESC;

You cannot use column aliases in the WHERE clause. Neither can you use row_number() there. You have to wrap the query with the row_number() in a subquery and select from that.
SELECT x.drivername,
x.drivernumber,
x.driverid,
x.logofftime
FROM (SELECT dr.drivername,
dr.drivernumber,
dr.driverid,
dl.logofftime,
row_number() OVER (PARTITION BY dl.driverid
ORDER BY dl.logofftime DESC) rn
FROM taxihistory.dbo.driverlogon dl
INNER JOIN taxihistory.dbo.driver dr
ON dr.driverid = dl.driverid
WHERE dl.logofftime <= '20180930') x
WHERE x.rn = 1
ORDER BY x.logofftime DESC;
It is also advisable to use explicit join syntax. And I do hope, that driverlogon.logofftime is not an [n][var]char but some date/time type.

You should get the logon from the last 3 months up to the current date by doing so:
SELECT Dr.DriverName, Dr.DriverNumber, Dr.DriverID, DL.DriverID, DL.LogoffTime
FROM Taxihistory.dbo.DriverLogon DL,
INNER JOIN Taxihistory.dbo.Driver Dr ON DL.DriverID = Dr.DriverID
WHERE DL.LogOffTime < DATEADD(MONTH, -3, GETDATE())
ORDER BY DL.LogoffTime DESC;

Related

SQL Server LEAD function

-- FIRST LOGIN DATE
WITH CTE_FIRST_LOGIN AS
(
SELECT
PLAYER_ID, EVENT_DATE,
ROW_NUMBER() OVER (PARTITION BY PLAYER_ID ORDER BY EVENT_DATE ASC) AS RN
FROM
ACTIVITY
),
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS AS
(
SELECT
PLAYER_ID,
LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE
FROM
ACTIVITY A
JOIN
CTE_FIRST_LOGIN C ON A.PLAYER_ID = C.PLAYER_ID
WHERE
NEXT_DATE = DATEADD(DAY, 1, A.EVENT_DATE) AND C.RN = 1
GROUP BY
A.PLAYER_ID
)
-- FRACTION
SELECT
NULLIF(ROUND(1.00 * COUNT(CTE_CONSEC.PLAYER_ID) / COUNT(DISTINCT PLAYER_ID), 2), 0) AS FRACTION
FROM
ACTIVITY
JOIN
CTE_CONSEC_PLAYERS CTE_CONSEC ON CTE_CONSEC.PLAYER_ID = ACTIVITY.PLAYER_ID
I am getting the following error when I run this query.
[42S22] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid column name 'NEXT_DATE'. (207) (SQLExecDirectW)
This is a leetcode medium question 550. Game Play Analysis IV. I wanted to know why it can't identify the column NEXT_DATE here and what am I missing? Thanks!
The problem is in this CTE:
-- CONSECUTIVE LOGINS prep
CTE_CONSEC_PLAYERS AS (
SELECT
PLAYER_ID,
LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE
FROM ACTIVITY A
JOIN CTE_FIRST_LOGIN C ON A.PLAYER_ID = C.PLAYER_ID
WHERE NEXT_DATE = DATEADD(DAY, 1, A.EVENT_DATE) AND C.RN = 1
GROUP BY A.PLAYER_ID
)
Note that you are creating NEXT_DATE as a column alias in this CTE but also referring to it in the WHERE clause. This is invalid because by SQL clause-ordering rules the NEXT_DATE column alias does not exist until you get to the ORDER BY clause which is the last evaluated clause in a SQL query or subquery. You don't have an ORDER BY clause in this subquery, so technically the NEXT_DATE column alias only exists to [sub]queries that both come after and reference your CTE_CONSEC_PLAYERS CTE.
To fix this you'd probably want two CTEs like this (untested):
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS_pre AS (
SELECT
PLAYER_ID,
RN,
EVENT_DATE,
LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE
FROM ACTIVITY A
JOIN CTE_FIRST_LOGIN C ON A.PLAYER_ID = C.PLAYER_ID
)
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS AS (
SELECT
PLAYER_ID,
MAX(NEXT_DATE) AS NEXT_DATE,
FROM CTE_CONSEC_PLAYERS_pre
WHERE NEXT_DATE = DATEADD(DAY, 1, EVENT_DATE) AND RN = 1
GROUP BY PLAYER_ID
)
You gave every table an alias (for example JOIN CTE_FIRST_LOGIN C has the alias C), and every column access is via the alias. You need to add the correct alias from the correct table to NEXT_DATE.
Your primary issue is that NEXT_DATE is a window function, and therefore cannot be referred to in the WHERE because of SQL's order of operations.
But it seems this query is over-complicated.
The problem to be solved appears to be: how many players logged in the day after they first logged in, as a percentage of all players.
This can be done in a single pass (no joins), by using multiple window functions together:
WITH CTE_FIRST_LOGIN AS (
SELECT
PLAYER_ID,
EVENT_DATE,
ROW_NUMBER() OVER (PARTITION BY PLAYER_ID ORDER BY EVENT_DATE) AS RN,
-- if EVENT_DATE is a datetime and can have multiple per day then group by CAST(EVENT_DATE AS date) first
LEAD(EVENT_DATE, 1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) AS NextDate
FROM ACTIVITY
),
BY_PLAYERS AS (
SELECT
c.PLAYER_ID,
SUM(CASE WHEN c.RN = 1 AND c.NextDate = DATEADD(DAY, 1, c.EVENT_DATE)
THEN 1 END) AS IsConsecutive
FROM CTE_FIRST_LOGIN AS c
GROUP BY c.PLAYER_ID
)
SELECT ROUND(
1.00 *
COUNT(c.IsConsecutive) /
NULLIF(COUNT(*), 0)
,2) AS FRACTION
FROM BY_PLAYERS AS c;
You could theoretically merge BY_PLAYERS into the outer query and use COUNT(DISTINCT but splitting them feels cleaner

Select only observations with a date more recent than the 30/6/2021 (dd/mm/yyyy)

I have the following code:
Select Tbl.Fromdate, Tbl.Por, Tbl.Porname, Tbl.Bmref3
From(
Select
To_Char(P.Fromdate, 'dd-mm-yyyy') As Fromdate, P.Por, P.Porname, W.Bmref3,
, RANK() OVER (PARTITION BY P.Por ORDER BY P.fromdate DESC) AS rank
From Tmsdat.Climandatecomps W
Inner Join Tmsdat.Portfolios P On (W.Porik = P.Porik)
Where 1=1
) Tbl
Where 1=1
And Tbl.Rank = 1
;
However, I wish to select only the observations that have a Fromdate more recent than the June 30, 2021. I tried to add Tbl.Fromdate> '30-06-2021' to the WHERE clause, but I did not receive the desired results.
Do you have any suggestions?
Thank you in advance.
Best regards,
You would put the condition in the inner query:
Select To_Char(P.Fromdate, 'dd-mm-yyyy') As Fromdate, P.Por, P.Porname, W.Bmref3,
RANK() OVER (PARTITION BY P.Por ORDER BY P.fromdate DESC) AS rank
From Tmsdat.Climandatecomps W inner join
Tmsdat.Portfolios P
On (W.Porik = P.Porik)
Where p.FromDate > date '2021-06-30'

SQL Time Attendance Query

Recently I made a switch from MS Access to SQL Server. Due to this switch I am having issues with making one SQL query to work.
This is how the current table looks like in SQL.
This is what I am trying to get as result from the query:
Previously I was able to make it work in MS Access with the following query:
SELECT m.UserEnrollNumber, m.Checktime AS TimeIn, (SELECT Min(s.Checktime)
FROM CheckInOut1 s
WHERE s.UserEnrollNumber = m.UserEnrollNumber
AND s.Checktime > m.Checktime
AND s.Checktime <= Int(m.Checktime) + 1) AS TimeOut
FROM CheckInOut1 AS m
WHERE ((((SELECT COUNT(*)
FROM CheckInOut1 s
WHERE s.UserEnrollNumber = m.UserEnrollNumber
AND s.Checktime <= m.Checktime
AND s.Checktime >= INT(m.Checktime)) Mod 2)=1));
The following query as answer from #GMB:
select
employee_id,
min(time_in_out) check_in,
max(time_in_out) check_out
from (
select t.*, row_number() over(partition by employee_id order by time_in_out) - 1 rn
from mytable t
) t
group by employee_id, floor(rn / 2)
order by employee_id, floor(rn / 2)
from SQL table:
gives me the following result:
Seems like the minimum and maximum rows are shown, but the rows in between are not.
The following query from #Gordon Linoff:
SELECT cio.EmployeeID, cio.TimeInOut AS CheckIn,
cio.TimeInOut as CheckOut
FROM (SELECT cio.*,
ROW_NUMBER() OVER (PARTITION BY cio.EmployeeID, CONVERT(date, cio.TimeInOut) ORDER BY cio.TimeInOut) as seqnum,
LEAD(cio.TimeInOut) OVER (PARTITION BY cio.EmployeeID, CONVERT(date, cio.TimeInOut) ORDER BY cio.TimeInOut) as next_TimeInOut
FROM CheckInOut22 cio
) cio
WHERE seqnum % 2 = 1;
Gives me the following result:
Checkin is the same as CheckOut.
All help would be appreciated.
This is much simpler in SQL Server. Use window functions:
SELECT cio.EmployeeID, cio.TimeInOut AS CheckIn,
cio.next_TimeInOut as CheckOut
FROM (SELECT cio.*,
ROW_NUMBER() OVER (PARTITION BY cio.EmployeeID, CONVERT(date, cio.TimeInOut) ORDER BY cio.TimeInOut) as seqnum,
LEAD(cio.TimeInOut) OVER (PARTITION BY cio.EmployeeID, CONVERT(date, cio.TimeInOut) ORDER BY cio.TimeInOut) as next_TimeInOut
FROM CheckInOut cio
) cio
WHERE seqnum % 2 = 1;

"First order by" in Teradata

I have a problem converting SQL statement from Oracle to Teradata. In Oracle statement is that:
SELECT ar.account_no,
MAX (ah.bal_acct) KEEP (DENSE_RANK FIRST ORDER BY ah.created_t desc)
FROM ar
JOIN ah ON ah.obj_id0 = ar.poid_Id0
JOIN acc ON a.poid_id0 = ar.account_obj_Id0
WHERE acc.account_no = '1234'
AND ah.created_t <= 1434753495
GROUP BY ar.account_no
I need to do similar statement in Teradata. I tried something with
QUALIFY ROW_NUMBER() OVER( PARTITION BY max(ah.bal_acct) ORDER BY ah.created_t desc) = 1
But all the time I have error: Selected non-aggregate values must be part of the associated group.
This is what I got:
Select ar.account_no, ah.created_t, ah.bal_acct
FROM VD_REPLICA_BRM.pi_tp_acct_ar_t ar
JOIN VD_REPLICA_BRM.pi_tp_acct_ar_hist_T ah ON ah.obj_id0 = ar.poid_Id0
JOIN VD_REPLICA_BRM.pi_account_t acc ON acc.poid_id0 = ar.account_obj_Id0
WHERE acc.account_no = '00003095660515'
AND ah.created_t <= CAST('2016-10-31' AS DATE FORMAT 'YYYY-MM-DD')
QUALIFY ROW_NUMBER() OVER( PARTITION BY max(ah.bal_acct) ORDER BY ah.created_t desc) = 1
GROUP BY ar.account_no
Where do I do mistake?
I'm not sure if you can do this with qualify. An equivalent statement is:
SELECT ar.account_no, ah.created_t, ah.bal_acct
FROM (SELECT ar.account_no, ah.created_t, ah.bal_acct,
ROW_NUMBER() OVER (PARTITION BY ar.account_no ORDER BY ah.created_t DESC) as seqnum
FROM ar JOIN
ah
ON ah.obj_id0 = ar.poid_Id0 JOIN
acc
ON a.poid_id0 = ar.account_obj_Id0
WHERE acc.account_no = '1234' AND ah.created_t <= 1434753495
) t
WHERE seqnum = 1;
Duh. You can do this with QUALIFY. The issue is the GROUP BY:
SELECT ar.account_no, ah.created_t, ah.bal_acct
FROM VD_REPLICA_BRM.pi_tp_acct_ar_t ar JOIN
VD_REPLICA_BRM.pi_tp_acct_ar_hist_T ah
ON ah.obj_id0 = ar.poid_Id0 JOIN
VD_REPLICA_BRM.pi_account_t acc
ON acc.poid_id0 = ar.account_obj_Id0
WHERE acc.account_no = '00003095660515' AND
ah.created_t <= CAST('2016-10-31' AS DATE FORMAT 'YYYY-MM-DD')
QUALIFY ROW_NUMBER() OVER( PARTITION BY ar_account_no ORDER BY ah.created_t desc) = 1

SQL Server 2008 - Finding duplicates using ROW_NUMBER

i have the following SQL which works to find duplicates
SELECT *
FROM (SELECT
id,
ShipAddress,
ShipZIPPostal,
ROW_NUMBER() OVER (PARTITION BY shipaddress, shipzippostal ORDER BY shipaddress) ROWNUM
FROM orders
WHERE CONVERT(date, orderdate) = CONVERT(date, GETDATE())) x
WHERE rownum > 1
I would like to only see rows where, if the value of Rownum > 1 then i would like to see its corresponding row where rownum =1.
So basically, if a row has duplicates, i want to see the original row and all its duplicates.
If a row does not have duplicates, then i don't want to see it (it will have rownum = 1 )
How would i do this please?
cheers
Use count(*) rather than row_number():
SELECT *
FROM (SELECT id, ShipAddress, ShipZIPPostal,
COUNT(*) OVER (PARTITION BY shipaddress, shipzippostal) as cnt
FROM orders
WHERE CONVERT(date, orderdate) = CONVERT(date, GETDATE())
) x
WHERE cnt > 1;
In addition to Gordon's answer, if you want to keep the row_number() approach for some academic reason, you can do this:
SELECT *
FROM (SELECT
id,
ShipAddress,
ShipZIPPostal,
ROW_NUMBER() OVER (PARTITION BY shipaddress, shipzippostal ORDER BY shipaddress) ROWNUM
FROM orders
WHERE CONVERT(date, orderdate) = CONVERT(date, GETDATE())) x
WHERE EXISTS(
SELECT * FROM x x2
WHERE x.shipaddress=x2.shipaddress
AND x.shipzippostal=x2.shipzippostal
AND x2.ROWNUM>1
)
I'd actually prefer a cte structure like this personally:
WITH cte AS (
SELECT
id,
ShipAddress,
ShipZIPPostal,
ROW_NUMBER() OVER (PARTITION BY shipaddress, shipzippostal ORDER BY shipaddress) ROWNUM
FROM orders
WHERE CONVERT(date, orderdate) = CONVERT(date, GETDATE())
)
SELECT *
FROM cte
WHERE EXISTS(
SELECT * FROM cte x2
WHERE cte.shipaddress=x2.shipaddress
AND cte.shipzippostal=x2.shipzippostal
AND x2.ROWNUM>1
)
You could add a second row_number, but change the order by to ID so it will be different, and compare the 2 row_numbers
SELECT
*
FROM
(SELECT
id,
ShipAddress,
ShipZIPPostal,
ROW_NUMBER() OVER (PARTITION BY shipaddress,shipzippostal ORDER BY id) ROWNUM1,
ROW_NUMBER() OVER (PARTITION BY shipaddress,shipzippostal ORDER BY id DESC) ROWNUM2
FROM
orders
WHERE
CONVERT(DATE,orderdate) = CONVERT(DATE,GETDATE())
) x
WHERE
ROWNUM1 <> ROWNUM2