SQL subquery question - sql

I have the following SQL
SELECT
Seq.UserSessionSequenceID,
Usr.SessionGuid,
Usr.UserSessionID,
Usr.SiteID,
Seq.Timestamp,
Seq.UrlTitle,
Seq.Url
FROM
tblUserSession Usr
INNER JOIN
tblUserSessionSequence Seq ON Usr.UserSessionID = Seq.UserSessionID
WHERE
(Usr.Timestamp > DATEADD(mi, -45, GETDATE())) AND (Usr.SiteID = 15)
ORDER BY Usr.Timestamp DESC
Pretty simple stuff. There are by nature multiple UserSessionIDs rows in tblUserSessionSequence. I ONLY want to return the latest (top 1) row with unique UserSessionID. How do I do that?

You can use the windowing function ROW_NUMBER to number the rows for each user and select only those rows that have row number 1.
SELECT
UserSessionSequenceID,
SessionGuid,
UserSessionID,
SiteID,
Timestamp,
UrlTitle,
Url
FROM (
SELECT
Seq.UserSessionSequenceID,
Usr.SessionGuid,
Usr.UserSessionID,
Usr.SiteID,
Usr.Timestamp AS UsrTimestamp,
Seq.Timestamp,
Seq.UrlTitle,
Seq.Url,
ROW_NUMBER() OVER (PARTITION BY Usr.UserSessionID
ORDER BY Seq.UserSessionSequenceID DESC) AS rn
FROM
tblUserSession Usr
INNER JOIN
tblUserSessionSequence Seq ON Usr.UserSessionID = Seq.UserSessionID
WHERE
(Usr.Timestamp > DATEADD(mi, -45, GETDATE())) AND (Usr.SiteID = 15)
) T1
WHERE rn = 1
ORDER BY UsrTimestamp DESC

If you're looking to return only a single row in your query (i.e., the ID with the latest timestamp), just change
SELECT
to
SELECT TOP 1
If you're looking to obtain a single row for each UserSessionID, but you want to ensure that you get the one with the latest TimeStamp, that's slightly more complex.
You could do something like this:
SELECT
Seq.UserSessionSequenceID,
Usr.SessionGuid,
Usr.UserSessionID,
Usr.SiteID,
Seq.Timestamp,
Seq.UrlTitle,
Seq.Url
FROM
tblUserSession Usr
INNER JOIN
(SELECT
UserSessionSequenceID,
UserSessionID,
Timestamp,
UrlTitle,
Url,
ROW_NUMBER() over (PARTITION BY UserSessionID ORDER BY UserSessionSequenceID) AS nbr
FROM tblUserSessionSequence) Seq ON Usr.UserSessionID = Seq.UserSessionID AND Seq.nbr = 0
WHERE
(Usr.Timestamp > DATEADD(mi, -45, GETDATE())) AND (Usr.SiteID = 15)
ORDER BY Usr.Timestamp DESC

Related

SQL Server LEAD function

-- FIRST LOGIN DATE
WITH CTE_FIRST_LOGIN AS
(
SELECT
PLAYER_ID, EVENT_DATE,
ROW_NUMBER() OVER (PARTITION BY PLAYER_ID ORDER BY EVENT_DATE ASC) AS RN
FROM
ACTIVITY
),
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS AS
(
SELECT
PLAYER_ID,
LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE
FROM
ACTIVITY A
JOIN
CTE_FIRST_LOGIN C ON A.PLAYER_ID = C.PLAYER_ID
WHERE
NEXT_DATE = DATEADD(DAY, 1, A.EVENT_DATE) AND C.RN = 1
GROUP BY
A.PLAYER_ID
)
-- FRACTION
SELECT
NULLIF(ROUND(1.00 * COUNT(CTE_CONSEC.PLAYER_ID) / COUNT(DISTINCT PLAYER_ID), 2), 0) AS FRACTION
FROM
ACTIVITY
JOIN
CTE_CONSEC_PLAYERS CTE_CONSEC ON CTE_CONSEC.PLAYER_ID = ACTIVITY.PLAYER_ID
I am getting the following error when I run this query.
[42S22] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid column name 'NEXT_DATE'. (207) (SQLExecDirectW)
This is a leetcode medium question 550. Game Play Analysis IV. I wanted to know why it can't identify the column NEXT_DATE here and what am I missing? Thanks!
The problem is in this CTE:
-- CONSECUTIVE LOGINS prep
CTE_CONSEC_PLAYERS AS (
SELECT
PLAYER_ID,
LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE
FROM ACTIVITY A
JOIN CTE_FIRST_LOGIN C ON A.PLAYER_ID = C.PLAYER_ID
WHERE NEXT_DATE = DATEADD(DAY, 1, A.EVENT_DATE) AND C.RN = 1
GROUP BY A.PLAYER_ID
)
Note that you are creating NEXT_DATE as a column alias in this CTE but also referring to it in the WHERE clause. This is invalid because by SQL clause-ordering rules the NEXT_DATE column alias does not exist until you get to the ORDER BY clause which is the last evaluated clause in a SQL query or subquery. You don't have an ORDER BY clause in this subquery, so technically the NEXT_DATE column alias only exists to [sub]queries that both come after and reference your CTE_CONSEC_PLAYERS CTE.
To fix this you'd probably want two CTEs like this (untested):
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS_pre AS (
SELECT
PLAYER_ID,
RN,
EVENT_DATE,
LEAD(EVENT_DATE,1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) NEXT_DATE
FROM ACTIVITY A
JOIN CTE_FIRST_LOGIN C ON A.PLAYER_ID = C.PLAYER_ID
)
-- CONSECUTIVE LOGINS
CTE_CONSEC_PLAYERS AS (
SELECT
PLAYER_ID,
MAX(NEXT_DATE) AS NEXT_DATE,
FROM CTE_CONSEC_PLAYERS_pre
WHERE NEXT_DATE = DATEADD(DAY, 1, EVENT_DATE) AND RN = 1
GROUP BY PLAYER_ID
)
You gave every table an alias (for example JOIN CTE_FIRST_LOGIN C has the alias C), and every column access is via the alias. You need to add the correct alias from the correct table to NEXT_DATE.
Your primary issue is that NEXT_DATE is a window function, and therefore cannot be referred to in the WHERE because of SQL's order of operations.
But it seems this query is over-complicated.
The problem to be solved appears to be: how many players logged in the day after they first logged in, as a percentage of all players.
This can be done in a single pass (no joins), by using multiple window functions together:
WITH CTE_FIRST_LOGIN AS (
SELECT
PLAYER_ID,
EVENT_DATE,
ROW_NUMBER() OVER (PARTITION BY PLAYER_ID ORDER BY EVENT_DATE) AS RN,
-- if EVENT_DATE is a datetime and can have multiple per day then group by CAST(EVENT_DATE AS date) first
LEAD(EVENT_DATE, 1) OVER (PARTITION BY EVENT_DATE ORDER BY EVENT_DATE) AS NextDate
FROM ACTIVITY
),
BY_PLAYERS AS (
SELECT
c.PLAYER_ID,
SUM(CASE WHEN c.RN = 1 AND c.NextDate = DATEADD(DAY, 1, c.EVENT_DATE)
THEN 1 END) AS IsConsecutive
FROM CTE_FIRST_LOGIN AS c
GROUP BY c.PLAYER_ID
)
SELECT ROUND(
1.00 *
COUNT(c.IsConsecutive) /
NULLIF(COUNT(*), 0)
,2) AS FRACTION
FROM BY_PLAYERS AS c;
You could theoretically merge BY_PLAYERS into the outer query and use COUNT(DISTINCT but splitting them feels cleaner

SQL select distinct from a concatenated column

This query almost does what I want
SELECT staging.dbo.ITEM_CODES.ITEM_CODE, MAX(dbo.OC_VDAT_AUX.UDL40) AS SAMPLEDATE,
CONCAT(RTRIM(dbo.OC_VDATA.UDL1), RTRIM(dbo.OC_VDATA.UDL6)) as LinkID
FROM dbo.OC_VDATA
INNER JOIN dbo.OC_VDAT_AUX ON dbo.OC_VDATA.PARTNO = dbo.OC_VDAT_AUX.PARTNOAUX AND dbo.OC_VDATA.DATETIME = dbo.OC_VDAT_AUX.DATETIMEAUX
INNER JOIN stagingPLM.dbo.ITEM_CODES ON LEFT(dbo.OC_VDATA.PARTNO, 12) = staging.dbo.ITEM_CODES.SPEC_NO
AND LEFT(dbo.OC_VDAT_AUX.PARTNOAUX, 12) = stagingPLM.dbo.ITEM_CODES.SPEC_NO
INNER JOIN stagingPLM.dbo.PLANTS ON dbo.OC_VDATA.UDL1 = staging.dbo.PLANTS.PLANT_CODE
WHERE (CONVERT(DATETIME, dbo.OC_VDAT_AUX.UDL40) > DATEADD(day, - 30, GETDATE()))
GROUP BY CONCAT(RTRIM(dbo.OC_VDATA.UDL1), RTRIM(dbo.OC_VDATA.UDL6)),staging.dbo.ITEM_CODES.ITEM_CODE
Sample Table generated by query:
The end result that I am trying to achieve is the latest ITEM_CODE per unique LinkID Note the first and last rows in the table. The last row should not be pulled by the query.
How do I modify this query to make that happen?
I have tried various placements for DISTINCT and sub queries in the select and where statements.
I would do in your case with ROW_NUMBER window function and CTE.
Solution can be like this:
WITH FilterCTE AS
(
SELECT staging.dbo.ITEM_CODES.ITEM_CODE, MAX(dbo.OC_VDAT_AUX.UDL40) AS SAMPLEDATE,
CONCAT(RTRIM(dbo.OC_VDATA.UDL1), RTRIM(dbo.OC_VDATA.UDL6)) AS LinkID,
ROW_NUMBER() OVER (PARTITION BY CONCAT(RTRIM(dbo.OC_VDATA.UDL1), RTRIM(dbo.OC_VDATA.UDL6)) ORDER BY MAX(dbo.OC_VDAT_AUX.UDL40)) AS RowNumber
FROM dbo.OC_VDATA
INNER JOIN dbo.OC_VDAT_AUX ON dbo.OC_VDATA.PARTNO = dbo.OC_VDAT_AUX.PARTNOAUX AND dbo.OC_VDATA.DATETIME = dbo.OC_VDAT_AUX.DATETIMEAUX
INNER JOIN stagingPLM.dbo.ITEM_CODES ON LEFT(dbo.OC_VDATA.PARTNO, 12) = staging.dbo.ITEM_CODES.SPEC_NO
AND LEFT(dbo.OC_VDAT_AUX.PARTNOAUX, 12) = stagingPLM.dbo.ITEM_CODES.SPEC_NO
INNER JOIN stagingPLM.dbo.PLANTS ON dbo.OC_VDATA.UDL1 = staging.dbo.PLANTS.PLANT_CODE
WHERE (CONVERT(DATETIME, dbo.OC_VDAT_AUX.UDL40) > DATEADD(day, - 30, GETDATE()))
GROUP BY CONCAT(RTRIM(dbo.OC_VDATA.UDL1), RTRIM(dbo.OC_VDATA.UDL6)),staging.dbo.ITEM_CODES.ITEM_CODE
)
SELECT *
FROM FilterCTE
WHERE RowNumber = 1

Query to do DATEDIFF between fields on different rows

I need to add a DATEDIFF to a query that gives me the hours between the current row's field, and the previous row's same field.
EDIT: {Should I ORDER the entire query by ROUTED_DTM DESC, as well as making the ORDER BY in the DATEDIFF DESC?
On one row I have a ROUTED_DTM of '2019-05-07 15:36:13.000', the row above has a ROUTED_DTM of '2019-05-01 14:19:52.000'. I would expect AGE_IN_ROLE_DAY, AGE_IN_ROLE_HR, AGE_IN_ROLE_MIN, AGE_IN_ROLE_SEC to be 6, 1, 16, and 21 (in order). However, I get 0, 0, 0, -2.}
SELECT c.ID,
c.PAID_DT,
DATEDIFF(dd,
CASE WHEN c.ID_ADJ_FROM = '' THEN c.RECD_DT ELSE c.INPUT_DT END,
CASE WHEN c.PAID_DT = '1/1/1753' THEN CONVERT(DATE,GETDATE()) ELSE c.PAID_DT END) + 1 AS DAYS_OLD
DATEDIFF(dd, h.ROUTED_DTM, LAG(h.ROUTED_DTM) OVER (ORDER BY h.ROUTED_DTM DESC)) AS AGE_IN_ROLE_DAY,
DATEDIFF(hh, h.ROUTED_DTM, LAG(h.ROUTED_DTM) OVER (ORDER BY h.ROUTED_DTM DESC)) AS AGE_IN_ROLE_HR,
DATEDIFF(MM, h.ROUTED_DTM, LAG(h.ROUTED_DTM) OVER (ORDER BY h.ROUTED_DTM DESC)) AS AGE_IN_ROLE_MIN,
DATEDIFF(ss, h.ROUTED_DTM, LAG(h.ROUTED_DTM) OVER (ORDER BY h.ROUTED_DTM DESC)) AS AGE_IN_ROLE_SEC,
h.QUEUE_ID,
h.QUEUE_DESC,
h.ROLE_ID,
h.ROLE_DESC,
h.ROUTED_DTM
FROM table1 c
LEFT JOIN table2 h
ON h.ID = c.ID
LEFT JOIN table 3 q
ON q.QUEUE_ID = h.QUEUE_ID
LEFT JOIN table4 r
ON r.ROLE_ID = h.ROLE_ID
ORDER BY c.ID, h.ROUTED_DTM DESC
I want to add a DATEDIFF(s) before the h.QUEUE_ID column that gives the difference between the current row's h.ROUTED_DTM, and the previous row's h.ROUTED_DTM
Currently, the query returns the correct results, however, I am not sure how to add the new DATEDIFF to each row.
You can use lag():
datediff(day, routed_dtm, lag(routed_dtm) over (order by routed_dtm))
You might also want partition by c.id in the window clause.

How to select the last 12 months in sql?

I need to select the last 12 months. As you can see on the picture, May occurs two times.
But I only want it to occur once. And it needs to be the newest one.
Plus, the table should stay in this structure, with the latest month on the bottom.
And this is the query:
SELECT Monat2,
Monat,
CASE WHEN NPLAY_IND = '4P'
THEN 'QuadruplePlay'
WHEN NPLAY_IND = '3P'
THEN 'TriplePlay'
WHEN NPLAY_IND = '2P'
THEN 'DoublePlay'
WHEN NPLAY_IND = '1P'
THEN 'SinglePlay'
END AS Series,
Anzahl as Cnt
FROM T_Play_n
where NPLAY_IND != '0P'
order by Series asc ,Monat
This is the new query
SELECT sub.Monat2,sub.Monat,
CASE WHEN NPLAY_IND = '4P'
THEN 'QuadruplePlay'
WHEN NPLAY_IND = '3P'
THEN 'TriplePlay'
WHEN NPLAY_IND = '2P'
THEN 'DoublePlay'
WHEN NPLAY_IND = '1P'
THEN 'SinglePlay'
END
AS Series, Anzahl as Cnt FROM (SELECT ROW_NUMBER () OVER (PARTITION BY Monat2 ORDER BY Monat DESC)rn,
Monat2,
Monat,
Anzahl,
NPLAY_IND
FROM T_Play_n)sub
where sub.rn = 1
It does only show the months once but it doesn't do that for every Series.
So with every Play it should have 12 months.
In Oracle and SQL-Server you can use ROW_NUMBER.
name = month name and num = month number:
SELECT sub.name, sub.num
FROM (SELECT ROW_NUMBER () OVER (PARTITION BY name ORDER BY num DESC) rn,
name,
num
FROM tab) sub
WHERE sub.rn = 1
ORDER BY num DESC;
WITH R(N) AS
(
SELECT 0
UNION ALL
SELECT N+1
FROM R
WHERE N < 12
)
SELECT LEFT(DATENAME(MONTH,DATEADD(MONTH,-N,GETDATE())),3) AS [month]
FROM R
The With R(N) is a Common Table Expression.The R is the name of the result set (or table) that you are generating. And the N is the month number.
In SQL Server you can do It in following:
SELECT DateMonth, DateWithMonth -- Specify columns to select
FROM Tbl -- Source table
WHERE CAST(CAST(DateWithMonth AS INT) * 100 + 1 AS VARCHAR(20)) >= DATEADD(MONTH, -12,GETDATE()) -- Condition to return data for last 12 months
GROUP BY DateMonth, DateWithMonth -- Uniqueness
ORDER BY DateWithMonth -- Sorting to get latest records on the bottom
So it sounds like you want to select rows that contain the last occurrence of months. Something like this should work:
select * from [table_name]
where id in (select max(id) from [table_name] group by [month_column])
The last select in the brackets will get a list of id's for the last occurrence of each month. If the year+month column you have shown is not in descending order already, you might want to max this column instead.
You can use something like this(the table dbo.Nums contains int values from 0 to 11)
SELECT DATEADD(MONTH, DATEDIFF(MONTH, '19991201', CURRENT_TIMESTAMP) + n - 12, '19991201'),
DATENAME(MONTH,DateAdd(Month, DATEDIFF(month, '19991201', CURRENT_TIMESTAMP) + n - 12, '19991201'))
FROM dbo.Nums
I suggest to use a group by for the month name, and a max function for the numeric component. If is not numeric, use to_number().

How to determine if two records are 1 year apart (using a timestamp)

I need to analyze some weblogs and determine if a user has visited once, taken a year break, and visited again. I want to add a flag to every row (Y/N) with a VisitId that meets the above criteria.
How would I go about creating this sql?
Here are the fields I have, that I think need to be used (by analyzing the timestamp of the first page of each visit):
VisitID - each visit has a unique Id (ie. 12356, 12345, 16459)
UserID - each user has one Id (ie. steve = 1, ted = 2, mark = 12345, etc...)
TimeStamp - looks like this: 2010-01-01 00:32:30.000
select VisitID, UserID, TimeStamp from page_view_t where pageNum = 1;
thanks - any help would be greatly appreciated.
You could rank every user's rows, then join the ranked row set to itself to compare adjacent rows:
;
WITH ranked AS (
SELECT
*,
rnk = ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY TimeStamp)
FROM page_view_t
),
flagged AS (
SELECT
*,
IsReturnVisit = CASE
WHEN EXISTS (
SELECT *
FROM ranked
WHERE UserID = r.UserID
AND rnk = r.rnk - 1
AND TimeStamp <= DATEADD(YEAR, -1, r.TimeStamp)
)
THEN 'Y'
ELSE 'N'
END
FROM ranked r
)
SELECT
VisitID,
UserID,
TimeStamp,
IsReturnVisit
FROM flagged
Note: the above flags only return visits.
UPDATE
To flag the first visits same as return visits, the flagged CTE could be modified as follows:
…
SELECT
*,
IsFirstOrReturnVisit = CASE
WHEN p.UserID IS NULL OR r.TimeStamp >= DATEADD(YEAR, 1, p.TimeStamp)
THEN 'Y'
ELSE 'N'
END
FROM ranked r
LEFT JOIN ranked p ON r.UserID = p.UserID AND r.rnk = p.rnk + 1
…
References that might be useful:
WITH common_table_expression (Transact-SQL)
Ranking Functions (Transact-SQL)
ROW_NUMBER (Transact-SQL)
The other guy was faster but since I took time to do it and it's a completely different approach I might as well post It :D.
SELECT pv2.VisitID,
pv2.UserID,
pv2.TimeStamp,
CASE WHEN pv1.VisitID IS NOT NULL
AND pv3.VisitID IS NULL
THEN 'YES' ELSE 'NO' END AS IsReturnVisit
FROM page_view_t pv2
LEFT JOIN page_view_t pv1 ON pv1.UserID = pv2.UserID
AND pv1.VisitID <> pv2.VisitID
AND (pv1.TimeStamp <= DATEADD(YEAR, -1, pv2.TimeStamp)
OR pv2.TimeStamp <= DATEADD(YEAR, -1, pv1.TimeStamp))
AND pv1.pageNum = 1
LEFT JOIN page_view_t pv3 ON pv1.UserID = pv3.UserID
AND (pv3.TimeStamp BETWEEN pv1.TimeStamp AND pv2.TimeStamp
OR pv3.TimeStamp BETWEEN pv2.TimeStamp AND pv1.TimeStamp)
AND pv3.pageNum = 1
WHERE pv2.pageNum = 1
Assuming page_view_t table stores UserID and TimeStamp details of each visit of the user, the following query will return users who have visited taking a break of at least an year (365 days) between two consecutive visits.
select t1.UserID
from page_view_t t1
where (
select datediff(day, max(t2.[TimeStamp]), t1.[TimeStamp])
from page_view_t t2
where t2.UserID = t1.UserID and t2.[TimeStamp] < t1.[TimeStamp]
group by t2.UserID
) >= 365