Get id based on MAX(Date) for each user [duplicate] - sql

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 4 years ago.
I'm trying to query for last read report and the date it was read.
UserReport
UserId, ReportId, DateRead
1, 2, 2018-01-01
1, 1, 2015-02-12
2, 3, 2016-03-11
3, 2, 2017-04-10
1, 3, 2016-01-01
2, 1, 2018-02-02
So to get for a specific user I can do a query like this:
SELECT TOP 1 *
FROM UserReport
WHERE UserId = 1
ORDER BY DateRead DESC
But I'm having troubles figuring out how to do this for each user. What is throwing me off is TOP 1
Expected Result:
UserId, ReportId, DateRead
1, 2, 2018-01-01
2, 1, 2018-02-02
3, 2, 2017-04-10

You could use:
SELECT TOP 1 WITH TIES *
FROM UserReport
ORDER BY ROW_NUMBER() OVER(PARTITION BY UserId ORDER BY DateRead DESC)

Related

Get Running Total of IDs in Presto SQL

I would like to keep a running/cumulative array of new IDs.
Starting with this:
Date
IDs_Used_Today
New_IDs
Dec 6
1, 2, 3
1, 2, 3
Dec 7
1, 4
4
Dec 8
2, 3, 4
3
Dec 9
1, 2, 3, 5
5
And getting this:
Date
IDs_Used_Today
New_IDs
All_IDs_To_Date
Dec 6
1, 2, 3
1, 2, 3
1, 2, 3
Dec 7
1, 4
4
1, 2, 3, 4
Dec 8
2, 3, 4
null
1, 2, 3, 4
Dec 9
1, 2, 3, 5
5
1, 2, 3, 4, 5
I need to do this by getting the values for "All_IDs_To_Date" from previous "All_IDs_To_Date" + "New_IDs"
by doing it that way, the table will always be accurate as long as there is one previous row of data.
So basically a combination of CONCAT( LAG(All_IDs_To_Date), New_IDs) with an IF conditional when there is no LAG(ALL_IDs_To_Date) then use that date's "New_IDs" value.
It is very important that if old rows are deleted, the most current rows keep the same data. Meaning if I start with 10 rows stored, with the last running total being "1,2,3,4,5" and then I delete the first 9 rows. My next calculation would be based off that last stored row, so my running total would still be adding to the "1,2,3,4,5" that was previously stored.
Once you have unnested every element of "New_IDs", you can select the first time each element appears, then use ARRAY_AGG window function to compute a running array aggregation over your date. ARRAY_REMOVE is needed to remove null values, generated by days without new ids.
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY elements ORDER BY date_) AS rn
FROM tab, UNNEST(New_IDs) AS elements
)
SELECT DISTINCT date_, ids_used_today, new_ids,
ARRAY_REMOVE(ARRAY_AGG(CASE WHEN rn = 1 THEN elements END) OVER(ORDER BY date_), NULL) AS All_IDs_To_Date
FROM cte
ORDER BY date_
Check the demo here.

Categorize website visitors starting from the first occasion, based on if condition

Could you please help me with sql statement, preferreby it should work in big query. I have 3 columns userid, date, hostname. I need to create additional column - client_type on the following condition: when userid first time comes to hostname = "online-store.com" then from this date on client_type for this particular userid will be always "current_client" else "visitor".
For example, in the image (link attached) we have userid = 1 and 4 who had become "current client". User 4 was just a visitor, but after visiting hostname = "online-store.com" he will be always classified as "current client".enter image description here
Below is for BigQuery Standard SQL
#standardSQL
SELECT
userid, date, hostname,
IF(0 = COUNTIF(hostname = 'online-store.com') OVER(
PARTITION BY userid ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
), 'visitor', 'current_client') client_type
FROM `project.dataset.table`
You can test, play with above using dummy data you provided in your question
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 userid, DATE '2018-02-01' date, 'online-store.com' hostname UNION ALL
SELECT 2, '2018-02-01', 'other' UNION ALL
SELECT 3, '2018-02-01', 'other' UNION ALL
SELECT 4, '2018-02-01', 'other' UNION ALL
SELECT 1, '2018-02-01', 'other' UNION ALL
SELECT 1, '2018-04-07', 'other' UNION ALL
SELECT 4, '2018-04-08', 'online-store.com' UNION ALL
SELECT 5, '2018-04-08', 'other' UNION ALL
SELECT 6, '2018-04-08', 'other' UNION ALL
SELECT 4, '2018-04-08', 'other' UNION ALL
SELECT 8, '2018-04-08', 'other' UNION ALL
SELECT 1, '2018-07-07', 'other' UNION ALL
SELECT 1, '2018-11-22', 'online-store.com'
)
SELECT
userid, date, hostname,
IF(0 = COUNTIF(hostname = 'online-store.com') OVER(
PARTITION BY userid ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
), 'visitor', 'current_client') client_type
FROM `project.dataset.table`
ORDER BY date
with result
Row userid date hostname client_type
1 1 2018-02-01 online-store.com current_client
2 1 2018-02-01 other current_client
3 2 2018-02-01 other visitor
4 3 2018-02-01 other visitor
5 4 2018-02-01 other visitor
6 1 2018-04-07 other current_client
7 4 2018-04-08 online-store.com current_client
8 4 2018-04-08 other current_client
9 5 2018-04-08 other visitor
10 6 2018-04-08 other visitor
11 8 2018-04-08 other visitor
12 1 2018-07-07 other current_client
13 1 2018-11-22 online-store.com current_client
This should be good:
#standardSQL
with userdates as (
select userid, hostname, min(date) as mindate from `dataset.table` where hostname = 'online-store.com' group by userid, hostname
)
select u.userid, u.date, u.hostname, case when u.date >= ud.mindate then 'current_user' else 'visitor' end as client_type
from `dataset.table` u
left outer join userdates ud on u.userid = ud.userid
order by 1, 2

Longest streak using Standard SQL

I have a table with fields:
user_id
tracking_date
with values
1, 2017-12-23
2, 2017-12-23
1, 2017-12-24
1, 2017-12-25
2, 2017-12-26
3, 2017-12-26
1, 2017-12-27
2, 2017-12-27
I would like to find the longest streak for all users as of today. So o/p of above query comes in form:
1, 1
2, 2
3, 0
Is there a way to achieve this o/p in a single SQL query.
This is tricky. For each user_id, you want to get latest date where there is no record on the previous date and the most recent date:
select user_id,
(case when max(tracking_date) <> current_date then 0
else (current_date -
max(case when prev_td is distinct from tracking_date - interval '1 day'
)
end) as seq
from (select t.*,
lag(tracking_date) over (partition by user_id order by tracking_date) as prev_td
from t
) t
group by user_id;

Id like to group by number of days (+ or -) and use min date

ID Date Count
1, 2014-05-01 1
1, 2014-05-04 1
1, 2014-05-10 1
2, 2014-05-02 1
2, 2014-05-03 1
2, 2014-05-09 1
if I was to group where the time difference +/- 5 days, this would become
ID Date Count
1, 2014-05-01 2
1, 2014-05-10 1
2, 2014-05-02 2
2, 2014-05-09 1
Is this possible in Sequel Server 2012? Any pointers would be greatly appreciated. Thanks
I think you want to start a new group when there is a gap of five days. So, if you had a record with (1, 2014-05-07), then you would have only one group for 1.
If so, the following will work:
select id, min(date), sum(count)
from (select t.*, sum(HasGap) over (partition by id order by date) as grpid
from (select t.*,
(case when datediff(day,
lag(date) over (partition by id order by date),
date) < 5
then 0 else 1
end) as HasGap
from table t
) t
) t
group by id, grpid;

How to group by Date Range starting from initial date

I have the following table structure
Key int
MemberID int
VisitDate DateTime
How can group all the dates falling with a given date range say 15 days..The first visit for the sameMember should be considered as the starting date.
eg
Key ID VisitDate(MM/dd/YY)
1 1 02/01/11
2 1 02/09/11
3 1 02/12/11
4 1 02/17/11
5 2 02/03/11
6 2 02/19/11
In this case the result should be
ID StartDate EndDate
1 02/01/11 02/12/11
1 02/17/11 02/17/11
2 02/03/11 02/03/11
2 02/19/11 02/19/11
One way to do this would be to use window aggregating. Here's how:
Setup:
DECLARE #data TABLE (
[Key] int, ID int, VisitDate date
);
INSERT INTO #data ([Key], ID, VisitDate)
SELECT 1, 1, '02/01/2011' UNION ALL
SELECT 2, 1, '02/09/2011' UNION ALL
SELECT 3, 1, '02/12/2011' UNION ALL
SELECT 4, 1, '02/17/2011' UNION ALL
SELECT 5, 2, '02/03/2011' UNION ALL
SELECT 6, 2, '02/19/2011';
Query:
WITH marked AS (
SELECT
*,
Grp = DATEDIFF(DAY, MIN(VisitDate) OVER (PARTITION BY ID), VisitDate) / 15
FROM #data
)
SELECT
ID,
StartDate = MIN(VisitDate),
EndDate = MAX(VisitDate)
FROM marked
GROUP BY ID, Grp
ORDER BY ID, StartDate
Output:
ID StartDate EndDate
----------- ---------- ----------
1 2011-02-01 2011-02-12
1 2011-02-17 2011-02-17
2 2011-02-03 2011-02-03
2 2011-02-19 2011-02-19
Basically, for each row, the query is calculating the difference of days between VisitDate and the first VisitDate for the same ID and divides it by 15. The result is then used as a grouping criterion. Note that SQL Server uses integer division when both operands of the / operator are integers.