Time Between First and Second Records SQL - sql

I am trying to calculate the time between the first and second records. My thought was to add a ranking for each record and then do a calculation on RN 2 - RN 1. I'm struggling to actually get the subquery to do RN2-RN1.
SAMPLE Data:
user_id
date
rn
698998737289929044
2021-04-08 11:27:38
1
698998737289929044
2021-04-08 12:20:25
2
698998737289929044
2021-04-01 13:23:59
3
732850336550572910
2021-03-23 06:13:25
1
598830651911547855
2021-03-11 11:56:53
1
SELECT
user_id,
date,
row_number() over(partition by user_id order by date) as RN
FROM event_table
GROUP BY user_id, date

You can join the result with itself to get the first and second row.
For example:
with
q as (
-- your query here
)
select
f.user_id,
f.date,
s.date - f.date as diff
from q f
left join q s on s.user_id = f.user_id and s.rn = 2
where f.rn = 1

Related

How to fill table with missed dates in PostgreSQL with previous data

I have a table:
date
user_id
state
8/12/2021
1
visit
9/12/2021
1
registered
12/12/2021
1
order
In this table I only have updated of state of users, but I don't see the state by some particular date. How can I add rows with missing dates and fill them with previous value, so that the table will be:
date
user_id
state
8/12/2021
1
visit
9/12/2021
1
registered
10/12/2021
1
registered
11/12/2021
1
registered
12/12/2021
1
order
Here's one attempt. The cte user_dates gets min and max dates for each user that is then fed to generate_series. I.e. each user is associated with all dates between there first and last date.
In the inner select we create a group for each first_value and consecutive null states.
In the outer select we pick the first_value for each such grp.
with user_dates(f, t, user_id) as (
select min(T.dt), max(T.dt), user_id
from T
group by user_id
)
select user_id, dt, grp, first_value(state) over (partition by user_id, grp order by dt)
from (
select ud.user_id
, cal.dt::date
, state
, count(T.state) over (partition by user_id
order by cal.dt) as grp
from user_dates ud
cross join generate_series(ud.f::timestamp, ud.t::timestamp , interval '1 day') cal (dt)
left join T
using (dt, user_id)
) as tmp
order by user_id, dt
;
user_id dt grp first_value
1 2021-12-08 1 visit
1 2021-12-09 2 registered
1 2021-12-10 2 registered
1 2021-12-11 2 registered
1 2021-12-12 3 order
You can remove grp from the select, it's merely there for informative purposes.
Fiddle

SQL: How to create a daily view based on different time intervals using SQL logic?

Here is an example:
Id|price|Date
1|2|2022-05-21
1|3|2022-06-15
1|2.5|2022-06-19
Needs to look like this:
Id|Date|price
1|2022-05-21|2
1|2022-05-22|2
1|2022-05-23|2
...
1|2022-06-15|3
1|2022-06-16|3
1|2022-06-17|3
1|2022-06-18|3
1|2022-06-19|2.5
1|2022-06-20|2.5
...
Until today
1|2022-08-30|2.5
I tried using the lag(price) over (partition by id order by date)
But i can't get it right.
I'm not familiar with Azure, but it looks like you need to use a calendar table, or generate missing dates using a recursive CTE.
To get started with a recursive CTE, you can generate line numbers for each id (assuming multiple id values) in the source data ordered by date. These rows with row number equal to 1 (with the minimum date value for the corresponding id) will be used as the starting point for the recursion. Then you can use the DATEADD function to generate the row for the next day. To use the price values ​​from the original data, you can use a subquery to get the price for this new date, and if there is no such value (no row for this date), use the previous price value from CTE (use the COALESCE function for this).
For SQL Server query can look like this
WITH cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATEADD(d, 1, cte.date),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATEADD(d, 1, cte.date)),
cte.price
)
FROM cte
WHERE DATEADD(d, 1, cte.date) <= GETDATE()
)
SELECT * FROM cte
ORDER BY id, date
OPTION (MAXRECURSION 0)
Note that I added OPTION (MAXRECURSION 0) to make the recursion run through all the steps, since the default value is 100, this is not enough to complete the recursion.
db<>fiddle here
The same approach for MySQL (you need MySQL of version 8.0 to use CTE)
WITH RECURSIVE cte AS (
SELECT
id,
date,
price
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS rn
FROM tbl
) t
WHERE rn = 1
UNION ALL
SELECT
cte.id,
DATE_ADD(cte.date, interval 1 day),
COALESCE(
(SELECT tbl.price
FROM tbl
WHERE tbl.id = cte.id AND tbl.date = DATE_ADD(cte.date, interval 1 day)),
cte.price
)
FROM cte
WHERE DATE_ADD(cte.date, interval 1 day) <= NOW()
)
SELECT * FROM cte
ORDER BY id, date
db<>fiddle here
Both queries produces the same results, the only difference is the use of the engine's specific date functions.
For MySQL versions below 8.0, you can use a calendar table since you don't have CTE support and can't generate the required date range.
Assuming there is a column in the calendar table to store date values ​​(let's call it date for simplicity) you can use the CROSS JOIN operator to generate date ranges for the id values in your table that will match existing dates. Then you can use a subquery to get the latest price value from the table which is stored for the corresponding date or before it.
So the query would be like this
SELECT
d.id,
d.date,
(SELECT
price
FROM tbl
WHERE tbl.id = d.id AND tbl.date <= d.date
ORDER BY tbl.date DESC
LIMIT 1
) price
FROM (
SELECT
t.id,
c.date
FROM calendar c
CROSS JOIN (SELECT DISTINCT id FROM tbl) t
WHERE c.date BETWEEN (
SELECT
MIN(date) min_date
FROM tbl
WHERE tbl.id = t.id
)
AND NOW()
) d
ORDER BY id, date
Using my pseudo-calendar table with date values ranging from 2022-05-20 to 2022-05-30 and source data in that range, like so
id
price
date
1
2
2022-05-21
1
3
2022-05-25
1
2.5
2022-05-28
2
10
2022-05-25
2
100
2022-05-30
the query produces following results
id
date
price
1
2022-05-21
2
1
2022-05-22
2
1
2022-05-23
2
1
2022-05-24
2
1
2022-05-25
3
1
2022-05-26
3
1
2022-05-27
3
1
2022-05-28
2.5
1
2022-05-29
2.5
1
2022-05-30
2.5
2
2022-05-25
10
2
2022-05-26
10
2
2022-05-27
10
2
2022-05-28
10
2
2022-05-29
10
2
2022-05-30
100
db<>fiddle here

SQL to get most recent value as of each date in the past

I have a table called event_user_fav_color_changed. Every row in the table represents the event that a user changes their favorite color. For every date in a certain range, I'd like to get every user's favorite color as of that date.
Here's a sample event_user_fav_color_changed table:
user_id date updated_at_datetime fav_color
1234 2020-01-01 2020-01-01 12:00:03 blue
1234 2020-01-05 2020-01-05 10:30:00 green
Here's a sample table with the users and dates I'm interested in:
user_id date
1234 2020-01-01
1234 2020-01-04
1234 2020-01-05
1234 2020-01-06
Here's the desired output:
user_id date fav_color
1234 2020-01-01 blue
1234 2020-01-04 blue
1234 2020-01-05 green
1234 2020-01-06 green
One option uses a correlated subquery. Assuming that your user/dates table is called data, you would do:
select
d.*,
(
select e.fav_color
from event_user_fav_color_changed e
where e.user_id = d.user_id and e.date <= d.date
order by e.date desc limit 1
)
from data d
You can use row_number() window function
select * from
(
select user_id, date, updated_at_datetime, fav_color,
row_number() over(partition by user_id,date order by updated_at_datetime desc) as rn
from tablename
)A where rn=1
It doesn't sound like you can constrain your lookup to any particular range. So basically each row has to search for the last occurence of an update.
select d.date,
(
select first_value(fav_color) over (order by updated_at_datetime desc)
from event_user_fav_color_changed
where updated_at_datetime < d.date
) as fav_as_of
from dates d
I don't know anything about Presto in particular but I believe this query ought to work.
One way to express this uses a join and row_number():
select uc.*
from (select ufcc.*,
row_number() over (partition by ufcc.user_id order by ufcc.date desc) as seqnum
from user_dates ud join
event_user_fav_color_changed ufcc
on ud.user_id = ufcc.user_id and
ud.date > ufcc.date
) uc
where seqnum = 1;
That can be inefficient if there are a lot of color changes. A join using lead() might be more efficient:
select ufcc.*
from user_dates ud join
(select ufcc.*,
lead(ufcc.date) over (partition by ufcc.user_id order by ufcc.date) as next_date
from event_user_fav_color_changed ufcc
) ufcc
on ud.user_id = ufcc.user_id and
ud.date > ufcc.date and
(ud.date <= ufcc.next_date or ufcc.next_date is null);
Or a lateral join:
select ufcc.*
from user_dates ud cross join lateral
(select ufcc.*
from event_user_fav_color_changed ufcc
where ud.user_id = ufcc.user_id and
ud.date > ufcc.date
order by ufcc.date desc
limit 1
) ud

Caller whose first and last call was to the same person

I have a phonelog table that has information about callers' call history. I'd like to find out callers whose first and last call was to the same person on a given day.
Callerid Recipientid DateCalled
1 2 2019-01-01 09:00:00.000
1 3 2019-01-01 17:00:00.000
1 4 2019-01-01 23:00:00.000
2 5 2019-07-05 09:00:00.000
2 5 2019-07-05 17:00:00.000
2 3 2019-07-05 23:00:00.000
2 5 2019-07-06 17:00:00.000
2 3 2019-08-01 09:00:00.000
2 3 2019-08-01 17:00:00.000
2 4 2019-08-02 09:00:00.000
2 5 2019-08-02 10:00:00.000
2 4 2019-08-02 11:00:00.000
Expected Output
Callerid Recipientid Datecalled
2 5 2019-07-05
2 3 2019-08-01
2 4 2019-08-02
I wrote the below query but can't get it to return recipientid. Any help on this will be appreciated!
select pl.callerid,cast(pl.datecalled as date) as datecalled
from phonelog pl inner join (select callerid, cast(datecalled as date) as datecalled,
min(datecalled) as firstcall, max(datecalled) as lastcall
from phonelog
group by callerid, cast(datecalled as date)) as x
on pl.callerid = x.callerid and cast(pl.datecalled as date) = x.datecalled
and (pl.datecalled = x.firstcall or pl.datecalled = x.lastcall)
group by pl.callerid, cast(pl.datecalled as date)
having count(distinct recipientid) = 1
Another dbFiddle option
First, my prequery (PQ alias), I am getting for a given client, per day, the min and max time called but also HAVING to make sure person had at least 2 phone calls in a given day. From that, I re-join to the phone log table on the FIRST (MIN) call for the person for the given day. Then I join one more time for the LAST (MAX) call for the same person for the same day and make sure the recipient of the first is same as last.
I do not have to join on the stripped-down "JustDate" column used for the grouping as the MIN/MAX qualifies the FULL date/time.
select
PQ.JustDate,
PQ.CallerID,
pl1.RecipientID
from
( select
callerID,
convert( date, dateCalled ) JustDate,
min( DateCalled ) minDateCall,
max( DateCalled ) maxDateCall
from
PhoneLog pl
group by
callerID,
convert( date, dateCalled )
having
count(*) > 1) PQ
JOIN PhoneLog pl1
on PQ.CallerID = pl1.CallerID
AND PQ.minDateCall = pl1.dateCalled
JOIN PhoneLog pl2
on PQ.CallerID = pl2.CallerID
AND PQ.maxDateCall = pl2.dateCalled
AND pl1.RecipientID = pl2.RecipientID
Its very easy with window function
WITH cte AS (
SELECT *, CAST(DateCalled as DATE) DateCalled
,FIRST_VALUE(Recipientid) OVER (PARTITION BY Callerid ,CAST(DateCalled as date) ORDER BY CAST(DateCalled AS DATE)) f
,LAST_VALUE(Recipientid) OVER (PARTITION BY Callerid ,CAST(DateCalled as date) ORDER BY CAST(DateCalled AS DATE)) l
FROM phonelog
)
SELECT DISTINCT Callerid,Recipientid, DateCalled FROM cte
WHERE f=l
Since SQL Server 2019 you could use the first_value() and last_value() window functions.
SELECT DISTINCT
x1.callerid,
x1.fri,
x1.datecalled
FROM (SELECT pl1.callerid,
pl1.recipientid,
convert(date, pl1.datecalled) datecalled,
first_value(pl1.recipientid) OVER (PARTITION BY pl1.callerid,
convert(date, pl1.datecalled)
ORDER BY pl1.datecalled
RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) fri,
last_value(pl1.recipientid) OVER (PARTITION BY pl1.callerid,
convert(date, pl1.datecalled)
ORDER BY pl1.datecalled
RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) lri
FROM phonelog pl1) x1
WHERE x1.fri = x1.lri;
In older versions you can use correlated subqueries with TOP 1.
SELECT DISTINCT
x1.callerid,
x1.fri,
x1.datecalled
FROM (SELECT pl1.callerid,
pl1.recipientid,
convert(date, pl1.datecalled) datecalled,
(SELECT TOP 1
pl2.recipientid
FROM phonelog pl2
WHERE pl2.callerid = pl1.callerid
AND pl2.datecalled >= convert(date, pl1.datecalled)
AND pl2.datecalled < dateadd(day, 1, convert(date, pl1.datecalled))
ORDER BY pl2.datecalled ASC) fri,
(SELECT TOP 1
pl2.recipientid
FROM phonelog pl2
WHERE pl2.callerid = pl1.callerid
AND pl2.datecalled >= convert(date, pl1.datecalled)
AND pl2.datecalled < dateadd(day, 1, convert(date, pl1.datecalled))
ORDER BY pl2.datecalled DESC) lri
FROM phonelog pl1) x1
WHERE x1.fri = x1.lri;
db<>fiddle
If you don't want to return log rows where somebody just made one call on a day, which of course means the first and the last call of the day were to the same person, you can use GROUP BY and HAVING count(*) > 1 instead of DISTINCT.
SELECT x1.callerid,
x1.fri,
x1.datecalled
FROM (...) x1
WHERE x1.fri = x1.lri
GROUP BY x1.callerid,
x1.fri,
x1.datecalled
HAVING count(*) > 1;
You can use a CTE to compute the first and last call of each day by Callerid, and then self-JOIN that CTE to find callers whose first and last calls were to the same Recipientid:
WITH CTE AS (
SELECT Callerid, RecipientId, CONVERT(DATE, Datecalled) AS Datecalled,
ROW_NUMBER() OVER (PARTITION BY Callerid, CONVERT(DATE, Datecalled) ORDER BY Datecalled) AS rna,
ROW_NUMBER() OVER (PARTITION BY Callerid, CONVERT(DATE, Datecalled) ORDER BY Datecalled DESC) AS rnb
FROM phonelog
)
SELECT c1.Callerid, c1.RecipientId, c1.Datecalled
FROM CTE c1
JOIN CTE c2 ON c1.Callerid = c2.Callerid AND c1.Recipientid = c2.Recipientid
WHERE c1.rna = 1 AND c2.rnb = 1
Output:
Callerid RecipientId Datecalled
2 5 2019-07-05
2 3 2019-08-01
2 4 2019-08-02
Demo on SQLFiddle
As my understanding, you want to select callerid with each Recipientid with the times greater than 1 to make sure that we have First call and Last call. So you just need to group by 3 columns combine with having count(Recipientid) > 1 Like this
SELECT Callerid, Recipientid, CAST(Datecalled AS DATE) AS Datecalled
FROM phonelog
GROUP BY Callerid, Recipientid, CAST(Datecalled AS DATE)
HAVING COUNT(Recipientid) > 1
Demo on db<>fiddle
As per my understanding we have to rank Caller_id as well as Recipient_id along with the Date.
Below is my solution which is working well for this case.
with CTE as
(select *,
row_number() over (partition by callerid, convert(VARCHAR,datecalled,23) order by convert(VARCHAR,datecalled,23)) as first_recipient_id,
row_number() over (partition by receipientid, convert(VARCHAR,datecalled,23) order by convert(VARCHAR,datecalled,23) desc) as last_recipient_id
from activity
)
select t.callerid,t.receipientid,CONVERT(VARCHAR,t.datecalled) as DateCalled from CTE t
where t.first_recipient_id >1 AND t.last_recipient_id>1;
The result that I was able to get:
Result
I think we need to identify first and last call made by caller on a day and then compare it with first and last call by caller to a recipient for that day. Below code has firstcall and lastcall made by caller on a day. Then it finds first and last call by caller to respective recipient and then compare.
SELECT DISTINCT
callerid,
recipientid,
CONVERT(date,firstcall)
FROM
(
Select
callerid,
recipientid,
MIN(dateCalled) OVER(PARTITION BY callerid,CONVERT(date,DateCalled)) as firstcall,
MAX(DateCalled) OVER(PARTITION BY callerid,CONVERT(date,DateCalled)) as lastcall,
MIN(DateCalled) OVER(PARTITION BY callerid,recipientid,convert(date,DateCalled)) as recipfirstcall,
MAX(call_start_time) OVER(PARTITION BY callerid,recipientid,convert(date,DateCalled)) as reciplastcall
from phonelog
) as A
where A.firstcall=A.recipfirstcall and A.lastcall=A.reciplastcall

Cumulative distinct count with Spark SQL

Using Spark 1.6.2.
Here the data:
day | visitorID
-------------
1 | A
1 | B
2 | A
2 | C
3 | A
4 | A
I want to count how many distinct visitors by day + cumul with the day before (I dont know the exact term for that, sorry).
This should give:
day | visitors
--------------
1 | 2 (A+B)
2 | 3 (A+B+C)
3 | 3
4 | 3
Tried self-join but really too slow
I am sure windowed function is what I am looking for but didnt manage to find it :/
You should be able to do:
select day, max(visitors) as visitors
from (select day,
count(distinct visitorId) over (order by day) as visitors
from t
) d
group by day;
Actually, I think a better approach is to record a visitor only on the first day s/he appears:
select startday, sum(count(*)) over (order by startday) as visitors
from (select visitorId, min(day) as startday
from t
group by visitorId
) t
group by startday
order by startday;
In SQL, you could do this.
select t1.day,sum(max(t.cnt)) over(order by t1.day) as visitors
from tbl t1
left join (select minday,count(*) as cnt
from (select visitorID,min(day) as minday
from tbl
group by visitorID
) t
group by minday
) t
on t1.day=t.minday
group by t1.day
Get the first day a visitorID appears using min.
Count the rows per such minday found above.
Left join this to your original table and get the cumulative sum.
Another approach would be
select t1.day,sum(count(t.visitorid)) over(order by t1.day) as cnt
from tbl t1
left join (select visitorID,min(day) as minday
from tbl
group by visitorID
) t
on t1.day=t.minday and t.visitorid=t1.visitorid
group by t1.day
Try it's
select
day,
count(*),
(
select count(*) from your_table b
where a.day >= b.day
) cumulative
from your_table as a
group by a.day
order by 1