I am trying to calculate the time between the first and second records. My thought was to add a ranking for each record and then do a calculation on RN 2 - RN 1. I'm struggling to actually get the subquery to do RN2-RN1.
SAMPLE Data:
user_id
date
rn
698998737289929044
2021-04-08 11:27:38
1
698998737289929044
2021-04-08 12:20:25
2
698998737289929044
2021-04-01 13:23:59
3
732850336550572910
2021-03-23 06:13:25
1
598830651911547855
2021-03-11 11:56:53
1
SELECT
user_id,
date,
row_number() over(partition by user_id order by date) as RN
FROM event_table
GROUP BY user_id, date
You can join the result with itself to get the first and second row.
For example:
with
q as (
-- your query here
)
select
f.user_id,
f.date,
s.date - f.date as diff
from q f
left join q s on s.user_id = f.user_id and s.rn = 2
where f.rn = 1
I have a table called event_user_fav_color_changed. Every row in the table represents the event that a user changes their favorite color. For every date in a certain range, I'd like to get every user's favorite color as of that date.
Here's a sample event_user_fav_color_changed table:
user_id date updated_at_datetime fav_color
1234 2020-01-01 2020-01-01 12:00:03 blue
1234 2020-01-05 2020-01-05 10:30:00 green
Here's a sample table with the users and dates I'm interested in:
user_id date
1234 2020-01-01
1234 2020-01-04
1234 2020-01-05
1234 2020-01-06
Here's the desired output:
user_id date fav_color
1234 2020-01-01 blue
1234 2020-01-04 blue
1234 2020-01-05 green
1234 2020-01-06 green
One option uses a correlated subquery. Assuming that your user/dates table is called data, you would do:
select
d.*,
(
select e.fav_color
from event_user_fav_color_changed e
where e.user_id = d.user_id and e.date <= d.date
order by e.date desc limit 1
)
from data d
You can use row_number() window function
select * from
(
select user_id, date, updated_at_datetime, fav_color,
row_number() over(partition by user_id,date order by updated_at_datetime desc) as rn
from tablename
)A where rn=1
It doesn't sound like you can constrain your lookup to any particular range. So basically each row has to search for the last occurence of an update.
select d.date,
(
select first_value(fav_color) over (order by updated_at_datetime desc)
from event_user_fav_color_changed
where updated_at_datetime < d.date
) as fav_as_of
from dates d
I don't know anything about Presto in particular but I believe this query ought to work.
One way to express this uses a join and row_number():
select uc.*
from (select ufcc.*,
row_number() over (partition by ufcc.user_id order by ufcc.date desc) as seqnum
from user_dates ud join
event_user_fav_color_changed ufcc
on ud.user_id = ufcc.user_id and
ud.date > ufcc.date
) uc
where seqnum = 1;
That can be inefficient if there are a lot of color changes. A join using lead() might be more efficient:
select ufcc.*
from user_dates ud join
(select ufcc.*,
lead(ufcc.date) over (partition by ufcc.user_id order by ufcc.date) as next_date
from event_user_fav_color_changed ufcc
) ufcc
on ud.user_id = ufcc.user_id and
ud.date > ufcc.date and
(ud.date <= ufcc.next_date or ufcc.next_date is null);
Or a lateral join:
select ufcc.*
from user_dates ud cross join lateral
(select ufcc.*
from event_user_fav_color_changed ufcc
where ud.user_id = ufcc.user_id and
ud.date > ufcc.date
order by ufcc.date desc
limit 1
) ud
I have a phonelog table that has information about callers' call history. I'd like to find out callers whose first and last call was to the same person on a given day.
Callerid Recipientid DateCalled
1 2 2019-01-01 09:00:00.000
1 3 2019-01-01 17:00:00.000
1 4 2019-01-01 23:00:00.000
2 5 2019-07-05 09:00:00.000
2 5 2019-07-05 17:00:00.000
2 3 2019-07-05 23:00:00.000
2 5 2019-07-06 17:00:00.000
2 3 2019-08-01 09:00:00.000
2 3 2019-08-01 17:00:00.000
2 4 2019-08-02 09:00:00.000
2 5 2019-08-02 10:00:00.000
2 4 2019-08-02 11:00:00.000
Expected Output
Callerid Recipientid Datecalled
2 5 2019-07-05
2 3 2019-08-01
2 4 2019-08-02
I wrote the below query but can't get it to return recipientid. Any help on this will be appreciated!
select pl.callerid,cast(pl.datecalled as date) as datecalled
from phonelog pl inner join (select callerid, cast(datecalled as date) as datecalled,
min(datecalled) as firstcall, max(datecalled) as lastcall
from phonelog
group by callerid, cast(datecalled as date)) as x
on pl.callerid = x.callerid and cast(pl.datecalled as date) = x.datecalled
and (pl.datecalled = x.firstcall or pl.datecalled = x.lastcall)
group by pl.callerid, cast(pl.datecalled as date)
having count(distinct recipientid) = 1
Another dbFiddle option
First, my prequery (PQ alias), I am getting for a given client, per day, the min and max time called but also HAVING to make sure person had at least 2 phone calls in a given day. From that, I re-join to the phone log table on the FIRST (MIN) call for the person for the given day. Then I join one more time for the LAST (MAX) call for the same person for the same day and make sure the recipient of the first is same as last.
I do not have to join on the stripped-down "JustDate" column used for the grouping as the MIN/MAX qualifies the FULL date/time.
select
PQ.JustDate,
PQ.CallerID,
pl1.RecipientID
from
( select
callerID,
convert( date, dateCalled ) JustDate,
min( DateCalled ) minDateCall,
max( DateCalled ) maxDateCall
from
PhoneLog pl
group by
callerID,
convert( date, dateCalled )
having
count(*) > 1) PQ
JOIN PhoneLog pl1
on PQ.CallerID = pl1.CallerID
AND PQ.minDateCall = pl1.dateCalled
JOIN PhoneLog pl2
on PQ.CallerID = pl2.CallerID
AND PQ.maxDateCall = pl2.dateCalled
AND pl1.RecipientID = pl2.RecipientID
Its very easy with window function
WITH cte AS (
SELECT *, CAST(DateCalled as DATE) DateCalled
,FIRST_VALUE(Recipientid) OVER (PARTITION BY Callerid ,CAST(DateCalled as date) ORDER BY CAST(DateCalled AS DATE)) f
,LAST_VALUE(Recipientid) OVER (PARTITION BY Callerid ,CAST(DateCalled as date) ORDER BY CAST(DateCalled AS DATE)) l
FROM phonelog
)
SELECT DISTINCT Callerid,Recipientid, DateCalled FROM cte
WHERE f=l
Since SQL Server 2019 you could use the first_value() and last_value() window functions.
SELECT DISTINCT
x1.callerid,
x1.fri,
x1.datecalled
FROM (SELECT pl1.callerid,
pl1.recipientid,
convert(date, pl1.datecalled) datecalled,
first_value(pl1.recipientid) OVER (PARTITION BY pl1.callerid,
convert(date, pl1.datecalled)
ORDER BY pl1.datecalled
RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) fri,
last_value(pl1.recipientid) OVER (PARTITION BY pl1.callerid,
convert(date, pl1.datecalled)
ORDER BY pl1.datecalled
RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) lri
FROM phonelog pl1) x1
WHERE x1.fri = x1.lri;
In older versions you can use correlated subqueries with TOP 1.
SELECT DISTINCT
x1.callerid,
x1.fri,
x1.datecalled
FROM (SELECT pl1.callerid,
pl1.recipientid,
convert(date, pl1.datecalled) datecalled,
(SELECT TOP 1
pl2.recipientid
FROM phonelog pl2
WHERE pl2.callerid = pl1.callerid
AND pl2.datecalled >= convert(date, pl1.datecalled)
AND pl2.datecalled < dateadd(day, 1, convert(date, pl1.datecalled))
ORDER BY pl2.datecalled ASC) fri,
(SELECT TOP 1
pl2.recipientid
FROM phonelog pl2
WHERE pl2.callerid = pl1.callerid
AND pl2.datecalled >= convert(date, pl1.datecalled)
AND pl2.datecalled < dateadd(day, 1, convert(date, pl1.datecalled))
ORDER BY pl2.datecalled DESC) lri
FROM phonelog pl1) x1
WHERE x1.fri = x1.lri;
db<>fiddle
If you don't want to return log rows where somebody just made one call on a day, which of course means the first and the last call of the day were to the same person, you can use GROUP BY and HAVING count(*) > 1 instead of DISTINCT.
SELECT x1.callerid,
x1.fri,
x1.datecalled
FROM (...) x1
WHERE x1.fri = x1.lri
GROUP BY x1.callerid,
x1.fri,
x1.datecalled
HAVING count(*) > 1;
You can use a CTE to compute the first and last call of each day by Callerid, and then self-JOIN that CTE to find callers whose first and last calls were to the same Recipientid:
WITH CTE AS (
SELECT Callerid, RecipientId, CONVERT(DATE, Datecalled) AS Datecalled,
ROW_NUMBER() OVER (PARTITION BY Callerid, CONVERT(DATE, Datecalled) ORDER BY Datecalled) AS rna,
ROW_NUMBER() OVER (PARTITION BY Callerid, CONVERT(DATE, Datecalled) ORDER BY Datecalled DESC) AS rnb
FROM phonelog
)
SELECT c1.Callerid, c1.RecipientId, c1.Datecalled
FROM CTE c1
JOIN CTE c2 ON c1.Callerid = c2.Callerid AND c1.Recipientid = c2.Recipientid
WHERE c1.rna = 1 AND c2.rnb = 1
Output:
Callerid RecipientId Datecalled
2 5 2019-07-05
2 3 2019-08-01
2 4 2019-08-02
Demo on SQLFiddle
As my understanding, you want to select callerid with each Recipientid with the times greater than 1 to make sure that we have First call and Last call. So you just need to group by 3 columns combine with having count(Recipientid) > 1 Like this
SELECT Callerid, Recipientid, CAST(Datecalled AS DATE) AS Datecalled
FROM phonelog
GROUP BY Callerid, Recipientid, CAST(Datecalled AS DATE)
HAVING COUNT(Recipientid) > 1
Demo on db<>fiddle
As per my understanding we have to rank Caller_id as well as Recipient_id along with the Date.
Below is my solution which is working well for this case.
with CTE as
(select *,
row_number() over (partition by callerid, convert(VARCHAR,datecalled,23) order by convert(VARCHAR,datecalled,23)) as first_recipient_id,
row_number() over (partition by receipientid, convert(VARCHAR,datecalled,23) order by convert(VARCHAR,datecalled,23) desc) as last_recipient_id
from activity
)
select t.callerid,t.receipientid,CONVERT(VARCHAR,t.datecalled) as DateCalled from CTE t
where t.first_recipient_id >1 AND t.last_recipient_id>1;
The result that I was able to get:
Result
I think we need to identify first and last call made by caller on a day and then compare it with first and last call by caller to a recipient for that day. Below code has firstcall and lastcall made by caller on a day. Then it finds first and last call by caller to respective recipient and then compare.
SELECT DISTINCT
callerid,
recipientid,
CONVERT(date,firstcall)
FROM
(
Select
callerid,
recipientid,
MIN(dateCalled) OVER(PARTITION BY callerid,CONVERT(date,DateCalled)) as firstcall,
MAX(DateCalled) OVER(PARTITION BY callerid,CONVERT(date,DateCalled)) as lastcall,
MIN(DateCalled) OVER(PARTITION BY callerid,recipientid,convert(date,DateCalled)) as recipfirstcall,
MAX(call_start_time) OVER(PARTITION BY callerid,recipientid,convert(date,DateCalled)) as reciplastcall
from phonelog
) as A
where A.firstcall=A.recipfirstcall and A.lastcall=A.reciplastcall
I have a table including more than 5 million rows of sales transactions. I would like to find sum of date intervals between each customer three recent purchases.
Suppose my table looks like this :
CustomerID ProductID ServiceStartDate ServiceExpiryDate
A X1 2010-01-01 2010-06-01
A X2 2010-08-12 2010-12-30
B X4 2011-10-01 2012-01-15
B X3 2012-04-01 2012-06-01
B X7 2012-08-01 2013-10-01
A X5 2013-01-01 2015-06-01
The Result that I'm looking for may looks like this :
CustomerID IntervalDays
A 802
B 135
I know the query need to first retrieve 3 resent transactions of each customer (based on ServiceStartDate) and then calculate the interval between startDate and ExpiryDate of his/her transactions.
You want to calculate the difference between the previous row's ServiceExpiryDate and the current row's ServiceStartDate based on descending dates and then sum up the last two differences:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc
, ServiceExpiryDate desc -- don't know if this 2nd column is necessary
) as rn
from tab
)
select t2.customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte as t2 left join cte as t1
on t1.customerId = t2.customerId
and t1.rn = t2.rn+1 -- previous and current row
where t2.rn <= 3 -- last three rows
group by t2.customerId;
Same result using LEAD:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc) as rn
,lead(ServiceExpiryDate)
over (partition by customerId
order by ServiceStartDate desc
) as prevEnd
from tab
)
select customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte
where rn <= 3
group by customerId;
Both will not return the expected result unless you subtract purchases (or max(rn)) from Intervaldays. But as you only sum two differences this seems to be not correct for me either...
Additional logic must be applied based on your rules regarding:
customer has less than 3 purchases
overlapping intervals
Assuming there are no overlaps, I think you want this:
select customerId,
sum(datediff(day, ServiceStartDate, ServieEndDate) as Intervaldays
from (select t.*, row_number() over (partition by customerId
order by ServiceStartDate desc) as seqnum
from table t
) t
where seqnum <= 3
group by customerId;
Try this:
SELECT dt.CustomerID,
SUM(DATEDIFF(DAY, dt.PrevExpiry, dt.ServiceStartDate)) As IntervalDays
FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY ServiceStartDate DESC) AS rn
, (SELECT Max(ti.ServiceExpiryDate)
FROM yourTable ti
WHERE t.CustomerID = ti.CustomerID
AND ti.ServiceStartDate < t.ServiceStartDate) As PrevExpiry
FROM yourTable t )dt
GROUP BY dt.CustomerID
Result will be:
CustomerId | IntervalDays
-----------+--------------
A | 805
B | 138
Given the following Contract table records
Id EmployeeId StartDate EndDate
1 5601 2011-01-01 2011-09-01
2 5601 2011-09-02 2012-05-01
3 5601 2012-02-01 2012-08-01
4 5602 2011-01-01 2011-09-01
5 5602 2011-07-01 2012-10-01
Every Employee could have multiple contract
I'm trying to find invalid contract which StartDate is bigger than EndDate for each Employee.
For the given result Id=3 and Id=5 is invalid .
What i have done is :
SELECT a.Id
FROM Contracts a
GROUP BY a.EmpId
HAVING a.StartDate > a.EndDate
But I get this error :
Column 'Contract.Id' is invalid in the HAVING clause because it is not contained in either an aggregate function or the group by clause.
Any idea ?
If I understood correctly, you want records where StartDate is not bigger then previous EndDate?
You can do that using CTE and ROW_NUMBER() function - joining the previous and current record.
WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY StartDate) RN
FROM Contracts
)
SELECT * FROM CTE c1
INNER JOIN CTE c2 ON c1.RN + 1 = c2.RN AND c1.EmployeeID = c2.EmployeeID
WHERE c1.EndDATE > c2.StartDate
You may try:
SELECT a.Id, a.EmpId
FROM Contracts a
WHERE a.StartDate > a.EndDate
GROUP BY a.Id, a.EmpId