Find repeating phone numbers between a 7 day range - sql

I have a phone and a call date field. I need to find all phone and call dates where calls were made more than once (>1) within a 7 day period.
What is the best approach to this?
Example:
ID|Phone|CallDate
-----------------
1|5551212|11/21/2020
2|5551212|11/22/2020
3|5551212|10/9/2020
4|4441212|11/22/2020
5|4441212|11/1/2020
output:
5551212|11/21/2020
5551212|11/22/2020
Here's an example query I tried but I assume I can do better (besides, its taking a very long time, over 1 million records):
SELECT A1.Phone
FROM CallDetail A1, CallDetail A2
WHERE (A1.Phone = A2.Phone) AND (A1.ID <> A2.ID)
GROUP BY A1.Phone, A1.CallDate, A2.CallDate
HAVING COUNT(A1.Phone) > 1 AND DATEDIFF(DAY, A1.CallDate, A2.CallDate) <= 7

You seem to want lead() and lag() to compare the calldate on one row to the nearest calldate before or after:
select cd.*
from (select cd.*
lag(cd.calldate) over (partition by cd.phone order by cd.calldate) as prev_calldate,
lead(cd.calldate) over (partition by cd.phone order by cd.calldate) as next_calldate
from calldetail cd
) cd
where prev_calldate > dateadd(day, -7, calldate) or
next_calldate < dateadd(day, 7, calldate)
order by phone, calldate;

Related

Date filtering in SQL

Table below consists of 2 columns: a unique identifier and date. I am trying to build a new column of episodes, where a new episode would be triggered when >= 3 months between dates. This process should occur for each unique EMID. In the table attached, EMID ending in 98 would only have 1 episode, there are no intervals >2 months between each row in the date column. However, EMID ending in 03 would have 2 episodes, as there is almost a 3 year gap between rows 12 and 13. I have tried the following code, which doesn't work.
Table:
SELECT TOP (1000) [EMID],[Date]
CASE
WHEN DATEDIFF(month, Date, LEAD Date) <3
THEN "1"
ELSE IF DATEDIFF(month, Date, LEAD Date) BETWEEN 3 AND 5
THEN "2"
ELSE "3"
END episode
FROM [res_treatment_escalation].[dbo].[cspine42920a]
EDIT: Using Microsoft SQL Server Management Studio.
EDIT 2: I have made some progress but the output is not exactly what I am looking for. Here is the query I used:
SELECT TOP (1000) [EMID],[visit_date_01],
CASE
WHEN DATEDIFF(DAY, visit_date_01, LAG(visit_date_01,1,getdate()) OVER (partition by EMID order by EMID)) <= 90 THEN '1'
WHEN DATEDIFF(DAY, visit_date_01, LAG(visit_date_01,1,getdate()) OVER (PARTITION BY EMID ORDER BY EMID)) BETWEEN 90 AND 179 THEN '2'
WHEN DATEDIFF(DAY, visit_date_01, LAG(visit_date_01,1,getdate()) OVER (PARTITION BY EMID order by EMID)) > 180 THEN '3'
END AS EPISODE
FROM [res_treatment_escalation].[dbo].['c-spine_full_dataset_4#29#20_wi$']
table2Here is the actual vs expected output
The partition by EMID does not seem to be working correctly. Every time there is a new EMID a new episode is triggered. I am using day instead of month as the filter in DATEDIFF- this does not seem to recognize new episodes within the same EMID
Hmmm: Use LAG() to get the previous date. Use a date comparison to assign a flag and then a cumulative sum:
select c.*,
sum(case when prev_date > dateadd(month, -3, date) then 0 else 1 end) over
(partition by emid order by date) as episode_number
from (select c.*, lag(date) over (partition by emid order by date) as prev_date
from res_treatment_escalation.dbo.cspine42920a c
) c;

To find from a table if there is any nearer date values for the same key

I have an sql table (t_accountdetails) with an account column called AccountId and effective date column for that account. An account can have multiple effective date. I have a requirement to get all the entries for the accounts which has very close effective date entries.(an offset of +/-14 days)
Say for eg:
AccountId: 12345 has got 2 entries with effective date 12/11/2017 and 12/18/2017
So my query should return above case where we have an entry of effective dates within offset of +/-14days
Please note I am actually not looking for date +-14 from today. I am looking for effective date which +/- 14 days of another effective date for the same account
You want all records where exists another effective date within 14 days, so use WHERE EXISTS:
select *
from t_accountdetails t
where exists
(
select *
from t_accountdetails other
where other.accountid = t.accountid
and other.id <> t.id
and abs(datediff(day, other.effective_date, t.effective_date)) <= 14
)
order by accountid, effective_date;
You can use the DATEADD function to make it work
select * from t_accountdetails where AccountId = 12345 and effectiveDate >= DATEADD(day, -14, getdate()) and effectiveDate <= DATEADD(day, 14, getdate())
This will return all records with AccountID = 12345 and an effective date between today - 14 days and today + 14 days.
Note: if more than one record match the criteria then all matching records will be returned.
I would be inclined to use lag() and lead():
select ad.*
from (select ad.*,
lag(effective_date) over (partition by accountid order by effective_date) as prev_ed,
lead(effective_date) over (partition by accountid order by effective_date) as next_ed
from t_accountdetails ad
) ad
where effective_date <= dateadd(day, 14, prev_ed) or
effective_date >= dateadd(day, -14, next_ed);
It would be interesting to compare the performance of this version to the exists version with an index on t_accountdetails(accountid, effective_date).

Order by decimals for counts of same value

EDIT: I have edited this question to make the query simpler:
ReportTracking:
Userid, ReportId, Duration, CreatedDate
Query:
SELECT t.UserId, COUNT(DISTINCT(t.ReportId)) AS ReportsRead
FROM ReportTracking t
WHERE t.Duration >= 30
AND t.CreatedDate > DATEADD(Day, -30, GETDATE())
GROUP BY t.UserId
Sample Result:
UserId ReportsRead
1 22
2 13
3 2
4 2
5 2
What I need to do is assign a number value to Reports Read. Essentially because there are 3 users who read swimming and they tie in terms of ranking (they each have 2 read only) I need to order them by who read the report last. I need to assign them all a decimal number value based on order of reading. So the person who read the report last would get .1, the person who read it first would get .3.
I'm not quite sure how to achieve this, the key part is that they do have have a decimal number value that ranks them and this decimal should be few decimal points long as the records are rather long. My idea was to use DateCreated and convert it a number value which I can substract from a max. But since there are multiple dates (one for each report), I'm not sure how to grab the latest one and only use that date with my report count.
I'm not sure why you need to assign decimals...
Just order by ReportsRead desc, max(createdDate) (this should be most recent read for a user in the select).
Also distinct isn't a function it's a statement. No need for the ()
SELECT t.UserId
, COUNT(DISTINCT t.ReportId) AS ReportsRead
max(t.createDate) Asc) RN
FROM ReportTracking t
WHERE t.Duration >= 30
AND t.CreatedDate > DATEADD(Day, -30, GETDATE())
GROUP BY t.UserId
ORDER BY ReportsRead DESC, max(createdDate)
if you need the numbers and plan on displaying them
WITH CTE AS (
SELECT t.UserId
, COUNT(DISTINCT t.ReportId) AS ReportsRead
, row_number() over (partition by count(Distinct t.reportID) order by max(t.createDate) Asc) RN
FROM ReportTracking t
WHERE t.Duration >= 30
AND t.CreatedDate > DATEADD(Day, -30, GETDATE())
GROUP BY t.UserId)
SELECT *
FROM CTE
ORDER BY ReportsRead DESC, RN
You can rank your rows within ReportsRead partition to obtain a ranking by ordering on the max(createddate). documentation: SQL Server Rank function
here is an example: http://sqlfiddle.com/#!18/1eefc/11
You may simplify the query by using CTE to reuse column aliases but the concept is:
SELECT t.UserId
, COUNT(DISTINCT( t.ReportId )) AS ReportsRead
, CAST(RANK()
OVER(
partition BY COUNT(DISTINCT( t.ReportId ))
ORDER BY MAX(t.createdDate) DESC) AS DECIMAL) / 10 ranking
FROM ReportTracking t
WHERE t.Duration >= 30
AND t.CreatedDate > DATEADD(Day, -30, GETDATE())
GROUP BY t.UserId
ORDER BY ReportsRead DESC
, ranking;

How to get value by a range of dates?

I have a table like so
And With this code I get the 5 latest values for each domainId
;WITH grp AS
(
SELECT DomainId, [Date],Passed, DatabasePerformance,ServerPerformance,
rn = ROW_NUMBER() OVER
(PARTITION BY DomainId ORDER BY [Date] DESC)
FROM dbo.DomainDetailDataHistory H
)
SELECT g.DomainId, g.[Date],g.Passed, g.ServerPerformance, g.DatabasePerformance
FROM grp g
INNER JOIN #Latest T ON T.DomainId = g.DomainId
WHERE rn < 7 AND t.date != g.[Date]
ORDER BY DomainId, [Date] DESC
What I Want
Well I would like to know how many tickets were sold for each of these 5 latest rows but with the following condition:
Each of these rows come with their own date which differs.
for each date I want to check how many were sold the last 15minutes AND how many were sold the last 30mns.
Example:
I get these 5 rows for each domainId
I want to extend the above with two columns, "soldTicketsLast15" and "soldTicketsLast30"
The date column contains all the dates I need and for each of these dates I want to go back 15 min and go back 30min to and get how many tickets were sold
Example:
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] >= DATEADD(minute, -15, '2016-04-12 12:10:28.2270000')
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] >= DATEADD(minute, -30, '2016-04-12 12:10:28.2270000')
How can i accomplish this?
I'd use OUTER APPLY or CROSS APPLY.
;WITH grp AS
(
SELECT
DomainId, [Date], Passed, DatabasePerformance, ServerPerformance,
rn = ROW_NUMBER() OVER (PARTITION BY DomainId ORDER BY [Date] DESC)
FROM dbo.DomainDetailDataHistory H
)
SELECT
g.DomainId, g.[Date],g.Passed, g.ServerPerformance, g.DatabasePerformance
,A15.SoldTicketsLast15
,A30.SoldTicketsLast30
FROM
grp g
INNER JOIN #Latest T ON T.DomainId = g.DomainId
OUTER APPLY
(
SELECT MAX(H.SoldTickets) - MIN(H.SoldTickets) AS SoldTicketsLast15
FROM DomainDetailDataHistory AS H
WHERE
H.DomainId = g.DomainId AND
H.[Date] >= DATEADD(minute, -15, g.[Date])
) AS A15
OUTER APPLY
(
SELECT MAX(H.SoldTickets) - MIN(H.SoldTickets) AS SoldTicketsLast30
FROM DomainDetailDataHistory AS H
WHERE
H.DomainId = g.DomainId AND
H.[Date] >= DATEADD(minute, -30, g.[Date])
) AS A30
WHERE
rn < 7
AND T.[date] != g.[Date]
ORDER BY DomainId, [Date] DESC;
To make the correlated APPLY queries efficient there should be an appropriate index, like the following:
CREATE NONCLUSTERED INDEX [IX_DomainId_Date] ON [dbo].[DomainDetailDataHistory]
(
[DomainId] ASC,
[Date] ASC
)
INCLUDE ([SoldTickets])
This index may also help to make the main part of your query (grp) efficient.
If I understood your question correctly, you want to get the tickets sold from one of your dates (in the Date column) going back 15 minutes and 30 minutes. Assuming that you are using your DATEADD function correctly, the following should work:
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] BETWEEN [DATE] AND DATEADD(minute, -15, '2016-04-12 12:10:28.2270000') GROUP BY [SoldTickets]
The between operator allows you to retrieve results between two date parameters. In the SQL above, we also need a group by since you are using a GROUPING function (MAX). The group by would depend on what you want to group by but I think in your case it would be SoldTickets.
The SQL above will give you the ones between the date and 15 minutes back. You could do something similar with the 30 minutes back.

Group by contiguous dates and Count

I have a table which contains information about reports being accessed along with the Date.I need to group reports being accessed according to a date range and count them.
I'm using T-SQL
Table
EventId ReportId Date
60 4 11/24/2015
59 11 11/23/2015
58 6 11/22/2015
57 11 11/22/2015
56 9 11/21/2015
55 3 11/20/2015
54 5 11/20/2015
53 6 11/19/2015
52 5 11/19/2015
51 4 11/18/2015
50 3 11/17/2015
49 9 11/16/2015
If days' difference is 3 then I need result in the format
StartDate EndDate ReportsAccessed
11/22/2015 11/24/2015 4
11/19/2015 11/21/2015 5
11/16/2015 11/18/2015 3
but the difference between days could change.
Assuming you have values for all the dates, then you can calculate the difference in days between each date and the maximum (or minimum) date. Then divide this by three and use that for aggregation:
select min(date), max(date), count(*) as ReportsAccessed
from (select t.*, max(date) over () as maxd
from table t
) t
group by (datediff(day, date, maxd) / 3)
order by min(date);
"3" is what I think you are referring to as the "difference in days".
Those 2 blocks are simply for added clarity on what parameters you'd have to change
DECLARE #t as TABLE(
id int identity(1,1),
reportId int,
dateAccess date)
DECLARE #NumberOfDays int=3;
And here comes the actual select
Select StartDate, EndDate, COUNT(reportId) from
(
select *,
DATEADD(day, DATEDIFF(DAY, dateAccess, maxdate.maxdate)%#NumberOfDays, dateAccess) as EndDate,
DATEADD(day, DATEDIFF(DAY, dateAccess, maxdate.maxdate)%#NumberOfDays-#NumberOfDays+1, dateAccess) as StartDate
from #t, (select MAX(dateAccess) maxdate from #t t2) maxdate
) results
GROUP BY StartDate, EndDate
ORDER BY StartDate desc
There are a few places I'm unsure if it's optimized or not, for instance cross joining with select max(date) instead of using a subquery, but that returns the exact result from your OP.
Basically, I simply split the entries into groups based on how far they are from the MAX(date), and then use a COUNT. On that note, it might be more useful to use COUNT(distinct ...) otherwise if someone looks at the document #9 3 times, it will tell you tha 3 documents were checked, but only 1 was truly looked at.
The upside with using MAX(date) over MIN(date) is that your first group will always have the maximal amount of days. This will prove very useful if you want to compare the last few periods to the average. The downside is that you don't have stable data. With every new entry (assuming it's a new day), your query will cycle itself to produce a new set of results. If you wanted to graph the data, you'd be better comparing to MIN(date) that way the first days won't change when you add a new one.
Depending on the usage, it could even be useful to extrapolate the number of accesses done in the last period (in that case MIN(date) is also preferable).
Here's an adaptation of Gordon's answer that's probably much more optimized (it's at the very least much more aesthetic) :
SELECT DateADD(day, -datediff(day, dateAccess, maxdate)/3*3, maxdate) as EndDate,
DateADD(day, (-datediff(day, dateAccess, maxdate)/3+1)*3, maxdate) as StartDate,
count(reportId)
from (select *, MAX(dateAccess) over() as maxdate from #t) t
GROUP BY datediff(day, dateAccess, maxdate)/3, maxdate
I will insist that most efficient way of doing this is to use tally table. That way you are getting sargable predicates with all benefits from indexes on date column:
declare #c int = 3
;with minmax as(select min(date) as mind, max(date) as maxd from t),
tally as(select #c * (-1 + row_number() over(order by(select null))) as rn
from master..spt_values),
intervals as(select dateadd(dd, rn, mind) as f, dateadd(dd, rn + #c - 1, mind) t
from tally t cross join minmax m where dateadd(dd, rn, mind) <= maxd)
select i.f as [from], i.t as [to], count(*) as reeports
from intervals i
join t on t.date >= i.f and t.date <= i.t
group by i.f, i.t
Explanation: minmax selects minimum date and maximum date from table.
tally generates numbers from 0 to N(depends on system, but enougth to calc intervals). intervals selects resulting intervals. The last part is simple join on intervals to calculate counts per interval.
Fiddle http://sqlfiddle.com/#!3/c61d1/5