How to take only one entry from a table based on an offset to a date column value - sql

I have a requirement to get values from a table based on an offset conditions on a date column.
Say for eg: for the below attached table, if there is any dates that comes close within 15 days based on effectivedate column I should return only the first one.
So my expected result would be as below:
Here for A1234 policy, it returns 6/18/16 entry and skipped 6/12/16 entry as the offset between these 2 dates is within 15 days and I took the latest one from the list.

If you want to group rows together that are within 15 days of each other, then you have a variant of the gaps-and-islands problem. I would recommend lag() and cumulative sum for this version:
select polno, min(effectivedate), max(expirationdate)
from (select t.*,
sum(case when prev_ed >= dateadd(day, -15, effectivedate)
then 1 else 0
end) over (partition by polno order by effectivedate) as grp
from (select t.*,
lag(expirationdate) over (partition by polno order by effectivedate) as prev_ed
from t
) t
) t
group by polno, grp;

Related

SQL - calculating hours since the earliest date in a partition

I have the following SQL code:
select
survey.ContactId,
survey.CommId,
survey.CommCreatedDate,
survey.CommIdStatus,
br.[Value],
null as HoursPastSinceFirstActiveSurvey,
row_number() over (partition by survey.ContactId order by survey.CommCreatedDate desc) as [row]
from
Survey_Completed survey
inner join
Business_Rules br on br.Name = 'OPT_OUT_TIME'
where
survey.CommIdStatus = 'Active'
Which produces the following result set:
What I need help with is filling out HoursPastSinceFirstActiveSurvey. The logic here should be as follows:
Calculate the total number of hours that has passed since the earliest (by CommCreatedDate) record in the partition for consecutive (by day) records. In order to address the "consecutive" part, I was thinking perhaps it might be possible to add to the partitioning logic to only partition if the days are consecutive. I'm not entirely sure if that's possible though. So for example, look at the last two records. They are grouped as a partition and the dates are consecutive and the earliest date/time on this partition is Nov 11 2020 12:00 AM. So I would want to perform the following in order to populate HoursPastSinceFirstActiveSurvey for these two records:
Today's date minus Nov 11 2020 12:00 AM.
This would be the value for those two records in the partition for HoursPastSinceFirstActiveSurvey. I am not sure where to even start with this!! Thank you all.
I was able to solve for this by the following query. Feedback is entirely WELCOME!
select
Q2.ContactId,
min(Q2.CommCreatedDate) as MinDate,
max(Q2.CommCreatedDate) as MaxDate,
Q2.Consecutive,
datediff(hour, min(Q2.CommCreatedDate), max(Q2.CommCreatedDate)) AS HoursPassed
from
(select
Q1.ContactId,
Q1.CommId,
Q1.CommCreatedDate,
Q1.CommIdStatus,
Q1.[Value],
Q1.Consecutive,
Q1.[row],
Q1.countOfPartition
from
(select
survey.ContactId,
survey.CommId,
survey.CommCreatedDate,
survey.CommIdStatus,
br.[Value],
CAST(dateadd(day,-row_number() over (partition by survey.ContactId order by survey.CommCreatedDate), survey.CommCreatedDate) as Date) as Consecutive,
row_number() over (partition by survey.ContactId order by survey.CommCreatedDate desc) as [row],
count(*) over (partition by survey.ContactId) as countOfPartition
from
Survey_Completed survey
inner join
Business_Rules br on br.Name = 'OPT_OUT_TIME'
where
survey.CommIdStatus = 'Active') Q1
where
Q1.countOfPartition <> 1) Q2
group by
Q2.ContactId, Q2.Consecutive, Q2.[Value]
having
datediff(hour, min(Q2.CommCreatedDate), max(Q2.CommCreatedDate)) > Q2.[Value]

Teradata get row counts for previous two days and compare

I'm trying to setup a data check, where we get the row count from a table for today and prior date. Since it isn't loaded on weekends or holidays, I can't say DATE-1.
I came-up with the following, to get the previous date:
SELECT
LOAD_DATE
,COUNT(LOAD_DATE) RW_COUNT
,ROW_NUMBER() OVER (ORDER BY LOAD_DATE ) AS LOAD_ROWNUM
FROM DATABASE1.TABLE1
WHERE LOAD_DATE >= DATE-6
GROUP BY 1
This produces the dates, counts and assigns a row number.
LOAD_DATE RW_COUNT LOAD_ROWNUM
2019-10-16 8259 1
2019-10-15 8253 2
2019-10-11 8256 3
2019-10-10 8243 4
I to take the two most current dates and compare them. Most current would be "current" and the 2nd most current would be "prior" . Then I would like to have something like this as the result set:
CURRENT_COUNT PRIOR_COUNT DIFF_PERCENT
8259 8253 .9927
My issue is, how do I reference the first two rows and compare them to each other? Unless I'm over-thinking this, I need two additional SELECT statements: 1 with the WHERE clause referencing row 1 and another with a WHERE referercing row 2.
How do I do that? Do I have two CTEs?
Eventually, I'll need a third SELECT dividing the two rows and checking for 10% tolerance. Help, I'm in analysis paralysis.
You can filter the result of an OLAP-function using QUALIFY:
SELECT
LOAD_DATE
,COUNT(LOAD_DATE) AS CURRENT_COUNT
-- previous day's count
,LEAD(RW_COUNT)
OVER (ORDER BY LOAD_DATE DESC) AS PRIOR_COUNT
-- if your TD version doesn't support LAG/LEAD (i.e. < 16.10)
--,MIN(RW_COUNT)
-- OVER (ORDER BY LOAD_DATE DESC
-- ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS PRIOR_COUNT
,CAST(CURRENT_COUNT AS DECIMAL(18,4)) / PRIOR_COUNT AS DIFF_PERCENT
FROM DATABASE1.TABLE1
WHERE LOAD_DATE >= DATE-6
GROUP BY 1
-- return the latest row only
QUALIFY ROW_NUMBER() OVER (ORDER BY LOAD_DATE DESC) = 1
checking for 10% tolerance:
DIFF_PERCENT BETWEEN 0.9 and 1.1
Either ANDed to the QUALIFY or within a CASE
I don't know what you want for your result set. But you can use LAG() with aggregation to get the previous value.
SELECT LOAD_DATE, COUNT(*) as RW_COUNT,
LAG(COUNT(*)) OVER (ORDER BY LOAD_DATE) as PREV_RW_COUNT
FROM DATABASE1.TABLE1
WHERE LOAD_DATE >= DATE-6
GROUP BY 1;
You may just want a difference of the two counts.
If your TD version (16.0+?) doesn't support LEAD/LAG, give this a try:
SELECT
load_date,
RW_COUNT,
MAX(RW_COUNT) OVER(
ORDER BY load_date DESC
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING -- Get previous row's value
) AS RW_COUNT_prev
FROM (
SELECT load_date, COUNT(LOAD_DATE) RW_COUNT,
FROM DATABASE1.TABLE1
WHERE LOAD_DATE >= DATE-6
GROUP BY 1
) src

Creating one record for a continuous sequnce of dates to a new table

We have a table in Microsoft SQL Server 2014 as shown below which has Id, LogId, AccountId, StateCode, Number and LastSentDate column.
Our goal was to move the data to a new table. When we move it we need to maintain the first and last record for that series. Based on our data the lastsentdate starts from 5/1 and continues till 5/5, then we should create a new row as shown below(we set the FirstSentDate as 5/1, Log Id as first log id that appeared - 28369 and since the series ended on 5/5 we update LastsentDate as 5/5 and LastSentLog Id as 28752)
if there are some dates with the difference in time, the desired output will be
Since our date series continues the last row in the new table will be
We were trying to group by date and achieve this
WITH t
AS (SELECT LastSentDate d,
ROW_NUMBER() OVER(
ORDER BY LastSentDate) i
FROM [dbo].[RegistrationActivity]
GROUP BY LastSentDate)
SELECT MIN(d),
MAX(d)
FROM t
GROUP BY DATEDIFF(day, i, d);
Use lag() to define where a group begins. Then use a cumulative sum to assign a group id to each group. And finally, extract the data you want. I'm not sure what data you actually want, but here is the idea:
select accountid, min(lastsentdate), max(lastsentdate)
from (select t.*,
sum(case when prev_lsd > dateadd(day, 1, lastsentdate )then 0 else 1 end) over (partition by accountid order by lastsentdate) as grp
from (select t.*, lag(lastsentdate) over (partition by accountid) as prev_lsd
from t
) t
) t
group by accountid;

SQL Ranking by consecutive date blocks

I'm trying to rank the number of consecutive date blocks but what is the best way to do this? Example below shows the first 3 blocks being consecutive and then the 4 has a month between them so the counting would begin again.
Data I'm trying to order:
StartDate | EndDate |Rank
----------+-----------+----
01/01/2016| 01/02/2016| 1
01/02/2016| 01/03/2016| 2
01/03/2016| 01/04/2016| 3
01/05/2016| 01/06/2016| 1
You can do this by identifying where a grouping begins, doing a cumulative sum to identify the group, and then a row number:
select t.*,
row_number() over (partition by grp order by startdate) as rank
from (select t.*,
sum(case when tprev.startdate is null then 1 else 0 end) over (order by startdate) as grp
from t left join
t tprev
on t.startdate = tprev.enddate
) t;
This particular SQL works for the data you have presented. It will not handle data that overlaps by more than one day, nor multiple records that start on the same day. These can be handled. If your data is more like that, then ask another question with appropriate data in it.

SQL Rolling Summary Statistics For Set Timeframe

I have a table that contains information about log-in events. Every time a user logs in, a record is added containing the user and the date. I want to calculate a new column in that table that holds the number of times that user has logged in in the past 31 days (including the current attempt). This is a simplified version of what my table looks like, including the column I want to add:
UserID Date LoginsInPast31Days
-------- ------------- --------------------
1 01-01-2012 1
2 02-01-2012 1
2 10-01-2012 2
1 25-01-2012 2
2 03-02-2012 2
2 22-03-2012 1
I know how to calculate a total amount of login attempts: I'd use COUNT(*) OVER (PARTITION BY UserId ORDER BY Date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). However, I want to limit the timeframe to the last 31 days. My guess is that I have to change the UNBOUNDED PRECEDING, but how do I alter it in such a way that it select the right amount of rows?
One pretty efficient way is to add a record 30 days after each date. It looks like this:
select userid, dte,
sum(inc) over (partition by userid order by dte) as LoginsInPast31Days
from ((select distinct userid, logindate as dte, 1 as inc from logins) union all
(select distinct userid, dateadd(day, 31, dte, -1 as inc from logins)
) l;
You're almost there, 2 adjustments:
First make sure to group by user and date so you know how many rows to select
Secondly, you'll need to use 'ROWS BETWEEN CURRENT ROW AND 31 FOLLOWING' since you cannot limit the number of preceding records to use. By using descending sort order, you'll get the required result.
Combine these tips and you'll get:
SELECT SUM(COUNT(*)) OVER (
PARTITION BY t.userid_KEY
ORDER BY CAST(t.login_ts AS DATE) DESC
ROWS BETWEEN CURRENT ROW AND 31 FOLLOWING
)
FROM table AS t
GROUP BY t.userid, CAST(t.login_ts AS DATE)