Identify if date is the last date for any given group? - sql

I have a table that is structured like the below - this contains details about all customer subscriptions and when they start/end.
SubKey
CustomerID
Status
StartDate
EndDate
29333
102
7
01 jan 2013
1 Jan 2014
29334
102
6
7 Jun 2013
15 Jun 2022
29335
144
6
10 jun 2021
17 jun 2022
29336
144
2
8 oct 2023
10 oct 2025
I am trying to add an indicator flag to this table (either "yes" or "no") which shows me by each row, if when the [EndDate] of the SubKey is the last one for that CustomerID. So for the above example..
SubKey
CustomerID
Status
StartDate
EndDate
IsLast
29333
102
7
01 jan 2013
1 Jan 2014
No
29334
102
6
7 Jun 2013
15 Jun 2022
Yes
29335
144
6
10 jun 2021
17 jun 2022
Yes
29336
144
2
8 oct 2023
10 oct 2025
Yes
The flag is set to No for the first row, because on 1 Jan 2014, customerID 102 had another SubKey (29334) still active at the time (which didn't end until 15 jun 2022)
The rest of the rows are set to "Yes" because these were the last active subscriptions per CustomerID.
I have been reading about the LAG function which may be able to help. I am just not sure how to make it fit in this scenario.

Probably the easiest method would to use exists with a correlation. Can you try the following for your desired results for excluding rows without an overlap:
select *,
case when exists (
select * from t t2
where t2.customerId = t.customerId
and t2.enddate > t.enddate
and t2.startDate < t.Enddate
) then 'No' else 'Yes' end as IsLast
from t;

Related

Select row with most recent date per location and increment recent date by 1 for each row by location using MariaDB

I have a table of location which has 'Date column'. I have to find recent date by each group of locationID for e.g. locationID 1 has most recent date '31 May 2022'. After finding recent date from the group of locationID I have to add 14 days in that recent date and store it in NewDate column. and add + 1 in that new date for other row for that group of locationID.
My table is:
id locationID Date NewDate
1 1 31 May 2022
2 1 16 May 2022
3 1 28 Apr 2021
4 2 29 Mar 2022
5 2 22 Feb 2022
6 3 14 Jun 2022
7 3 27 Oct 2021
8 4 01 Feb 2022
9 4 04 May 2022
10 4 14 Jun 2021
11 5 01 Jun 2022
12 5 29 May 2022
13 5 20 Sep 2022
14 5 11 Aug 2022
15 5 03 Aug 2022
Answer should be as below:
For e.g. for locationID = 1
id locationID Date NewDate
1 1 31 May 2022 14 Jun 2022 // Recent Date + 14 Days - 31 May + 14 Days
2 1 16 May 2022 15 Jun 2022 // Recent Date + 15 Days - 31 May + 15 Days
3 1 28 Apr 2021 16 Jun 2022 // Recent Date + 16 Days - 31 May + 16 Days
I have come across few similar post and found recent date like this:
SELECT L.*
FROM Locations L
INNER JOIN
(SELECT locationID, MAX(Date) AS MAXdate
FROM Locations
GROUP BY locationID) groupedL
ON L.locationID = groupedL.locationID
AND L.Date = groupedL.MAXdate
using above code I am able to find recent date per location but how do I add and increment required days and store it to NewDate column ? I am new to MariaDB, please suggest similar post link, any reference documents or blogs. Should I make some function to perform this logic and call the function to store required dates in NewDate column? I am not sure please suggest. Thank you.
RESULT SHOULD LOOK LIKE BELOW:
id locationID Date NewDate
1 1 31 May 2022 14 Jun 2022 // Recent Date for locationid 1 + 14 Days - 31 May + 14 Days
2 1 16 May 2022 15 Jun 2022 // Recent Date for locationid 1 + 15 Days - 31 May + 15 Days
3 1 28 Apr 2021 16 Jun 2022 // Recent Date for locationid 1 + 16 Days - 31 May + 16 Days
4 2 29 Mar 2022 12 APR 2022 // Recent Date for locationid 2 + 14 Days
5 2 22 Feb 2022 13 APR 2022 // Recent Date for locationid 2 + 15 Days
6 3 14 Jun 2022 28 JUN 2022 // Recent Date for locationid 3 + 14 Days
7 3 27 Oct 2021 29 JUN 2022 // Recent Date for locationid 3 + 15 Days
8 4 01 Feb 2022 18 MAY 2022 // Recent Date for locationid 4 + 14 Days
9 4 04 May 2022 19 MAY 2022 // Recent Date for locationid 4 + 15 Days
10 4 14 Jun 2021 20 MAY 2022 // Recent Date for locationid 4 + 16 Days
11 5 01 Jun 2022 04 OCT 2022 // Recent Date for locationid 5 + 14 Days
12 5 29 May 2022 05 OCT 2022 // Recent Date for locationid 5 + 15 Days
13 5 20 Sep 2022 06 OCT 2022 // Recent Date for locationid 5 + 16 Days
14 5 11 Aug 2022 07 OCT 2022 // Recent Date for locationid 5 + 17 Days
15 5 03 Aug 2022 08 OCT 2022 // Recent Date for locationid 5 + 18 Days
You can use a cte:
with cte as (
select l1.*, l2.m, (select sum(l4.id < l1.id and l4.locationid = l1.locationid) from locations l4) inc from locations l1
join (select l3.locationid, max(l3.dt) m from locations l3 group by l3.locationid) l2 on l1.locationid = l2.locationid
)
select c.id, c.locationid, c.dt, c.m + interval 14 + c.inc day from cte c
You could use analytic window functions and update the original table by joining to a sub-query (works for MariaDB):
update t
join (
select Id,
Date_Add(First_Value(date) over(partition by locationId order by date desc),
interval (13 + row_number() over(partition by locationId order by date desc)) day
) NewDate
from t
)nd on t.id = nd.id
set t.Newdate = nd.NewDate;
See DB<>Fiddle example

big query SQL - repeatedly/recursively change a row's column in the select statement based on the values in previous row

I have table like below
customer
date
end date
1
jan 1 2021
jan 30 2021
1
jan 2 2021
jan 31 2021
1
jan 3 2021
feb 1 2021
1
jan 27 2021
feb 26 2021
1
feb 3 2021
mar 5 2021
2
jan 2 2021
jan 31 2021
2
jan 10 2021
feb 9 2021
2
feb 10 2021
mar 12 2021
Now, I wanted to update the value in the 'end date' column of a row based on the values in the previous row 'end date' and the current row 'date'.
Say if the date in current row < end date of the previous row, I wanted to update the end date of the current row = (end date of the previous row).
I Wanted to do this repeated for all the rows (grouped by customer).
I want the output as below. Just need it in the select statement instead of a updating/inserting in a table.
Note - in below as the second row(end date) is updated with the value in the first row (jan 30 2021), now the third row value (jan 3 2021) is evaluated against the updated value in the second row (which is jan 30 2021) but not with the second row value before update (jan 31 2021).
customer
date
end date
1
jan 1 2021
jan 30 2021
1
jan 2 2021
jan 30 2021 [updated because current date < previous end date]
1
jan 3 2021
jan 30 2021[updated because current date < previous end date]
1
jan 27 2021
jan 30 2021 [updated because current date < previous end date]
1
feb 3 2021
mar 5 2021
2
jan 2 2021
jan 31 2021
2
jan 10 2021
jan 31 2021[updated because current date < previous end date]
2
feb 10 2021
mar 12 2021
I think I should go this way. I use the datasource twice just to get the way its needed to perform the operation without updating or inserting into the table.
input table:
1|2021-01-01|2021-01-30
1|2021-01-02|2021-01-31
1|2021-01-03|2021-02-01
1|2021-01-27|2021-02-26
1|2021-02-03|2021-03-05
2|2021-01-02|2021-01-31
2|2021-01-10|2021-02-09
2|2021-02-10|2021-03-12
code:
with num_raw_data as (
SELECT row_number() over(partition by customer)as num, customer,date_init,date_end
FROM `project-id.data-set.table`
), analyzed_data as(
select r.num,
r.customer,
r.date_init,
r.date_end,
case when date_init<(select date_end from num_raw_data where num=r.num-1 and customer=r.customer and EXTRACT(month FROM r.date_init)=EXTRACT(month FROM date_init)) then 1 else 0 end validation
from num_raw_data r
)
select customer,
date_init,
case when validation !=0 then (select MIN(date_end) from analyzed_data where validation=0 and customer=ad.customer and date_init<ad.date_end) else date_end end as date_end
from analyzed_data ad
order by customer,num
output:
1|2021-01-01|2021-01-30
1|2021-01-02|2021-01-30
1|2021-01-03|2021-01-30
1|2021-01-27|2021-01-30
1|2021-02-03|2021-03-05
2|2021-01-02|2021-01-31
2|2021-01-10|2021-01-31
2|2021-02-10|2021-03-12
Using column validation from analyzed_data to get to know where I should be looking for changes. I'm not sure if its fast (probably not) but it works for the scenario you bring in your question.

group by a substring in field

i have a table which looks like this :
coumn 1 = timestamp : string , column 2 = numOfentites : int
please note i am using hiveql
Fri, 10 Aug 2001 274
Fri, 10 Dec 1999 39
Fri, 10 Mar 2000 107
Fri, 10 May 2002 26
Fri, 10 Nov 2000 351
Fri, 10 Sep 1999 22
Fri, 11 Aug 2000 189
Fri, 11 Dec 1998 1
Fri, 11 Feb 2000 84
Fri, 11 Jan 2002 580
Fri, 11 Jun 1999 12
Fri, 11 May 2001 571
Fri, 12 Apr 2002 41
Now, I retrieved the frequency per year from this table and found out some year XXXX had the most number of entities.
My aim now is to go one level deep and extract the frequency per month for the year XXXX.
I tired using the group by clause on the substring indicating month but it doesn’t work.
can you guys please give me a direction on how to proceed..
Just need a hint not the answer :P trying to learn hiveql here
EDIT
here is the query that i used to extract the frequency of entities on yearly basis.
note that timestamp is the first column of the input.
select dates , count(dates) as numEmails
from (select split(timestamp," ")[3] as dates , count(timestamp)
from dataset
group by timestamp
) mailfreq
group by dates
order by numEmails desc;
I know that hivesql has strange limitations, but won't this work?
select split(timestamp," ")[3] as yr, split(timestamp," ")[2] as mon, count(timestamp)
from dataset
group by split(timestamp," ")[3], split(timestamp," ")[2];

Trying to pull the required rows from the single table with applying conditional statements on columns in sql server?

I have tried in n-number ways to solve this solution but unfortunately I got stuck in all the ways..
source table
id year jan feb mar apr may jun jul aug sep oct nov dec
1234 2014 05 06 12 15 16 17 18 19 20 21 22 23
1234 2013 05 06 12 15 16 17 18 19 20 21 22 23
Task: Assume that we are currently at March 2014, and we need 12 months back date ...(i.e., from Mar 2013 to Feb 2014, and the remaining values needs to be zero except year and id.)
Solution:
id year jan feb mar apr may jun jul aug sep oct nov dec
1234 2014 05 06 0 0 0 0 0 0 0 0 0 0
1234 2013 0 0 12 15 16 17 18 19 20 21 22 23
This needs a code solution for SQL Server 2008. I would be very happy if any body can solve this.
Note:
I got stuck to pull the column names dynamically.
You can try this:
select id, year, case when DATEDiff(month, getdate(), convert(datetime, year + '-01-01'))) < 12 then jan else 0,
DATEDiff(month, getdate(), convert(datetime, year + '-02-01'))) < 12 then fab else 0 ....

How to change start date in a table to a pair of start date and end date using SQL

The title must be confusing, but the thing I am trying to do is very easy to understand with an example. I have a table like this:
Code Date_ Ratio
73245 Jan 1 1975 12:00AM 10
73245 Apr 18 2006 12:00AM 4
73245 Dec 26 2007 12:00AM 10
73245 Jan 30 2009 12:00AM 4
73245 Apr 21 2011 12:00AM 2
Basically for each security it gives some ratio for it with a date when the ratio starts to be effective. This table will be much easier to use if instead of just having a start date, it has a pair of start date and end date, like the following:
Code StartDate_ EndDate_ Ratio
73245 Jan 1 1975 12:00AM Apr 18 2006 12:00AM 10
73245 Apr 18 2006 12:00AM Dec 26 2007 12:00AM 4
73245 Dec 26 2007 12:00AM Jan 30 2009 12:00AM 10
73245 Jan 30 2009 12:00AM Apr 21 2011 12:00AM 4
73245 Apr 21 2011 12:00AM Dce 31 2049 12:00AM(or some random date in far future) 2
How do I transform the original table to the table I want using SQL statements? I have little experience with SQL and I could not figure how.
Please help! Thanks!
In SQL Server 2012:
SELECT code,
date_ AS startDate,
LEAD(date_) OVER (PARTITION BY code ORDER BY date_) AS endDate,
ratio
FROM mytable
In SQL Server 2005 and 2008:
WITH q AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY code ORDER BY date_) AS rn
FROM mytable
)
SELECT q1.code, q1.date_ AS startDate, q2.date_ AS endDate, q1.ratio
FROM q q1
LEFT JOIN
q q2
ON q2.code = q1.code
AND q2.rn = q1.rn + 1
Maybe it would also be possible to use OUTER APPLY, something like:
SELECT t1.Code, t1.Date_ AS StartDate_, ISNULL(t2.EndDate_, CAST('20491231' AS DATETIME)) AS EndDate_
FROM t1 AS t1o
OUTER APPLY
(
SELECT TOP 1 Date_ AS EndDate_
FROM t1
WHERE t1.Code = t1o.Code AND t1.Date_ > t1o.Date_
ORDER BY t1.Date_ ASC
) AS t2