For multiple rows with some identical fields, keep the one with updated values, and mark the others

For multiple rows with some identical fields, keep the one with updated values, and mark the others - sql

For multiple rows with identical features, I hope two add few marks/new columns in the original table.
The original table is as below:
ID Start_date End_Date Amount
1 2005-01-01 2010-01-01 5
1 2000-07-01 2009-06-01 10
1 2017-08-01 2018-03-01 30
I wish to keep one record with the earliest start date, latest end date, added amount and an indicator to tell me to use this record. For the others, just use the indicator to tell me not to use.
The updated table should be as below:
ID Start_date End_Date Amount Amount_new Usable Start End
1 2005-01-01 2010-01-01 5 45 0 2000-07-01 2018-03-01
1 2000-07-01 2009-06-01 10 1
1 2017-08-01 2018-03-01 30 1
It does not matter which row to keep, as long as there is one row with Usable=0, and Amount_new, Start and End are updated.
If not considering the end date, I was thinking of grouping by ID and Start_date, then update the column Usable and Amount_new of the first row. However I still have the problem of how to select the first row from the group by group. Considering the End_Date makes my mind even more messy!
Could anyone help to shed some light upon this issue?

You seem to want something like this:
alter table original
add amount_new int,
add usable bit,
add new_start,
add new_end;
Then, you can update it using window functions:
with toupdate as (
select o.*,
sum(amount) over (partition by id) as x_amount,
(case when row_number() over (partition by id order by start_date) as x_usable,
min(start_date) as x_start_date,
max(end_date) as x_end_date
from original o
)
update toupdate
set new_amount = x_amount,
usable = x_usable,
new_start = x_start_date,
new_end = x_end_date;

The following query should do what you want:
CREATE TABLE #temp (ID INT, [Start_date] DATE, End_Date DATE, Amount NUMERIC(28,0), Amount_new NUMERIC(28,0), Usable BIT, Start [Date], [End] [Date])
INSERT INTO #temp (ID, [Start_date], End_Date, Amount) VALUES
(1,'2005-01-01','2010-01-01',5),
(1,'2000-07-01','2009-06-01',10),
(1,'2017-08-01','2018-03-01',30),
(2,'2001-07-01','2009-06-01',5),
(2,'2017-08-01','2019-03-01',35)
UPDATE t1
SET Amount_new = t2.[Amount_new],
Usable = 1,
Start = t2.[Start],
[End] = t2.[End]
FROM (SELECT *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT 1)) AS RNO FROM #temp) t1
INNER JOIN
(
SELECT ID,[Start_date],[End_Date],[Amount]
,SUM(Amount) OVER(PARTITION BY ID) AS [Amount_new]
,MIN([Start_date]) OVER(PARTITION BY ID) AS [Start]
,MAX(End_Date) OVER(PARTITION BY ID) AS [End]
,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT 1)) AS RNO
FROM #temp ) t2 ON t1.id = t2.id AND t2.rno = t1.RNO AND t2.RNO = 1
SELECT * FROM #temp
The result is as below,
ID Start_date End_Date Amount Amount_new Usable Start End
1 2005-01-01 2010-01-01 5 45 1 2000-07-01 2018-03-01
1 2000-07-01 2009-06-01 10 NULL NULL NULL NULL
1 2017-08-01 2018-03-01 30 NULL NULL NULL NULL
2 2001-07-01 2009-06-01 5 40 1 2001-07-01 2019-03-01
2 2017-08-01 2019-03-01 35 NULL NULL NULL NULL

Related

How to get the row for the current date?

Pretend today 2022-10-24
case 1
id
productCode
version
startDate
endDate
1
AAA
1
2022-10-01
2022-10-28
2
AAA
2
2022-10-29
NULL
case 1 depend on table above, I want to return only 1 row at id 1, why cause today 2022-10-24 still between startDate and endDate
case 2
id
productCode
version
startDate
endDate
1
AAA
1
2022-10-01
2022-10-28
2
AAA
2
2022-10-01
NULL
case 2 depends on table above. I want to return only 1 row at id 2. Why cause when startDate has the same value between id 1 & 2, so choose endDate with NULL value.
I am still confused about how to implement this with query.
I want to make for one query logic. When running query so when use case 1 return id 1 and when I use for case 2 return id 2.

As I mention in the comments, seems you just need some simple >= and <(=) logic (while handling NULLs) and a "Top 1 per group":
WITH CTE AS(
SELECT id,
productCode,
version,
startDate,
endDate,
ROW_NUMBER() OVER (PARTITION BY productCode ORDER BY Version DESC) AS RN --Guessed the required partition and order clauses
FROM dbo.YourTable
WHERE startDate <= CONVERT(date,GETDATE())
AND (endDate >= CONVERT(date,GETDATE()) OR endDate IS NULL))
SELECT id,
productCode,
version,
startDate,
endDate
FROM CTE
WHERE RN = 1;

How to combine date ranges in SQL with small gaps

I have a dataset where each row has a date range. I want to combine records into single date ranges if they overlap or there's a gap of less than 30 days and they share the same ID number. If it's more than 30 days, I want them to remain separate. I can figure out how to do it if they are overlapping, and I can figure out how to do it no matter the size of the gap, but I can't figure out how to do it with a limited gap allowance.
So, for example, if my data looks like this:
ID Date1 Date2
ABC 2018-01-01 2018-02-14
ABC 2018-02-13 2018-03-17
ABC 2018-04-01 2018-07-24
DEF 2017-01-01 2017-06-30
DEF 2017-10-01 2017-12-01
I want it to come out like this:
ID Date1 Date2
ABC 2018-01-01 2018-07-24
DEF 2017-01-01 2017-06-30
DEF 2017-10-01 2017-12-01
The three date ranges for ABC are combined, because they either overlap or the gaps are less than 30 days. The two date ranges for DEF stay separate, because the gap between them is larger than 30 days.
I'm using Microsoft SSMS.

You can identify where the new periods begin. For a general problem, I would go with not exists. Then you can assign a group using cumulative sums:
select id, sum(is_start) over (partition by id order by datestart) as grp
from (select t.*,
(case when not exists (select 1
from t t2
where t2.id = t.id and
t2.date1 >= dateadd(day, -30, t1.date1) and
t2.date2 < dateadd(day, 30, t1.date2)
)
then 1 else 0
end) as is_start
from t
) t;
The final step is aggregation:
with g as (
select id, sum(is_start) over (partition by id order by datestart) as grp
from (select t.*,
(case when not exists (select 1
from t t2
where t2.id = t.id and
t2.date1 >= dateadd(day, -30, t1.date1) and
t2.date2 < dateadd(day, 30, t1.date2)
)
then 1 else 0
end) as is_start
from t
) t
)
select id, min(date1), max(date2)
from g
group by id, grp;

SQL - Find if column dates include at least partially a date range

I need to create a report and I am struggling with the SQL script.
The table I want to query is a company_status_history table which has entries like the following (the ones that I can't figure out)
Table company_status_history
Columns:
| id | company_id | status_id | effective_date |
Data:
| 1 | 10 | 1 | 2016-12-30 00:00:00.000 |
| 2 | 10 | 5 | 2017-02-04 00:00:00.000 |
| 3 | 11 | 5 | 2017-06-05 00:00:00.000 |
| 4 | 11 | 1 | 2018-04-30 00:00:00.000 |
I want to answer to the question "Get all companies that have been at least for some point in status 1 inside the time period 01/01/2017 - 31/12/2017"
Above are the cases that I don't know how to handle since I need to add some logic of type :
"If this row is status 1 and it's date is before the date range check the next row if it has a date inside the date range."
"If this row is status 1 and it's date is after the date range check the row before if it has a date inside the date range."

I think this can be handled as a gaps and islands problem. Consider the following input data: (same as sample data of OP plus two additional rows)
id company_id status_id effective_date
-------------------------------------------
1 10 1 2016-12-15
2 10 1 2016-12-30
3 10 5 2017-02-04
4 10 4 2017-02-08
5 11 5 2017-06-05
6 11 1 2018-04-30
You can use the following query:
SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
FROM company_status_history AS t
OUTER APPLY
(
SELECT COUNT(*) AS cnt
FROM company_status_history AS c
WHERE c.status_id = 1
AND c.company_id = t.company_id
AND c.effective_date < t.effective_date
) AS x
ORDER BY company_id, effective_date
to get:
id company_id status_id effective_date grp
-----------------------------------------------
1 10 1 2016-12-15 0
2 10 1 2016-12-30 1
3 10 5 2017-02-04 2
4 10 4 2017-02-08 2
5 11 5 2017-06-05 0
6 11 1 2018-04-30 0
Now you can identify status = 1 islands using:
;WITH CTE AS
(
SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
FROM company_status_history AS t
OUTER APPLY
(
SELECT COUNT(*) AS cnt
FROM company_status_history AS c
WHERE c.status_id = 1
AND c.company_id = t.company_id
AND c.effective_date < t.effective_date
) AS x
)
SELECT id, company_id, status_id, effective_date,
ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) -
cnt AS grp
FROM CTE
Output:
id company_id status_id effective_date grp
-----------------------------------------------
1 10 1 2016-12-15 1
2 10 1 2016-12-30 1
3 10 5 2017-02-04 1
4 10 4 2017-02-08 2
5 11 5 2017-06-05 1
6 11 1 2018-04-30 2
Calculated field grp will help us identify those islands:
;WITH CTE AS
(
SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
FROM company_status_history AS t
OUTER APPLY
(
SELECT COUNT(*) AS cnt
FROM company_status_history AS c
WHERE c.status_id = 1
AND c.company_id = t.company_id
AND c.effective_date < t.effective_date
) AS x
), CTE2 AS
(
SELECT id, company_id, status_id, effective_date,
ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) -
cnt AS grp
FROM CTE
)
SELECT company_id,
MIN(effective_date) AS start_date,
CASE
WHEN COUNT(*) > 1 THEN DATEADD(DAY, -1, MAX(effective_date))
ELSE MIN(effective_date)
END AS end_date
FROM CTE2
GROUP BY company_id, grp
HAVING COUNT(CASE WHEN status_id = 1 THEN 1 END) > 0
Output:
company_id start_date end_date
-----------------------------------
10 2016-12-15 2017-02-03
11 2018-04-30 2018-04-30
All you want know is those records from above that overlap with the specified interval.
Demo here with somewhat more complicated use case.

Maybe this is what you are looking for? For these kind of questions, you need to join two instance of your table, in this case I am just joining with next record by Id, which probably is not totally correct. To do it better, you can create a new Id using a windowed function like row_number, ordering the table by your requirement criteria
If this row is status 1 and it's date is before the date range check
the next row if it has a date inside the date range
declare #range_st date = '2017-01-01'
declare #range_en date = '2017-12-31'
select
case
when csh1.status_id=1 and csh1.effective_date<#range_st
then
case
when csh2.effective_date between #range_st and #range_en then true
else false
end
else NULL
end
from company_status_history csh1
left join company_status_history csh2
on csh1.id=csh2.id+1
Implementing second criteria:
"If this row is status 1 and it's date is after the date range check
the row before if it has a date inside the date range."
declare #range_st date = '2017-01-01'
declare #range_en date = '2017-12-31'
select
case
when csh1.status_id=1 and csh1.effective_date<#range_st
then
case
when csh2.effective_date between #range_st and #range_en then true
else false
end
when csh1.status_id=1 and csh1.effective_date>#range_en
then
case
when csh3.effective_date between #range_st and #range_en then true
else false
end
else null -- ¿?
end
from company_status_history csh1
left join company_status_history csh2
on csh1.id=csh2.id+1
left join company_status_history csh3
on csh1.id=csh3.id-1

I would suggest the use of a cte and the window functions ROW_NUMBER. With this you can find the desired records. An example:
DECLARE #t TABLE(
id INT
,company_id INT
,status_id INT
,effective_date DATETIME
)
INSERT INTO #t VALUES
(1, 10, 1, '2016-12-30 00:00:00.000')
,(2, 10, 5, '2017-02-04 00:00:00.000')
,(3, 11, 5, '2017-06-05 00:00:00.000')
,(4, 11, 1, '2018-04-30 00:00:00.000')
DECLARE #StartDate DATETIME = '2017-01-01';
DECLARE #EndDate DATETIME = '2017-12-31';
WITH cte AS(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) AS rn
FROM #t
),
cteLeadLag AS(
SELECT c.*, ISNULL(c2.effective_date, c.effective_date) LagEffective, ISNULL(c3.effective_date, c.effective_date)LeadEffective
FROM cte c
LEFT JOIN cte c2 ON c2.company_id = c.company_id AND c2.rn = c.rn-1
LEFT JOIN cte c3 ON c3.company_id = c.company_id AND c3.rn = c.rn+1
)
SELECT 'Included' AS RangeStatus, *
FROM cteLeadLag
WHERE status_id = 1
AND effective_date BETWEEN #StartDate AND #EndDate
UNION ALL
SELECT 'Following' AS RangeStatus, *
FROM cteLeadLag
WHERE status_id = 1
AND effective_date > #EndDate
AND LagEffective BETWEEN #StartDate AND #EndDate
UNION ALL
SELECT 'Trailing' AS RangeStatus, *
FROM cteLeadLag
WHERE status_id = 1
AND effective_date < #EndDate
AND LeadEffective BETWEEN #StartDate AND #EndDate
I first select all records with their leading and lagging Dates and then I perform your checks on the inclusion in the desired timespan.

Try with this, self-explanatory. Responds to this part of your question:
I want to answer to the question "Get all companies that have been at
least for some point in status 1 inside the time period 01/01/2017 -
31/12/2017"
Case that you want to find those id's that have been in any moment in status 1 and have records in the period requested:
SELECT *
FROM company_status_history
WHERE id IN
( SELECT Id
FROM company_status_history
WHERE status_id=1 )
AND effective_date BETWEEN '2017-01-01' AND '2017-12-31'
Case that you want to find id's in status 1 and inside the period:
SELECT *
FROM company_status_history
WHERE status_id=1
AND effective_date BETWEEN '2017-01-01' AND '2017-12-31'

start date end date combine rows

In Redshift, through SQL script want to consolidate monthly records as long as gap between the end date of first and the start date of the next record is 32 days or less (<=32) into single record with minimum startdate of continuous month as output startdate and maximum of end date of continuous month as output enddate.
The below input data refers to the table's data and also listed the expected output. The input data is listed ORDER BY ID,STARTDT,ENDDT in ASC.
For example, in below table, consider ID 100, the gab between the end of the first record and start of the next record <=32, however gap between the second record end date and third records start date falls more than 32 days, hence the first two records to be consolidate into one record i.e. (ID),MIN(STARTSDT),MAX(ENDDT) which corresponds to first record in the expected output. Similarly gab between 3 and 4 record in the input data falls within the 32 days and thus these 2 records to be consolidated into single records which corresponds to the second record in the expected output.
INPUT DATA:
ID STARTDT ENDDT
100 2000-01-01 2000-01-31
100 2000-02-01 2000-02-29
100 2000-05-01 2000-05-31
100 2000-06-01 2000-06-30
100 2000-09-01 2000-09-30
100 2000-10-01 2000-10-31
101 2012-06-01 2012-06-30
101 2012-07-01 2012-07-31
102 2000-01-01 2000-01-31
103 2013-03-01 2013-03-31
103 2013-05-01 2013-05-31
EXPECTED OUTPUT:
ID MIN_STARTDT MAX_END_DT
100 2000-01-01 2000-02-29
100 2000-05-01 2000-06-30
100 2000-09-01 2000-10-31
101 2012-06-01 2012-07-31
102 2000-01-01 2000-01-31
103 2013-03-01 2013-03-31
103 2013-05-01 2013-05-31

You can do this in steps:
Use a join to identify where two adjacent records should be combined.
Then do a cumulative sum to assign all such adjacent records a grouping identifier.
Aggregate.
It looks like:
select id, min(startdt), max(enddte)
from (select t.*,
count(case when tprev.id is null then 1 else 0 end) over
(partition by t.idid
order by t.startdt
rows between unbounded preceding and current row
) as grp
from t left join
t tprev
on t.id = tprev.id and
t.startdt = tprev.enddt + interval '1 day'
) t
group by id, grp;

The question is very similar to this one and my answer is also similar: Fetch rows based on condition
The gist of the idea is to use Window Functions to identify transitions between period (events which are less than 33 days apart), and then do some filtering to remove the rows within the period, and then Window Functions again.
Complete solution:
SELECT
id,
startdt AS period_start,
period_end
FROM (
SELECT
id,
startdt,
enddt,
lead(enddt, 1)
OVER (PARTITION BY id
ORDER BY enddt) AS period_end,
period_boundary
FROM (
SELECT
id,
startdt,
enddt,
CASE WHEN period_switch = 0 AND reverse_period_switch = 1
THEN 'start'
ELSE 'end' END AS period_boundary
FROM (
SELECT
id,
startdt,
enddt,
CASE WHEN datediff(days, enddt, lead(startdt, 1)
OVER (PARTITION BY id
ORDER BY enddt ASC)) > 32
THEN 1
ELSE 0 END AS period_switch,
CASE WHEN datediff(days, lead(enddt, 1)
OVER (PARTITION BY id
ORDER BY enddt DESC), startdt) > 32
THEN 1
ELSE 0 END AS reverse_period_switch
FROM date_test
)
AS sessioned
WHERE period_switch != 0 OR reverse_period_switch != 0
UNION
SELECT -- adding start rows without transition
id,
startdt,
enddt,
'start'
FROM (
SELECT
id,
startdt,
enddt,
row_number()
OVER (PARTITION BY id
ORDER BY enddt ASC) AS row_num
FROM date_test
) AS with_row_number
WHERE row_num = 1
UNION
SELECT -- adding end rows without transition
id,
startdt,
enddt,
'end'
FROM (
SELECT
id,
startdt,
enddt,
row_number()
OVER (PARTITION BY id
ORDER BY enddt desc) AS row_num
FROM date_test
) AS with_row_number
WHERE row_num = 1
) AS with_boundary -- data set containing start/end boundaries
) AS with_end -- data set where end date is propagated into the start row of the period
WHERE period_boundary = 'start'
ORDER BY id, startdt ASC;
Note that in your expected output, you had a row for 103 2013-05-01 2013-05-31, however its start date is 31 days apart from end date of the previous row, so this row should instead be merged with the previous row for id 103 according to your requirements.
So the output that I get looks like this:
id start end
100 2000-01-01 2000-02-29
100 2000-05-01 2000-06-30
100 2000-09-01 2000-10-31
101 2012-06-01 2012-07-31
102 2000-01-01 2000-01-31
103 2013-03-01 2013-05-31

SQL Server - MyCTE query based on 24 hour period (next day)

I have this bit of code:
;WITH MyCTE AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY CardUser ORDER BY CardTableID) AS NewVariation
FROM CardChecker
)
UPDATE MyCTE
SET Status = NewVariation
which currently updates the status column, however what I want to happen is over a 24 hour period, the status starts again the next day at 1, and counts again based on the CardUser like specified above:
Current data and what happens:
2 aaa 1 2015-06-25 08:00:00.000 123 1 NULL
3 ccc 1 2015-06-25 00:00:00.000 124 1 NULL
4 aaa 1 2015-06-25 17:30:00.000 125 2 NULL
5 aaa 1 2015-06-26 17:30:00.000 125 *3* NULL
what I want to happen:
2 aaa 1 2015-06-25 08:00:00.000 123 1 NULL
3 ccc 1 2015-06-25 00:00:00.000 124 1 NULL
4 aaa 1 2015-06-25 17:30:00.000 125 2 NULL
5 aaa 1 2015-06-26 17:30:00.000 125 *1* NULL
im not quite sure how I could add this to the above query so would it be possible for someone to point me in the right direction?
the main problem is the EventTime field contains both the date and the time, so adding it is as a PARTITION means the status would always be 1 based on the time parameter of the field
thanks for the help
Current CardTable structure:
CREATE TABLE CardTable (CardTableID INT IDENTITY (1,1) NOT NULL,
CardUser VARCHAR(50),
CardNumber VARCHAR(50),
EventTime DATETIME,
Status INT)

You can CONVERT() the EventTime to DATE type and then PARTITION:
;WITH MyCTE AS
(
SELECT Status,
ROW_NUMBER() OVER(PARTITION BY CardUser, CONVERT(DATE, EventTime)
ORDER BY CardTableID) AS NewVariation
FROM CardChecker
)
UPDATE MyCTE
SET Status = NewVariation
Your query basically unnecessarily updating entire table everytime. If EventTime is current date time of the system, having a flag to mark already updated status would improve the performance.
;WITH MyCTE AS
(
SELECT Status,
ROW_NUMBER() OVER(PARTITION BY CardUser, CONVERT(DATE, EventTime)
ORDER BY CardTableID) AS NewVariation
FROM CardChecker
WHERE Status IS NULL OR
CONVERT(DATE, EventTime) = CONVERT(DATE, GETDATE())
)
UPDATE MyCTE
SET Status = NewVariation

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

For multiple rows with some identical fields, keep the one with updated values, and mark the others - sql

Related

How to get the row for the current date?

How to combine date ranges in SQL with small gaps

SQL - Find if column dates include at least partially a date range

start date end date combine rows

SQL Server - MyCTE query based on 24 hour period (next day)

Categories

Resources