Update SQL table date based on column in another table - sql

I have a table like this:
ID
start_date
end_date
1
09/01/2022
1
09/04/2022
2
09/01/2022
I have another reference table like this:
ID
date
owner
1
09/01/2022
null
1
09/02/2022
null
1
09/03/2022
Joe
1
09/04/2022
null
1
09/05/2022
Jack
2
09/01/2022
null
2
09/02/2022
John
2
09/03/2022
John
2
09/04/2022
John
For every ID and start_date in the first table, I need find rows in the reference table that occur after start_date, and have non-null owner. Then I need to update this date value in end_date of first table.
Below is the output that I want:
ID
date
end_date
1
09/01/2022
09/03/2022
1
09/04/2022
09/05/2022
2
09/01/2022
09/02/2022

Related

How to identify invalid records from a dimension table?

This is my sample data. Its a slowing changing dimension (type 2).
iddim
idperson
name
role
IsActive
start
end
1
1234
jim
driver
1
2022-01-01
2022-02-03
2
1234
jim
driver
0
2022-02-03
9999-12-31
3
3456
tom
accountant
1
2022-01-01
2022-08-30
4
4567
patty
assistant
1
2022-01-01
9999-12-31
Due to a server error one of my ssis packages performed some unexpected actions and there are now idperson without the 99991231 end date (ie. Tom)
I require to identify them so I can manually modify this condition so my resulting table will be
iddim
idperson
name
role
IsActive
start
end
1
1234
jim
driver
1
2022-01-01
2022-02-03
2
1234
jim
driver
0
2022-02-04
9999-12-31
3
3456
tom
accountant
1
2022-01-01
2022-08-30
4
4567
patty
assistant
1
2022-01-01
9999-12-31
5
3456
tom
accountant
0
2022-08-31
9999-12-31
So, as I understand your requirements, you need to generate records to fill the gaps between the latest end date (per person) and '9999-12-31'. the filler records should have IsActive = 0 and should inherit the latest prior name and role for that idperson.
Perhaps something like the following:
SELECT
idperson,
name,
role,
IsActive = 0,
start = DATEADD(day, 1, [end]),
[end] = '9999-12-31'
FROM (
SELECT *, Recency = ROW_NUMBER() OVER(PARTITION BY idperson ORDER BY [End] DESC)
FROM #Data
) D
WHERE Recency = 1 AND [end] < '9999-12-31'
ORDER BY iddim
The Recency value calculated above will be 1 for the latest record per idperson ands 2, 3, etc. for records with older end dates. If the latest record isn't end-of-time, a filler record is generated.
See this db<>fiddle for a working example (which includes a few additional test data records).
Note: The two existing jim records in your original posted data have different idperson values, so they are treated as different persons and the first triggers a gap record.
UPDATE: The above was revised to allow for possible name change over time for a given idperson.

Oracle Query to find the Nth oldest visit of a person

I have the following Oracle table
PersonID
VisitedOn
1
1/1/2017
1
1/1/2018
1
1/1/2019
1
1/1/2020
1
2/1/2020
1
3/1/2020
1
5/1/2021
1
6/1/2022
2
1/1/2015
2
1/1/2017
2
1/1/2018
2
1/1/2019
2
1/1/2020
2
2/1/2020
3
1/1/2017
3
1/1/2018
3
1/1/2019
3
1/1/2020
3
2/1/2020
3
3/1/2020
3
5/1/2021
I try to write a query to return the Nth oldest visit of each person.
For instance if I want to return the 5th oldest visit (N=5) the result would be
PersonID
VisitDate
1
1/1/2020
2
1/1/2017
3
1/1/2019
I think this will work:
Ran test with this data:
create table test (PersonID number, VisitedOn date);
insert into test values(1,'01-JAN-2000');
insert into test values(1,'01-JAN-2001');
insert into test values(1,'01-JAN-2002');
insert into test values(1,'01-JAN-2003');
insert into test values(2,'01-JAN-2000');
insert into test values(2,'01-JAN-2001');
select personid, visitedon
from (
select personid,
visitedon,
row_number() over ( partition by personid order by visitedon ) rn
from test
)
where rn=5
What this does is use an analytic function to assign a row number to each set of records partitioned by the person id, then pick the Nth row from each partitioned group, where the rows in each group are sorted by date. If you run the inner query by itself, you will see where the row_number is assigned:
PERSONID VISITEDON RN
1 01-JAN-00 1
1 01-JAN-01 2
1 01-JAN-02 3
1 01-JAN-03 4
2 01-JAN-00 1
2 01-JAN-01 2

SQL - dynamic sum based on dynamic date range

I'm new to SQL and I'm not even sure if what I am trying to achieve is possible.
I have two tables. The first gives an account number, a 'from' date and a 'to' date. The second table shows monthly volume for each account.
Table 1 - Dates
Account# Date_from Date_to
-------- --------- -------
123 2018-01-01 2018-12-10
456 2018-06-01 2018-12-10
789 2018-04-23 2018-11-01
Table 2 - Monthly_Volume
Account# Date Volume
--------- ---------- ------
123 2017-12-01 5
123 2018-01-15 5
123 2018-02-05 5
456 2018-01-01 10
456 2018-10-01 15
789 2017-06-01 5
789 2018-01-15 10
789 2018-06-20 7
I would like to merge the two tables in such a way that each account in Table 1 has a fourth column that gives the sum of Volume between Date_from and Date_to.
Desired Result:
Account# Date_from Date_to Sum(Volume)
-------- --------- ------- -----------
123 2018-01-01 2018-12-10 10
456 2018-06-01 2018-12-10 15
789 2018-04-23 2018-11-01 7
I believe that this would be possible to achieve for each account individually by doing something like the following and joining the result to the Dates table:
SELECT
Account#,
SUM(Volume)
FROM Monthly_Volume
WHERE
Account# = '123'
AND Date_from >= TO_DATE('2018-01-01', 'YYYY-MM-DD')
AND Date_to <= TO_DATE('2018-12-10', 'YYYY-MM-DD')
GROUP BY Account#
What I'd like to know is whether it is possible to achieve this without having to individually fill in the Account#, Date_from and Date_to for each account (there are ~1,000 accounts), but have it be done automatically for each entry in the Dates table.
Thank you!
You should be able to use join and group by:
select d.account#, d.Date_from, d.Date_to, sum(mv.volume)
from dates d left join
monthly_volume mv
on mv.account# = d.account# and
mv.date between d.Date_from and d.Date_to
group by d.account#, d.Date_from, d.Date_to;

Use Calendar table to generate historical view of the data

I have a created_date (timestamp) on 1 of my tables, that also has the duration column of a project, and I need to join with another table that only has first_day_of_month column that has the first day of each month, and other relevant information.
Table 1
id project_id created_date duration
1 12345 01/01/2015 10
2 12345 20/10/2015 11
3 12345 10/04/2016 13
4 12345 10/08/2016 15
Table 2
project_id month_start_date
12345 01/01/2015
12345 01/02/2015
12345 01/03/2015
12345 01/04/2015
...
12345 01/08/2016
Expected result
project_id month_start_date duration
12345 01/01/2015 10
12345 01/02/2015 10
...
12345 01/10/2015 11
12345 01/11/2015 11
...
12345 01/04/2016 13
12345 01/05/2016 13
12345 01/06/2016 13
...
12345 01/08/2016 15
I want to be able to present the data listed in my second table historically. So, basically I want the query to return the same duration related to the month_start_date, so that values will repeat until another dateadd(month,datediff(month,0,created_date),0) = first_day_of_month is met... and so forth.
This is my query:
select table2.project_name,
table2.month_start_date,
table1.duration,
table1.created_date
from table1 left outer join table2
on table1.project_id=table2.project_id
where dateadd(month,datediff(month,0,table1.created_date),0)<=table2.month_start_date
group by table2.project_name,table2.month_start_date,table1.duration,table1.created_date
order by table2.month_start_date asc
but I get repeated records on this:
Result I'm getting
project_id month_start_date duration
12345 01/01/2015 10
12345 01/02/2015 10
...
12345 01/10/2015 10
12345 01/10/2015 11
...
12345 01/04/2016 10
12345 01/04/2016 11
12345 01/04/2016 13
...
12345 01/08/2016 10
12345 01/08/2016 11
12345 01/08/2016 13
12345 01/08/2016 15
Can anyone help?
Thank you!
I'd use CROSS/OUTER APPLY operator.
Here is one possible variant. For each row in your calendar table Table2 (for each month) the inner correlated subquery inside the CROSS APPLY finds one row from Table1. It will be the row with the same project_id and the first row with created_date before the month_start_date plus 1 month.
SELECT
Table2.project_id
,Table2.month_start_date
,Durations.duration
FROM
Table2
CROSS APPLY
(
SELECT TOP(1) Table1.duration
FROM Table1
WHERE
Table1.project_id = Table2.project_id
AND Table1.created_date < DATEADD(month, 1, Table2.month_start_date)
ORDER BY Table1.created_date DESC
) AS Durations
;
Make sure that Table1 has index on (project_id, created_date) include (duration). Otherwise, performance would be poor.

Records with overlapping dates

I have an addresses table, say:
address_id person_id start_date stop_date address
1 123 01-JAN-15 01-JUN-15 india
2 123 01-MAY-15 null russia
3 321 01-JAN-15 01-JUN-15 us
4 321 10-MAY-15 null india
I want to find all records (address_id values) which have overlapping dates for the same person_id. In this example that would find address_id 2 and 4, as May lies between Jan and Jun.
I then want to update the stop_date to start_date - 1 of the subsequent row belonging to same person so that the overlap is removed. For instance updating stop_date to 09-MAY-2015at row withaddress_id` 3.
So I want to end up with:
address_id person_id start_date stop_date address
1 123 01-JAN-15 30-APR-15 india
2 123 01-MAY-15 null russia
3 321 01-JAN-15 09-MAY-15 us
4 321 10-MAY-15 null india
I have tried:
update (
select * from addresses a1,addresses a2
where a1.person_id = a2.person_id
and a2.start_date > a1.start_date and a2.start_date <a1.stop_date
)
set a1.stop_date = a2.start_date - 1;
This worked fine in Microsoft Access but in Oracle it an invalid identifier error for a2.start_date.
How can I perform this update?
You can use a correlated update:
update addresses a
set stop_date = (
select min(start_date) - 1
from addresses
where person_id = a.person_id
and start_date > a.start_date
and start_date <= a.stop_date
)
where exists (
select null
from addresses
where person_id = a.person_id
and start_date > a.start_date
and start_date <= a.stop_date
);
2 rows updated.
select * from addresses;
ADDRESS_ID PERSON_ID START_DATE STOP_DATE ADDRESS
---------- ---------- ---------- --------- ----------
1 123 01-JAN-15 30-APR-15 india
2 123 01-MAY-15 russia
3 321 01-JAN-15 09-MAY-15 us
4 321 10-MAY-15 india
Both the set subquery and the exists subquery look for a row for the same person whose start date is between the start and stop date of the current row (which is the correlated part). The exists means only accounts which match are updated; without that any rows which don't have an overlap would be updated to null. (You wouldn't see any difference with the sample data, but would if you had more data).