How to identify invalid records from a dimension table? - sql

This is my sample data. Its a slowing changing dimension (type 2).
iddim
idperson
name
role
IsActive
start
end
1
1234
jim
driver
1
2022-01-01
2022-02-03
2
1234
jim
driver
0
2022-02-03
9999-12-31
3
3456
tom
accountant
1
2022-01-01
2022-08-30
4
4567
patty
assistant
1
2022-01-01
9999-12-31
Due to a server error one of my ssis packages performed some unexpected actions and there are now idperson without the 99991231 end date (ie. Tom)
I require to identify them so I can manually modify this condition so my resulting table will be
iddim
idperson
name
role
IsActive
start
end
1
1234
jim
driver
1
2022-01-01
2022-02-03
2
1234
jim
driver
0
2022-02-04
9999-12-31
3
3456
tom
accountant
1
2022-01-01
2022-08-30
4
4567
patty
assistant
1
2022-01-01
9999-12-31
5
3456
tom
accountant
0
2022-08-31
9999-12-31

So, as I understand your requirements, you need to generate records to fill the gaps between the latest end date (per person) and '9999-12-31'. the filler records should have IsActive = 0 and should inherit the latest prior name and role for that idperson.
Perhaps something like the following:
SELECT
idperson,
name,
role,
IsActive = 0,
start = DATEADD(day, 1, [end]),
[end] = '9999-12-31'
FROM (
SELECT *, Recency = ROW_NUMBER() OVER(PARTITION BY idperson ORDER BY [End] DESC)
FROM #Data
) D
WHERE Recency = 1 AND [end] < '9999-12-31'
ORDER BY iddim
The Recency value calculated above will be 1 for the latest record per idperson ands 2, 3, etc. for records with older end dates. If the latest record isn't end-of-time, a filler record is generated.
See this db<>fiddle for a working example (which includes a few additional test data records).
Note: The two existing jim records in your original posted data have different idperson values, so they are treated as different persons and the first triggers a gap record.
UPDATE: The above was revised to allow for possible name change over time for a given idperson.

Related

Update SQL table date based on column in another table

I have a table like this:
ID
start_date
end_date
1
09/01/2022
1
09/04/2022
2
09/01/2022
I have another reference table like this:
ID
date
owner
1
09/01/2022
null
1
09/02/2022
null
1
09/03/2022
Joe
1
09/04/2022
null
1
09/05/2022
Jack
2
09/01/2022
null
2
09/02/2022
John
2
09/03/2022
John
2
09/04/2022
John
For every ID and start_date in the first table, I need find rows in the reference table that occur after start_date, and have non-null owner. Then I need to update this date value in end_date of first table.
Below is the output that I want:
ID
date
end_date
1
09/01/2022
09/03/2022
1
09/04/2022
09/05/2022
2
09/01/2022
09/02/2022

How to filter out multiple downtime events in SQL Server?

There is a query I need to write that will filter out multiples of the same downtime event. These records get created at the exact same time with multiple different timestealrs which I don't need. Also, in the event of multiple timestealers for a downtime event I need to make the timestealer 'NULL' instead.
Example table:
Id
TimeStealer
Start
End
Is_Downtime
Downtime_Event
1
Machine 1
2022-01-01 01:00:00
2022-01-01 01:01:00
1
Malfunction
2
Machine 2
2022-01-01 01:00:00
2022-01-01 01:01:00
1
Malfunction
3
NULL
2022-01-01 00:01:00
2022-01-01 00:59:59
0
Operating
What I need the query to return:
Id
TimeStealer
Start
End
Is_Downtime
Downtime_Event
1
NULL
2022-01-01 01:00:00
2022-01-01 01:01:00
1
Malfunction
2
NULL
2022-01-01 00:01:00
2022-01-01 00:59:59
0
Operating
Seems like this is a top 1 row of each group, but with the added logic of making a column NULL when there are multiple rows. You can achieve that by also using a windowed COUNT, and then a CASE expression in the outer SELECT to only return the value of TimeStealer when there was 1 event:
WITH CTE AS(
SELECT V.Id,
V.TimeStealer,
V.Start,
V.[End],
V.Is_Downtime,
V.Downtime_Event,
ROW_NUMBER() OVER (PARTITION BY V.Start, V.[End], V.Is_Downtime,V.Downtime_Event ORDER BY ID) AS RN,
COUNT(V.ID) OVER (PARTITION BY V.Start, V.[End], V.Is_Downtime,V.Downtime_Event) AS Events
FROM(VALUES('1','Machine 1',CONVERT(datetime2(0),'2022-01-01 01:00:00'),CONVERT(datetime2(0),'2022-01-01 01:01:00'),'1','Malfunction'),
('2','Machine 2',CONVERT(datetime2(0),'2022-01-01 01:00:00'),CONVERT(datetime2(0),'2022-01-01 01:01:00'),'1','Malfunction'),
('3','NULL',CONVERT(datetime2(0),'2022-01-01 00:01:00'),CONVERT(datetime2(0),'2022-01-01 00:59:59'),'0','Operating'))V(Id,TimeStealer,[Start],[End],Is_Downtime,Downtime_Event))
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS ID,
CASE WHEN C.Events = 1 THEN C.TimeStealer END AS TimeStealer,
C.Start,
C.[End],
C.Is_Downtime,
C.Downtime_Event
FROM CTE C
WHERE C.RN = 1;

Oracle Query to find the Nth oldest visit of a person

I have the following Oracle table
PersonID
VisitedOn
1
1/1/2017
1
1/1/2018
1
1/1/2019
1
1/1/2020
1
2/1/2020
1
3/1/2020
1
5/1/2021
1
6/1/2022
2
1/1/2015
2
1/1/2017
2
1/1/2018
2
1/1/2019
2
1/1/2020
2
2/1/2020
3
1/1/2017
3
1/1/2018
3
1/1/2019
3
1/1/2020
3
2/1/2020
3
3/1/2020
3
5/1/2021
I try to write a query to return the Nth oldest visit of each person.
For instance if I want to return the 5th oldest visit (N=5) the result would be
PersonID
VisitDate
1
1/1/2020
2
1/1/2017
3
1/1/2019
I think this will work:
Ran test with this data:
create table test (PersonID number, VisitedOn date);
insert into test values(1,'01-JAN-2000');
insert into test values(1,'01-JAN-2001');
insert into test values(1,'01-JAN-2002');
insert into test values(1,'01-JAN-2003');
insert into test values(2,'01-JAN-2000');
insert into test values(2,'01-JAN-2001');
select personid, visitedon
from (
select personid,
visitedon,
row_number() over ( partition by personid order by visitedon ) rn
from test
)
where rn=5
What this does is use an analytic function to assign a row number to each set of records partitioned by the person id, then pick the Nth row from each partitioned group, where the rows in each group are sorted by date. If you run the inner query by itself, you will see where the row_number is assigned:
PERSONID VISITEDON RN
1 01-JAN-00 1
1 01-JAN-01 2
1 01-JAN-02 3
1 01-JAN-03 4
2 01-JAN-00 1
2 01-JAN-01 2

Report on a point in time

I am about to create what I assume will be 2 new tables in SQL. The idea is for one to be the "live" data and a second which would hold all the changes. Dates are in DD/MM/YYYY format.
Active
ID | Name | State Date | End Date
1 Zac 1/1/2016 -
2 John 1/5/2016 -
3 Sam 1/6/2016 -
4 Joel 1/7/2016 -
Changes
CID | UID | Name | Start Date | End Date
1 1 Zac 1/1/2016 -
2 4 Joel 1/1/2016 -
3 4 Joel - 1/4/2016
4 2 John 1/5/2016 -
5 3 Sam 1/6/2016 -
6 4 Joel 1/7/2016 -
In the above situation you can see that Joel worked from the 1/1/2016 until the 1/4/2016, took 3 months off and then worked from the 1/7/2016.
I need to build a query where by I can pick a date in time and report on who was working at that time. The above table only lists the name but there will be many more columns to report on for a point in time.
What would be best way to structure the tables to be able to achieve this query.
I started writing this last night and finally coming back to it. Basically you would have to use your change table to create a Slowly Changing Dimension and then generate a row number to match your start and ends. This will assume however that your DB will never be out of sync by adding 2 start records or 2 end records in a row.
This also assumes you are using a RDBMS that supports common table expressions and Window Functions such as SQL Server, Oracle, PostgreSQL, DB2....
WITH cte AS (
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY UID ORDER BY ISNULL(StartDate,EndDate)) As RowNum
FROM
Changes c
)
SELECT
s.UID
,s.Name
,s.StartDate
,COALESCE(e.EndDate,GETDATE()) as EndDate
FROM
cte s
LEFT JOIN cte e
ON s.UID = e.UID
AND s.RowNum + 1 = e.RowNum
WHERE
s.StartDate IS NOT NULL
AND '2016-05-05' BETWEEN s.StartDate AND COALESCE(e.EndDate,GETDATE())

Recursive query with time difference

This is my first post here even though I am a daily reader. :)
I need to produce an MS SQL Server 2014 report that shows the clients that come back to do business with me in less than or equal to 3 days. I tried with INNER JOINS but I wasn't successful.
The way I thought of the solution is using the below Logic:
If product is same
and if userId is same
and if action was donedeal but now is new
and if date diff <= 3 days
and if type is NOT same
then show results
e.g of my Data:
id orderId userId type product date action
1 1001 654 ordered apple 01/05/2016 new
2 1002 889 ordered peach 01/05/2016 new
3 1001 654 paid apple 01/05/2016 donedeal
4 1002 889 paid peach 03/05/2016 donedeal
5 1003 654 ordered apple 03/05/2016 new
6 1004 889 ordered peach 04/05/2016 new
7 1005 122 ordered apple 04/05/2016 new
8 1006 978 ordered peach 04/05/2016 new
9 1005 122 paid apple 04/05/2016 donedeal
10 1007 122 ordered apple 10/05/2016 new
Desired results:
id orderId userId type product date Diff
3 1001 654 paid apple 01/05/2016 2 days
4 1002 889 paid peach 03/05/2016 1 day
5 1003 654 ordered apple 03/05/2016 2 days
6 1004 889 ordered peach 04/05/2016 1 day
Could you please direct me to the functions that can be useful for me to solve this?
Thanks in advance.
#
Update
Gordon Linoff gave me the suggested code below but since the Type had to be different I replicated the code and run it as per below and it worked:
select t.*
from (select t.*,
max(case when action = 'donedeal' and type='paid' then date end) over
(partition by user, product order by date) as last_donedealdate
from t
) t
where action = 'new' and type='ordered' date < dateadd(day, 3, last_donedealdate)
UNION ALL
select t.*
from (select t.*,
max(case when action = 'donedeal' and type='ordered' then date end) over
(partition by user, product order by date) as last_donedealdate
from t
) t
where action = 'new' and type='paid' date < dateadd(day, 3, last_donedealdate)
You can use window functions for this. To get the last done deal date, use max() with partition by and order by. The rest is just where clause logic:
select t.*
from (select t.*,
max(case when action = 'donedeal' then date end) over
(partition by user, product order by date) as last_donedealdate
from t
) t
where action = 'new' and date < dateadd(day, 3, last_donedealdate);