Denormalize column - sql

I have data in my database like this:
Code
meta
meta_ID
date
A
1,2
1
01/01/2022 08:08:08
B
1,2
2
01/01/2022 02:00:00
B
null
2
01/01/1900 02:00:00
C
null
3
01/01/2022 02:00:00
D
8
8
01/01/2022 02:00:00
E
5,6,7
5
01/01/2022 02:00:00
F
1,2
2
01/01/2022 02:00:00
I want to have this with the last date (comparing with day, month year)
Code
meta
meta_ID
list_Code
date
A
2,3
1
A,B,F
01/01/2022 08:08:08
B
1,3
2
A,B,F
01/01/2022 02:00:00
C
null
3
C
01/01/2022 02:00:00
D
8
8
D
01/01/2022 02:00:00
E
5,6,7
5
E
01/01/2022 02:00:00
F
1,2
3
A,B,F
01/01/2022 02:00:00
I want to have the list of code having the same meta group, do you know how to do it with SQL Server?

The code below inputs the 1st table and outputs the 2nd table exactly. The Meta and Date columns had duplicate values, so in the CTE I took the MAX for both fields. Different logic can be applied if needed.
It uses XML Path to merge all rows into one column to create the List_Code column. The Stuff function removes the leading comma (,) delimiter.
CREATE TABLE MetaTable
(
Code VARCHAR(5),
Meta VARCHAR(100),
Meta_ID INT,
Date DATETIME
)
GO
INSERT INTO MetaTable
VALUES
('A', '1,2', '1', '01/01/2022 08:08:08'),
('B', '1,2','2', '01/01/2022 02:00:00'),
('B', NULL,'2', '01/01/1900 02:00:00'),
('C', NULL,'3', '01/01/2022 02:00:00'),
('D', '8','8', '01/01/2022 02:00:00'),
('E', '5,6,7', '5', '01/01/2022 02:00:00'),
('F', '1,2','2', '01/01/2022 02:00:00')
GO
WITH CTE_Meta
AS
(
SELECT
Code,
MAX(Meta) AS 'Meta',
Meta_ID,
MAX(Date) AS 'Date'
FROM MetaTable
GROUP BY
Code,
Meta_ID
)
SELECT
T1.Code,
T1.Meta,
T1.Meta_ID,
STUFF
(
(
SELECT ',' + Code
FROM CTE_Meta T2
WHERE ISNULL(T1.Meta, '') = ISNULL(T2.Meta, '')
FOR XML PATH('')
), 1, 1, ''
) AS 'List_Code',
T1.Date
FROM CTE_Meta T1
ORDER BY 1

I like the first answer using XML. It's very concise. This is more verbose, but might be more flexible if the data can have different meta values spread about in different records. The CAST to varchar(12) in various places is just for the display. I use STRING_AGG and STRING_SPLIT instead of XML.
WITH TestData as (
SELECT t.*
FROM (
Values
('A', '1,2', '1', '01/01/2022 08:08:08'),
('B', '1,2', '2', '01/01/2022 02:00:00'),
('B', null, '2', '01/01/1900 02:00:00'),
('C', null, '3', '01/01/2022 02:00:00'),
('D', '8', '8', '01/01/2022 02:00:00'),
('E', '5,6,7', '5', '01/01/2022 02:00:00'),
('F', '1,2', '2', '01/01/2022 02:00:00'),
('G', '16', '17', '01/01/2022 02:00:00'),
('G', null, '17', '01/02/2022 03:00:00'),
('G', '19', '18', '01/03/2022 04:00:00'),
('G', '19', '18', '01/03/2022 04:00:00'),
('G', '20', '19', '01/04/2022 05:00:00'),
('G', '20', '20', '01/05/2022 06:00:00')
) t (Code, meta, meta_ID, date)
), CodeLookup as ( -- used to find the Code from the meta_ID
SELECT DISTINCT meta_ID, Code
FROM TestData
), Normalized as ( -- split out the meta values, one per row
SELECT t.Code, s.Value as [meta], meta_ID, [date]
FROM TestData t
OUTER APPLY STRING_SPLIT(t.meta, ',') s
), MetaLookup as ( -- used to find the distinct list of meta values for a Code
SELECT n.Code, CAST(STRING_AGG(n.meta, ',') WITHIN GROUP ( ORDER BY n.meta ASC ) as varchar(12)) as [meta]
FROM (
SELECT DISTINCT Code, meta
FROM Normalized
WHERE meta is not NULL
) n
GROUP BY n.Code
), MetaIdLookup as ( -- used to find the distinct list of meta_ID values for a Code
SELECT n.Code, CAST(STRING_AGG(n.meta_ID, ',') WITHIN GROUP ( ORDER BY n.meta_ID ASC ) as varchar(12)) as [meta_ID]
FROM (
SELECT DISTINCT Code, meta_ID
FROM Normalized
) n
GROUP BY n.Code
), ListCodeLookup as ( -- for every code, get all codes for the meta values
SELECT l.Code, CAST(STRING_AGG(l.lookupCode, ',') WITHIN GROUP ( ORDER BY l.lookupCode ASC ) as varchar(12)) as [list_Code]
FROM (
SELECT DISTINCT n.Code, c.Code as [lookupCode]
FROM Normalized n
INNER JOIN CodeLookup c
ON c.meta_ID = n.meta
UNION -- every record needs it's own code in the list_code?
SELECT DISTINCT n.Code, n.Code as [lookupCode]
FROM Normalized n
) l
GROUP BY l.Code
)
SELECT t.Code, m.meta, mi.meta_ID, lc.list_Code, t.[date]
FROM (
SELECT Code, MAX([date]) as [date]
FROM TestData
GROUP BY Code
) t
LEFT JOIN MetaLookup m
ON m.Code = t.Code
LEFT JOIN MetaIdLookup mi
ON mi.Code = t.Code
LEFT JOIN ListCodeLookup lc
ON lc.Code = t.Code
Code meta meta_ID list_Code date
---- ------------ ------------ ------------ -------------------
A 1,2 1 A,B,F 01/01/2022 08:08:08
B 1,2 2 A,B,F 01/01/2022 02:00:00
C NULL 3 C 01/01/2022 02:00:00
D 8 8 D 01/01/2022 02:00:00
E 5,6,7 5 E 01/01/2022 02:00:00
F 1,2 2 A,B,F 01/01/2022 02:00:00
G 16,19,20 17,18,19,20 G 01/05/2022 06:00:00

Related

Cumulative sum reset by change of year (SQL Server)

I'm using SQL Server 2017. I would like to sum up the budget per month of a year for that year and factory.
This cumulation is to be reset with each new year.
Table schema:
CREATE TABLE [TABLE_1]
(
FACTORY varchar(50) Null,
DATE_YM int Null,
BUDGET int NULL,
);
INSERT INTO TABLE_1 (FACTORY, DATE_YM, BUDGET)
VALUES ('A', 202111, 1),
('A', 202112, 1),
('A', 202201, 10),
('A', 202202, 100),
('A', 202203, 1000),
('B', 202111, 2),
('B', 202112, 2),
('B', 202201, 20),
('B', 202202, 200),
('B', 202203, 2000),
('C', 202111, 3),
('C', 202112, 3),
('C', 202201, 30),
('C', 202202, 300),
('C', 202203, 3000);
LINK TO db<>fiddle
Desired result
FACTORY
DATE_YM
C_BUDGET_SUM
A
202111
1
A
202112
2
A
202201
10
A
202202
110
A
202203
1110
B
202111
2
B
202112
4
B
202201
20
B
202202
220
B
202203
2220
C
202111
3
C
202112
6
C
202201
30
C
202202
330
C
202203
3330
My approach:
WITH data AS
(
SELECT
T1.FACTORY,
T1.DATE_YM,
T1.BUDGET
FROM
TABLE_1 AS T1
)
SELECT
FACTORY,
DATE_YM,
SUM(BUDGET) OVER (ORDER BY FACTORY, DATE_YM ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS 'C_BUDGET_SUM'
FROM
data
This query totals across year ends. How can the year break be implemented dynamically?
The CTE is not necessary, but I'm assuming this is a simplified version.
To expand on my comment
with data as (
select
T1.FACTORY,
T1.DATE_YM,
T1.BUDGET
from TABLE_1 as T1
)
select
FACTORY,
DATE_YM,
sum(BUDGET) over (partition by Factory,left(Date_YM,4) order by DATE_YM asc rows between unbounded preceding and current row) as 'C_BUDGET_SUM'
from data
Results

Identify rows subsequent to other rows based on criteria?

I am fairly new to DB2 and SQL. There exists a table of customers and their visits. I need to write a query to find visits by the same customer subsequent and within 24hr to a visit when Sale = 'Y'.
Based on this example data:
CustomerId
VisitID
Sale
DateTime
1
1
Y
2021-04-23 20:16:00.000000
2
2
N
2021-04-24 20:16:00.000000
1
3
N
2021-04-23 21:16:00.000000
2
4
Y
2021-04-25 20:16:00.000000
3
5
Y
2021-04-23 20:16:00.000000
2
6
N
2021-04-25 24:16:00.000000
3
7
N
2021-5-23 20:16:00.000000
The query results should return:
VisitID
3
6
How do I do this?
Try this. You may uncomment the commented out block to run this statement as is.
/*
WITH MYTAB (CustomerId, VisitID, Sale, DateTime) AS
(
VALUES
(1, 1, 'Y', '2021-04-23 20:16:00'::TIMESTAMP)
, (1, 3, 'N', '2021-04-23 21:16:00'::TIMESTAMP)
, (2, 2, 'N', '2021-04-24 20:16:00'::TIMESTAMP)
, (2, 4, 'Y', '2021-04-25 20:16:00'::TIMESTAMP)
, (2, 6, 'N', '2021-04-25 23:16:00'::TIMESTAMP)
, (3, 5, 'Y', '2021-04-23 20:16:00'::TIMESTAMP)
, (3, 7, 'N', '2021-05-23 20:16:00'::TIMESTAMP)
)
*/
SELECT VisitID
FROM MYTAB A
WHERE EXISTS
(
SELECT 1
FROM MYTAB B
WHERE B.CustomerID = A.CustomerID
AND B.Sale = 'Y'
AND B.VisitID <> A.VisitID
AND A.DateTime BETWEEN B.DateTime AND B.DateTime + 24 HOUR
)

oracle track the history on a table with timestamp columns

I have 2 tables in the oracle 12c database with the below structure. Table A has the incoming data from an application with modified date timestamps,
each day we may get around 50,000 rows in table A. the goal is to use the table table A's data and insert into the final target table B(usually has billions of rows)
by using table A's data as the driving data set.
A record needs to be inserted/merged in table B only when there is a change in the incoming dataset attributes.
basically the purpose is to track the history/journey of a given product with valid timestamps only when there are changes in its attributes such as state and zip_cd.
See table structures below
Table A ( PRODUCT_ID, STATE, ZIP_CD, Modified_dt)
'abc', 'MN', '123', '3/5/2020 12:01:00 AM'
'abc', 'MN', '123', '3/5/2020 6:01:13 PM'
'abc', 'IL', '223', '3/5/2020 7:01:15 PM'
'abc', 'OH', '333', '3/5/2020 6:01:16 PM'
'abc', 'NY', '722', '3/5/2020 4:29:00 PM'
'abc', 'KS', '444', '3/5/2020 4:31:41 PM'
'bbc', 'MN', '123', '3/19/2020 2:47:08 PM'
'bbc', 'IL', '223', '3/19/2020 2:50:37 PM'
'ccb', 'MN', '123', '3/21/2020 2:56:24 PM'
'dbd', 'KS', '444', '6/20/2020 12:00:00 AM'
Target Table B (SEQUENCE_KEY,PRODUCT_ID,STATE, ZIP_CD, Valid_From, Valid_To, LATEST_FLAG)
'1', 'abc', 'AR', '999', '3/3/2020 12:00:00 AM', '3/3/2020 6:01:13 PM', 'N'
'2', 'abc', 'AR', '555', '3/3/2020 6:01:14 PM', '3/3/2020 6:01:14 PM', 'N'
'3', 'abc', 'CA', '565', '3/3/2020 6:01:15 PM', '3/4/2020 4:28:59 PM', 'N'
'4', 'abc', 'CA', '777', '3/4/2020 4:29:00 PM', '12/31/2099', 'Y'
'5', 'bbc', 'MN', '123', '3/4/2020 4:31:41 PM', '3/19/2020 2:47:07 PM', 'N'
'6', 'bbc', 'MN', '666', '3/18/2020 2:47:08 PM', '3/19/2020 2:50:36 PM', 'N'
'7', 'bbc', 'MN', '777', '3/18/2020 2:50:37 PM', '12/31/2099', , 'Y'
'8', 'ccb', 'MN', '123', '3/20/2020 2:56:24 PM', '12/31/2099', 'Y'
Rules for populating data into table B:
the primary key on the output table is product_id and valid_from field.
the incoming data from table A will always have modified dt timestamps greather than the existing table.
inorder to insert data, we will have to compare latest_flag = 'Y' record from target table B and the incoming data from table A and only when there is a change
in the attributes state and zip_cd, then a record needs to be inserted in table B from table A. valid_to column is a calcuated field which is always 1 second lower than the
next row's valid from date, and for the latest row its defaulted to '12/31/2099'. Similary, latest_flag column is a calcuated column that indicates the current row of a given product_id
In the incoming dataset if there are multiple rows without any changes compared to the previous row or existing data in table B(latest_flag='Y') then
those should be ignored as well. as an example row 2 and row 9 from Table A are ignored as there are no changes in the attributes state, zip_cd when compared to their previous rows for that product.
Based on the above rules, I need to merge the table A data into table B and the final ouput looks like below
Table B (SEQUENCE_KEY,PRODUCT_ID,STATE, ZIP_CD, Valid_From, Valid_To, LATEST_FLAG)
'1', 'abc', 'AR', '999', '3/3/2020 12:00:00 AM', '3/3/2020 6:01:13 PM', 'N'
'2', 'abc', 'AR', '555', '3/3/2020 6:01:14 PM' '3/3/2020 6:01:14 PM', 'N'
'3', 'abc', 'CA', '565', '3/3/2020 6:01:15 PM' '3/4/2020 4:28:59 PM', 'N'
'4', 'abc', 'CA', '777', '3/4/2020 4:29:00 PM' '3/5/2020 12:00:00 AM', 'N'
'5', 'abc', 'MN', '123', '3/5/2020 12:01:00 AM', '3/5/2020 7:01:14 PM', 'N'
'6', 'abc', 'IL', '223' '3/5/2020 7:01:15 PM', '3/5/2020 6:01:15 PM', 'N'
'7', 'abc', 'OH', '333', '3/5/2020 6:01:16 PM', '3/5/2020 4:28:59 PM', 'N'
'8', 'abc', 'NY', '722', '3/5/2020 4:29:00 PM', '3/5/2020 4:31:40 PM', 'N'
'9', 'abc', 'KS', '444', '3/5/2020 4:31:41 PM', '12/31/2099', 'Y'
'10', 'bbc', 'MN', '123', '3/4/2020 4:31:41 PM' '3/19/2020 2:47:07 PM', 'N'
'11', 'bbc', 'MN', '666', '3/18/2020 2:47:08 PM' '3/19/2020 2:50:36 PM', 'N'
'12', 'bbc', 'MN', '777', '3/18/2020 2:50:37 PM' '3/19/2020 2:47:07 PM', 'N'
'13', 'bbc', 'MN', '123', '3/19/2020 2:47:08 PM' '3/19/2020 2:50:36 PM', 'N'
'14', 'bbc', 'IL', '223', '3/19/2020 2:50:37 PM' '12/31/2099', 'Y'
'15', 'ccb', 'MN', '123', '3/20/2020 2:56:24 PM' '12/31/2099', 'Y'
'16', 'dbd', 'KS', '444', '6/20/2020 12:00:00 AM' '12/31/2099', 'Y'
Looking for suggestions to solve this problem.
LIVE SQL link:
https://livesql.oracle.com/apex/livesql/s/kfbx7dwzr3zz28v6eigv0ars0
Thank you.
I tried to see how to do this in SQL but it was impossible to me because of the logic and also the sequence_key reset that you have in your desired ouput.
So, here my suggestion in PL/SQL
SQL> select * from table_a ;
PRODUCT_ID STATE ZIP_CD MODIFIED_
------------------------------ ------------------------------ ------------------------------ ---------
abc MN 123 05-MAR-20
abc MN 123 05-MAR-20
abc IL 223 05-MAR-20
abc OH 333 05-MAR-20
abc NY 722 05-MAR-20
abc KS 444 05-MAR-20
bbc MN 123 19-MAR-20
bbc IL 223 19-MAR-20
ccb MN 123 19-MAR-20
dbd KS 444 19-MAR-20
10 rows selected.
SQL> select * from table_b ;
SEQUENCE_KEY PRODUCT_ID STATE ZIP_CD VALID_FRO VALID_TO L
------------ ------------------------------ ------------------------------ ------------------------------ --------- --------- -
1 abc AR 999 05-MAR-20 05-MAR-20 N
2 abc AR 555 05-MAR-20 05-MAR-20 N
3 abc CA 565 05-MAR-20 05-MAR-20 N
4 abc CA 777 05-MAR-20 31-DEC-99 Y
5 bbc MN 123 05-MAR-20 05-MAR-20 N
6 bbc MN 666 05-MAR-20 05-MAR-20 N
7 bbc MN 777 19-MAR-20 31-DEC-99 Y
8 ccb MN 123 19-MAR-20 31-DEC-99 Y
8 rows selected.
Now, I used this piece of PL_SQL code
declare
type typ_rec_set IS RECORD
(
PRODUCT_ID VARCHAR2(30 CHAR),
STATE VARCHAR2(30 CHAR),
ZIP_CD VARCHAR2(30 CHAR),
VALID_FROM DATE ,
VALID_TO DATE ,
LATEST_FLAG VARCHAR2(1 CHAR)
);
type typ_rec_tab is TABLE OF typ_rec_set;
l_hdr_tab typ_rec_tab;
begin
SELECT product_id
,state
,zip_cd
,valid_from
,valid_to
,CASE WHEN valid_to = DATE '2099-12-31' THEN 'Y' ELSE 'N' END latest_flag
BULK COLLECT INTO l_hdr_tab
FROM
(
SELECT a.product_id
,a.state
,a.zip_cd
,a.modified_dt valid_from
,NVL(((LEAD (a.modified_dt,1) OVER (PARTITION BY a.product_id ORDER BY a.modified_dt)) - INTERVAL '1' SECOND),DATE '2099-12-31' )valid_to
,CASE
WHEN ( ( b.product_id IS NOT NULL
AND a.state != b.state
AND a.zip_cd != b.zip_cd)
OR b.product_id IS NULL
) THEN
1
ELSE
0
END insert_flag
FROM table_a a
LEFT OUTER JOIN table_b b
ON a.product_id = b.product_id
AND b.latest_flag = 'Y'
WHERE (a.modified_dt >= b.valid_from OR b.product_id IS NULL)
ORDER BY a.product_id,a.modified_dt
)
WHERE insert_flag != 0 ;
--loop
FOR i IN l_hdr_tab.first .. l_hdr_tab.last
LOOP
-- begin block
begin
insert into table_b
(
sequence_key ,
PRODUCT_ID ,
STATE ,
ZIP_CD ,
VALID_FROM ,
VALID_TO ,
LATEST_FLAG
)
values
(
( select max(sequence_key)+1 from table_b ),
l_hdr_tab(i).product_id ,
l_hdr_tab(i).state ,
l_hdr_tab(i).zip_cd ,
l_hdr_tab(i).valid_from ,
l_hdr_tab(i).valid_to ,
l_hdr_tab(i).latest_flag
);
end;
end loop;-- reset sequence base of row_number over product_id valid_from
commit;
-- reset sequence
merge into table_b t
using ( select sequence_key ,
PRODUCT_ID ,
STATE ,
ZIP_CD ,
VALID_FROM ,
VALID_TO ,
LATEST_FLAG ,
row_number() over ( order by product_id,valid_from ) as new_seq
from table_b ) s
on ( s.rowid = t.rowid )
when matched then
update set t.sequence_key = s.new_seq where t.sequence_key != s.new_seq ;
commit;
exception when others then raise;
end;
/
Then I run it
SQL> host cat proc.sql
declare
type typ_rec_set IS RECORD
(
PRODUCT_ID VARCHAR2(30 CHAR),
STATE VARCHAR2(30 CHAR),
ZIP_CD VARCHAR2(30 CHAR),
VALID_FROM DATE ,
VALID_TO DATE ,
LATEST_FLAG VARCHAR2(1 CHAR)
);
type typ_rec_tab is TABLE OF typ_rec_set;
l_hdr_tab typ_rec_tab;
begin
SELECT product_id
,state
,zip_cd
,valid_from
,valid_to
,CASE WHEN valid_to = DATE '2099-12-31' THEN 'Y' ELSE 'N' END latest_flag
BULK COLLECT INTO l_hdr_tab
FROM
(
SELECT a.product_id
,a.state
,a.zip_cd
,a.modified_dt valid_from
,NVL(((LEAD (a.modified_dt,1) OVER (PARTITION BY a.product_id ORDER BY a.modified_dt)) - INTERVAL '1' SECOND),DATE '2099-12-31' )valid_to
,CASE
WHEN ( ( b.product_id IS NOT NULL
AND a.state != b.state
AND a.zip_cd != b.zip_cd)
OR b.product_id IS NULL
) THEN
1
ELSE
0
END insert_flag
FROM table_a a
LEFT OUTER JOIN table_b b
ON a.product_id = b.product_id
AND b.latest_flag = 'Y'
WHERE (a.modified_dt >= b.valid_from OR b.product_id IS NULL)
ORDER BY a.product_id,a.modified_dt
)
WHERE insert_flag != 0 ;
--loop
FOR i IN l_hdr_tab.first .. l_hdr_tab.last
LOOP
-- begin block
begin
insert into table_b
(
sequence_key ,
PRODUCT_ID ,
STATE ,
ZIP_CD ,
VALID_FROM ,
VALID_TO ,
LATEST_FLAG
)
values
(
( select max(sequence_key)+1 from table_b ),
l_hdr_tab(i).product_id ,
l_hdr_tab(i).state ,
l_hdr_tab(i).zip_cd ,
l_hdr_tab(i).valid_from ,
l_hdr_tab(i).valid_to ,
l_hdr_tab(i).latest_flag
);
end;
end loop;-- reset sequence base of row_number over product_id valid_from
commit;
-- reset sequence
merge into table_b t
using ( select sequence_key ,
PRODUCT_ID ,
STATE ,
ZIP_CD ,
VALID_FROM ,
VALID_TO ,
LATEST_FLAG ,
row_number() over ( order by product_id,valid_from ) as new_seq
from table_b ) s
on ( s.rowid = t.rowid )
when matched then
update set t.sequence_key = s.new_seq where t.sequence_key != s.new_seq ;
commit;
exception when others then raise;
end;
/
SQL> #proc.sql
PL/SQL procedure successfully completed.
SQL> select * from table_b order by sequence_key ;
SEQUENCE_KEY PRODUCT_ID STATE ZIP_CD VALID_FRO VALID_TO L
------------ ------------------------------ ------------------------------ ------------------------------ --------- --------- -
1 abc AR 999 05-MAR-20 05-MAR-20 N
2 abc NY 722 05-MAR-20 05-MAR-20 N
3 abc CA 777 05-MAR-20 31-DEC-99 Y
4 abc KS 444 05-MAR-20 05-MAR-20 N
5 abc MN 123 05-MAR-20 05-MAR-20 N
6 abc AR 555 05-MAR-20 05-MAR-20 N
7 abc CA 565 05-MAR-20 05-MAR-20 N
8 abc OH 333 05-MAR-20 05-MAR-20 N
9 abc IL 223 05-MAR-20 31-DEC-99 Y
10 bbc MN 666 05-MAR-20 05-MAR-20 N
11 bbc MN 123 05-MAR-20 05-MAR-20 N
SEQUENCE_KEY PRODUCT_ID STATE ZIP_CD VALID_FRO VALID_TO L
------------ ------------------------------ ------------------------------ ------------------------------ --------- --------- -
12 bbc MN 777 19-MAR-20 31-DEC-99 Y
13 bbc IL 223 19-MAR-20 31-DEC-99 Y
14 ccb MN 123 19-MAR-20 31-DEC-99 Y
15 dbd KS 444 19-MAR-20 31-DEC-99 Y
15 rows selected.
SQL>
Just let me know any doubts you might have. I know that for sure I miss something ;)
UPDATE
I realized that I have an useless operation in the loop, the calculation of the maxvalue for the field SEQUENCE_KEY. I have a better version of the procedure here:
declare
type typ_rec_set IS RECORD
(
PRODUCT_ID VARCHAR2(30 CHAR),
STATE VARCHAR2(30 CHAR),
ZIP_CD VARCHAR2(30 CHAR),
VALID_FROM DATE ,
VALID_TO DATE ,
LATEST_FLAG VARCHAR2(1 CHAR)
);
type typ_rec_tab is TABLE OF typ_rec_set;
l_hdr_tab typ_rec_tab;
r pls_integer := 1;
vseq pls_integer;
begin
-- calculate value sequence
select max(sequence_key) into vseq from table_b ;
SELECT product_id
,state
,zip_cd
,valid_from
,valid_to
,CASE WHEN valid_to = DATE '2099-12-31' THEN 'Y' ELSE 'N' END latest_flag
BULK COLLECT INTO l_hdr_tab
FROM
(
SELECT a.product_id
,a.state
,a.zip_cd
,a.modified_dt valid_from
,NVL(((LEAD (a.modified_dt,1) OVER (PARTITION BY a.product_id ORDER BY a.modified_dt)) - INTERVAL '1' SECOND),DATE '2099-12-31' )valid_to
,CASE
WHEN ( ( b.product_id IS NOT NULL
AND a.state != b.state
AND a.zip_cd != b.zip_cd)
OR b.product_id IS NULL
) THEN
1
ELSE
0
END insert_flag
FROM table_a a
LEFT OUTER JOIN table_b b
ON a.product_id = b.product_id
AND b.latest_flag = 'Y'
WHERE (a.modified_dt >= b.valid_from OR b.product_id IS NULL)
ORDER BY a.product_id,a.modified_dt
)
WHERE insert_flag != 0 ;
--loop
FOR i IN l_hdr_tab.first .. l_hdr_tab.last
LOOP
-- begin block
vseq := vseq + r ;
begin
insert into table_b
(
sequence_key ,
PRODUCT_ID ,
STATE ,
ZIP_CD ,
VALID_FROM ,
VALID_TO ,
LATEST_FLAG
)
values
(
vseq ,
l_hdr_tab(i).product_id ,
l_hdr_tab(i).state ,
l_hdr_tab(i).zip_cd ,
l_hdr_tab(i).valid_from ,
l_hdr_tab(i).valid_to ,
l_hdr_tab(i).latest_flag
);
end;
r := r + 1;
end loop;-- reset sequence base of row_number over product_id valid_from
commit;
-- reset sequence
merge into table_b t
using ( select sequence_key ,
PRODUCT_ID ,
STATE ,
ZIP_CD ,
VALID_FROM ,
VALID_TO ,
LATEST_FLAG ,
row_number() over ( order by product_id,valid_from ) as new_seq
from table_b ) s
on ( s.rowid = t.rowid )
when matched then
update set t.sequence_key = s.new_seq where t.sequence_key != s.new_seq ;
commit;
exception when others then raise;
end;
/
I would give my first try with the understanding I have. The cursor as source for inserting to TableB would look like,
SELECT product_id
,state
,zip_cd
,valid_from
,valid_to
,CASE WHEN valid_to = DATE '2099-12-31' THEN 'Y' ELSE 'N' END latest_flag
FROM
(
SELECT a.product_id
,a.state
,a.zip_cd
,a.modified_dt valid_from
,NVL(((LEAD (a.modified_dt,1) OVER (PARTITION BY a.product_id ORDER BY a.modified_dt)) - INTERVAL '1' SECOND),DATE '2099-12-31' )valid_to
,CASE
WHEN ( ( b.product_id IS NOT NULL
AND a.state != b.state
AND a.zip_cd != b.zip_cd)
OR b.product_id IS NULL
) THEN
1
ELSE
0
END insert_flag
FROM table_a a
LEFT OUTER JOIN table_b b
ON a.product_id = b.product_id
AND b.latest_flag = 'Y'
WHERE (a.modified_dt >= b.valid_from OR b.product_id IS NULL)
ORDER BY a.product_id,a.modified_dt
)
WHERE insert_flag != 0;
LEFT OUTER JOIN to check if the record exists in TableB and the WHERE clause checks for modified_date greater than the valid_from for the latest_flag = 'Y'
Inner Case statement will tell us whether the attributes are changed or not and in case the product_id is not present it also consider it as first entry and the insert_flag will be 1
Outer case statement provides the valid_to in case of last record as per modified date column to 31-12-2099
Not completely clear with respect to point 3 but I believe the case statement is what we need for.
At the end I didn't consider the performance problem here. you can think of converting it to PL/SQL block and other collection methods to process data in chunk.
Also I have here one question , what happens to the record with product id "dbd" (which is a new entry and doesn't exists in TableB) if present multiple times in tableA ?
This is Slowly Changing Dimensions (SCD) Type 2 problem in data warehousing (Kimball approach). You can see a short definitions here
https://www.oracle.com/webfolder/technetwork/tutorials/obe/db/10g/r2/owb/owb10gr2_gs/owb/lesson3/slowlychangingdimensions.htm
Support for SCD Type 2 is available in Enterprise ETL option of OWB 10gR2 only as described in the above link. If that's not available and you have to use PL/SQL, you can check out the following approach. Unfortunately, Oracle PL/SQL does not offer a straight forward solution unlike MS SQL.
Implementing Type 2 SCD in Oracle

need to get a subsequent record with a specific value

i have the following table :
Dt Status
05.23.2019 10:00:00 A
05.23.2019 11:00:00 B
05.23.2019 12:00:00 B
05.23.2019 13:00:00 D
05.23.2019 14:00:00 A
05.23.2019 15:00:00 B
05.23.2019 16:00:00 C
05.23.2019 17:00:00 D
05.23.2019 18:00:00 A
For each status A i need to get the next status D. The result should be like this :
Status1 Status2 Dt1 Dt2
A D 05.23.2019 10:00:00 05.23.2019 13:00:00
A D 05.23.2019 14:00:00 05.23.2019 17:00:00
A null 05.23.2019 18:00:00 null
I have my own solution based on cross/outer apply , In terms of performance i need solution without cross/outer apply.
We can try using ROW_NUMBER here along with some pivot logic:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Status ORDER BY Dt) rn
FROM yourTable
WHERE Status IN ('A', 'D')
)
SELECT
MAX(CASE WHEN Status = 'A' THEN Status END) AS Status1,
MAX(CASE WHEN Status = 'D' THEN Status END) AS Status2,
MAX(CASE WHEN Status = 'A' THEN Dt END) AS Dt1,
MAX(CASE WHEN Status = 'D' THEN Dt END) AS Dt2
FROM cte
GROUP BY rn
ORDER BY rn;
Demo
The idea here is to generate a row number sequence along your entire table, for each separate Status value (A or D). Then, aggregate by that row number sequence to bring the A and D records together.
As result columns Status1 and Status2 always seem to be "A" and "D" respectively, I omitted them in my result.
DECLARE #Data TABLE
(
[Dt] SMALLDATETIME,
[Status] CHAR(1)
);
INSERT INTO #Data ([Dt], [Status]) VALUES
('2019-05-23 10:00:00', 'A'),
('2019-05-23 11:00:00', 'B'),
('2019-05-23 12:00:00', 'B'),
('2019-05-23 13:00:00', 'D'),
('2019-05-23 14:00:00', 'A'),
('2019-05-23 15:00:00', 'B'),
('2019-05-23 16:00:00', 'C'),
('2019-05-23 17:00:00', 'D'),
('2019-05-23 18:00:00', 'A'),
('2019-05-23 19:00:00', 'D'),
('2019-05-23 20:00:00', 'D'),
('2019-05-23 21:00:00', 'A'),
('2019-05-23 22:00:00', 'A'),
('2019-05-23 23:00:00', 'A');
SELECT
D.[Dt] AS [Dt1],
[LastDBeforeNextA].[Dt] AS [Dt2]
FROM
#Data AS D
OUTER APPLY (SELECT TOP (1) [Dt]
FROM #Data
WHERE [Status] = 'A' AND [Dt] > D.[Dt]
ORDER BY [Dt]) AS [NextA]
OUTER APPLY (SELECT TOP (1) [Dt]
FROM #Data
WHERE [Status] = 'D' AND [Dt] < [NextA].[Dt] AND [Dt] > D.[Dt]
ORDER BY [Dt] DESC) AS [LastDBeforeNextA]
WHERE
D.[Status] = 'A' AND
([NextA].[Dt] > [LastDBeforeNextA].[Dt] OR ([LastDBeforeNextA].[Dt] IS NULL AND [NextA].[Dt] IS NULL))
It initially gets all records from the table where status is 'A' (using expression D.[Status] = 'A' in the WHERE-clause).
For each record found, it joins the date of the next record with status A (table expression with alias NextA) and the date of the last record with status D that comes right before the next A-record but after the current A-record (table expression with alias LastDBeforeNextA).
Results are valid when a D-record is found (expression [NextA].[Dt] > [LastDBeforeNextA].[Dt] in the WHERE-clause) or when there is no D-record yet (expression [LastDBeforeNextA].[Dt] IS NULL in the WHERE-clause). In the latter case, you need to get the latest A-record, however (expression [NextA].[Dt] IS NULL in the WHERE-clause), since there can be multiple A-records after the last D-record.

postgresql timeclock entry with no matching pair

I have a table:
CREATE TABLE timeclock(
employeeid INT
, logdate DATE
, logtime TIME
, timetype VARCHAR(1)
);
INSERT INTO test VALUES
(1, '2013-01-01', '07:00', 'I'),
(1, '2013-01-01', '07:01', 'I'),
(1, '2013-01-01', '16:00', 'O'),
(1, '2013-01-01', '16:01', 'O'),
(2, '2013-01-01', '07:00', 'I'),
(2, '2013-01-01', '16:00', 'O'),
(1, '2013-01-02', '07:00', 'I'),
(1, '2013-01-02', '16:30', 'O'),
(2, '2013-01-02', '06:30', 'I'),
(2, '2013-01-02', '15:30', 'O'),
(2, '2013-01-02', '16:30', 'I'),
(2, '2013-01-02', '23:30', 'O'),
(3, '2013-01-01', '06:30', 'I'),
(3, '2013-01-02', '16:30', 'O'),
(4, '2013-01-01', '20:30', 'I'),
(4, '2013-01-02', '05:30', 'O'),
(5, '2013-01-01', '20:30', 'O'),
(5, '2013-01-02', '05:30', 'I');
I need to get the the time IN and OUT of each employee, disregarding duplicate entries
and identifying orphan entries (without a matching IN or OUT) so that I can put it in a separate list for notification of missing entries.
so far I have this sql that I modified which I got from Peter Larsson's Island and Gaps solution (link) :
WITH cteIslands ( employeeid, timetype, logdate, logtime, grp)
AS ( SELECT employeeid, timetype, logdate, logtime,
ROW_NUMBER()
OVER ( ORDER BY employeeid, logdate, logtime )
- ROW_NUMBER()
OVER ( ORDER BY timetype, employeeid,
logdate, logtime ) AS grp
FROM timeclock
),
cteGrouped ( employeeid, timetype, logdate, logtime )
AS ( SELECT employeeid, MIN(timetype), logdate,
CASE WHEN MIN(timetype) = 'I'
THEN MIN(logtime)
ELSE MAX(logtime)
END AS logtime
FROM cteIslands
GROUP BY employeeid, logdate, grp
)
select * from cteIslands
order by employeeid, logdate, logtime
The above works fine in satisfying the removal of duplicate entries but now I cant seem to get the orphan entries. I think LEAD or LAG can be used on this but I am new with postgresql. I hope someone here can help me on this.
Edit:
I somehow need to add a new field that I can use so that I know which records are orphaned.
somethine like the table below:
EMPID TYPE LOGDATE LOGTIME ORPHAN_FLAG
1 I 2013-01-01 07:00:00 0
1 O 2013-01-01 16:01:00 0
1 I 2013-01-02 07:00:00 0
1 O 2013-01-02 16:30:00 0
2 I 2013-01-01 07:00:00 0
2 O 2013-01-01 16:00:00 0
2 I 2013-01-02 06:30:00 0
2 O 2013-01-02 15:30:00 0
2 I 2013-01-02 16:30:00 0
2 O 2013-01-02 23:30:00 0
3 I 2013-01-01 06:30:00 0
3 O 2013-01-02 16:30:00 0
4 I 2013-01-01 20:30:00 0
4 O 2013-01-02 05:30:00 0
5 O 2013-01-01 20:30:00 1 <--- NO MATCHING IN
5 I 2013-01-02 05:30:00 1 <--- NO MATCHING OUT
First, I think you should rethink your design a little bit. It makes little sense to record a clock-out entry without having clocked in, and you can use things like partial indexes to ensure that clocked in entries are easy to look up when they don't have a clock out entry.
So I would start by considering moving your storage tables to something like:
CREATE TABLE timeclock(
employeeid INT
, logdate DATE
, logintime TIME
, logouttime time
, timetype VARCHAR(1)
);
The bad news is that if you can't do that, your orphanned report will be quite difficult to make perform well because you are doing a self join where you hope every row in a large table will have a corresponding other entry. This is going to require, at best, two sequential scans on the table and at worst a sequential scan with a nested loop index scan (assuming proper indexes, the alternative, a nested loop sequential scan would be even worse).
Handling this where you have rollover between dates (clock in at 11pm, clock out at 2am) will make this problem very hard to avoid.
Now since you have your CTE's working fine except for orphanned records, my recommendation is that union with another query on the same table looking for those which are not found properly in your current query.