What I'd like to do is to extend RxEndDates until there is no more overlap in the prescriptions. And new extensions do not overlap either.
Context: If Amy takes Humera daily and gets a refill before her current prescription runs out, then add the DaySupply of the 2nd prescription to the first prescription.
sample data:
User Drug RxStartDate DaySupply RxEndDate
Amy Humera 2/12/2017 7 2/18/2017
Amy Humera 2/28/2017 5 3/4/2017 <--Overlap with below
Amy Humera 3/3/2017 5 3/7/2017 <--Overlap with above, need to combine
Amy Humera 3/8/2017 2 3/9/2017
Amy Humera 3/10/2017 7 3/16/2017
Amy Humera 3/17/2017 30 4/15/2017 <--Overlap with all below, combine
Amy Humera 3/22/2017 2 3/23/2017 <--Overlap
Amy Humera 3/24/2017 2 3/25/2017 <--Overlap
Amy Humera 3/31/2017 3 4/2/2017 <--Overlap
Amy Humera 4/7/2017 5 4/11/2017 <--Overlap
Amy Humera 4/13/2017 30 5/12/2017 <--Overlap
So after we combine, we get
User Drug RxStartDate DaySupply RxEndDate
Amy Humera 2/12/2017 7 2/18/2017
Amy Humera 2/28/2017 10 3/9/2017 <-- Combined from above, new overlap
Amy Humera 3/8/2017 2 3/9/2017 <-- Now this overlaps with above
Amy Humera 3/10/2017 7 3/16/2017
Amy Humera 3/17/2017 72 5/27/2017
User Drug RxStartDate DaySupply RxEndDate
Amy Humera 2/12/2017 7 2/18/2017
Amy Humera 2/28/2017 12 3/11/2017 <-- Combined, again, new overlap
Amy Humera 3/10/2017 7 3/16/2017 <-- Now this overlaps with above
Amy Humera 3/17/2017 72 5/27/2017
User Drug RxStartDate DaySupply RxEndDate
Amy Humera 2/12/2017 7 2/18/2017
Amy Humera 2/28/2017 19 3/18/2017 <-- Combined, again, new overlap
Amy Humera 3/17/2017 72 5/27/2017 <-- Now this overlaps with above
User Drug RxStartDate DaySupply RxEndDate
Amy Humera 2/12/2017 7 2/18/2017
Amy Humera 2/28/2017 91 5/29/2017
There is no more overlap…finished!
Is there a way to do this automatically in a loop or something...any ideas?
I think the solution can only be implemented by recursion, as there should be a loop that calculates the accumulated DaySupply and I see no way of doing that with any non-recursive lookups. You can do this with recursive CTE - and according to the official doc, it is available starting with SQL Server 2008.
A possible implementation (I added some test data to challenge it):
DECLARE #test TABLE (
[User] VARCHAR(100),
Drug VARCHAR(100),
RxStartDate DATE,
DaySupply INT,
RxEndDate DATE
)
INSERT #test
VALUES
('Amy', 'Humera', '2/12/2017', '7', '2/18/2017'),
('Amy', 'Humera', '2/28/2017', '5', '3/4/2017'),
('Amy', 'Humera', '3/3/2017', '5', '3/7/2017'),
('Amy', 'Humera', '3/8/2017', '2', '3/9/2017'),
('Amy', 'Humera', '3/10/2017', '7', '3/16/2017'),
('Amy', 'Humera', '3/17/2017', '30', '4/15/2017'),
('Amy', 'Humera', '3/22/2017', '2', '3/23/2017'),
('Amy', 'Humera', '3/24/2017', '2', '3/25/2017'),
('Amy', 'Humera', '3/31/2017', '3', '4/2/2017'),
('Amy', 'Humera', '4/7/2017', '5', '4/11/2017'),
('Amy', 'Humera', '4/13/2017', '30', '5/12/2017'),
('Amy', 'Other', '3/24/2017', '7', '3/30/2017'),
('Amy', 'Other', '3/31/2017', '3', '4/2/2017'),
('Amy', 'Other', '4/7/2017', '5', '4/11/2017'),
('Amy', 'Other', '4/13/2017', '30', '5/12/2017'),
('Joe', 'Humera', '3/24/2017', '8', '3/31/2017'),
('Joe', 'Humera', '3/31/2017', '3', '4/2/2017'),
('Joe', 'Humera', '4/12/2017', '5', '4/16/2017'),
('Joe', 'Humera', '4/23/2017', '30', '5/22/2017'),
('Joe', 'Other', '3/24/2017', '60', '5/23/2017'),
('Joe', 'Other', '3/31/2017', '3', '4/2/2017'),
('Joe', 'Other', '4/7/2017', '5', '4/11/2017'),
('Joe', 'Other', '4/13/2017', '30', '5/12/2017')
-- You can comment this out, it is just to show progress:
SELECT * FROM #test ORDER BY [User], Drug, RxStartDate
DECLARE #test_2 TABLE (
[User] VARCHAR(100),
Drug VARCHAR(100),
RxStartDate_base DATE,
DaySupplyCumulative INT
)
;WITH CTE_RxEndDateExtended as (
SELECT [User], Drug, RxStartDate, DaySupply, DaySupply as DaySupplyCumulative, RxStartDate as RxStartDate_base, RxStartDate as RxStartDateExtended, dateadd (dd, DaySupply, RxStartDate) as RxEndDateExtended
FROM #test
-- WHERE [User] = 'Amy' and Drug = 'Humera' and RxStartDate = '2/28/2017'
UNION ALL
SELECT t.[User], t.Drug, t.RxStartDate, t.DaySupply, c.DaySupplyCumulative + t.DaySupply as DaySupplyCumulative, c.RxStartDate_base, t.RxStartDate as RxStartDateExtended, dateadd (dd, t.DaySupply, c.RxEndDateExtended) as RxEndDateExtended
FROM CTE_RxEndDateExtended as c INNER JOIN #test as t
on c.[User] = t.[User] and c.Drug = t.Drug
and c.RxEndDateExtended >= t.RxStartDate and c.RxStartDateExtended < t.RxStartDate
)
INSERT #test_2
SELECT [User], Drug, RxStartDate_base, MAX (DaySupplyCumulative) as DaySupplyCumulative -- comment this out and use this for debugging: SELECT *
FROM CTE_RxEndDateExtended
GROUP BY [User], Drug, RxStartDate_base -- comment this out for debugging
OPTION (MAXRECURSION 0) -- comment this out and use this for debugging (to avoid infinite loops): OPTION (MAXRECURSION 1000)
-- You can comment this out, it is just to show progress:
SELECT * FROM #test_2
ORDER BY [User], Drug, RxStartDate_base -- comment this out and use this for debugging: ORDER BY [User], Drug, RxStartDate_base, RxStartDate, DaySupplyCumulative
SELECT base.*, dateadd (dd, base.DaySupplyCumulative - 1, base.RxStartDate_base) as RxEndDateCumulative
FROM #test_2 as base LEFT OUTER JOIN #test_2 as filter
on base.[User] = filter.[User] and base.Drug = filter.Drug
and base.RxStartDate_base > filter.RxStartDate_base
and dateadd (dd, base.DaySupplyCumulative, base.RxStartDate_base) <= dateadd (dd, filter.DaySupplyCumulative, filter.RxStartDate_base)
WHERE filter.[User] IS NULL
ORDER BY [User], Drug, RxStartDate_base
Maybe you need to optimize it by simplifying the logic. But be careful not to make an infinite loop. When debugging use OPTION (MAXRECURSION N) with N other than zero.
PS.: this one works also if I add 'Amy', 'Humera', '2/15/2017', '11', '2/25/2017', with which I was criticizing the other solutions... I am curious if it works as you expect - please test!
You can identify where a group starts, using not exists. Then do a cumulative sum to assign a group . . . and aggregate. The following assumes a unique id, which is sort of needed to handle duplicates:
select [user], drug, grp, sum(daysupply), min(RxStartDate), max(RxEndDate)
from (select t.*, sum(flg) over (partition by [user], drug order by RxStartDate) as grp
from (select t.*,
(case when exists (select 1
from #test t2
where t2.[user] = t.[user] and t2.drug = t.drug and
t2.RxStartDate < t.RxStartDate and
t2.RxEndDate >= dateadd(day, -1, t.RxStartDate)
)
then 0 else 1
end) as flg
from #test t
) t
) t
group by [user], drug, grp;
I used a CTE Common Table Expression to perform the grouping. Since some of the days don't technically overlap, I created an alternate end date [RxEndDate_ALT] by adding 1 to the [RxEndDate] in the source_data. Then I was able to group the dates using NOT EXISTS in source_data_grouped. After that, I join back to the source_data_raw to SUM the [DaySupply].
Results
SQL
WITH
source_data_raw
AS
(
SELECT tbl.* FROM (VALUES
( 'Amy', 'Humera', 7, CAST('12-Feb-2017' AS DATE), CAST('18-Feb-2017' AS DATE))
, ( 'Amy', 'Humera', 5, '28-Feb-2017', '04-Mar-2017')
, ( 'Amy', 'Humera', 5, '03-Mar-2017', '07-Mar-2017')
, ( 'Amy', 'Humera', 2, '08-Mar-2017', '09-Mar-2017')
, ( 'Amy', 'Humera', 7, '10-Mar-2017', '16-Mar-2017')
, ( 'Amy', 'Humera', 30, '17-Mar-2017', '15-Apr-2017')
, ( 'Amy', 'Humera', 2, '22-Mar-2017', '23-Mar-2017')
, ( 'Amy', 'Humera', 2, '24-Mar-2017', '25-Mar-2017')
, ( 'Amy', 'Humera', 3, '31-Mar-2017', '15-Apr-2017')
, ( 'Amy', 'Humera', 5, '07-Apr-2017', '16-Apr-2017')
, ( 'Amy', 'Humera', 30, '13-Apr-2017', '27-May-2017')
) tbl ([User], [Drug], [DaySupply], [RxStartDate], [RxEndDate])
)
,
source_data
AS
(
SELECT
sdr.[User]
, sdr.[Drug]
, sdr.[RxStartDate]
, sdr.[RxEndDate]
, [RxEndDate_ALT] = DATEADD(DAY, 1, sdr.[RxEndDate])
FROM
source_data_raw AS sdr
)
,
source_data_grouped
AS
(
SELECT
s1.[User]
, s1.[Drug]
, s1.[RxStartDate]
, [RxEndDate] = MIN(t1.[RxEndDate])
FROM
source_data AS s1
INNER JOIN source_data AS t1 ON s1.[User] = t1.[User] AND s1.[Drug] = t1.[Drug] AND s1.[RxStartDate] <= t1.[RxEndDate_ALT]
AND NOT EXISTS
(
SELECT 1
FROM source_data AS t2
WHERE
1=1
AND t1.[User] = t2.[User]
AND t1.[Drug] = t2.[Drug]
AND t1.[RxEndDate_ALT] >= t2.[RxStartDate]
AND t1.[RxEndDate_ALT] < t2.[RxEndDate_ALT]
)
WHERE
1=1
AND NOT EXISTS
(
SELECT 1
FROM source_data AS s2
WHERE
1=1
AND s1.[User] = s2.[User]
AND s1.[Drug] = s2.[Drug]
AND s1.[RxStartDate] > s2.[RxStartDate]
AND s1.[RxStartDate] <= s2.[RxEndDate_ALT]
)
GROUP BY
s1.[User]
, s1.[Drug]
, s1.[RxStartDate]
)
SELECT
sdg.[User]
, sdg.[Drug]
, [DaySupply] = SUM(sdr.[DaySupply])
, sdg.[RxStartDate]
, sdg.[RxEndDate]
FROM
source_data_grouped AS sdg
INNER JOIN source_data_raw AS sdr ON sdr.[RxStartDate] BETWEEN sdg.[RxStartDate] AND sdg.[RxEndDate]
GROUP BY
sdg.[User]
, sdg.[Drug]
, sdg.[RxStartDate]
, sdg.[RxEndDate]
Related
I have data in my database like this:
Code
meta
meta_ID
date
A
1,2
1
01/01/2022 08:08:08
B
1,2
2
01/01/2022 02:00:00
B
null
2
01/01/1900 02:00:00
C
null
3
01/01/2022 02:00:00
D
8
8
01/01/2022 02:00:00
E
5,6,7
5
01/01/2022 02:00:00
F
1,2
2
01/01/2022 02:00:00
I want to have this with the last date (comparing with day, month year)
Code
meta
meta_ID
list_Code
date
A
2,3
1
A,B,F
01/01/2022 08:08:08
B
1,3
2
A,B,F
01/01/2022 02:00:00
C
null
3
C
01/01/2022 02:00:00
D
8
8
D
01/01/2022 02:00:00
E
5,6,7
5
E
01/01/2022 02:00:00
F
1,2
3
A,B,F
01/01/2022 02:00:00
I want to have the list of code having the same meta group, do you know how to do it with SQL Server?
The code below inputs the 1st table and outputs the 2nd table exactly. The Meta and Date columns had duplicate values, so in the CTE I took the MAX for both fields. Different logic can be applied if needed.
It uses XML Path to merge all rows into one column to create the List_Code column. The Stuff function removes the leading comma (,) delimiter.
CREATE TABLE MetaTable
(
Code VARCHAR(5),
Meta VARCHAR(100),
Meta_ID INT,
Date DATETIME
)
GO
INSERT INTO MetaTable
VALUES
('A', '1,2', '1', '01/01/2022 08:08:08'),
('B', '1,2','2', '01/01/2022 02:00:00'),
('B', NULL,'2', '01/01/1900 02:00:00'),
('C', NULL,'3', '01/01/2022 02:00:00'),
('D', '8','8', '01/01/2022 02:00:00'),
('E', '5,6,7', '5', '01/01/2022 02:00:00'),
('F', '1,2','2', '01/01/2022 02:00:00')
GO
WITH CTE_Meta
AS
(
SELECT
Code,
MAX(Meta) AS 'Meta',
Meta_ID,
MAX(Date) AS 'Date'
FROM MetaTable
GROUP BY
Code,
Meta_ID
)
SELECT
T1.Code,
T1.Meta,
T1.Meta_ID,
STUFF
(
(
SELECT ',' + Code
FROM CTE_Meta T2
WHERE ISNULL(T1.Meta, '') = ISNULL(T2.Meta, '')
FOR XML PATH('')
), 1, 1, ''
) AS 'List_Code',
T1.Date
FROM CTE_Meta T1
ORDER BY 1
I like the first answer using XML. It's very concise. This is more verbose, but might be more flexible if the data can have different meta values spread about in different records. The CAST to varchar(12) in various places is just for the display. I use STRING_AGG and STRING_SPLIT instead of XML.
WITH TestData as (
SELECT t.*
FROM (
Values
('A', '1,2', '1', '01/01/2022 08:08:08'),
('B', '1,2', '2', '01/01/2022 02:00:00'),
('B', null, '2', '01/01/1900 02:00:00'),
('C', null, '3', '01/01/2022 02:00:00'),
('D', '8', '8', '01/01/2022 02:00:00'),
('E', '5,6,7', '5', '01/01/2022 02:00:00'),
('F', '1,2', '2', '01/01/2022 02:00:00'),
('G', '16', '17', '01/01/2022 02:00:00'),
('G', null, '17', '01/02/2022 03:00:00'),
('G', '19', '18', '01/03/2022 04:00:00'),
('G', '19', '18', '01/03/2022 04:00:00'),
('G', '20', '19', '01/04/2022 05:00:00'),
('G', '20', '20', '01/05/2022 06:00:00')
) t (Code, meta, meta_ID, date)
), CodeLookup as ( -- used to find the Code from the meta_ID
SELECT DISTINCT meta_ID, Code
FROM TestData
), Normalized as ( -- split out the meta values, one per row
SELECT t.Code, s.Value as [meta], meta_ID, [date]
FROM TestData t
OUTER APPLY STRING_SPLIT(t.meta, ',') s
), MetaLookup as ( -- used to find the distinct list of meta values for a Code
SELECT n.Code, CAST(STRING_AGG(n.meta, ',') WITHIN GROUP ( ORDER BY n.meta ASC ) as varchar(12)) as [meta]
FROM (
SELECT DISTINCT Code, meta
FROM Normalized
WHERE meta is not NULL
) n
GROUP BY n.Code
), MetaIdLookup as ( -- used to find the distinct list of meta_ID values for a Code
SELECT n.Code, CAST(STRING_AGG(n.meta_ID, ',') WITHIN GROUP ( ORDER BY n.meta_ID ASC ) as varchar(12)) as [meta_ID]
FROM (
SELECT DISTINCT Code, meta_ID
FROM Normalized
) n
GROUP BY n.Code
), ListCodeLookup as ( -- for every code, get all codes for the meta values
SELECT l.Code, CAST(STRING_AGG(l.lookupCode, ',') WITHIN GROUP ( ORDER BY l.lookupCode ASC ) as varchar(12)) as [list_Code]
FROM (
SELECT DISTINCT n.Code, c.Code as [lookupCode]
FROM Normalized n
INNER JOIN CodeLookup c
ON c.meta_ID = n.meta
UNION -- every record needs it's own code in the list_code?
SELECT DISTINCT n.Code, n.Code as [lookupCode]
FROM Normalized n
) l
GROUP BY l.Code
)
SELECT t.Code, m.meta, mi.meta_ID, lc.list_Code, t.[date]
FROM (
SELECT Code, MAX([date]) as [date]
FROM TestData
GROUP BY Code
) t
LEFT JOIN MetaLookup m
ON m.Code = t.Code
LEFT JOIN MetaIdLookup mi
ON mi.Code = t.Code
LEFT JOIN ListCodeLookup lc
ON lc.Code = t.Code
Code meta meta_ID list_Code date
---- ------------ ------------ ------------ -------------------
A 1,2 1 A,B,F 01/01/2022 08:08:08
B 1,2 2 A,B,F 01/01/2022 02:00:00
C NULL 3 C 01/01/2022 02:00:00
D 8 8 D 01/01/2022 02:00:00
E 5,6,7 5 E 01/01/2022 02:00:00
F 1,2 2 A,B,F 01/01/2022 02:00:00
G 16,19,20 17,18,19,20 G 01/05/2022 06:00:00
I have 2 tables in the oracle 12c database with the below structure. Table A has the incoming data from an application with modified date timestamps,
each day we may get around 50,000 rows in table A. the goal is to use the table table A's data and insert into the final target table B(usually has billions of rows)
by using table A's data as the driving data set.
A record needs to be inserted/merged in table B only when there is a change in the incoming dataset attributes.
basically the purpose is to track the history/journey of a given product with valid timestamps only when there are changes in its attributes such as state and zip_cd.
See table structures below
Table A ( PRODUCT_ID, STATE, ZIP_CD, Modified_dt)
'abc', 'MN', '123', '3/5/2020 12:01:00 AM'
'abc', 'MN', '123', '3/5/2020 6:01:13 PM'
'abc', 'IL', '223', '3/5/2020 7:01:15 PM'
'abc', 'OH', '333', '3/5/2020 6:01:16 PM'
'abc', 'NY', '722', '3/5/2020 4:29:00 PM'
'abc', 'KS', '444', '3/5/2020 4:31:41 PM'
'bbc', 'MN', '123', '3/19/2020 2:47:08 PM'
'bbc', 'IL', '223', '3/19/2020 2:50:37 PM'
'ccb', 'MN', '123', '3/21/2020 2:56:24 PM'
'dbd', 'KS', '444', '6/20/2020 12:00:00 AM'
Target Table B (SEQUENCE_KEY,PRODUCT_ID,STATE, ZIP_CD, Valid_From, Valid_To, LATEST_FLAG)
'1', 'abc', 'AR', '999', '3/3/2020 12:00:00 AM', '3/3/2020 6:01:13 PM', 'N'
'2', 'abc', 'AR', '555', '3/3/2020 6:01:14 PM', '3/3/2020 6:01:14 PM', 'N'
'3', 'abc', 'CA', '565', '3/3/2020 6:01:15 PM', '3/4/2020 4:28:59 PM', 'N'
'4', 'abc', 'CA', '777', '3/4/2020 4:29:00 PM', '12/31/2099', 'Y'
'5', 'bbc', 'MN', '123', '3/4/2020 4:31:41 PM', '3/19/2020 2:47:07 PM', 'N'
'6', 'bbc', 'MN', '666', '3/18/2020 2:47:08 PM', '3/19/2020 2:50:36 PM', 'N'
'7', 'bbc', 'MN', '777', '3/18/2020 2:50:37 PM', '12/31/2099', , 'Y'
'8', 'ccb', 'MN', '123', '3/20/2020 2:56:24 PM', '12/31/2099', 'Y'
Rules for populating data into table B:
the primary key on the output table is product_id and valid_from field.
the incoming data from table A will always have modified dt timestamps greather than the existing table.
inorder to insert data, we will have to compare latest_flag = 'Y' record from target table B and the incoming data from table A and only when there is a change
in the attributes state and zip_cd, then a record needs to be inserted in table B from table A. valid_to column is a calcuated field which is always 1 second lower than the
next row's valid from date, and for the latest row its defaulted to '12/31/2099'. Similary, latest_flag column is a calcuated column that indicates the current row of a given product_id
In the incoming dataset if there are multiple rows without any changes compared to the previous row or existing data in table B(latest_flag='Y') then
those should be ignored as well. as an example row 2 and row 9 from Table A are ignored as there are no changes in the attributes state, zip_cd when compared to their previous rows for that product.
Based on the above rules, I need to merge the table A data into table B and the final ouput looks like below
Table B (SEQUENCE_KEY,PRODUCT_ID,STATE, ZIP_CD, Valid_From, Valid_To, LATEST_FLAG)
'1', 'abc', 'AR', '999', '3/3/2020 12:00:00 AM', '3/3/2020 6:01:13 PM', 'N'
'2', 'abc', 'AR', '555', '3/3/2020 6:01:14 PM' '3/3/2020 6:01:14 PM', 'N'
'3', 'abc', 'CA', '565', '3/3/2020 6:01:15 PM' '3/4/2020 4:28:59 PM', 'N'
'4', 'abc', 'CA', '777', '3/4/2020 4:29:00 PM' '3/5/2020 12:00:00 AM', 'N'
'5', 'abc', 'MN', '123', '3/5/2020 12:01:00 AM', '3/5/2020 7:01:14 PM', 'N'
'6', 'abc', 'IL', '223' '3/5/2020 7:01:15 PM', '3/5/2020 6:01:15 PM', 'N'
'7', 'abc', 'OH', '333', '3/5/2020 6:01:16 PM', '3/5/2020 4:28:59 PM', 'N'
'8', 'abc', 'NY', '722', '3/5/2020 4:29:00 PM', '3/5/2020 4:31:40 PM', 'N'
'9', 'abc', 'KS', '444', '3/5/2020 4:31:41 PM', '12/31/2099', 'Y'
'10', 'bbc', 'MN', '123', '3/4/2020 4:31:41 PM' '3/19/2020 2:47:07 PM', 'N'
'11', 'bbc', 'MN', '666', '3/18/2020 2:47:08 PM' '3/19/2020 2:50:36 PM', 'N'
'12', 'bbc', 'MN', '777', '3/18/2020 2:50:37 PM' '3/19/2020 2:47:07 PM', 'N'
'13', 'bbc', 'MN', '123', '3/19/2020 2:47:08 PM' '3/19/2020 2:50:36 PM', 'N'
'14', 'bbc', 'IL', '223', '3/19/2020 2:50:37 PM' '12/31/2099', 'Y'
'15', 'ccb', 'MN', '123', '3/20/2020 2:56:24 PM' '12/31/2099', 'Y'
'16', 'dbd', 'KS', '444', '6/20/2020 12:00:00 AM' '12/31/2099', 'Y'
Looking for suggestions to solve this problem.
LIVE SQL link:
https://livesql.oracle.com/apex/livesql/s/kfbx7dwzr3zz28v6eigv0ars0
Thank you.
I tried to see how to do this in SQL but it was impossible to me because of the logic and also the sequence_key reset that you have in your desired ouput.
So, here my suggestion in PL/SQL
SQL> select * from table_a ;
PRODUCT_ID STATE ZIP_CD MODIFIED_
------------------------------ ------------------------------ ------------------------------ ---------
abc MN 123 05-MAR-20
abc MN 123 05-MAR-20
abc IL 223 05-MAR-20
abc OH 333 05-MAR-20
abc NY 722 05-MAR-20
abc KS 444 05-MAR-20
bbc MN 123 19-MAR-20
bbc IL 223 19-MAR-20
ccb MN 123 19-MAR-20
dbd KS 444 19-MAR-20
10 rows selected.
SQL> select * from table_b ;
SEQUENCE_KEY PRODUCT_ID STATE ZIP_CD VALID_FRO VALID_TO L
------------ ------------------------------ ------------------------------ ------------------------------ --------- --------- -
1 abc AR 999 05-MAR-20 05-MAR-20 N
2 abc AR 555 05-MAR-20 05-MAR-20 N
3 abc CA 565 05-MAR-20 05-MAR-20 N
4 abc CA 777 05-MAR-20 31-DEC-99 Y
5 bbc MN 123 05-MAR-20 05-MAR-20 N
6 bbc MN 666 05-MAR-20 05-MAR-20 N
7 bbc MN 777 19-MAR-20 31-DEC-99 Y
8 ccb MN 123 19-MAR-20 31-DEC-99 Y
8 rows selected.
Now, I used this piece of PL_SQL code
declare
type typ_rec_set IS RECORD
(
PRODUCT_ID VARCHAR2(30 CHAR),
STATE VARCHAR2(30 CHAR),
ZIP_CD VARCHAR2(30 CHAR),
VALID_FROM DATE ,
VALID_TO DATE ,
LATEST_FLAG VARCHAR2(1 CHAR)
);
type typ_rec_tab is TABLE OF typ_rec_set;
l_hdr_tab typ_rec_tab;
begin
SELECT product_id
,state
,zip_cd
,valid_from
,valid_to
,CASE WHEN valid_to = DATE '2099-12-31' THEN 'Y' ELSE 'N' END latest_flag
BULK COLLECT INTO l_hdr_tab
FROM
(
SELECT a.product_id
,a.state
,a.zip_cd
,a.modified_dt valid_from
,NVL(((LEAD (a.modified_dt,1) OVER (PARTITION BY a.product_id ORDER BY a.modified_dt)) - INTERVAL '1' SECOND),DATE '2099-12-31' )valid_to
,CASE
WHEN ( ( b.product_id IS NOT NULL
AND a.state != b.state
AND a.zip_cd != b.zip_cd)
OR b.product_id IS NULL
) THEN
1
ELSE
0
END insert_flag
FROM table_a a
LEFT OUTER JOIN table_b b
ON a.product_id = b.product_id
AND b.latest_flag = 'Y'
WHERE (a.modified_dt >= b.valid_from OR b.product_id IS NULL)
ORDER BY a.product_id,a.modified_dt
)
WHERE insert_flag != 0 ;
--loop
FOR i IN l_hdr_tab.first .. l_hdr_tab.last
LOOP
-- begin block
begin
insert into table_b
(
sequence_key ,
PRODUCT_ID ,
STATE ,
ZIP_CD ,
VALID_FROM ,
VALID_TO ,
LATEST_FLAG
)
values
(
( select max(sequence_key)+1 from table_b ),
l_hdr_tab(i).product_id ,
l_hdr_tab(i).state ,
l_hdr_tab(i).zip_cd ,
l_hdr_tab(i).valid_from ,
l_hdr_tab(i).valid_to ,
l_hdr_tab(i).latest_flag
);
end;
end loop;-- reset sequence base of row_number over product_id valid_from
commit;
-- reset sequence
merge into table_b t
using ( select sequence_key ,
PRODUCT_ID ,
STATE ,
ZIP_CD ,
VALID_FROM ,
VALID_TO ,
LATEST_FLAG ,
row_number() over ( order by product_id,valid_from ) as new_seq
from table_b ) s
on ( s.rowid = t.rowid )
when matched then
update set t.sequence_key = s.new_seq where t.sequence_key != s.new_seq ;
commit;
exception when others then raise;
end;
/
Then I run it
SQL> host cat proc.sql
declare
type typ_rec_set IS RECORD
(
PRODUCT_ID VARCHAR2(30 CHAR),
STATE VARCHAR2(30 CHAR),
ZIP_CD VARCHAR2(30 CHAR),
VALID_FROM DATE ,
VALID_TO DATE ,
LATEST_FLAG VARCHAR2(1 CHAR)
);
type typ_rec_tab is TABLE OF typ_rec_set;
l_hdr_tab typ_rec_tab;
begin
SELECT product_id
,state
,zip_cd
,valid_from
,valid_to
,CASE WHEN valid_to = DATE '2099-12-31' THEN 'Y' ELSE 'N' END latest_flag
BULK COLLECT INTO l_hdr_tab
FROM
(
SELECT a.product_id
,a.state
,a.zip_cd
,a.modified_dt valid_from
,NVL(((LEAD (a.modified_dt,1) OVER (PARTITION BY a.product_id ORDER BY a.modified_dt)) - INTERVAL '1' SECOND),DATE '2099-12-31' )valid_to
,CASE
WHEN ( ( b.product_id IS NOT NULL
AND a.state != b.state
AND a.zip_cd != b.zip_cd)
OR b.product_id IS NULL
) THEN
1
ELSE
0
END insert_flag
FROM table_a a
LEFT OUTER JOIN table_b b
ON a.product_id = b.product_id
AND b.latest_flag = 'Y'
WHERE (a.modified_dt >= b.valid_from OR b.product_id IS NULL)
ORDER BY a.product_id,a.modified_dt
)
WHERE insert_flag != 0 ;
--loop
FOR i IN l_hdr_tab.first .. l_hdr_tab.last
LOOP
-- begin block
begin
insert into table_b
(
sequence_key ,
PRODUCT_ID ,
STATE ,
ZIP_CD ,
VALID_FROM ,
VALID_TO ,
LATEST_FLAG
)
values
(
( select max(sequence_key)+1 from table_b ),
l_hdr_tab(i).product_id ,
l_hdr_tab(i).state ,
l_hdr_tab(i).zip_cd ,
l_hdr_tab(i).valid_from ,
l_hdr_tab(i).valid_to ,
l_hdr_tab(i).latest_flag
);
end;
end loop;-- reset sequence base of row_number over product_id valid_from
commit;
-- reset sequence
merge into table_b t
using ( select sequence_key ,
PRODUCT_ID ,
STATE ,
ZIP_CD ,
VALID_FROM ,
VALID_TO ,
LATEST_FLAG ,
row_number() over ( order by product_id,valid_from ) as new_seq
from table_b ) s
on ( s.rowid = t.rowid )
when matched then
update set t.sequence_key = s.new_seq where t.sequence_key != s.new_seq ;
commit;
exception when others then raise;
end;
/
SQL> #proc.sql
PL/SQL procedure successfully completed.
SQL> select * from table_b order by sequence_key ;
SEQUENCE_KEY PRODUCT_ID STATE ZIP_CD VALID_FRO VALID_TO L
------------ ------------------------------ ------------------------------ ------------------------------ --------- --------- -
1 abc AR 999 05-MAR-20 05-MAR-20 N
2 abc NY 722 05-MAR-20 05-MAR-20 N
3 abc CA 777 05-MAR-20 31-DEC-99 Y
4 abc KS 444 05-MAR-20 05-MAR-20 N
5 abc MN 123 05-MAR-20 05-MAR-20 N
6 abc AR 555 05-MAR-20 05-MAR-20 N
7 abc CA 565 05-MAR-20 05-MAR-20 N
8 abc OH 333 05-MAR-20 05-MAR-20 N
9 abc IL 223 05-MAR-20 31-DEC-99 Y
10 bbc MN 666 05-MAR-20 05-MAR-20 N
11 bbc MN 123 05-MAR-20 05-MAR-20 N
SEQUENCE_KEY PRODUCT_ID STATE ZIP_CD VALID_FRO VALID_TO L
------------ ------------------------------ ------------------------------ ------------------------------ --------- --------- -
12 bbc MN 777 19-MAR-20 31-DEC-99 Y
13 bbc IL 223 19-MAR-20 31-DEC-99 Y
14 ccb MN 123 19-MAR-20 31-DEC-99 Y
15 dbd KS 444 19-MAR-20 31-DEC-99 Y
15 rows selected.
SQL>
Just let me know any doubts you might have. I know that for sure I miss something ;)
UPDATE
I realized that I have an useless operation in the loop, the calculation of the maxvalue for the field SEQUENCE_KEY. I have a better version of the procedure here:
declare
type typ_rec_set IS RECORD
(
PRODUCT_ID VARCHAR2(30 CHAR),
STATE VARCHAR2(30 CHAR),
ZIP_CD VARCHAR2(30 CHAR),
VALID_FROM DATE ,
VALID_TO DATE ,
LATEST_FLAG VARCHAR2(1 CHAR)
);
type typ_rec_tab is TABLE OF typ_rec_set;
l_hdr_tab typ_rec_tab;
r pls_integer := 1;
vseq pls_integer;
begin
-- calculate value sequence
select max(sequence_key) into vseq from table_b ;
SELECT product_id
,state
,zip_cd
,valid_from
,valid_to
,CASE WHEN valid_to = DATE '2099-12-31' THEN 'Y' ELSE 'N' END latest_flag
BULK COLLECT INTO l_hdr_tab
FROM
(
SELECT a.product_id
,a.state
,a.zip_cd
,a.modified_dt valid_from
,NVL(((LEAD (a.modified_dt,1) OVER (PARTITION BY a.product_id ORDER BY a.modified_dt)) - INTERVAL '1' SECOND),DATE '2099-12-31' )valid_to
,CASE
WHEN ( ( b.product_id IS NOT NULL
AND a.state != b.state
AND a.zip_cd != b.zip_cd)
OR b.product_id IS NULL
) THEN
1
ELSE
0
END insert_flag
FROM table_a a
LEFT OUTER JOIN table_b b
ON a.product_id = b.product_id
AND b.latest_flag = 'Y'
WHERE (a.modified_dt >= b.valid_from OR b.product_id IS NULL)
ORDER BY a.product_id,a.modified_dt
)
WHERE insert_flag != 0 ;
--loop
FOR i IN l_hdr_tab.first .. l_hdr_tab.last
LOOP
-- begin block
vseq := vseq + r ;
begin
insert into table_b
(
sequence_key ,
PRODUCT_ID ,
STATE ,
ZIP_CD ,
VALID_FROM ,
VALID_TO ,
LATEST_FLAG
)
values
(
vseq ,
l_hdr_tab(i).product_id ,
l_hdr_tab(i).state ,
l_hdr_tab(i).zip_cd ,
l_hdr_tab(i).valid_from ,
l_hdr_tab(i).valid_to ,
l_hdr_tab(i).latest_flag
);
end;
r := r + 1;
end loop;-- reset sequence base of row_number over product_id valid_from
commit;
-- reset sequence
merge into table_b t
using ( select sequence_key ,
PRODUCT_ID ,
STATE ,
ZIP_CD ,
VALID_FROM ,
VALID_TO ,
LATEST_FLAG ,
row_number() over ( order by product_id,valid_from ) as new_seq
from table_b ) s
on ( s.rowid = t.rowid )
when matched then
update set t.sequence_key = s.new_seq where t.sequence_key != s.new_seq ;
commit;
exception when others then raise;
end;
/
I would give my first try with the understanding I have. The cursor as source for inserting to TableB would look like,
SELECT product_id
,state
,zip_cd
,valid_from
,valid_to
,CASE WHEN valid_to = DATE '2099-12-31' THEN 'Y' ELSE 'N' END latest_flag
FROM
(
SELECT a.product_id
,a.state
,a.zip_cd
,a.modified_dt valid_from
,NVL(((LEAD (a.modified_dt,1) OVER (PARTITION BY a.product_id ORDER BY a.modified_dt)) - INTERVAL '1' SECOND),DATE '2099-12-31' )valid_to
,CASE
WHEN ( ( b.product_id IS NOT NULL
AND a.state != b.state
AND a.zip_cd != b.zip_cd)
OR b.product_id IS NULL
) THEN
1
ELSE
0
END insert_flag
FROM table_a a
LEFT OUTER JOIN table_b b
ON a.product_id = b.product_id
AND b.latest_flag = 'Y'
WHERE (a.modified_dt >= b.valid_from OR b.product_id IS NULL)
ORDER BY a.product_id,a.modified_dt
)
WHERE insert_flag != 0;
LEFT OUTER JOIN to check if the record exists in TableB and the WHERE clause checks for modified_date greater than the valid_from for the latest_flag = 'Y'
Inner Case statement will tell us whether the attributes are changed or not and in case the product_id is not present it also consider it as first entry and the insert_flag will be 1
Outer case statement provides the valid_to in case of last record as per modified date column to 31-12-2099
Not completely clear with respect to point 3 but I believe the case statement is what we need for.
At the end I didn't consider the performance problem here. you can think of converting it to PL/SQL block and other collection methods to process data in chunk.
Also I have here one question , what happens to the record with product id "dbd" (which is a new entry and doesn't exists in TableB) if present multiple times in tableA ?
This is Slowly Changing Dimensions (SCD) Type 2 problem in data warehousing (Kimball approach). You can see a short definitions here
https://www.oracle.com/webfolder/technetwork/tutorials/obe/db/10g/r2/owb/owb10gr2_gs/owb/lesson3/slowlychangingdimensions.htm
Support for SCD Type 2 is available in Enterprise ETL option of OWB 10gR2 only as described in the above link. If that's not available and you have to use PL/SQL, you can check out the following approach. Unfortunately, Oracle PL/SQL does not offer a straight forward solution unlike MS SQL.
Implementing Type 2 SCD in Oracle
---Table A:
CREATE TABLE TABLE_A (
BadgeNum varchar(10), --This is a persons unique number.
Gender varchar(2), ---Gender 'F' or 'M'
Date_Sent date, --- Date of Questionnaire
Type_Status varchar(3), --- A person can be Single or Married
Living_State varchar(2), --Person's living state
S_Type_Type int , -- Value can be 1 or 0
Recipient_num int, --Key to create grouping to put each person in category. Using Case Statement for this 'Billing_Transaction' or 'Online_Transaction'
MONTH,
YEAR
);
-- Table B:
CREATE TABLE TABLE_B
(
BadgeNumber varchar, --- This is a persons unique number
RespondedYear smalldate, --- Year a person responded
RespondedMonth smalldate, --- Month a person responded
Date_Process -- This is in YYMMWW (Year,Month,Week)
Value money, --- Cost of the purchase
State --- Person's resides
)
----Sample Data for TABLE_A---
INSERT INTO TABLE_A
VALUES ('11E2', 'F', '07/20/2020', 'Single', 'NV' , '1' , '00001', '07', '2020');
VALUES ('11E3', 'M', '06/30/2020', 'Married', 'AZ' , '1' , '00001', '06', '2020');
VALUES ('11E4', 'F', '05/22/2019', 'Single', 'TN', '1' , '00001', '05', '2019');
VALUES ('11E5', 'M', '05/30/2018', 'Married', 'NY' , '1' , '00001', '05', '2018' );
VALUES ('11E6', 'F', '03/25/2017', 'Single', 'CA' , '1' , '00001', '03', '2017');
VALUES ('11E7', 'M', '02/27/2017', 'Married', 'VT' , '1' , '00002', '02', '2017');
VALUES ('11E8', 'F', '03/01/2018', 'Single', 'AL', '1' , '00002', '03', '2018');
VALUES ('11E2', 'F', '07/20/2020', 'Single', 'NV' , '1' , '00001', '07', '2020');
VALUES ('11E3', 'M', '06/30/2020', 'Married', 'AZ' , '1' , '00001', '06', '2020');
VALUES ('11E4', 'F', '05/22/2019', 'Single', 'TN', '1' , '00001', '05', '2019');
VALUES ('11E5', 'M', '05/30/2018', 'Married', 'NY' , '1' , '00001', '05', '2018' );
VALUES ('11E6', 'F', '03/25/2017', 'Single', 'CA' , '1' , '00001', '03', '2017');
VALUES ('11E7', 'M', '02/27/2017', 'Married', 'VT' , '1' , '00002', '02', '2017');
VALUES ('11E8', 'F', '03/01/2018', 'Single', 'AL', '1' , '00002', '03', '2018');
---SampleDate for TABLE_B
INSERT INTO TABLE_B
VALUES ('11E3', '2020', '7', '200208', '200', 'AZ' );
VALUES ('11E2', '2018', '5', '180520', '300', 'NV' );
VALUES ('11E4', '2018', '3', '180311', '200', 'TN' ) ;
VALUES ('11E5', '2020', '6', '200416', '800', 'NY' );
VALUES ('11E6', '2019', ' 5', '191250', '500', 'CA' );
VALUES ('11E7', '2018', '3', '180313', '100', 'VT' );
VALUES ('11E8', '2019', ' 5', '191251', '1000', 'AL' );
----Queries for Table A---
SELECT
MOnth,
Year,
BadgeNum,
Gender,
Date_Sent,
Type_Status,
Living_State,
S_Type_Type,
,CASE WHEN Recipient_num IN ('00001') THEN 'Billing_Transaction'
WHEN Recipient_num IN ('00002') THEN 'Online_Transaction'
END as [Category]
,COUNT(DISTINCT case when [S_Type_Type] = '1' THEN 'BadgeNum' ELSE NULL END) as [Sent_Questions]
,COUNT([BadgeNum]) as [NumberOfBadges]
FROM [TABLE_A]
GROUP bY
MOnth,
Year,
BadgeNum,
Gender,
Date_Sent,
Type_Status,
Living_State,
S_Type_Type,
--- Queries for Table B----
Table B-
=========
SELECT
COUNT([BadgeNumber]) as [Total_Number_Answered] ,
RespondedYear,
RespondedMonth,
Date_Process,
Value,
State
FROM TABLE_B
GROUP BY
RespondedYear,
RespondedMonth,
Date_Process,
Value,
State
---- Output From Two Table:
Result of the TABLE_A
===================
YEAR MONTH Sent_Questions
===== ======= =====================
2017 03 2
2017 02 0
2018 03 0
2018 05 2
2019 05 2
2020 07 2
2020 06 2
Result of the TABLE_B
========================
YEAR MONTH Total_Number_Answered
===== ======= =====================
2017 03 0
2017 02 0
2018 03 1
2018 05 1
2019 05 2
2020 07 1
2020 06 1
---This is the result I need -----
Expected Result:
==============================
YEAR MONTH Total_Number_Answered Sent_Questions
===== ======= ===================== ===============
2017 03 0 2
2017 02 0 0
2018 03 1 0
2018 05 1 2
2019 05 2 2
2020 07 1 2
2020 06 1 2
Here is the query that I am stuck on: I want to get the same numbers as the Expected Result table. I would like to join Table A and Table B with the Year and Month in some fashion without losing any data. I am having trouble with it.
Here is the query that I have started to work with:
SELECT
MOnth,
Year,
BadgeNum,
Gender,
Date_Sent,
Type_Status,
Living_State,
S_Type_Type,
,CASE WHEN Recipient_num IN ('00001') THEN 'Billing_Transaction'
WHEN Recipient_num IN ('00002') THEN 'Online_Transaction'
END as [Category]
,COUNT(DISTINCT case when [S_Type_Type] = '1' THEN 'BadgeNum' ELSE NULL END) as [Sent_Questions]
,COUNT([BadgeNum]) as [NumberOfBadges]
t2.[Counts_Display]
FROM [TABLE_A] as t1
FULL OUTER JOIN
(
SELECT
COUNT ([BadgeNumber]) as [Counts_Display]
,LEFT([RespondedYear],4)+LEFT([RespondedMonth],2) as [CombinedDates]
,VALUE
,State
) as t2
ON (t2.[BadgeNumber] = t1.[BadgeNum])
and t2.[RespondedYear] = t1.[Year]
and t1.[RespondedMonth] = t2.[Month]
GROUP bY
MOnth,
Year,
BadgeNum,
Gender,
Date_Sent,
Type_Status,
Living_State,
S_Type_Type,
t2.[Counts_Display]
I need to be able to create a Trailing Twelve Month report using SQL (PostgreSQL) - essentially a window/rolling 12 month sum that sums up the current month's totals + the previous 11 months for each month.
I have this table:
CREATE TABLE order_test(
order_id text,
sale_date date,
delivery_date date,
customer_id text,
vendor_id text,
order_total float);
with these values:
insert into order_test
values ('1', '2016-06-01', '2016-06-10', '2', '3', 200.10),
('2', '2016-06-02', '2016-06-11', '2', '4', 150.50),
('3', '2016-07-02', '2016-07-11', '5', '4', 100.50),
('4', '2016-07-02', '2016-07-11', '1', '4', 150.50),
('5', '2016-07-02', '2016-07-11', '1', '4', 150.50),
('6', '2016-08-02', '2016-08-11', '6', '4', 300.50),
('7', '2016-08-02', '2016-08-11', '6', '4', 150.50),
('8', '2016-09-02', '2016-09-11', '1', '4', 150.50),
('9', '2016-10-02', '2016-10-11', '1', '4', 150.50),
('10', '2016-11-02', '2016-11-11', '1', '4', 150.50),
('11', '2016-12-02', '2016-12-11', '6', '4', 150.50),
('12', '2017-01-02', '2017-01-11', '7', '4', 150.50),
('13', '2017-01-02', '2017-01-11', '1', '4', 150.50),
('14', '2017-01-02', '2017-01-11', '1', '4', 100.50),
('15', '2017-02-02', '2017-02-11', '1', '4', 150.50),
('16', '2017-02-02', '2017-02-11', '1', '4', 150.50),
('17', '2017-03-02', '2017-03-11', '2', '4', 150.50),
('18', '2017-03-02', '2017-03-11', '2', '4', 150.50),
('19', '2017-04-02', '2017-04-11', '6', '4', 120.50),
('20', '2017-05-02', '2017-05-11', '1', '4', 150.50),
('21', '2017-06-02', '2017-06-11', '2', '4', 150.50),
('22', '2017-06-02', '2017-06-11', '1', '4', 130.50),
('23', '2017-07-02', '2017-07-11', '1', '4', 150.50),
('24', '2017-07-02', '2017-07-11', '5', '4', 200.50),
('25', '2017-08-02', '2017-08-11', '1', '4', 150.50),
('26', '2017-09-02', '2017-09-11', '2', '4', 100.50),
('27', '2017-09-02', '2017-10-11', '1', '4', 150.50);
These are individual sales. For each month, I need the previous 11 months + that month's total (sale month).
I've tried a window calculation like this:
select date_trunc('month', sale_date) as sale_month,
sum(order_total) over w as total_sales
from order_test
where (delivery_date < current_date) and
(sale_date >= (date_trunc('month', current_date) - interval '1 year'))
window w as (Partition by date_trunc('month', sale_date)
order by sale_date
rows between current row and 11 following)
but it's giving me this:
sale_month total_sales
1 01.09.2016 00:00:00 150,5
2 01.10.2016 00:00:00 150,5
3 01.11.2016 00:00:00 150,5
4 01.12.2016 00:00:00 150,5
5 01.01.2017 00:00:00 401,5
6 01.01.2017 00:00:00 251
7 01.01.2017 00:00:00 100,5
8 01.02.2017 00:00:00 301
9 01.02.2017 00:00:00 150,5
10 01.03.2017 00:00:00 301
11 01.03.2017 00:00:00 150,5
12 01.04.2017 00:00:00 120,5
13 01.05.2017 00:00:00 150,5
14 01.06.2017 00:00:00 281
15 01.06.2017 00:00:00 130,5
16 01.07.2017 00:00:00 351
17 01.07.2017 00:00:00 200,5
18 01.08.2017 00:00:00 150,5
19 01.09.2017 00:00:00 100,5
where there should only be one row per month.
In inner query derived table, you need to truncate Sale_Date column to month precision using date_trunc and group by the resulting column to get the Month_total sales and then in outer query, use cumulative window sum function on month_total sales data ordering by Sale_Month to get your desired result as below.
SELECT sale_Month
,month_total
,sum(month_total) OVER (
ORDER BY sale_Month ASC rows BETWEEN 11 preceding
AND CURRENT row
) AS Sum_Series
FROM (
SELECT date_trunc('month', Sale_Date) AS Sale_Month
,sum(Order_Total) AS Month_Total
FROM order_test
GROUP BY 1
ORDER BY 1
) t
Kindly note that AND CURRENT row is optional as cumulative window function includes the current row by default, so the query can be rewritten as below.
SELECT sale_Month
,month_total
,sum(month_total) OVER (
ORDER BY sale_Month ASC rows 11 preceding
) AS Sum_Series
FROM (
SELECT date_trunc('month', Sale_Date) AS Sale_Month
,sum(Order_Total) AS Month_Total
FROM order_test
GROUP BY 1
ORDER BY 1
) t
Result:
sale_month month_total sum_series
----------------------------------------------
2016-06-01T00:00:00Z 350.6 350.6
2016-07-01T00:00:00Z 401.5 752.1
2016-08-01T00:00:00Z 451 1203.1
2016-09-01T00:00:00Z 150.5 1353.6
2016-10-01T00:00:00Z 150.5 1504.1
2016-11-01T00:00:00Z 150.5 1654.6
2016-12-01T00:00:00Z 150.5 1805.1
2017-01-01T00:00:00Z 401.5 2206.6
2017-02-01T00:00:00Z 301 2507.6
2017-03-01T00:00:00Z 301 2808.6
2017-04-01T00:00:00Z 120.5 2929.1
2017-05-01T00:00:00Z 150.5 3079.6
2017-06-01T00:00:00Z 281 3010
2017-07-01T00:00:00Z 351 2959.5
2017-08-01T00:00:00Z 150.5 2659
2017-09-01T00:00:00Z 251 2759.5
You can check the demo here
If I understand it correctly, you want all months to have cumulative data for the last 11 months. But the first 11 rows won't have preceding 11 entries to calculate the rolling sum. But you have mentioned that all months should have a cumulative total.
So I believe you are looking for something like this.
with x as (
select date_trunc('month', sale_date) as sale_month,sum(order_total) as monthly_order_total from order_test
group by 1 order by 1 asc)
select sale_month, monthly_order_total,
sum(monthly_order_total ) over (order by sale_month asc rows between 11 preceding and current row)
from x
I have a table:
CREATE TABLE timeclock(
employeeid INT
, logdate DATE
, logtime TIME
, timetype VARCHAR(1)
);
INSERT INTO test VALUES
(1, '2013-01-01', '07:00', 'I'),
(1, '2013-01-01', '07:01', 'I'),
(1, '2013-01-01', '16:00', 'O'),
(1, '2013-01-01', '16:01', 'O'),
(2, '2013-01-01', '07:00', 'I'),
(2, '2013-01-01', '16:00', 'O'),
(1, '2013-01-02', '07:00', 'I'),
(1, '2013-01-02', '16:30', 'O'),
(2, '2013-01-02', '06:30', 'I'),
(2, '2013-01-02', '15:30', 'O'),
(2, '2013-01-02', '16:30', 'I'),
(2, '2013-01-02', '23:30', 'O'),
(3, '2013-01-01', '06:30', 'I'),
(3, '2013-01-02', '16:30', 'O'),
(4, '2013-01-01', '20:30', 'I'),
(4, '2013-01-02', '05:30', 'O'),
(5, '2013-01-01', '20:30', 'O'),
(5, '2013-01-02', '05:30', 'I');
I need to get the the time IN and OUT of each employee, disregarding duplicate entries
and identifying orphan entries (without a matching IN or OUT) so that I can put it in a separate list for notification of missing entries.
so far I have this sql that I modified which I got from Peter Larsson's Island and Gaps solution (link) :
WITH cteIslands ( employeeid, timetype, logdate, logtime, grp)
AS ( SELECT employeeid, timetype, logdate, logtime,
ROW_NUMBER()
OVER ( ORDER BY employeeid, logdate, logtime )
- ROW_NUMBER()
OVER ( ORDER BY timetype, employeeid,
logdate, logtime ) AS grp
FROM timeclock
),
cteGrouped ( employeeid, timetype, logdate, logtime )
AS ( SELECT employeeid, MIN(timetype), logdate,
CASE WHEN MIN(timetype) = 'I'
THEN MIN(logtime)
ELSE MAX(logtime)
END AS logtime
FROM cteIslands
GROUP BY employeeid, logdate, grp
)
select * from cteIslands
order by employeeid, logdate, logtime
The above works fine in satisfying the removal of duplicate entries but now I cant seem to get the orphan entries. I think LEAD or LAG can be used on this but I am new with postgresql. I hope someone here can help me on this.
Edit:
I somehow need to add a new field that I can use so that I know which records are orphaned.
somethine like the table below:
EMPID TYPE LOGDATE LOGTIME ORPHAN_FLAG
1 I 2013-01-01 07:00:00 0
1 O 2013-01-01 16:01:00 0
1 I 2013-01-02 07:00:00 0
1 O 2013-01-02 16:30:00 0
2 I 2013-01-01 07:00:00 0
2 O 2013-01-01 16:00:00 0
2 I 2013-01-02 06:30:00 0
2 O 2013-01-02 15:30:00 0
2 I 2013-01-02 16:30:00 0
2 O 2013-01-02 23:30:00 0
3 I 2013-01-01 06:30:00 0
3 O 2013-01-02 16:30:00 0
4 I 2013-01-01 20:30:00 0
4 O 2013-01-02 05:30:00 0
5 O 2013-01-01 20:30:00 1 <--- NO MATCHING IN
5 I 2013-01-02 05:30:00 1 <--- NO MATCHING OUT
First, I think you should rethink your design a little bit. It makes little sense to record a clock-out entry without having clocked in, and you can use things like partial indexes to ensure that clocked in entries are easy to look up when they don't have a clock out entry.
So I would start by considering moving your storage tables to something like:
CREATE TABLE timeclock(
employeeid INT
, logdate DATE
, logintime TIME
, logouttime time
, timetype VARCHAR(1)
);
The bad news is that if you can't do that, your orphanned report will be quite difficult to make perform well because you are doing a self join where you hope every row in a large table will have a corresponding other entry. This is going to require, at best, two sequential scans on the table and at worst a sequential scan with a nested loop index scan (assuming proper indexes, the alternative, a nested loop sequential scan would be even worse).
Handling this where you have rollover between dates (clock in at 11pm, clock out at 2am) will make this problem very hard to avoid.
Now since you have your CTE's working fine except for orphanned records, my recommendation is that union with another query on the same table looking for those which are not found properly in your current query.