Get rows for the last 10 dates - sql

I have a scenario in a Postgres 9.3 database where I have to get the last 10 dates when books were sold. Consider below example:
Store Book
---------- ----------------------
Id Name Id Name Sid Count Date
1 ABC 1 XYZ 1 20 11/11/2015
2 DEF 2 JHG 1 10 11/11/2015
3 UYH 1 10 15/11/2015
4 TRE 1 50 17/11/2015
There is currently no UNIQUE constraint on (name, sid, date) in table book, but we have a service in place that inserts only one count per day.
I have to get results based on store.id. When I pass the ID, the report should be generated with bookname, sold date, and the count of sold copies.
Desired output:
BookName 11/11/2015 15/11/2015 17/11/2015
XYZ 20 -- --
JHG 10 -- --
UYH -- 10 --
TRE -- -- 50

This looks unsuspicious, but it's a hell of a question.
Assumptions
Your counts are integer.
All columns in table book are defined NOT NULL.
The composite (name, sid, date) is unique in table book. You should have a UNIQUE constraint, preferably (for performance) with columns in this order:
UNIQUE(sid, date, name)
This provides the index needed for performance automatically. (Else create one.) See:
Multicolumn index and performance
Is a composite index also good for queries on the first field?
crosstab() queries
To get top performance and short query strings (especially if you run this query often) I suggest the additional module tablefunc providing various crosstab() functions. Basic instructions:
PostgreSQL Crosstab Query
Basic queries
You need to get these right first.
The last 10 days:
SELECT DISTINCT date
FROM book
WHERE sid = 1
ORDER BY date DESC
LIMIT 10;
Numbers for last 10 days using the window function dense_rank():
SELECT *
FROM (
SELECT name
, dense_rank() OVER (ORDER BY date DESC) AS date_rnk
, count
FROM book
WHERE sid = 1
) sub
WHERE date_rnk < 11
ORDER BY name, date_rnk DESC;
(Not including actual dates in this query.)
Column names for output columns (for full solution):
SELECT 'bookname, "' || string_agg(to_char(date, 'DD/MM/YYYY'), '", "' ORDER BY date) || '"'
FROM (
SELECT DISTINCT date
FROM book
WHERE sid = 1
ORDER BY date DESC
LIMIT 10
) sub;
Simple result with static column names
This may be good enough for you - but we don't see actual dates in the result:
SELECT * FROM crosstab(
'SELECT *
FROM (
SELECT name
, dense_rank() OVER (ORDER BY date DESC) AS date_rnk
, count
FROM book
WHERE sid = 1
) sub
WHERE date_rnk < 11
ORDER BY name, date_rnk DESC'
, 'SELECT generate_series(10, 1, -1)'
) AS (bookname text
, date1 int, date2 int, date3 int, date4 int, date5 int
, date6 int, date7 int, date8 int, date9 int, date10 int);
For repeated use I suggest you create this (very fast) generic C function for 10 integer columns once, to simplify things a bit:
CREATE OR REPLACE FUNCTION crosstab_int10(text, text)
RETURNS TABLE (bookname text
, date1 int, date2 int, date3 int, date4 int, date5 int
, date6 int, date7 int, date8 int, date9 int, date10 int)
LANGUAGE C STABLE STRICT AS
'$libdir/tablefunc','crosstab_hash';
Details in this related answer:
Dynamically generate columns for crosstab in PostgreSQL
Then your call becomes:
SELECT * FROM crosstab(
'SELECT *
FROM (
SELECT name
, dense_rank() OVER (ORDER BY date DESC) AS date_rnk
, count
FROM book
WHERE sid = 1
) sub
WHERE date_rnk < 11
ORDER BY name, date_rnk DESC'
, 'SELECT generate_series(10, 1, -1)'
); -- no column definition list required!
Full solution with dynamic column names
Your actual question is more complicated, you also want dynamic column names.
For a given table, the resulting query could look like this then:
SELECT * FROM crosstab_int10(
'SELECT *
FROM (
SELECT name
, dense_rank() OVER (ORDER BY date DESC) AS date_rnk
, count
FROM book
WHERE sid = 1
) sub
WHERE date_rnk < 11
ORDER BY name, date_rnk DESC'
, 'SELECT generate_series(10, 1, -1)'
) AS t(bookname
, "04/11/2015", "05/11/2015", "06/11/2015", "07/11/2015", "08/11/2015"
, "09/11/2015", "10/11/2015", "11/11/2015", "15/11/2015", "17/11/2015");
The difficulty is to distill dynamic column names. Either assemble the query string by hand, or (much rather) let this function do it for you:
CREATE OR REPLACE FUNCTION f_generate_date10_sql(_sid int = 1)
RETURNS text
LANGUAGE sql AS
$func$
SELECT format(
$$SELECT * FROM crosstab_int10(
'SELECT *
FROM (
SELECT name
, dense_rank() OVER (ORDER BY date DESC) AS date_rnk
, count
FROM book
WHERE sid = %1$s
) sub
WHERE date_rnk < 11
ORDER BY name, date_rnk DESC'
, 'SELECT generate_series(10, 1, -1)'
) AS ct(bookname, "$$
|| string_agg(to_char(date, 'DD/MM/YYYY'), '", "' ORDER BY date) || '")'
, _sid)
FROM (
SELECT DISTINCT date
FROM book
WHERE sid = 1
ORDER BY date DESC
LIMIT 10
) sub
$func$;
Call:
SELECT f_generate_date10_sql(1);
This generates the desired query, which you execute in turn.
db<>fiddle here

Related

How to show only the latest record in SQL

I have this issue where I want to show only the latest record (Col 1). I deleted the date column thinking that it might not work if it has different values. but if that's the case, then the record itself has a different name (Col 1) because it has a different date in the name of it.
Is it possible to fetch one record in this case?
The code:
SELECT distinct p.ID,
max(at.Date) as date,
at.[RAPID3 Name] as COL1,
at.[DLQI Name] AS COL2,
at.[HAQ-DI Name] AS COL3,
phy.name as phyi,
at.State_ID
FROM dbo.[Assessment Tool] as at
Inner join dbo.patient as p on p.[ID] = at.[Owner (Patient)_Patient_ID]
Inner join dbo.[Physician] as phy on phy.ID = p.Physician_ID
where (at.State_ID in (162, 165,168) and p.ID = 5580)
group by
at.[RAPID3 Name],
at.[DLQI Name],
at.[HAQ-DI Name],
p.ID, phy.name,
at.State_ID
SS:
In this SS I want to show only the latest record (COL 1) of this ID "5580". Means the first row for this ID.
Thank you
The Most Accurate way to handle this.
Extract The Date.
Than use Top and Order.
create table #Temp(
ID int,
Col1 Varchar(50) null,
Col2 Varchar(50) null,
Col3 Varchar(50) null,
Phyi Varchar(50) null,
State_ID int)
Insert Into #Temp values(5580,'[9/29/2021]-[9.0]High Severity',null,null,'Eman Elshorpagy',168)
Insert Into #Temp values(5580,'[10/3/2021]-[9.3]High Severity',null,null,'Eman Elshorpagy',168)
select top 1 * from #Temp as t
order by cast((Select REPLACE((SELECT REPLACE((SELECT top 1 Value FROM STRING_SPLIT(t.Col1,'-')),'[','')),']','')) as date) desc
This is close to ANSI standard, and it also caters for the newest row per id.
The principle is to use ROW_NUMBER() using a descending order on the date/timestamp (using a DATE type instead of a DATETIME and avoiding the keyword DATE for a column name) in one query, then to select from that query using the result of row number for the filter.
-- your input, but 2 id-s to show how it works with many ..
indata(id,dt,col1,phyi,state_id) AS (
SELECT 5580,DATE '2021-10-03','[10/3/2021] - [9,3] High Severity','Eman Elshorpagy',168
UNION ALL SELECT 5580,DATE '2021-09-29','[9/29/2021] - [9,0] High Severity','Eman Elshorpagy',168
UNION ALL SELECT 5581,DATE '2021-10-03','[10/3/2021] - [9,3] High Severity','Eman Elshorpagy',168
UNION ALL SELECT 5581,DATE '2021-09-29','[9/29/2021] - [9,0] High Severity','Eman Elshorpagy',168
)
-- real query starts here, replace following comman with "WITH" ...
,
with_rank AS (
SELECT
*
, ROW_NUMBER() OVER(PARTITION BY id ORDER BY dt DESC) AS rank_id
FROM indata
)
SELECT
id
, dt
, col1
, phyi
, state_id
FROM with_rank
WHERE rank_id=1
;
id | dt | col1 | phyi | state_id
------+------------+-----------------------------------+-----------------+----------
5580 | 2021-10-03 | [10/3/2021] - [9,3] High Severity | Eman Elshorpagy | 168
5581 | 2021-10-03 | [10/3/2021] - [9,3] High Severity | Eman Elshorpagy | 168

Update with Incrementing column

Example table:
Sample
------
id (primary key)
secondaryId (optional secondary key)
crtdate (created date)
... (other fields)
Some users use secondaryId for identifying rows (i.e., should be unique)
When the rows were created, secondaryId did not get a value, and was defaulted to 0.
Subsequently, rows were given a secondaryId value as they were used.
I need to update all rows with value 0 to be the next available number.
Desired result (w/ simplified values):
From: To:
id secondaryId id secondaryId
1 0 1 7 // this is the max(secondaryId)+1
2 0 2 8 // incrementing as we fill in the zeroes
3 5 3 5 // already assigned
4 0 4 9 // keep incrementing...
5 6 5 6 // already assigned
This query would accomplish what I want to do; but alas, CTE + UPDATE is not supported:
with temp(primary, rownumber) as (
values (
select
id,
row_number() over (partition by secondaryId order by crtdate)+6 --6 is max secondaryId
from Sample
where secondaryId='0'
)
update Sample set secondaryId=temp.rownumber where Sample.id=temp.id
Does anyone have suggestions for a different way to approach this problem? I'm now suffering from tunnel vision...
You can use MERGE statement as id is primary key and there won't be any duplicates.
MERGE INTO Sample as trgt
Using (
select id,
row_number() over (partition by secondaryId order by crtdate)+6 secondaryId
--6 is max secondaryId
from Sample where secondaryId='0'
) as src
ON( src.id= trgt.id)
WHEN MATCHED THEN UPDATE SET trgt.secondaryid = src.secondaryId
You can also UPDATE a SELECT (well "fullselect"), which is probably the neatest solution here
create table sample (
id int not null primary key
, secondaryId int not null
, crtdate date not null
)
;
INSERT INTO sample VALUES
( 1 , 0 , current_date)
,( 2 , 0 , current_date)
,( 3 , 5 , current_date)
,( 4 , 0 , current_date)
,( 5 , 6 , current_date)
;
UPDATE (
SELECT id, secondaryId
, ROW_NUMBER() OVER(ORDER BY secondaryId Desc)
+ (SELECT MAX(id) from sample) as new_secondaryId
FROM
sample s
WHERE secondaryId = 0
)
SET secondaryId = new_secondaryId
;
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0001022.html
For a Template method (not necessary to know max value), you can try this:
create table tmptable as (
select f1.id, row_number() over(order by f1.crtdate) + ifnull((select max(f2.secondaryId) from Sample f2), 0) newsecondary
from Sample f1 where f1.secondaryId='0'
) with data;
update Sample f1
set f1.secondaryId=(
select f2.newsecondary
from tmptable f2
where f2.id=f1.id
)
where exists
(
select * from tmptable f2
where f2.id=f1.id
);
drop table tmptable;

Joining a table on itself with aggregation

I am trying to (efficiently) fetch rows from the connections table, where the startdate is the latest within cpid - for selected cpid's.
Here's an example of the data in the connections table with rows I want marked with <<<
connid cpid startdate
1 20 7/17/16
2 20 8/23/16
3 20 9/12/16 <<<
4 30 6/17/16
5 30 8/23/16 <<<
6 40 2/24/16
7 40 3/17/16
8 40 5/18/16 <<<
etc...
This query returns the latest startdate and cpid, but I'm not sure how to join it with itself to get the result that I need:
select cpid, max(startdate)
from connections
where cpid in (
20,
30,
40
)
group by cpid
The result I'm looking for is as follows:
connid cpid startdate
3 20 9/12/16
5 30 8/23/16
8 40 5/18/16
Any help would be appreciated!
robm
Something like this:
WITH Numbered AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY cpid ORDER BY startdate DESC) AS Nr
,*
FROM connections
)
SELECT *
FROM Numbered
WHERE Nr=1;
The function ROW_NUMBER() will add a running number to the row. PARTITION BY allows you to re-start the running number for groups and ORDER BY allows you to define the order for the numbering. With DESC you will get the latest on top, hence Nr=1.
UPDATE: old-fashioned...
If you need this on other systems than SQL-Server you might go the old-fashioned way:
SET DATEFORMAT mdy;
DECLARE #tbl TABLE(connid INT, cpid INT, startdate DATE);
INSERT INTO #tbl VALUES
( 1,20,'7/17/16')
,(2,20,'8/23/16')
,(3,20,'9/12/16')
,(4,30,'6/17/16')
,(5,30,'8/23/16')
,(6,40,'2/24/16')
,(7,40,'3/17/16')
,(8,40,'5/18/16') ;
SELECT *
FROM #tbl AS tbl
WHERE tbl.startdate IN(SELECT MAX(x.startdate) FROM #tbl AS x WHERE x.cpid=tbl.cpid)
Here's an alternative solution using CROSS APPLY rather than a CTE, the execution plan is different, once you try it on your environment it could be more efficient.
DROP TABLE #connections
CREATE TABLE #connections(connid INT, cpid INT,startdate datetime)
INSERT INTO #connections(connid,cpid,startdate)
VALUES
(1,'20','20160717')
,(2,'20','20160823')
,(3,'20','20160912')
,(4,'30','20160617')
,(5,'30','20160823')
,(6,'40','20160224')
,(7,'40','20160317')
,(8,'40','20160518')
SELECT
c.connid,c.cpid,c.startdate
FROM
#connections c
CROSS APPLY (
SELECT
cpid
,MAX(startdate) startdate
FROM
#connections
GROUP BY
cpid
) a
WHERE
c.cpid = a.cpid
AND c.startdate = a.startdate

PostgreSQL 9.3: Pivot table query

I want to show the pivot table(crosstab) for the given below table.
Table: Employee
CREATE TABLE Employee
(
Employee_Number varchar(10),
Employee_Role varchar(50),
Group_Name varchar(10)
);
Insertion:
INSERT INTO Employee VALUES('EMP101','C# Developer','Group_1'),
('EMP102','ASP Developer','Group_1'),
('EMP103','SQL Developer','Group_2'),
('EMP104','PLSQL Developer','Group_2'),
('EMP101','Java Developer',''),
('EMP102','Web Developer','');
Now I want to show the pivot table for the above data as shown below:
Expected Result:
Employee_Number TotalRoles TotalGroups Available Others Group_1 Group_2
---------------------------------------------------------------------------------------------------
EMP101 2 2 1 1 1 0
EMP102 2 2 1 1 1 0
EMP103 1 2 1 0 0 1
EMP104 1 2 1 0 0 1
Explanation: I want to show the Employee_Number, the TotalRoles which each employee has,
the TotalGroups which are present to all employees, the Available shows the employee available
in how many groups, the Others have to show the employee is available in other's also for which
the group_name have not assigned and finally the Group_Names must be shown in the pivot format.
SELECT * FROM crosstab(
$$SELECT grp.*, e.group_name
, CASE WHEN e.employee_number IS NULL THEN 0 ELSE 1 END AS val
FROM (
SELECT employee_number
, count(employee_role)::int AS total_roles
, (SELECT count(DISTINCT group_name)::int
FROM employee
WHERE group_name <> '') AS total_groups
, count(group_name <> '' OR NULL)::int AS available
, count(group_name = '' OR NULL)::int AS others
FROM employee
GROUP BY 1
) grp
LEFT JOIN employee e ON e.employee_number = grp.employee_number
AND e.group_name <> ''
ORDER BY grp.employee_number, e.group_name$$
,$$VALUES ('Group_1'::text), ('Group_2')$$
) AS ct (employee_number text
, total_roles int
, total_groups int
, available int
, others int
, "Group_1" int
, "Group_2" int);
SQL Fiddle demonstrating the base query, but not the crosstab step, which is not installed on sqlfiddle.com
Basics for crosstab:
PostgreSQL Crosstab Query
Special in this crosstab: all the "extra" columns. Those columns are placed in the middle, after the "row name" but before "category" and "value":
Pivot on Multiple Columns using Tablefunc
Once again, if you have a dynamic set of groups, you need to build this statement dynamically and execute it in a second call:
Selecting multiple max() values using a single SQL statement
You can use the crosstab function for this.
First of all you need to add the tablefunc extension if you haven't already:
CREATE EXTENSION tablefunc;
The crosstab functions require you to pass it a query returning the data you need to pivot, then a list of the columns in the output. (In other ways "tell me the input and the output format you want"). The sort order is important!
In your case, the input query is quite complicated - I think you need to do a load of separate queries, then UNION ALL them to get the desired data. I'm not entirely sure how you calculate the values "TotalGroups" and "Available", but you can modify the below in the relevant place to get what you need.
SELECT * FROM crosstab(
'SELECT employee_number, attribute, value::integer AS value FROM (with allemployees AS (SELECT distinct employee_number FROM employee) -- use a CTE to get distinct employees
SELECT employee_number,''attr_0'' AS attribute,COUNT(distinct employee_role) AS value FROM employee GROUP BY employee_number -- Roles by employee
UNION ALL
SELECT employee_number,''attr_1'' AS attribute,value from allemployees, (select count (distinct group_name) as value from employee where group_name <> '''') a
UNION ALL
SELECT employee_number,''attr_2'' AS attribute, COUNT(distinct group_name) AS value FROM employee where group_name <> '''' GROUP BY employee_number -- Available, do not know how this is calculate
UNION ALL
SELECT a.employee_number, ''attr_3'' AS attribute,coalesce(value,0) AS value FROM allemployees a LEFT JOIN -- other groups. Use a LEFT JOIN to avoid nulls in the output
(SELECT employee_number,COUNT(*) AS value FROM employee WHERE group_name ='''' GROUP BY employee_number) b on a.employee_number = b.employee_number
UNION ALL
SELECT a.employee_number, ''attr_4'' AS attribute,coalesce(value,0) AS value FROM allemployees a LEFT JOIN -- group 1
(SELECT employee_number,COUNT(*) AS value FROM employee WHERE group_name =''Group_1'' GROUP BY employee_number) b on a.employee_number = b.employee_number
UNION ALL
SELECT a.employee_number, ''attr_5'' AS attribute,coalesce(value,0) AS value FROM allemployees a LEFT JOIN -- group 2
(SELECT employee_number,COUNT(*) AS value FROM employee WHERE group_name =''Group_2'' GROUP BY employee_number) b on a.employee_number = b.employee_number) a order by 1,2')
AS ct(employee_number varchar,"TotalRoles" integer,"TotalGroups" integer,"Available" integer, "Others" integer,"Group_1" integer, "Group_2" integer)

Create Historical Table from Dates with Ranked Contracts (Gaps and Islands?)

I have an issue in Teradata where I am trying to build a historical contract table that lists a system, it's corresponding contracts and the start and end dates of each contract. This table would then be queried for reporting as a point in time table. Here is some code to better explain.
CREATE TABLE TMP_WORK_DB.SOLD_SYSTEMS
(
SYSTEM_ID varchar(5),
CONTRACT_TYPE varchar(10),
CONTRACT_RANK int,
CONTRACT_STRT_DT date,
CONTRACT_END_DT date
);
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('AAA', 'BEST', 10, '2012-01-01', '2012-06-30');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('AAA', 'BEST', 9, '2012-01-01', '2012-06-30');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('AAA', 'OK', 1, '2012-08-01', '2012-12-30');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('BBB', 'BEST', 10, '2013-12-01', '2014-03-02');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('BBB', 'BETTER', 7, '2013-12-01', '2017-03-02');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('BBB', 'GOOD', 4, '2016-12-02', '2017-12-02');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('CCC', 'BEST', 10, '2009-10-13', '2014-10-14');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('CCC', 'BETTER', 7, '2009-10-13', '2016-10-14');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('CCC', 'OK', 2, '2008-10-13', '2017-10-14');
The required output would be:
SYSTEM_ID CONTRACT_TYPE CONTRACT_STRT_DT CONTARCT_END_DT CONTRACT_RANK
AAA BEST 01/01/2012 06/30/2012 10
AAA OK 08/01/2012 12/30/2012 1
BBB BEST 12/01/2013 03/02/2014 10
BBB BETTER 03/03/2014 03/02/2017 7
BBB GOOD 03/03/2017 12/02/2017 4
CCC OK 10/13/2008 10/12/2009 2
CCC BEST 10/13/2009 10/14/2014 10
CCC BETTER 10/15/2014 10/14/2016 7
CCC OK 10/15/2016 10/14/2017 2
I'm not necessarily looking to reduce rows but am looking to get the correct state of the system_id at any given point in time. Note that when a higher ranked contract ends and a lower ranked contract is still active the lower ranked picks up where the higher one left off.
We are using TD 14 and I have been able to get the easy records where the dates flow sequentially and are of higher rank but am having trouble with the overlaps where two different ranked contracts cover multiple date spans.
I found this blog post (Sharpening Stones) and got it working for the most part but am still having trouble setting the new start dates for the overlapping contracts.
Any help would be appreciated. Thanks.
*UPDATE 04/04/2014 *
I came up with the following code which gives me exactly what I want but I'm not sure of the performance. It works on smaller data sets of a few hundred rows but I havent tested it on several million:
*UPDATE 04/07/2014 *
Updated the date subquery due to spool issues. This query explodes all days where the contract is possibly active and then uses the ROW_NUMBER function to get the highest ranked CONTRACT_TYPE per day. The MIN/MAX functions are then partitioned over the system and contract type to pick up when the highest ranked contract type changes.
*UPDATE - 2 - 04/07/2014 *
I cleaned up the query and it seems to be perform a little better.
SELECT
SYSTEM_ID
, CONTRACT_TYPE
, MIN(CALENDAR_DATE) NEW_START_DATE
, MAX(CALENDAR_DATE) NEW_END_DATE
, CONTRACT_RANK
FROM (
SELECT
CALENDAR_DATE
, SYSTEM_ID
, CONTRACT_TYPE
, CONTRACT_RANK
, ROW_NUMBER() OVER (PARTITION BY SYSTEM_ID, CALENDAR_DATE ORDER BY CONTRACT_RANK DESC, CONTRACT_STRT_DT DESC, CONTRACT_END_DT DESC) AS RNK
FROM SOLD_SYSTEMS t1
JOIN (
SELECT CALENDAR_DATE
FROM FULL_CALENDAR_TABLE ia
WHERE CALENDAR_DATE > DATE'2013-01-01'
)dt
ON CALENDAR_DATE BETWEEN CONTRACT_STRT_DT AND CONTRACT_END_DT
QUALIFY RNK = 1
)z1
GROUP BY 1,2,5
Following approach uses the new PERIOD functions in TD13.10.
-- 1. TD_SEQUENCED_COUNT can't be used in joins, so create a Volatile Table
-- 2. TD_SEQUENCED_COUNT can't use additional columns (e.g. CONTRACT_RANK),
-- so simply create a new row whenever a period starts or ends without
-- considering CONTRACT_RANK
CREATE VOLATILE TABLE vt AS
(
WITH cte
(
SYSTEM_ID
,pd
)
AS
(
SELECT
SYSTEM_ID
-- PERIODs can easily be constructed on-the-fly, but the end date is not inclusive,
-- so I had to adjust to your implementation, CONTRACT_END_DT +/- 1:
,PERIOD(CONTRACT_STRT_DT, CONTRACT_END_DT + 1) AS pd
FROM SOLD_SYSTEMS
)
SELECT
SYSTEM_ID
,BEGIN(pd) AS CONTRACT_STRT_DT
,END(pd) - 1 AS CONTRACT_END_DT
FROM
TABLE (TD_SEQUENCED_COUNT
(NEW VARIANT_TYPE(cte.SYSTEM_ID)
,cte.pd)
RETURNS (SYSTEM_ID VARCHAR(5)
,Policy_Count INTEGER
,pd PERIOD(DATE))
HASH BY SYSTEM_ID
LOCAL ORDER BY SYSTEM_ID ,pd) AS dt
)
WITH DATA
PRIMARY INDEX (SYSTEM_ID)
ON COMMIT PRESERVE ROWS
;
-- Find the matching CONTRACT_RANK
SELECT
vt.SYSTEM_ID
,t.CONTRACT_TYPE
,vt.CONTRACT_STRT_DT
,vt.CONTRACT_END_DT
,t.CONTRACT_RANK
FROM vt
-- If both vt and SOLD_SYSTEMS have a NUPI on SYSTEM_ID this join should be
-- quite efficient
JOIN SOLD_SYSTEMS AS t
ON vt.SYSTEM_ID = t.SYSTEM_ID
AND ( t.CONTRACT_STRT_DT, t.CONTRACT_END_DT)
OVERLAPS (vt.CONTRACT_STRT_DT, vt.CONTRACT_END_DT)
QUALIFY
-- As multiple contracts for the same period are possible:
-- find the row with the highest rank
ROW_NUMBER()
OVER (PARTITION BY vt.SYSTEM_ID,vt.CONTRACT_STRT_DT
ORDER BY t.CONTRACT_RANK DESC, vt.CONTRACT_END_DT DESC) = 1
ORDER BY 1,3
;
-- Previous query might return consecutive rows with the same CONTRACT_RANK, e.g.
-- BBB BETTER 2014-03-03 2016-12-01 7
-- BBB BETTER 2016-12-02 2017-03-02 7
-- If you don't want that you have to normalize the data:
WITH cte
(
SYSTEM_ID
,CONTRACT_STRT_DT
,CONTRACT_END_DT
,CONTRACT_RANK
,CONTRACT_TYPE
,pd
)
AS
(
SELECT
vt.SYSTEM_ID
,vt.CONTRACT_STRT_DT
,vt.CONTRACT_END_DT
,t.CONTRACT_RANK
,t.CONTRACT_TYPE
,PERIOD(vt.CONTRACT_STRT_DT, vt.CONTRACT_END_DT + 1) AS pd
FROM vt
JOIN SOLD_SYSTEMS AS t
ON vt.SYSTEM_ID = t.SYSTEM_ID
AND ( t.CONTRACT_STRT_DT, t.CONTRACT_END_DT)
OVERLAPS (vt.CONTRACT_STRT_DT, vt.CONTRACT_END_DT)
QUALIFY
ROW_NUMBER()
OVER (PARTITION BY vt.SYSTEM_ID,vt.CONTRACT_STRT_DT
ORDER BY t.CONTRACT_RANK DESC, vt.CONTRACT_END_DT DESC) = 1
)
SELECT
SYSTEM_ID
,CONTRACT_TYPE
,BEGIN(pd) AS CONTRACT_STRT_DT
,END(pd) - 1 AS CONTRACT_END_DT
,CONTRACT_RANK
FROM
TABLE (TD_NORMALIZE_MEET
(NEW VARIANT_TYPE(cte.SYSTEM_ID
,cte.CONTRACT_RANK
,cte.CONTRACT_TYPE)
,cte.pd)
RETURNS (SYSTEM_ID VARCHAR(5)
,CONTRACT_RANK INT
,CONTRACT_TYPE VARCHAR(10)
,pd PERIOD(DATE))
HASH BY SYSTEM_ID
LOCAL ORDER BY SYSTEM_ID, CONTRACT_RANK, CONTRACT_TYPE, pd ) A
ORDER BY 1, 3;
Edit: This is another way to get the result of the 2nd query without Volatile Table and TD_SEQUENCED_COUNT:
SELECT
t.SYSTEM_ID
,t.CONTRACT_TYPE
,BEGIN(CONTRACT_PERIOD) AS CONTRACT_STRT_DT
,END(CONTRACT_PERIOD)- 1 AS CONTRACT_END_DT
,t.CONTRACT_RANK
,dt.p P_INTERSECT PERIOD(t.CONTRACT_STRT_DT,t.CONTRACT_END_DT + 1) AS CONTRACT_PERIOD
FROM
(
SELECT
dt.SYSTEM_ID
,PERIOD(d, MIN(d)
OVER (PARTITION BY dt.SYSTEM_ID
ORDER BY d
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)) AS p
FROM
(
SELECT
SYSTEM_ID
,CONTRACT_STRT_DT AS d
FROM SOLD_SYSTEMS
UNION
SELECT
SYSTEM_ID
,CONTRACT_END_DT + 1 AS d
FROM SOLD_SYSTEMS
) AS dt
QUALIFY p IS NOT NULL
) AS dt
JOIN SOLD_SYSTEMS AS t
ON dt.SYSTEM_ID = t.SYSTEM_ID
WHERE CONTRACT_PERIOD IS NOT NULL
QUALIFY
ROW_NUMBER()
OVER (PARTITION BY dt.SYSTEM_ID,p
ORDER BY t.CONTRACT_RANK DESC, t.CONTRACT_END_DT DESC) = 1
ORDER BY 1,3
And based on that you can also include the normalization in a single query:
WITH cte
(
SYSTEM_ID
,CONTRACT_TYPE
,CONTRACT_STRT_DT
,CONTRACT_END_DT
,CONTRACT_RANK
,pd
)
AS
(
SELECT
t.SYSTEM_ID
,t.CONTRACT_TYPE
,BEGIN(CONTRACT_PERIOD) AS CONTRACT_STRT_DT
,END(CONTRACT_PERIOD)- 1 AS CONTRACT_END_DT
,t.CONTRACT_RANK
,dt.p P_INTERSECT PERIOD(t.CONTRACT_STRT_DT,t.CONTRACT_END_DT + 1) AS CONTRACT_PERIOD
FROM
(
SELECT
dt.SYSTEM_ID
,PERIOD(d, MIN(d)
OVER (PARTITION BY dt.SYSTEM_ID
ORDER BY d
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)) AS p
FROM
(
SELECT
SYSTEM_ID
,CONTRACT_STRT_DT AS d
FROM SOLD_SYSTEMS
UNION
SELECT
SYSTEM_ID
,CONTRACT_END_DT + 1 AS d
FROM SOLD_SYSTEMS
) AS dt
QUALIFY p IS NOT NULL
) AS dt
JOIN SOLD_SYSTEMS AS t
ON dt.SYSTEM_ID = t.SYSTEM_ID
WHERE CONTRACT_PERIOD IS NOT NULL
QUALIFY
ROW_NUMBER()
OVER (PARTITION BY dt.SYSTEM_ID,p
ORDER BY t.CONTRACT_RANK DESC, t.CONTRACT_END_DT DESC) = 1
)
SELECT
SYSTEM_ID
,CONTRACT_TYPE
,BEGIN(pd) AS CONTRACT_STRT_DT
,END(pd) - 1 AS CONTRACT_END_DT
,CONTRACT_RANK
FROM
TABLE (TD_NORMALIZE_MEET
(NEW VARIANT_TYPE(cte.SYSTEM_ID
,cte.CONTRACT_RANK
,cte.CONTRACT_TYPE)
,cte.pd)
RETURNS (SYSTEM_ID VARCHAR(5)
,CONTRACT_RANK INT
,CONTRACT_TYPE VARCHAR(10)
,pd PERIOD(DATE))
HASH BY SYSTEM_ID
LOCAL ORDER BY SYSTEM_ID, CONTRACT_RANK, CONTRACT_TYPE, pd ) A
ORDER BY 1, 3;
SEL system_id,contract_type,MAX(contract_rank),
CASE WHEN contract_strt_dt<prev_end_dt THEN prev_end_dt+1
ELSE contract_strt_dt
END AS new_start ,contract_strt_dt,contract_end_dt,
MIN(contract_end_dt) OVER (PARTITION BY system_id
ORDER BY contract_strt_dt,contract_end_dt ROWS BETWEEN 1 PRECEDING
AND 1 PRECEDING) prev_end_dt
FROM sold_systems
GROUP BY system_id,contract_type,contract_strt_dt,contract_end_dt
ORDER BY contract_strt_dt,contract_end_dt,prev_end_dt
I think I'v got it....
try this
select SYSTEM_ID, CONTRACT_TYPE,CONTRACT_RANK,
case
when CONTRACT_STRT_DT<NEW_START_DATE then NEW_START_DATE /*if new_star_date overlap startdate then get new_Start_date */
else CONTRACT_STRT_DT
end as new_contract_str_dt,
CONTRACT_END_DT
from
(select t1.SYSTEM_ID,t1.CONTRACT_TYPE,t1.CONTRACT_RANK,t1.CONTRACT_STRT_DT,t1.CONTRACT_END_DT,
coalesce(max(t1.CONTRACT_END_DT) over (partition by t1.SYSTEM_ID order by t1.CONTRACT_RANK desc rows between UNBOUNDED PRECEDING and 1 preceding ), t1.CONTRACT_STRT_DT) NEW_START_DATE
from SOLD_SYSTEMS t1
) as temp1
/*you may remove fully overlapped contracts*/
where NEW_START_DATE<=CONTRACT_END_DT
It's simpler and have a good execution plan... You can work with large tables (don't forget to collect statistics )