Group Consecutive Records - sql

I am trying to group some records to find the first one for a particular site for a given client. The problem is that the records go back and forth between sites, so I need to keep non-consecutive site date ranges separate.
Given the sample data, I want to end up with 3 records - one for site 1 starting 7/3/18, a second for site 2 starting on 9/3/18 and the third for site 1 again starting 11/3/18.
SELECT 9999 AS CLIENT_ID, 1 AS SITE_NUM, '2018-07-03' AS START_DATE, '2018-08-05' AS CREATED_DATE, 1 AS RECORD_ID
INTO #TEMP
UNION
SELECT 9999 AS MEMBER_ID, 1 AS SITE_NUM, '2018-08-01' AS CONSENT_SIGN_DATE, '2018-10-05' AS CREATED_DATE, 2
UNION
SELECT 9999 AS MEMBER_ID, 1 AS SITE_NUM, '2018-07-03' AS CONSENT_SIGN_DATE, '2018-09-22' AS CREATED_DATE, 3
UNION
SELECT 9999 AS MEMBER_ID, 2 AS SITE_NUM, '2018-09-03' AS CONSENT_SIGN_DATE, '2018-09-05' AS CREATED_DATE, 4
UNION
SELECT 9999 AS MEMBER_ID, 2 AS SITE_NUM, '2018-10-03' AS CONSENT_SIGN_DATE, '2018-10-05' AS CREATED_DATE, 5
UNION
SELECT 9999 AS MEMBER_ID, 1 AS SITE_NUM, '2018-11-03' AS CONSENT_SIGN_DATE, '2018-11-05' AS CREATED_DATE, 6
UNION
SELECT 9999 AS MEMBER_ID, 1 AS SITE_NUM, '2018-12-01' AS CONSENT_SIGN_DATE, '2018-12-05' AS CREATED_DATE, 7
I've been playing with ROW_NUM but haven't been able to figure out how to separate the two sets of dates for Site 1.
SELECT *, ROW_NUMBER()OVER(PARTITION BY T.CLIENT_ID, T.SITE_NUM ORDER BY T.START_DATE, T.RECORD_ID)
FROM #TEMP T
LEFT JOIN #TEMP T2 ON T2.CLIENT_ID = T.CLIENT_ID AND T2.RECORD_ID = T.RECORD_ID - 1
ORDER BY T.RECORD_ID
How can I group the results by client and consecutive dates for a single site?

This is a gaps-and-islands problem. For this, the difference of row numbers is the best approach:
select t.client_id, t.site_num, min(t.start_date), max(t.start_date)
from (select t.*,
row_number() over (partition by t.client_id order by T.START_DATE, T.RECORD_ID) as seqnum_c,
row_number() over (partition by t.client_id, t.site_num order by T.START_DATE, T.RECORD_ID) as seqnum_cs
from #temp t
) t
group by client_id, site_num, (seqnum_c - seqnum_cs)

WHat you want is consecutive rows should not have the same SITE_NUM value. All you need to do is add a where clause at the end of your query.
SELECT *, ROW_NUMBER()OVER(PARTITION BY T.CLIENT_ID, T.SITE_NUM ORDER BY T.START_DATE, T.RECORD_ID)
FROM #TEMP T
LEFT JOIN #TEMP T2 ON T2.CLIENT_ID = T.CLIENT_ID AND T2.RECORD_ID = T.RECORD_ID - 1
ORDER BY T.RECORD_ID
WHERE T.SITE_NUM <> T2.SITE_NUM OR T2.SITE_NUM IS NULL
EDIT As suggested by #SteveB to add T2.SITE_NUM IS NULL to show last record too.

Related

How to make a SQL query for last transaction of every account where last_transaction amout is 500?

Say I have a table "transactions" that has columns "acct_id" "trans_date" and "trans_type" and I want to filter this table so that I have just the last transaction for each account. Clearly, I could do something like
SELECT acct_id, max(trans_date) as trans_date , max(time) as trans_time
FROM transactions
where transactions_amount = 500
GROUP BY acct_id ;
Is there a way to do this with a single query, hopefully, a generic method that would work with oracle?
That's typically done using window functions:
select acct_id, trans_date, trans_time
from (
select acct_id, trans_date, trans_time,
max(trans_date) over (partition by acct_id) as max_trans_date,
max(trans_time) over (partition by acct_id) as max_trans_time
from transactions
where ...
) t
where trans_date = max_trans_date
and trans_time = max_trans_time;
The above is 100% standard SQL and works on every modern DBMS.
Use a row_number aggregated function as follows
row_number() over (partition by acct_id order by trans_date desc) rn
list in PARTITION BY all columns that identifies the entity (in your case the account id)
on ORDER BY define the columns that order the transaction in descended order, so that the last row get the RN=1.
Please remember that if it is possible that more transactions can have e.g. identical timestamp, enhance the order columns so, that they with the partition by columns build a unique key. This is important to get * deterministic* and well defined result.
The final query selects only the rows with RN=1
The whole query with sample data
with trans as (
select 1 acct_id, date'2020-01-01' trans_date, 'CR' trans_type from dual union all
select 1 acct_id, date'2020-02-11' trans_date, 'DB' trans_type from dual union all
select 1 acct_id, date'2020-03-21' trans_date, 'CR' trans_type from dual union all
select 2 acct_id, date'2020-01-10' trans_date, 'DB' trans_type from dual union all
select 2 acct_id, date'2020-04-01' trans_date, 'CR' trans_type from dual),
trans2 as (
select t.*,
row_number() over (partition by acct_id order by trans_date desc) rn
from trans t)
select ACCT_ID, TRANS_DATE, TRANS_TYPE
from trans2
where rn = 1
ACCT_ID TRANS_DATE TR
---------- ------------------- --
1 21.03.2020 00:00:00 CR
2 01.04.2020 00:00:00 CR

Oracle SQL Hierarchy Summation

I have a table TRANS that contains the following records:
TRANS_ID TRANS_DT QTY
1 01-Aug-2020 5
1 01-Aug-2020 1
1 03-Aug-2020 2
2 02-Aug-2020 1
The expected output:
TRANS_ID TRANS_DT BEGBAL TOTAL END_BAL
1 01-Aug-2020 0 6 6
1 02-Aug-2020 6 0 6
1 03-Aug-2020 6 2 8
2 01-Aug-2020 0 0 0
2 02-Aug-2020 0 1 1
2 03-Aug-2020 1 0 1
Each trans_id starts with a beginning balance of 0 (01-Aug-2020). For succeeding days, the beginning balance is the ending balance of the previous day and so on.
I can create PL/SQL block to create the output. Is it possible to get the output in 1 SQL statement?
Thanks.
Try this following script using CTE-
Demo Here
WITH CTE
AS
(
SELECT DISTINCT A.TRANS_ID,B.TRANS_DT
FROM your_table A
CROSS JOIN (SELECT DISTINCT TRANS_DT FROM your_table) B
),
CTE2
AS
(
SELECT C.TRANS_ID,C.TRANS_DT,SUM(D.QTY) QTY
FROM CTE C
LEFT JOIN your_table D
ON C.TRANS_ID = D.TRANS_ID
AND C.TRANS_DT = D.TRANS_DT
GROUP BY C.TRANS_ID,C.TRANS_DT
ORDER BY C.TRANS_ID,C.TRANS_DT
)
SELECT F.TRANS_ID,F.TRANS_DT,
(
SELECT COALESCE (SUM(QTY), 0) FROM CTE2 E
WHERE E.TRANS_ID = F.TRANS_ID AND E.TRANS_DT < F.TRANS_DT
) BEGBAL,
(
SELECT COALESCE (SUM(QTY), 0) FROM CTE2 E
WHERE E.TRANS_ID = F.TRANS_ID AND E.TRANS_DT = F.TRANS_DT
) TOTAL ,
(
SELECT COALESCE (SUM(QTY), 0) FROM CTE2 E
WHERE E.TRANS_ID = F.TRANS_ID AND E.TRANS_DT <= F.TRANS_DT
) END_BAL
FROM CTE2 F
You can as well do like this (I would assume it's a bit faster): Demo
with
dt_between as (
select mindt + level - 1 as trans_dt
from (select min(trans_dt) as mindt, max(trans_dt) as maxdt from t)
connect by level <= maxdt - mindt + 1
),
dt_for_trans_id as (
select *
from dt_between, (select distinct trans_id from t)
),
qty_change as (
select distinct trans_id, trans_dt,
sum(qty) over (partition by trans_id, trans_dt) as total,
sum(qty) over (partition by trans_id order by trans_dt) as end_bal
from t
right outer join dt_for_trans_id using (trans_id, trans_dt)
)
select
trans_id,
to_char(trans_dt, 'DD-Mon-YYYY') as trans_dt,
nvl(lag(end_bal) over (partition by trans_id order by trans_dt), 0) as beg_bal,
nvl(total, 0) as total,
nvl(end_bal, 0) as end_bal
from qty_change q
order by trans_id, trans_dt
dt_between returns all the days between min(trans_dt) and max(trans_dt) in your data.
dt_for_trans_id returns all these days for each trans_id in your data.
qty_change finds difference for each day (which is TOTAL in your example) and cumulative sum over all the days (which is END_BAL in your example).
The main select takes END_BAL from previous day and calls it BEG_BAL, it also does some formatting of final output.
First of all, you need to generate dates, then you need to aggregate your values by TRANS_DT, and then left join your aggregated data to dates. The easiest way to get required sums is to use analitic window functions:
with dates(dt) as ( -- generating dates between min(TRANS_DT) and max(TRANS_DT) from TRANS
select min(trans_dt) from trans
union all
select dt+1 from dates
where dt+1<=(select max(trans_dt) from trans)
)
,trans_agg as ( -- aggregating QTY in TRANS
select TRANS_ID,TRANS_DT,sum(QTY) as QTY
from trans
group by TRANS_ID,TRANS_DT
)
select -- using left join partition by to get data on daily basis for each trans_id:
dt,
trans_id,
nvl(sum(qty) over(partition by trans_id order by dates.dt range between unbounded preceding and 1 preceding),0) as BEGBAL,
nvl(qty,0) as TOTAL,
nvl(sum(qty) over(partition by trans_id order by dates.dt),0) as END_BAL
from dates
left join trans_agg tr
partition by (trans_id)
on tr.trans_dt=dates.dt;
Full example with sample data:
alter session set nls_date_format='dd-mon-yyyy';
with trans(TRANS_ID,TRANS_DT,QTY) as (
select 1,to_date('01-Aug-2020'), 5 from dual union all
select 1,to_date('01-Aug-2020'), 1 from dual union all
select 1,to_date('03-Aug-2020'), 2 from dual union all
select 2,to_date('02-Aug-2020'), 1 from dual
)
,dates(dt) as ( -- generating dates between min(TRANS_DT) and max(TRANS_DT) from TRANS
select min(trans_dt) from trans
union all
select dt+1 from dates
where dt+1<=(select max(trans_dt) from trans)
)
,trans_agg as ( -- aggregating QTY in TRANS
select TRANS_ID,TRANS_DT,sum(QTY) as QTY
from trans
group by TRANS_ID,TRANS_DT
)
select
dt,
trans_id,
nvl(sum(qty) over(partition by trans_id order by dates.dt range between unbounded preceding and 1 preceding),0) as BEGBAL,
nvl(qty,0) as TOTAL,
nvl(sum(qty) over(partition by trans_id order by dates.dt),0) as END_BAL
from dates
left join trans_agg tr
partition by (trans_id)
on tr.trans_dt=dates.dt;
You can use a recursive query to generate the overall date range, cross join it with the list of distinct tran_id, then bring the table with a left join. The last step is aggregation and window functions:
with all_dates (trans_dt, max_dt) as (
select min(trans_dt), max(trans_dt) from trans group by trans_id
union all
select trans_dt + interval '1' day, max_dt from all_dates where trans_dt < max_dt
)
select
i.trans_id,
d.trans_dt,
coalesce(sum(sum(t.qty)) over(partition by i.trans_id order by d.trans_dt), 0) - coalesce(sum(t.qty), 0) begbal,
coalesce(sum(t.qty), 0) total,
coalesce(sum(sum(t.qty)) over(partition by i.trans_id order by d.trans_dt), 0) endbal
from all_dates d
cross join (select distinct trans_id from trans) i
left join trans t on t.trans_id = i.trans_id and t.trans_dt = d.trans_dt
group by i.trans_id, d.trans_dt
order by i.trans_id, d.trans_dt

Grouping duplicate rows and calculating effective and end dates

As per the attached sample, I have data of repeated rows with different date values. I would like to combine the duplicate records to reduce the number of rows and at the same time would like to calculate the end date of record.
“CountryCode” column should be used to combine the records and value changes in “CountryRiskLevel” or “RegionRiskLevel” columns should be used to define the start and end date ranges.
Database - SQL Server.
Try this query, I used slightly different sample data, but query will work for you as well:
;with SampleData as(
select 1 CountryCode,
1 RegionCode,
5 CountryRiskLevel,
5 RegionRiskLevel,
CONVERT(date, '2018-01-01') EffectiveDate
union all
select 1,1,5,5,CONVERT(date, '2018-01-02')
union all
select 1,1,5,5,CONVERT(date, '2018-01-03')
union all
select 1,1,5,5,CONVERT(date, '2018-01-04')
union all
select 1,1,2,2,CONVERT(date, '2018-01-05')
union all
select 1,1,5,5,CONVERT(date, '2018-01-06')
union all
select 1,1,5,5,CONVERT(date, '2018-01-07')
union all
select 1,1,5,3,CONVERT(date, '2018-01-08')
union all
select 1,1,5,5,CONVERT(date, '2018-01-09')
union all
select 1,1,5,5,CONVERT(date, '2018-01-10')
union all
select 1,1,5,5,CONVERT(date, '2018-01-11')
)
select CountryCode,
RegionCode,
CountryRiskLevel,
RegionRiskLevel,
MIN(effectiveDate) EffecticeStartDate,
case when MAX(effectiveDate) = MIN(effectiveDate) then MAX(dt) else MAX(effectiveDate) end EffectiveEndDate
from (
select *,
ROW_NUMBER() over (partition by CountryCode, RegionCode, CountryRiskLevel, RegionRiskLevel order by EffectiveDate) rn1,
ROW_NUMBER() over (order by EffectiveDate) rn2,
case when COUNT(*) over (partition by countrycode, RegionCode, CountryRiskLevel, RegionRiskLevel) = 1
then LEAD(effectivedate) over (order by effectivedate) end dt
from SampleData
) a group by CountryCode, RegionCode, CountryRiskLevel, RegionRiskLevel, rn2 - rn1

How to take last two maximum dates from multiple tables Oracle

I have 4 tables and I am trying to get Last two maximum dates from these 4 tables. I have listed my query below:
WITH LAST_ATT_DATE AS
(
SELECT NUMREF AS NUMBER1,DATE AS LAST_DATE
FROM TABLE1
WHERE NUM_REF='E1'
UNION
SELECT NUMREF AS NUMBER1,DATE AS LAST_DATE
FROM TABLE2
WHERE NUM_REF='E1'
UNION
SELECT NUMREF AS NUMBER1,DATE AS LAST_DATE
FROM TABLE3
WHERE NUMREF='E1'
UNION
SELECT NUMREF AS NUMBER1,DATE AS LAST_DATE
FROM TABLE4
WHERE NUMREF='E1'
)
SELECT MAX(decode(RANK,1,LAST_DATE)),MAX(decode(RANK,2,LAST_DATE))
FROM (SELECT NUMBER1,LAST_DATE,Row_Number() OVER(PARTITION BY NUMBER1
ORDER BY LAST_DATE DESC) AS RANK
FROM LAST_ATT_DATE) WHERE RANK <= 2
GROUP BY NUMBER1 ORDER BY NUMBER1;
For some records, it's working properly and for many records, it's showing the same date(only the first maximum date) even though it's having a 2nd maximum date.
Someone please correct this code or suggest any other alternative method.
Hoping, I understood the question correctly. Please check below query.
WITH LAST_ATT_DATE AS
(
SELECT NUMREF AS NUMBER1,DATE AS LAST_DATE
FROM TABLE1
WHERE NUM_REF='E1'
UNION
SELECT NUMREF AS NUMBER1,DATE AS LAST_DATE
FROM TABLE2
WHERE NUM_REF='E1'
UNION
SELECT NUMREF AS NUMBER1,DATE AS LAST_DATE
FROM TABLE3
WHERE NUMREF='E1'
UNION
SELECT NUMREF AS NUMBER1,DATE AS LAST_DATE
FROM TABLE4
WHERE NUMREF='E1'
)
SELECT NUMBER1 , MAX(CASE WHEN RN=1 THEN LAST_DATE END) LAST_DATE_1,
MAX(CASE WHEN RN=2 THEN LAST_DATE END) LAST_DATE_2
FROM (SELECT NUMBER1,LAST_DATE,Row_Number() OVER(
ORDER BY LAST_DATE DESC) AS RN
FROM LAST_ATT_DATE) WHERE RN <= 2
GROUP BY NUMBER1 ORDER BY NUMBER1;
You need to use dense_rank() rather than row_number():
SELECT NUMBER1,
MAX(CASE WHEN seqnum = 1 THEN LAST_DATE END),
MAX(CASE WHEN seqnum = 2 THEN LAST_DATE)
FROM (SELECT NUMBER1, LAST_DATE,
DENSE_RANK() OVER (PARTITION BY NUMBER1
ORDER BY LAST_DATE DESC
) AS seqnum
FROM LAST_ATT_DATE
) lad
WHERE seqnum <= 2
GROUP BY NUMBER1
ORDER BY NUMBER1;
Thanks all for your replies.
I got a solution to my question. I just truncated the date column and tried it's worked.
SELECT NUMREF AS NUMBER1,trunc(DATE) AS LAST_DATE
FROM TABLE1 WHERE NUMREF='E1';
Again thanks all.

Oracle SQL: Show entries from component tables once apiece

My objective is produce a dataset that shows a boatload of data from, in total, just shy of 50 tables, all in the same Oracle SQL database schema. Each table except the first consists of, as far as the report I'm building cares, two elements:
A foreign-key identifier that matches a row on the first table
A date
There may be many rows on one of these tables corresponding to one case, and it will NOT be the same number of rows from table to table.
My objective is to have each row in the first table show up as many times as needed to display all the results from the other tables once. So, something like this (except on a lot more tables):
CASE_FILE_ID INITIATED_DATE INSPECTION_DATE PAYMENT_DATE ACTION_DATE
------------ -------------- --------------- ------------ -----------
1000 10-JUL-1986 14-JUL-1987 10-JUL-1986
1000 14-JUL-1988 10-JUL-1987
1000 14-JUL-1989 10-JUL-1988
1000 10-JUL-1989
My current SQL code (shrunk down to five tables, but the rest all follow the same format as T1-T4):
SELECT DISTINCT
A.CASE_FILE_ID,
T1.DATE AS INITIATED_DATE,
T2.DATE AS INSPECTION_DATE,
T3.DATE AS PAYMENT_DATE,
T4.DATE AS ACTION_DATE
FROM
RECORDS.CASE_FILE A
LEFT OUTER JOIN RECORDS.INITIATE T1 ON A.CASE_FILE_ID = T1.CASE_FILE_ID
LEFT OUTER JOIN RECORDS.INSPECTION T2 ON A.CASE_FILE_ID = T2.CASE_FILE_ID
LEFT OUTER JOIN RECORDS.PAYMENT T3 ON A.CASE_FILE_ID = T3.CASE_FILE_ID
LEFT OUTER JOIN RECORDS.ACTION T4 ON A.CASE_FILE_ID = T4.CASE_FILE_ID
ORDER BY
A.CASE_FILE_ID
The problem is, the output this produces results in distinct combinations; so in the above example (where I added a 'WHERE' clause of A.CASE_FILE_ID = '1000'), instead of four rows for case 1000, it'd show twelve (1 Initiated Date * 3 Inspection Dates * 4 Payment Dates = 12 rows). Suffice it to say, as the number of tables increases, this would get very prohibitive in both display and runtime, very quickly.
What is the best way to get an output loosely akin to the ideal above, where any one date is only shown once? Failing that, is there a way to get it to only show as many lines for one CASE_FILE as it needs to show all the dates, even if some dates repeat within that?
There isn't a good way, but there are two ways. One method involves subqueries for each table and complex outer joins. The second involves subqueries and union all. Let's go with that one:
SELECT CASE_FILE_ID,
MAX(INITIATED_DATE) as INITIATED_DATE,
MAX(INSPECTION_DATE) as INSPECTION_DATE,
MAX(PAYMENT_DATE) as PAYMENT_DATE,
MAX(ACTION) as ACTION
FROM ((SELECT A.CASE_FILE_ID, NULL as INITIATED_DATE, NULL as INSPECTION_DATE,
NULL as PAYMENT_DATE, NULL as ACTION_DATE,
1 as seqnum
FROM RECORDS.CASE_FILE A
) UNION ALL
(SELECT T1.CASE_FILE_ID, DATE as INITIATED_DATE, NULL as INSPECTION_DATE,
NULL as PAYMENT_DATE, NULL as ACTION_DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.INITIATE
) UNION ALL
(SELECT T1.CASE_FILE_ID, NULL as INITIATED_DATE, DATE as INSPECTION_DATE,
NULL as PAYMENT_DATE, NULL as ACTION_DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.INSPECTION
) UNION ALL
(SELECT T1.CASE_FILE_ID, NULL as INITIATED_DATE, NULL as INSPECTION_DATE,
DATE as PAYMENT_DATE, NULL as ACTION_DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.PAYMENT
) UNION ALL
(SELECT T1.CASE_FILE_ID, NULL as INITIATED_DATE, NULL as INSPECTION_DATE,
NULL as PAYMENT_DATE, ACTION as ACTION_DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.ACTION
)
) a
GROUP BY CASE_FILE_ID, seqnum;
Hmmm, a closely related solution is easier to maintain:
SELECT CASE_FILE_ID,
MAX(CASE WHEN type = 'INITIATED' THEN DATE END) as INITIATED_DATE,
MAX(CASE WHEN type = 'INSPECTION' THEN DATE END) as INSPECTION_DATE,
MAX(CASE WHEN type = 'PAYMENT' THEN DATE END) as PAYMENT_DATE,
MAX(CASE WHEN type = 'ACTION' THEN DATE END) as ACTION
FROM ((SELECT A.CASE_FILE_ID, NULL as TYPE, NULL as DATE,
1 as seqnum
FROM RECORDS.CASE_FILE A
) UNION ALL
(SELECT T1.CASE_FILE_ID, 'INSPECTION', DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.INITIATE
) UNION ALL
(SELECT T1.CASE_FILE_ID, 'INSPECTION', DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.INSPECTION
) UNION ALL
(SELECT T1.CASE_FILE_ID, 'PAYMENT', DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.PAYMENT
) UNION ALL
(SELECT T1.CASE_FILE_ID, 'ACTION', DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.ACTION
)
) a
GROUP BY CASE_FILE_ID, seqnum;