Query database of events to find only events meeting parameters - sql

I have a dataset [Table_1] that records all events on a new row, meaning there are multiple entries for each customer_id. The structure is this;
customer_id
recorded_at
event_type
value
123-456-789
2022-05-28
status
open
123-456-789
2022-06-01
attribute
order_placed
123-456-789
2022-06-02
attribute
order_fulfilled
123-456-789
2022-06-04
status
closed
123-456-789
2022-06-05
attribute
order_placed
123-456-789
2022-06-07
attribute
order_fulfilled
123-456-789
2022-06-10
status
open
123-456-789
2022-06-11
attribute
order_placed
123-456-789
2022-06-12
attribute
order_fulfilled
123-456-789
2022-06-15
attribute
order_placed
123-456-789
2022-06-17
attribute
order_fulfilled
987-654-321
2022-06-12
status
open
987-654-321
2022-06-15
attribute
order_placed
987-654-321
2022-06-17
attribute
order_fulfilled
987-654-321
2022-06-17
status
closed
What I'm trying to do is write a query that returns the dates of the two attributes, order_placed and order_fulfilled after the last time the status went open. My approach is to query the dataset three times, first for all customers who went open, then returning the dates when the attributes are order_placed and order_cancelled, however I'm running into issues in returning all instances where the attributes are order_placed and order_fulfilled, not just the most recent one.
With d1 as (Select customer_id,recorded_at as open_time from Table_1 where event_type = 'status' and value = 'open')
Select d1.customer_id,
d1.open_time,
order_placed.order_placed_time,
order_fulfilled.order_filled_time
from d1
left join (Select customer_id,max(recorded_at) as order_placed_time from Table_1 where event_type = 'attribute' and value = 'order_placed') order_placed
on d1.customer_id = order_placed.customer_id and order_placed.order_placed_time > d1.open_time
left join (Select customer_id,max(recorded_at) as order_fulfilled_time from Table_1 where event_type = 'attribute' and value = 'order_fulfilled') order_filled
on d1.customer_id = order_filled.customer_id and order_filled.order_fulfilled_time > d1.open_time
where order_filled.order_fulfilled_time > order_placed.order_placed_time
However, this only returns the last time an order was placed and fulfilled after the status = open, not every instance where that happened. The output I am going for would look like:
customer_id
open_time
order_placed_time
order_filled_time
123-456-789
2022-05-28
2022-06-01
2022-06-01
123-456-789
2022-06-10
2022-06-11
2022-06-12
123-456-789
2022-06-10
2022-06-15
2022-06-17
987-654-321
2022-06-12
2022-06-15
2022-06-17

What I'm trying to do is write a query that returns the dates of the two attributes, order_placed and order_fulfilled after the last time the status went open.
Consider below query:
WITH orders AS (
SELECT *, SUM(IF(value IN ('open', 'closed'), 1, 0)) OVER w AS order_group
FROM sample
WINDOW w AS (PARTITION BY customer_id ORDER BY recorded_at, event_type)
)
SELECT customer_id, open_time, pre_recorded_at AS order_placed_time, recorded_at AS order_filled_time
FROM (
SELECT *, FIRST_VALUE(IF(value = 'open', recorded_at, NULL)) OVER w AS open_time,
LAG(recorded_at) OVER w AS pre_recorded_at,
FROM orders
WINDOW w AS (PARTITION BY customer_id, order_group ORDER BY recorded_at)
)
WHERE open_time IS NOT NULL AND value = 'order_fulfilled'
;
output will be:
Note: Due to transactions below in your dataset, orders CTE has a weired event_type column in ORDER BY clause. If you have more accurate timestamp recorded_at, it can be removed. I'll leave it to you.
WINDOW w AS (PARTITION BY customer_id ORDER BY recorded_at, event_type)
987-654-321 2022-06-17 attribute order_fulfilled
987-654-321 2022-06-17 status closed

One option to solve this problem is following these steps:
keep all rows found between an open and an end, hence remove the end and the others
assign a unique id to different couples of ("order_placed","order_fulfilled")
extract the values relative to "open_time", "order_placed_time" and "order_fulfilled_time" with a CASE statement in three separate fields
apply different aggregations over "open_time" and "order_placed/fulfilled_time" separately, as long as each "open_time" can have multiple couples of orders.
These four steps are implemented in two ctes.
The first cte includes:
the first COUNT, that allows to extract even values for the open/order_placed/order_fulfilled (orders following open) values and odd values for the closed/order_placed/order_fulfilled values (orders following closed):
the second COUNT, that allows to extract different values for each couple made of ("order_placed", "order_fulfilled")
SELECT *,
COUNT(CASE WHEN value = 'open' THEN 1
WHEN value = 'closed' THEN 0 END) OVER (
PARTITION BY customer_id
ORDER BY recorded_at, event_type
) AS status_value,
COUNT(CASE WHEN value <> 'order_fulfilled' THEN 1 END) OVER(
PARTITION BY customer_id
ORDER BY recorded_at, event_type
) AS order_value
FROM tab
The second cte includes:
a WHERE clause that filters out all rows that are found between a "closed" and an "open" value, first included, last excluded
the first MAX window function, that partitions on the customer and on the previous first COUNT function, to extract the "open_time" value
the second MAX window function, that partitions on the customer and on the previous second COUNT function, to extract the "order_placed_time" value
the third MAX window function, that partitions on the customer and on the previous second COUNT function, to extract the "order_fulfilled_time" value
SELECT customer_id,
MAX(CASE WHEN value = 'open' THEN recorded_at END) OVER(
PARTITION BY customer_id, status_value
) AS open_time,
MAX(CASE WHEN value = 'order_placed' THEN recorded_at END) OVER(
PARTITION BY customer_id, order_value
) AS order_placed_time,
MAX(CASE WHEN value = 'order_fulfilled' THEN recorded_at END) OVER(
PARTITION BY customer_id, order_value
) AS order_fulfilled_time
FROM cte
WHERE MOD(status_value, 2) = 1
Note that it is not possible to use the MAX aggregation functions with a unique GROUP BY clause because the first MAX and the other two MAX aggregate on different columns respectively.
The final query uses the ctes and adds up:
a selection of DISTINCT rows (we're aggregating the output of the window functions)
a filtering operation on rows with NULL values in either the "order_placed_time" or "order_fulfilled_time" (correspond to the old "open" rows).
WITH cte AS (
SELECT *,
COUNT(CASE WHEN value = 'open' THEN 1
WHEN value = 'closed' THEN 0 END) OVER (
PARTITION BY customer_id
ORDER BY recorded_at, event_type
) AS status_value,
COUNT(CASE WHEN value <> 'order_fulfilled' THEN 1 END) OVER(
PARTITION BY customer_id
ORDER BY recorded_at, event_type
) AS order_value
FROM tab
), cte2 AS(
SELECT customer_id,
MAX(CASE WHEN value = 'open' THEN recorded_at END) OVER(
PARTITION BY customer_id, status_value
) AS open_time,
MAX(CASE WHEN value = 'order_placed' THEN recorded_at END) OVER(
PARTITION BY customer_id, order_value
) AS order_placed_time,
MAX(CASE WHEN value = 'order_fulfilled' THEN recorded_at END) OVER(
PARTITION BY customer_id, order_value
) AS order_fulfilled_time
FROM cte
WHERE MOD(status_value, 2) = 1
)
SELECT DISTINCT *
FROM cte2
WHERE order_fulfilled_time IS NOT NULL
I'd recommend to check intermediate output steps for a deep understanding of this specific solution.

Consider yet another option
with order_groups as (
select *,
countif(value in ('open', 'closed')) over order_group_sorted as group_num,
countif(value = 'order_placed') over order_group_sorted as subgroup_num,
from your_table
window order_group_sorted as (partition by customer_id order by recorded_at, event_type)
)
select * except(subgroup_num) from (
select customer_id, recorded_at, value, subgroup_num,
max(if(value = 'open', recorded_at, null)) over order_group as open_time
from order_groups
window order_group as (partition by customer_id, group_num)
)
pivot (any_value(recorded_at) for value in ('order_placed', 'order_fulfilled'))
where not open_time || order_placed is null
if applied to sample data in your question - output is

with data as (
select *, sum(case when value = 'open' then 1 end) over (partition by customer_id) as grp
from T
)
select customer_id,
min(case when value = 'open' then recorded_at end) as open_time,
...
from data
group by customer_id, grp

Related

SQL - Find the min(date) since a category has its most recent value

I need some help with this problem.
Assuming I have following table:
contract_id
tariff_id
product_category
date (DD.MM.YYYY)
month (YYYYMM)
123456
ABC
small
01.01.2021
202101
123456
ABC
medium
01.02.2021
202102
123456
DEF
small
01.03.2021
202103
123456
DEF
small
01.04.2021
202104
123456
ABC
big
01.05.2021
202105
123456
DEF
small
01.06.2021
202106
123456
DEF
medium
02.06.2021
202106
123456
DEF
medium
01.07.2021
202107
The table is partitioned by month.
This is a part of my table containing multiple contract_ids.
I'm trying to figure out for every contract_id, since when it has its most recent tariff_id and since when it has the product_category_id='small' (if it doesn't have small as product category, the value should then be Null).
The results will be written into a table which gets updated every month.
So for the table above my latest results should look like this:
contract_id
same_tariff_id_since
product_category_small_since
123456
01.06.2021
NULL
I'm using Hive.
So far, I could only come up with this solution for same_tariff_id_since:
The problem is that it gives me absolute min(date) for the tariff_id and not the min(date) since the most recent tariff_id.
I think the code for product_category_small_since will have mostly the same logic.
My current code is:
SELECT q2.contract_id
, q3.tariff_id
, q2.date
FROM (
SELECT contract_id
, max(date_2) AS date
FROM (
SELECT contract_id
, date
, min(date) OVER (PARTITION BY tariff_id ORDER BY date) AS date_2
FROM given_table
)q1
WHERE date=date_2
GROUP BY contract_id
)q2
JOIN given_table AS q3
ON q2.contract_id=q3.contract_id
AND q2.date=q3.date
Thanks in advance.
One approach for solving this type of query is to do a grouping of the sequences you want to track. For the tariff_id sequence grouping, you want a new "sequence grouping id" for each time that the tariff id changes for a given contract id. Since the product_category can change independently, you need to do a sequence grouping id for that change as well.
Here's code to accomplish the task. This only returns the latest version of each contract and the specific columns you described in your latest results table. This was done against PostgreSQL 9.6, but the syntax and data types can probably be modified to be compatible with Hive.
https://www.db-fiddle.com/f/qSk3Mb9Xfp1NDo5VeA1qHh/8
select q2.contract_id
, to_char(min(q2."date (DD.MM.YYYY)")
over (partition by q2.contract_id, q2.contract_tariff_sequence_id), 'DD.MM.YYYY') as same_tariff_id_since
, to_char(min(case when q2.product_category = 'small' then q2."date (DD.MM.YYYY)" else null end)
over (partition by q2.contract_id, q2.contract_product_category_sequence_id), 'DD.MM.YYYY') as product_category_small_since
from(
select q1.*
, sum(case when q1.tariff_id = q1.prior_tariff_id then 0 else 1 end)
over (partition by q1.contract_id order by q1."date (DD.MM.YYYY)" rows unbounded preceding) as contract_tariff_sequence_id
, sum(case when q1.product_category = q1.prior_product_category then 0 else 1 end)
over (partition by q1.contract_id order by q1."date (DD.MM.YYYY)" rows unbounded preceding) as contract_product_category_sequence_id
from (
select *
, lag(tariff_id) over (partition by contract_id order by "date (DD.MM.YYYY)") as prior_tariff_id
, lag(product_category) over (partition by contract_id order by "date (DD.MM.YYYY)") as prior_product_category
, row_number() over (partition by contract_id order by "date (DD.MM.YYYY)" desc) latest_record_per_contract
from contract_tariffs
) q1
) q2
where latest_record_per_contract = 1
If you want to see all the rows and columns so you can examine how this works with the sequence grouping ids etc., you can modify the outer query slightly:
https://www.db-fiddle.com/f/qSk3Mb9Xfp1NDo5VeA1qHh/10
If this works for you, please mark as correct answer.

Find the true start end dates for customers that have multiple accounts in SQL Server 2014

I have a checking account table that contains columns Cust_id (customer id), Open_Date (start date), and Closed_Date (end date). There is one row for each account. A customer can open multiple accounts at any given point. I would like to know how long the person has been a customer.
eg 1:
CREATE TABLE [Cust]
(
[Cust_id] [varchar](10) NULL,
[Open_Date] [date] NULL,
[Closed_Date] [date] NULL
)
insert into [Cust] values ('a123', '10/01/2019', '10/15/2019')
insert into [Cust] values ('a123', '10/12/2019', '11/01/2019')
Ideally I would like to insert this into a table with just one row, that says this person has been a customer from 10/01/2019 to 11/01/2019. (as he opened his second account before he closed his previous one.
Similarly eg 2:
insert into [Cust] values ('b245', '07/01/2019', '09/15/2019')
insert into [Cust] values ('b245', '10/12/2019', '12/01/2019')
I would like to see 2 rows in this case- one that shows he was a customer from 07/01 to 09/15 and then again from 10/12 to 12/01.
Can you point me to the best way to get this?
I would approach this as a gaps and islands problem. You want to group together groups of adjacents rows whose periods overlap.
Here is one way to solve it using lag() and a cumulative sum(). Everytime the open date is greater than the closed date of the previous record, a new group starts.
select
cust_id,
min(open_date) open_date,
max(closed_date) closed_date
from (
select
t.*,
sum(case when not open_date <= lag_closed_date then 1 else 0 end)
over(partition by cust_id order by open_date) grp
from (
select
t.*,
lag(closed_date) over (partition by cust_id order by open_date) lag_closed_date
from cust t
) t
) t
group by cust_id, grp
In this db fiddle with your sample data, the query produces:
cust_id | open_date | closed_date
:------ | :--------- | :----------
a123 | 2019-10-01 | 2019-11-01
b245 | 2019-07-01 | 2019-09-15
b245 | 2019-10-12 | 2019-12-01
I would solve this with recursion. While this is certainly very heavy, it should accommodate even the most complex account timings (assuming your data has such). However, if the sample data provided is as complex as you need to solve for, I highly recommend sticking with the solution provided above. It is much more concise and clear.
WITH x (cust_id, open_date, closed_date, lvl, grp) AS (
SELECT cust_id, open_date, closed_date, 1, 1
FROM (
SELECT cust_id
, open_date
, closed_date
, row_number()
OVER (PARTITION BY cust_id ORDER BY closed_date DESC, open_date) AS rn
FROM cust
) AS t
WHERE rn = 1
UNION ALL
SELECT cust_id, open_date, closed_date, lvl, grp
FROM (
SELECT c.cust_id
, c.open_date
, c.closed_date
, x.lvl + 1 AS lvl
, x.grp + CASE WHEN c.closed_date < x.open_date THEN 1 ELSE 0 END AS grp
, row_number() OVER (PARTITION BY c.cust_id ORDER BY c.closed_date DESC) AS rn
FROM cust c
JOIN x
ON x.cust_id = c.cust_id
AND c.open_date < x.open_date
) AS t
WHERE t.rn = 1
)
SELECT cust_id, min(open_date) AS first_open_date, max(closed_date) AS last_closed_date
FROM x
GROUP BY cust_id, grp
ORDER BY cust_id, grp
I would also add the caveat that I don't run on SQL Server, so there could be syntax differences that I didn't account for. Hopefully they are minor, if present.
you can try something like that:
select distinct
cust_id,
(select min(Open_Date)
from Cust as b
where b.cust_id = a.cust_id and
a.Open_Date <= b.Closed_Date and
a.Closed_Date >= b.Open_Date
),
(select max(Closed_Date)
from Cust as b
where b.cust_id = a.cust_id and
a.Open_Date <= b.Closed_Date and
a.Closed_Date >= b.Open_Date
)
from Cust as a
so, for every row - you're selecting minimal and maximal dates from all overlapping ranges, later distinct filters out duplicates

How to get the difference between (multiple) two different rows?

I have a set of data containing some fields: month, customer_id, row_num (RANK), and verified_date.
The rank field indicates the first (1) and second (2) purchase of each customer. I would like to know the time difference between first and second purchase for each customer and show only its first month = month where row_num = 1.
https://i.ibb.co/PjJk5Y0/Capture.png
So my expected result is like below image:
https://i.ibb.co/y5Mww7k/Capture-2.png
I'm using StandardSQL in Google Bigquery.
row_num, verified_date
from table
GROUP BY 1, 2```
We can try using a pivot query here, aggregating by the customer_id:
SELECT
MAX(CASE WHEN row_num = 1 THEN month END) AS month,
customer_id,
1 AS row_num,
DATE_DIFF(MAX(CASE WHEN row_num = 2 THEN verified_date END),
MAX(CASE WHEN row_num = 1 THEN verified_date END), DAY) AS difference
FROM yourTable
GROUP BY
customer_id;

How to return all the rows in the yellow census blocks?

Hey the schema is like this: for the whole dataset, we should order by machine_id first, then order by ss2k. after that, for each machine, we should find all the rows with at least consecutively 5 flag = 'census'. In this dataset, the result should be all the yellow rows..
I cannot return the last 4 rows of the yellow blocks by using this:
drop table if exists qz_panel_census_228_rank;
create table qz_panel_census_228_rank as
select t.*
from (select t.*,
count(*) filter (where flag = 'census') over (partition by machine_id, date order by ss2k rows between current row and 4 following) as census_cnt5,
count(*) filter (where flag = 'census') over (partition by machine_id, date) as count_census,
row_number() over (partition by machine_id, date order by ss2k) as seqnum,
count(*) over (partition by machine_id, date) as cnt
from qz_panel_census_228 t
) t
where census_cnt5 = 5
group by 1,2,3,4,5,6,7,8,9,10,11
DISTRIBUTED BY (machine_id);
You were close, but you need to search in both directions:
select t.*
from (select t.*,
case when count(*) filter (where flag = 'census')
over (partition by machine_id, date
order by ss2k
rows between 4 preceding and current row) = 5
or count(*) filter (where flag = 'census')
over (partition by machine_id, date
order by ss2k
rows between current row and 4 following) = 5
then 1
else 0
end as flag
from qz_panel_census_228 t
) t
where flag = 1
Edit:
This approach will not work unless you add an extra count for each possible 5 row window, e.g. 3 preceding and 1 following, 2 preceding and 2 following, etc. This results in ugly code and is not very flexible.
The common way to solve this gaps & islands problem is to assign consecutive rows to a common group first:
select *
from
(
select t2.*,
count(*) over (partition by machine_id, date, grp) as cnt
from
(
select t1.*
from (select t.*,
-- keep the same number for 'census' rows
sum(case when flag = 'census' then 0 else 1 end)
over (partition by machine_id, date
order by ss2k
rows unbounded preceding) as grp
from qz_panel_census_228 t
) t1
where flag = 'census' -- only census rows
) as t2
) t3
where cnt >= 5 -- only groups of at least 5 census rows
Wow, there has to be a better way of doing this, but the only way I could figure out was to create blocks of consecutive 'census' values. This looks awful but might be a catalyst to a better idea.
with q1 as (
select
machine_id, recorded, ss2k, flag, date,
case
when flag = 'census' and
lag (flag) over (order by machine_id, ss2k) != 'census'
then 1
else 0
end as block
from foo
),
q2 as (
select
machine_id, recorded, ss2k, flag, date,
sum (block) over (order by machine_id, ss2k) as group_id,
case when flag = 'census' then 1 else 0 end as census
from q1
),
q3 as (
select
machine_id, recorded, ss2k, flag, date, group_id,
sum (census) over (partition by group_id order by ss2k) as max_count
from q2
),
groups as (
select group_id
from q3
group by group_id
having max (max_count) >= 5
)
select
q2.machine_id, q2.recorded, q2.ss2k, q2.flag, q2.date
from
q2
join groups g on q2.group_id = g.group_id
where
q2.flag = 'census'
If you run each query within the with clauses in isolation, I think you will see how this evolves.

Oracle SQL: Show entries from component tables once apiece

My objective is produce a dataset that shows a boatload of data from, in total, just shy of 50 tables, all in the same Oracle SQL database schema. Each table except the first consists of, as far as the report I'm building cares, two elements:
A foreign-key identifier that matches a row on the first table
A date
There may be many rows on one of these tables corresponding to one case, and it will NOT be the same number of rows from table to table.
My objective is to have each row in the first table show up as many times as needed to display all the results from the other tables once. So, something like this (except on a lot more tables):
CASE_FILE_ID INITIATED_DATE INSPECTION_DATE PAYMENT_DATE ACTION_DATE
------------ -------------- --------------- ------------ -----------
1000 10-JUL-1986 14-JUL-1987 10-JUL-1986
1000 14-JUL-1988 10-JUL-1987
1000 14-JUL-1989 10-JUL-1988
1000 10-JUL-1989
My current SQL code (shrunk down to five tables, but the rest all follow the same format as T1-T4):
SELECT DISTINCT
A.CASE_FILE_ID,
T1.DATE AS INITIATED_DATE,
T2.DATE AS INSPECTION_DATE,
T3.DATE AS PAYMENT_DATE,
T4.DATE AS ACTION_DATE
FROM
RECORDS.CASE_FILE A
LEFT OUTER JOIN RECORDS.INITIATE T1 ON A.CASE_FILE_ID = T1.CASE_FILE_ID
LEFT OUTER JOIN RECORDS.INSPECTION T2 ON A.CASE_FILE_ID = T2.CASE_FILE_ID
LEFT OUTER JOIN RECORDS.PAYMENT T3 ON A.CASE_FILE_ID = T3.CASE_FILE_ID
LEFT OUTER JOIN RECORDS.ACTION T4 ON A.CASE_FILE_ID = T4.CASE_FILE_ID
ORDER BY
A.CASE_FILE_ID
The problem is, the output this produces results in distinct combinations; so in the above example (where I added a 'WHERE' clause of A.CASE_FILE_ID = '1000'), instead of four rows for case 1000, it'd show twelve (1 Initiated Date * 3 Inspection Dates * 4 Payment Dates = 12 rows). Suffice it to say, as the number of tables increases, this would get very prohibitive in both display and runtime, very quickly.
What is the best way to get an output loosely akin to the ideal above, where any one date is only shown once? Failing that, is there a way to get it to only show as many lines for one CASE_FILE as it needs to show all the dates, even if some dates repeat within that?
There isn't a good way, but there are two ways. One method involves subqueries for each table and complex outer joins. The second involves subqueries and union all. Let's go with that one:
SELECT CASE_FILE_ID,
MAX(INITIATED_DATE) as INITIATED_DATE,
MAX(INSPECTION_DATE) as INSPECTION_DATE,
MAX(PAYMENT_DATE) as PAYMENT_DATE,
MAX(ACTION) as ACTION
FROM ((SELECT A.CASE_FILE_ID, NULL as INITIATED_DATE, NULL as INSPECTION_DATE,
NULL as PAYMENT_DATE, NULL as ACTION_DATE,
1 as seqnum
FROM RECORDS.CASE_FILE A
) UNION ALL
(SELECT T1.CASE_FILE_ID, DATE as INITIATED_DATE, NULL as INSPECTION_DATE,
NULL as PAYMENT_DATE, NULL as ACTION_DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.INITIATE
) UNION ALL
(SELECT T1.CASE_FILE_ID, NULL as INITIATED_DATE, DATE as INSPECTION_DATE,
NULL as PAYMENT_DATE, NULL as ACTION_DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.INSPECTION
) UNION ALL
(SELECT T1.CASE_FILE_ID, NULL as INITIATED_DATE, NULL as INSPECTION_DATE,
DATE as PAYMENT_DATE, NULL as ACTION_DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.PAYMENT
) UNION ALL
(SELECT T1.CASE_FILE_ID, NULL as INITIATED_DATE, NULL as INSPECTION_DATE,
NULL as PAYMENT_DATE, ACTION as ACTION_DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.ACTION
)
) a
GROUP BY CASE_FILE_ID, seqnum;
Hmmm, a closely related solution is easier to maintain:
SELECT CASE_FILE_ID,
MAX(CASE WHEN type = 'INITIATED' THEN DATE END) as INITIATED_DATE,
MAX(CASE WHEN type = 'INSPECTION' THEN DATE END) as INSPECTION_DATE,
MAX(CASE WHEN type = 'PAYMENT' THEN DATE END) as PAYMENT_DATE,
MAX(CASE WHEN type = 'ACTION' THEN DATE END) as ACTION
FROM ((SELECT A.CASE_FILE_ID, NULL as TYPE, NULL as DATE,
1 as seqnum
FROM RECORDS.CASE_FILE A
) UNION ALL
(SELECT T1.CASE_FILE_ID, 'INSPECTION', DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.INITIATE
) UNION ALL
(SELECT T1.CASE_FILE_ID, 'INSPECTION', DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.INSPECTION
) UNION ALL
(SELECT T1.CASE_FILE_ID, 'PAYMENT', DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.PAYMENT
) UNION ALL
(SELECT T1.CASE_FILE_ID, 'ACTION', DATE,
ROW_NUMBER() OVER (PARTITION BY CASE_FILE_ID ORDER BY DATE) as seqnum
FROM RECORDS.ACTION
)
) a
GROUP BY CASE_FILE_ID, seqnum;