Oracle SQL Left join same table unknown amount of times - sql

I have this table
| old | new |
|------|-------|
| a | b |
| b | c |
| d | e |
| ... | ... |
| aa | bb |
| bb | ff |
| ... | ... |
| 11 | 33 |
| 33 | 523 |
| 523 | 4444 |
| 4444 | 21444 |
The result I want to achieve is
| old | newest |
|------|--------|
| a | e |
| b | e |
| d | e |
| ... | |
| aa | ff |
| bb | ff |
| ... | |
| 11 | 21444 |
| 33 | 21444 |
| 523 | 21444 |
| 4444 | 21444 |
I can hard code the query to get the result that I want.
SELECT
older.old,
older.new,
newer.new firstcol,
newer1.new secondcol,
…
newerX-1.new secondlastcol,
newerX.new lastcol
from Table older
Left join Table newer
on older.old = newer.new
Left join Table newer1
on newer.new = newer1.old
…
Left join Table newerX-1
on newerX-2.new = newerX-1.old
Left join Table newerX
on newerX-1.new = newerX.old;
and then just take the first value from the right that is not null.
Illustrated here:
| old | new | firstcol | secondcol | thirdcol | fourthcol | | lastcol |
|------|-------|----------|-----------|----------|-----------|-----|---------|
| a | b | c | e | null | null | ... | null |
| b | c | e | null | null | null | ... | null |
| d | e | null | null | null | null | ... | null |
| ... | ... | ... | ... | ... | ... | ... | null |
| aa | bb | ff | null | null | null | ... | null |
| bb | ff | null | null | null | null | ... | null |
| ... | ... | ... | ... | ... | ... | ... | null |
| 11 | 33 | 523 | 4444 | 21444 | null | ... | null |
| 33 | 523 | 4444 | 21444 | null | null | ... | null |
| 523 | 4444 | 21444 | null | null | null | ... | null |
| 4444 | 21444 | null | null | null | null | ... | null |
The problem is that the length of "the replacement chain" is always changing (Can vary from 10 to 100).
There must be a better way to do this?

What you are looking for is a recursive query. Something like this:
with cte (old, new, lev) as
(
select old, new, 1 as lev from mytable
union all
select m.old, cte.new, cte.lev + 1
from mytable m
join cte on cte.old = m.new
)
select old, max(new) keep (dense_rank last order by lev) as new
from cte
group by old
order by old;
The recursive CTE creates all iterations (you can see this by replacing the query by select * from cte). And in the final query we get the last new per old with Oracle's KEEP LAST.
Rextester demo: http://rextester.com/CHTG34988

I'm trying to understand how you group your rows to determine different "newest" values. Are these the groupings you want based on the old field?
Group 1 - one letter (a, b, d)
Group 2 - two letters (aa, bb)
Group 3 - any number (11, 33, 523, 4444)
Is this correct? If so, you just need to group them by an expression and then use a window function MAX(). Something like this:
SELECT
"old",
MAX() OVER(PARTITION BY MyGrouping) AS newest
FROM (
SELECT
"old",
CASE
WHEN NOT IS_NUMERIC("old") THEN 'string' || CHAR_LENGTH("old") -- If string, group by string length
ELSE 'number' -- Otherwise, group as a number
END AS MyGrouping
FROM MyTable
) src
I don't know if Oracle has equivalents of the IS_NUMERIC and CHAR_LENGTH functions, so you need to check on that. If not, replace that expression with something similar, like this:
https://www.techonthenet.com/oracle/questions/isnumeric.php

Related

Replace nulls of a column with column value from another table

I have data flowing from two tables, table A and table B. I'm doing an inner join on a common column from both the tables and creating two more new columns based on different conditions. Below is a sample dataset:
Table A
| Id | StartDate |
|-----|------------|
| 119 | 01-01-2018 |
| 120 | 01-02-2019 |
| 121 | 03-05-2018 |
| 123 | 05-08-2021 |
TABLE B
| Id | CodeId | Code | RedemptionDate |
|-----|--------|------|----------------|
| 119 | 1 | abc | null |
| 119 | 2 | abc | null |
| 119 | 3 | def | null |
| 119 | 4 | def | 2/3/2019 |
| 120 | 5 | ghi | 04/7/2018 |
| 120 | 6 | ghi | 4/5/2018 |
| 121 | 7 | jkl | null |
| 121 | 8 | jkl | 4/4/2019 |
| 121 | 9 | mno | 3/18/2020 |
| 123 | 10 | pqr | null |
What I'm basically doing is joining the tables on column 'Id' when StartDate>2018 and create two new columns - 'unlock' by counting CodeId when RedemptionDate is null and 'Redeem' by counting CodeId when RedmeptionDate is not null. Below is the SQL query:
WITH cte1 AS (
SELECT a.id, COUNT(b.CodeId) AS 'Unlock'
FROM TableA AS a
JOIN TableB AS b ON a.Id=b.Id
WHERE YEAR(a.StartDate) >= 2018 AND b.RedemptionDate IS NULL
GROUP BY a.id
), cte2 AS (
SELECT a.id, COUNT(b.CodeId) AS 'Redeem'
FROM TableA AS a
JOIN TableB AS b ON a.Id=b.Id
WHERE YEAR(a.StartDate) >= 2018 AND b.RedemptionDate IS NOT NULL
GROUP BY a.id
)
SELECT cte1.Id, cte1.Unlocked, cte2.Redeemed
FROM cte1
FULL OUTER JOIN cte2 ON cte1.Id = cte2.Id
If I break down the output of this query, result from cte1 will look like below:
| Id | Unlock |
|-----|--------|
| 119 | 3 |
| 121 | 1 |
| 123 | 1 |
And from cte2 will look like below:
| Id | Redeem |
|-----|--------|
| 119 | 1 |
| 120 | 2 |
| 121 | 2 |
The last select query will produce the following result:
| Id | Unlock | Redeem |
|------|--------|--------|
| 119 | 3 | 1 |
| null | null | 2 |
| 121 | 1 | 2 |
| 123 | 1 | null |
How can I replace the null value from Id with values from 'b.Id'? If I try coalesce or a case statement, they create new columns. I don't want to create additional columns, rather replace the null values from the column values coming from another table.
My final output should like:
| Id | Unlock | Redeem |
|-----|--------|--------|
| 119 | 3 | 1 |
| 120 | null | 2 |
| 121 | 1 | 2 |
| 123 | 1 | null |
If I'm following correctly, you can use apply with aggregation:
select a.*, b.*
from a cross apply
(select count(RedemptionDate) as num_redeemed,
count(*) - count(RedemptionDate) as num_unlock
from b
where b.id = a.id
) b;
However, the answer to your question is to use coalesce(cte1.id, cte2.id) as id.

Calculate a column value backwards over a series of previous rows/RECURSIVE/CONNECTED BY

need your help. I guess/hope there is a function for that. I found "CONNECT DBY" and "WITH RECURSIVE AS ..." but it doesn't seem to solve my problem.
GIVEN TABLES:
Table A
+------+------------+----------+
| id | prev_id | date |
+------------------------------+
| 1 | | 20200101 |
| 23 | 1 | 20200104 |
| 34 | 23 | 20200112 |
| 41 | 34 | 20200130 |
+------------------------------+
Table B
+------+-----------+
| ref_id | key |
+------------------+
| 41 | abc |
+------------------+
(points always to the lates entry in table "A". Update, no history)
Join Statement:
SELECT
id, prev_id, key, date
FROM A
LEFT OUTER JOIN B ON B.ref_id = A.id
GIVEN psql result set:
+------+------------+----------+-----------+
| id | prev_id | key | date |
+------------------------------+-----------+
| 1 | | | 20200101 |
| 23 | 1 | | 20200104 |
| 34 | 23 | | 20200112 |
| 41 | 34 | abc | 20200130 |
+------------------------------+-----------+
DESIRED output:
+------+------------+----------+-----------+
| id | prev_id | key | date |
+------------------------------+-----------+
| 1 | | abc | 20200101 |
| 23 | 1 | abc | 20200104 |
| 34 | 23 | abc | 20200112 |
| 41 | 34 | abc | 20200130 |
+------------------------------+-----------+
The rows of the result set are connected by columns 'id' and 'prev_id'.
I want to calculate the "key" column in a reasonable time.
Keep in mind, this is a very simplified example. Normally there are a lot of more rows and different keys and id's
I understand that you want to bring the hierarchy of each row in tableb. Here is one approach using a recursive query:
with recursive cte as (
select a.id, a.prev_id, a.date, b.key
from tablea a
inner join tableb b on b.ref_id = a.id
union all
select a.id, a.prev_id, a.date, c.key
from cte c
inner join tablea a on a.id = c.prev_id
)
select * from cte

How to fill forward time series data in Postgres

I am looking to join three tables together and fill forward null values on the resulting table.
Three tables:
Table 1 (raw.fb_historical_data) - this is the main table on which I would like to join the other two on to. Each row of this table is related to one or more rows in the other two tables through a combination of columns id, clk and timestamp (mkt_id and row_id in the other tables).
+---------------------+-----+-----+--------------+
| timestamp | clk | id | some_columns |
+---------------------+-----+-----+--------------+
| 2016-06-19 06:11:13 | 123 | 126 | a |
| 2016-06-19 06:16:13 | 124 | 127 | b |
| 2016-06-19 06:21:13 | 234 | 126 | c |
| 2016-06-19 06:41:13 | 456 | 127 | d |
| ... | ... | ... | ... |
+---------------------+-----+-----+--------------+
Table 2 (raw.fb_runner_changes) - this table essentially gives price changes for a wide range of different markets
+---------------------+--------+--------+-------+
| timestamp | row_id | mkt_id | price |
+---------------------+--------+--------+-------+
| 2016-06-19 06:11:13 | 123 | 126 | 1 |
| 2016-06-19 06:21:13 | 123 | 126 | 2 |
| 2016-06-19 06:41:13 | 123 | 126 | 3 |
| 2016-06-06 18:54:06 | 124 | 127 | 1 |
| 2016-06-06 18:56:06 | 124 | 127 | 2 |
| 2016-06-06 18:57:06 | 124 | 127 | 3 |
| ... | ... | ... | ... |
+---------------------+--------+--------+-------+
Table 3 (raw.fb_runners) - a table with extra information about market changes that I would like to join
+---------------------+--------+--------+---------------+
| timestamp | row_id | mkt_id | other_columns |
+---------------------+--------+--------+---------------+
| 2016-06-19 06:15:13 | 234 | 126 | ab |
| 2016-06-19 06:31:13 | 234 | 126 | cd |
| 2016-06-19 06:56:13 | 234 | 126 | ef |
| 2016-06-06 18:54:06 | 456 | 127 | gh |
| 2016-06-06 18:56:06 | 456 | 127 | jk |
| 2016-06-06 18:57:06 | 456 | 127 | lm |
| ... | ... | ... | ... |
+---------------------+--------+--------+---------------+
Essentially what I want to do is fill NULL information forward (ordered by timestamp) while grouping by market id.
So far, I have tried to join the tables together using
SELECT *
FROM raw.fb_historical_data AS h
LEFT JOIN raw.fb_runner_changes AS rc
ON rc.row_id = h.clk
AND rc.timestamp = h.timestamp
AND rc.mkt_id = h.id
LEFT JOIN raw.fb_runners AS r
ON r.row_id = h.clk
AND r.timestamp = h.timestamp
AND r.mkt_id = h.id
Which has worked as intended, though now there are nulls in the resulting dataset which i'd like to fill in with the last available value for that market.
With some of the other SQL dialects, fill forward could be done using the window function last_value in combination with the instruction ignore nulls.
Since this is not supported in PostgreSQL (check the note at the bottom of this page), we are using a 2 steps work-around.
select ts, val, val_seq, min(val) over (partition by val_seq) val_fill_fw
from (select ts, val, count(val) over(order by ts) as val_seq
from t
) t
-
+----+----------+---------+-------------+
| ts | val | val_seq | val_fill_fw |
+----+----------+---------+-------------+
| 1 | (null) | 0 | (null) |
| 2 | (null) | 0 | (null) |
| 3 | hello | 1 | hello |
| 4 | (null) | 1 | hello |
| 5 | (null) | 1 | hello |
| 6 | darkness | 2 | darkness |
| 7 | my | 3 | my |
| 8 | (null) | 3 | my |
| 9 | old | 4 | old |
| 10 | (null) | 4 | old |
| 11 | (null) | 4 | old |
| 12 | (null) | 4 | old |
| 13 | friend | 5 | friend |
| 14 | (null) | 5 | friend |
+----+----------+---------+-------------+
SQL Fiddle
This seems to correctly do 'forward fill' in postgres. However I am a postgres newbie so I would appreciate feedback if it's wrong.
DROP TABLE IF EXISTS example;
create temporary table example(id int, str text, val integer);
insert into example values
(1, 'a', null),
(1, null, 1),
(2, 'b', 2),
(2,null ,null );
select * from example
select id, (case
when str is null
then lag(str,1) over (order by id)
else str
end) as str,
(case
when val is null
then lag(val,1) over (order by id)
else val
end) as val
from example

db2, roll up unknown number of rows from case statement result

I am trying to write a query where I can concatenate some rows into a single column based on the result of the case statement in DB2 v9.5
The contractId can be a variable number of rows as well.
Given I have the following table structure
Table1
+------------+------------+------+
| ContractId | Reference | Code |
+------------+------------+------+
| 12 | P123456789 | A |
| 12 | A987654321 | B |
| 12 | 9995559971 | C |
| 12 | 3215654778 | D |
| 13 | abcdef | A |
| 15 | asdfa | B |
| 37 | 282jd | B |
| 89 | asdf82 | C |
+------------+------------+------+
I would like to get the output of the result like so
+-------------+-----------------------+------------------------------------+
| ContractId | Reference with Code A | Other References |
+-------------+-----------------------+------------------------------------+
| 12 | P123456789 | A987654321, 9995559971, 3215654778 |
| 13 | abcdef | asdfa, 282jd, asdf82 |
+-------------+-----------------------+------------------------------------+
I've tried queries like
select t1.contract_id,
max(case when t1.code = A then t1.reference end) as "reference with code a",
max(case when t1.code in ('B','C','D') then t1.reference end) as 'other references
from table t1
group by t1.contractId
however, this is still giving me an output like
+-------------+-----------------------+------------------+
| ContractId | Reference with Code A | Other References |
+-------------+-----------------------+------------------+
| 12 | P123456789 | null |
| 12 | null | A987654321 |
| 12 | null | 9995559971 |
| 12 | null | 3215654778 |
+-------------+-----------------------+------------------+
I've also attempted using some of the XML Agg functions but can't seem to get it to format the way I want it too.

Joining two tables and calculating divide-SUM from the resulting table in SQL Server

I have one table that looks like this:
+---------------+---------------+-----------+-------+------+
| id_instrument | id_data_label | Date | Value | Note |
+---------------+---------------+-----------+-------+------+
| 1 | 57 | 1.10.2010 | 200 | NULL |
| 1 | 57 | 2.10.2010 | 190 | NULL |
| 1 | 57 | 3.10.2010 | 202 | NULL |
| | | | | |
+---------------+---------------+-----------+-------+------+
And the other that looks like this:
+----------------+---------------+---------------+--------------+-------+-----------+------+
| id_fundamental | id_instrument | id_data_label | quarter_code | value | AnnDate | Note |
+----------------+---------------+---------------+--------------+-------+-----------+------+
| 1 | 1 | 20 | 20101 | 3 | 28.2.2010 | NULL |
| 2 | 1 | 20 | 20102 | 4 | 1.8.2010 | NULL |
| 3 | 1 | 20 | 20103 | 5 | 2.11.2010 | NULL |
| | | | | | | |
+----------------+---------------+---------------+--------------+-------+-----------+------+
What I would like to do is to merge/join these two tables in one in a way that I get something like this:
+------------+--------------+--------------+----------+--------------+
| Date | Table1.Value | Table2.Value | AnnDate | quarter_code |
+------------+--------------+--------------+----------+--------------+
| 1.10.2010. | 200 | 3 | 1.8.2010 | 20102 |
| 2.10.2010. | 190 | 3 | 1.8.2010 | 20102 |
| 3.10.2010. | 202 | 3 | 1.8.2010 | 20102 |
| | | | | |
+------------+--------------+--------------+----------+--------------+
So the idea is to order them by Date from Table1 and since Table2 Values only change on the change of AnnDate we populate the Resulting table with same values from Table2.
After that I would like to go through the resulting table and create another (Final table) with the following.
On Date 1.10.2010. take last 4 AnnDates (so it would be 1.8.2010. and f.e. 20.3.2010. 30.1.2010. 15.11.2009) and Table2 values on those AnnDate. Make SUM of those 4 values and then divide the Table1 Value with that SUM.
So we would get something like:
+-----------+---------------------------------------------------------------+
| Date | FinalValue |
+-----------+---------------------------------------------------------------+
| 1.10.2010 | 200/(Table2.Value on 1.8.2010+Table2.Value on 20.3.2010 +...) |
| | |
+-----------+---------------------------------------------------------------+
Is there any way this can be done?
EDIT:
Hmm yes now I see that I really didn't do a good job explaining it.
What I wanted to say is
I try INNER JOIN like this:
SELECT TableOne.Date, TableOne.Value, TableTwo.Value, TableTwo.AnnDate, TableTwo.quarter_code
FROM TableOne
INNER JOIN TableTwo ON TableOne.id_intrument=TableTwo.id_instrument WHERE TableOne.id_data_label = somevalue AND TableTwo.id_data_label = somevalue AND date > xxx AND date < yyy
And this inner join returns 2620*40 rows which means for every AnnDate from table2 it returns all Date from table1.
What I want is to return 2620 values with Dates from Table1
Values from table1 on that date and Values from table2 that respond to that period of dates
f.e.
Table1:
+-------+-------+
| Date | Value |
+-------+-------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
+-------+-------+
Table2
+-------+---------+
| Value | AnnDate |
+-------+---------+
| x | 1 |
| y | 4 |
+-------+---------+
Resulting table:
+-------+---------+---------+
| Date | ValueT1 | ValueT2 |
+-------+---------+---------+
| 1 | a | x |
| 2 | b | x |
| 3 | c | x |
| 4 | d | y |
+-------+---------+---------+
You need a JOIN statement for your first query. Try:
SELECT TableOne.Date, TableOne.Value, TableTwo.Value, TableTwo.AnnDate, TableTwo.quarter_code FROM TableOne
INNER JOIN TableTwo
ON TableOne.id_intrument=TableTwo.id_instrument;