Recursive calculation in SQL (Oracle) - sql

I'm having a tough time finding a solution to ETL some data into my resulting table. I think I cannot accomplish this using pure SQL and need to use PL-SQL due to the looping. Could the sql gurus help me go towards the right direction or provide some pointers to solve this problem?
Here's the scenario:
Tables: TABLEA and TABLEB.
Steps:
Group records in TABLEA by A_CD and SUM the A_AMT FIELD. (Lets assume A_FLAG is always same for any A_CD.). Lets call the grouped resultset as TABLEA_GRP (This is not a table, it is a grouped query).
Pick a row from TABLEB and if B_FLG is 'N' then pick all rows in TABLEA_GRP where A_FLG is 'N'. If the B_FLG is 'Y' then pick all rows in TABLEA_GRP.
Starting first record of rows picked in step 2, calculate the ratio of its TOTAL_AMT to SUM of ALL TOTAL_AMT for the selected rows. Multiply the ratio to B_AMT and add resulting amount to the rows TOTAL_AMT and store in RESULTING_AMT. Repeat this calculation for all rows picked in step 2.
Repeat step 2 and 3, now using the starting TOTAL_AMT VALUE from the RESULTING_AMT value from previous calculation of the same A_CD.
RESULTING _RATIO field is not needed to be saved, it is just given for demo purpose. How would you do this?
Basically I want to get data in RESULTING_TABLE from TABLEA and TABLEB
Could anyone help? Thanks a lot in advance for any guidance.
EDIT: I added A_DATE and B_DATE for supporting join between the two tables. For simplicity you can just do A.A_DATE = B.B_DATE, example this basic join:
SELECT
A.A_CD,
SUM(A.A_AMT) AS TOTAL_AMT,
A.A_FLAG,
A.A_DATE,
B.B_ID,
B.B_AMT,
B.B_FLAG
FROM
TABLEA A
JOIN TABLEB B
ON A.A_DATE = B.B_DATE
GROUP BY
A.A_CD,
A.A_FLAG,
A.A_DATE,
B.B_ID,
B.B_AMT,
B.B_FLAG
;

Okay I think I've got the solution. The numbers are a bit different to yours, but I'm fairly sure mine is doing what you want. We can do everything in steps 1 & 2 using a single query (main_sql). 3 and 4 have to be done using a recursive statement (recur_sql).
with main_sql as (
select a.*,
b.*,
sum(a_amt) over (partition by b_id) as cd_amt,
rank() over (partition by a_cd order by b_id) as rnk
from (select a_cd, a_flag, sum(a_amt) as a_amt
from tablea
group by a_cd, a_flag) a,
tableb b
where a.a_flag = case when b.b_flag = 'Y' then a.a_flag else b.b_flag end
order by b_id, a_cd
),
recur_sql (a_cd, b_id, total_amt, cd_amt, resulting_ratio, resulting_amt, rnk) as (
select m.a_cd,
m.b_id,
m.a_amt as total_amt,
m.cd_amt, m.a_amt / m.cd_amt as resulting_ratio,
m.a_amt + (m.a_amt / m.cd_amt * m.b_amt) as resulting_amt,
rnk
from main_sql m
where rnk = 1
union all
select m.a_cd,
m.b_id,
r.resulting_amt as total_amt,
m.cd_amt,
r.resulting_amt / m.cd_amt as resulting_ratio,
r.resulting_amt + (r.resulting_amt / m.cd_amt * m.b_amt) as resulting_amt,
m.rnk
from recur_sql r,
main_sql m
where m.rnk > 1
and r.a_cd = m.a_cd
and m.rnk - 1 = r.rnk
)
select a_cd, b_id, total_amt, resulting_ratio, resulting_amt
from recur_sql
order by 2, 1

Related

SQL query to join smae table multiple times

I have a scenario to join the same table multiple times to get the desired output. For ex I have two tables TABLE A and TABLE B.
Step 1: I want to take the all the parties from TABLE A which have
lowest Idate. Lowest idate will be fetched based partyid and idate
column.
Step 2: Then based on CID which is fetched from TABLE A in step 1,
we need to fetch the corresponding MID from TABLE B which have
MIDTYPE=130300.
Step 3: Then based on the MID fetched in step 2 we need to traverse
the same table and find out the latest record for the same MID based
on idate in TABLE B and fetch the corresponding CID for the MID.
Step 4: Now for that CID we need to fetch MID value for MIDTYPE
130307 in the same table(TABLEB). And my final output should be combination of MID
which we fetched for step 3 and MID fetched for 130307 in step 4.
I write a query like this ..but its taking lot of time for the query to run as we are going through the same table(TABLEB) multiple times and TABLEB have millions of rows. Is there anyway we can rewrite this query in different way. Could some one can help with this me.
SELECT
ident.mid mid1,
b.mid mid2
FROM
(
SELECT
*
FROM
tableb
WHERE
midtype = 130307
) ident
INNER JOIN (
SELECT
s.cid,
s.mid,
s.midtype
FROM
(
SELECT
cid,
partyid,
admin_sys_tp_cd,
mid,
ilast
FROM
(
SELECT
cq.cid,
RANK() OVER(
PARTITION BY cq.partyid
ORDER BY
cq.idate ASC
) rnk,
cq.idate,
cq.partyid,
i.mid,
i.idate AS ilast
FROM
tablea cq
INNER JOIN tableb i ON cq.cid = i.cid
INNER JOIN tablec ON i.cid = c.cid
WHERE
i.midtype = 130300
)
WHERE
rnk = 1
) a
INNER JOIN (
SELECT
*
FROM
(
SELECT
cid,
mid,
midtype,
RANK() OVER(
PARTITION BY mid
ORDER BY
idate DESC
) rnk_mpid
FROM
tableb
)
WHERE
rnk_mpid = 1
) s ON a.mid = s.mid
AND s.midtype = 130300
) b ON ident.cid = b.cid
AND ident.midtype = 130307
not what you asked, but before others and I, spent time trying to get different approaches for you, let's make sure the basics are covered.
No matter how different you can write an SQL query, they will never perform fast, in a MILLION base table if you don't have the proper indexes for it. Specially in your case, as you have to access it 3 times at least.
Just by looking at your detailed steps. I would say that you should have at least 3 different indexes created to support this query.
TableA_Index1 ( PARTYID, LDATE, INCLUDES CID)
TableB_Index1 (CID, MIDTYPE, INCLUDES MID )
TableB_Index2 (MID, LDATE, INCLUDES CID )
Do you have them ?
Have you ever tried to run this query on db2-advisor (db2advis) to get recommended indexes for it ?

Recursive subtraction from two separate tables to fill in historical data

I have two datasets hosted in Snowflake with social media follower counts by day. The main table we will be using going forward (follower_counts) shows follower counts by day:
This table is live as of 4/4/2020 and will be updated daily. Unfortunately, I am unable to get historical data in this format. Instead, I have a table with historical data (follower_gains) that shows net follower gains by day for several accounts:
Ideally - I want to take the follower_count value from the minimum date in the current table (follower_counts) and subtract the sum of gains (organic + paid gains) for each day, until the minimum date of the follower_gains table, to fill in the follower_count historically. In addition, there are several accounts with data in these tables, so it would need to be grouped by account. It should look like this:
I've only gotten as far as unioning these two tables together, but don't even know where to start with looping through these rows:
WITH a AS (
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
total_followers_count,
null AS paid_follower_gain,
null AS organic_follower_gain,
account_name,
last_update
FROM follower_counts
UNION ALL
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
null AS total_followers_count,
organic_follower_gain,
paid_follower_gain,
account_name,
last_update
FROM follower_gains)
SELECT
a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.total_followers_count,
a.organic_follower_gain,
a.paid_follower_gain,
a.account_name,
a.last_update
FROM a
ORDER BY date desc LIMIT 100
UPDATE: Changed union to union all and added not exists to remove duplicates. Made changes per the comments.
NOTE: Please make sure you don't post images of the tables. It's difficult to recreate your scenario to write a correct query. Test this solution and update so that I can make modifications if necessary.
You don't loop through in SQL because its not a procedural language. The operation you define in the query is performed for all the rows in a table.
with cte as (SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
(a.follower_count - (b.organic_gain+b.paid_gain)) AS follower_count,
a.account_name,
a.last_update,
b.organic_gain,
b.paid_gain
FROM follower_counts a
JOIN follower_gains b ON a.account_id = b.account_id
AND b.date < (select min(date) from
follower_counts c where a.account.id = c.account_id)
)
SELECT b.account_id,
b.date,
b.organizational_entity,
b.organizational_entity_type,
b.vanity_name,
b.localized_name,
b.localized_website,
b.organization_type,
b.follower_count,
b.account_name,
b.last_update,
b.organic_gain,
b.paid_gain
FROM cte b
UNION ALL
SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.follower_count,
a.account_name,
a.last_update,
NULL as organic_gain,
NULL as paid_gain
FROM follower_counts a where not exists (select 1 from
follower_gains c where a.account_id = c.account_id AND a.date = c.date)
You could do something like this, instead of using the variable you can just wrap it another bracket and write at end ) AS FollowerGrowth
DECLARE #FollowerGrowth INT =
( SELECT total_followers_count
FROM follower_gains
WHERE AccountID = xx )
-
( SELECT TOP 1 follower_count
FROM follower_counts
WHERE AccountID = xx
ORDER BY date ASCENDING )

Find duplicates in MS SQL table

I know that this question has been asked several times but I still cannot figure out why my query is returning values which are not duplicates. I want my query to return only the records which have identical value in the column Credit. The query executes without any errors but values which are not duplicated are also being returned. This is my query:
Select
_bvGLTransactionsFull.AccountDesc,
_bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate,
_bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit,
_bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName
From
_bvGLAccountsFinancial Inner Join
_bvGLTransactionsFull On _bvGLAccountsFinancial.AccountLink =
_bvGLTransactionsFull.AccountLink
Where
_bvGLTransactionsFull.Credit
IN
(SELECT Credit AS NumOccurrences
FROM _bvGLTransactionsFull
GROUP BY Credit
HAVING (COUNT(Credit) > 1 ) )
Group By
_bvGLTransactionsFull.AccountDesc, _bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate, _bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit, _bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(_bvGLTransactionsFull.Reference), _bvGLTransactionsFull.TrCode
Having
_bvGLTransactionsFull.TxDate > 01 / 11 / 2014 And
_bvGLTransactionsFull.Reference Like '5_____' And
_bvGLTransactionsFull.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
That's because you're matching on the credit field back to your table, which contains duplicates. You need to isolate the rows that are duplicated with ROW_NUMBER:
;WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY CREDIT ORDER BY (SELECT NULL)) AS RN
FROM _bvGLTransactionsFull)
Select
CTE.AccountDesc,
_bvGLAccountsFinancial.Description,
CTE.TxDate,
CTE.Description,
CTE.Credit,
CTE.Reference,
CTE.UserName
From
_bvGLAccountsFinancial Inner Join
CTE On _bvGLAccountsFinancial.AccountLink = CTE.AccountLink
WHERE CTE.RN > 1
Group By
CTE.AccountDesc, _bvGLAccountsFinancial.Description,
CTE.TxDate, CTE.Description,
CTE.Credit, CTE.Reference,
CTE.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(CTE.Reference), CTE.TrCode
Having
CTE.TxDate > 01 / 11 / 2014 And
CTE.Reference Like '5_____' And
CTE.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
Just as a side note, I would consider using aliases to shorten your queries and make them more readable. Prefixing the table name before each column in a join is very difficult to read.
I trust your code in terms of extracting all data per your criteria. With this, let me have a different approach and see your script "as-is". So then, lets keep first all the records in a temp.
Select
_bvGLTransactionsFull.AccountDesc,
_bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate,
_bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit,
_bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName
-- temp table
INTO #tmpTable
From
_bvGLAccountsFinancial Inner Join
_bvGLTransactionsFull On _bvGLAccountsFinancial.AccountLink =
_bvGLTransactionsFull.AccountLink
Where
_bvGLTransactionsFull.Credit
IN
(SELECT Credit AS NumOccurrences
FROM _bvGLTransactionsFull
GROUP BY Credit
HAVING (COUNT(Credit) > 1 ) )
Group By
_bvGLTransactionsFull.AccountDesc, _bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate, _bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit, _bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(_bvGLTransactionsFull.Reference), _bvGLTransactionsFull.TrCode
Having
_bvGLTransactionsFull.TxDate > 01 / 11 / 2014 And
_bvGLTransactionsFull.Reference Like '5_____' And
_bvGLTransactionsFull.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
Then remove the "single occurrence" data by creating a row index and remove all those 1 time indexes.
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY Credit ORDER BY Credit) AS rowIdx
, *
FROM #tmpTable) AS innerTmp
WHERE
rowIdx != 1
You can change your preference through PARTITION BY <column name>.
Should you have any concerns, please raise it first as these are so far how I understood your case.
EDIT : To include those credits that has duplicates.
SELECT
tmp1.*
FROM #tmpTable tmp1
RIGHT JOIN (
SELECT
Credit
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY Credit ORDER BY Credit) AS rowIdx
, *
FROM #tmpTable) AS innerTmp
WHERE
rowIdx != 1
) AS tmp2
ON tmp1.Credit = tmp2.Credit

Compare and find differences in two tables in Oracle

I have 2 tables:
account: ID, ACC, AE_CCY, DRCR_IND, AMOUNT, MODULE
flex: ID, ACC, AE_CCY, DRCR_IND, AMOUNT, MODULE
I want to show differences comparing only by: AE_CCY, DRCR_IND, AMOUNT, MODULE and ACC by first 4 characters
Example:
ID ACC AE_CCY DRCR_IND AMOUNT MODULE
-- --------- ------ -------- ------ ------
1 734647674 USD D 100 OP
and in flex:
ID ACC AE_CCY DRCR_IND AMOUNT MODULE
-- --------- ------ -------- ------ ------
1 734647654 USD D 100 OP
2 734665474 USD D 100 OP
9 734611111 USD D 100 OP
ID's 2 and 9 should be shown as differences.
If I use FULL JOIN I'll get no differences as substr(account.ACC,1,4) = substr(flex.ACC,1,4) are equal and others are equal and MINUS doesn't work because ID's different.
Do you mean you want to group by the first 4 characters of ACC, then diff them?
And, if not, why is Flex:ID=1 NOT a difference to account:ID=1, if ID=2 and ID=9 are, especially since it reads that ID is not a comparison field?
a brute-force set theory answer:
SELECT * FROM ID
UNION
SELECT * FROM FLEX
MINUS
(SELECT * FROM ID
INTERSECT
SELECT * FROM FLEX)
I think what you want is the full join with an additional condition. Something like:
select F.ID, F.AE_CCY, F.DRCR_IND, F.AMOUNT, F.MODULE, F.ACC
from account a join flex f
on substr(a.ACC,1,4) = substr(f.ACC,1,4)
where a.AE_CCY <> f.AE_CCY
or a.DRCR_IND <> f.DRCR_IND
or a.AMOUNT <> f.AMOUNT
or a.MODULE <> f.MODULE
or a.ACC <> f.ACC
This way, the join is still performed on the first 4 characters, but the where condition checks the entire field (as well as the other four).
Revised solution: This is something of a stab-in-the-dark, by I'm wondering if what you're really looking for is a list of records that don't have a match in the other table. In that case, a full outer join might be the answer:
select coalesce(F.ID,a.ID) as ID,
coalesce(F.AE_CCY,a.AE_CCY) as AE_CCY,
coalesce(F.DRCR_IND,a.DRCR_IND) as DRCR_IND,
coalesce(F.AMOUNT,a.AMOUNT) as AMOUNT,
coalesce(F.MODULE,a.MODULE) as MODULE,
coalesce(F.ACC,a.ACC) as ACC
from account a full outer join flex f
on substr(a.ACC,1,4) = substr(f.ACC,1,4)
and a.AE_CCY = f.AE_CCY
and a.DRCR_IND = f.DRCR_IND
and a.AMOUNT = f.AMOUNT
and a.MODULE = f.MODULE
where a.id is null
or f.id is null
Third attempted solution: Thinking about it further, I think you're saying that you want each record from the first table to match to exactly one record in the second table (and vice-versa). That's a difficult problem because relational databases aren't really design work that way.
The solution below uses the full outer join again, to get only rows that don't appear in the other table. This time, we're adding ROW_NUMBER to assign a unique number to each member of a set of duplicate values found in either table. In the example from your comment, with 5 identical rows in one table and 1 of the same row in another, the first table will be numbered 1-5 and the second will be 1. Therefore, by adding that as a join condition, we assure that each row has only one match. The one flaw in this design is that a perfect match on ACC is not guaranteed to take precedence over another value. Making that work would be quite a bit more difficult.
select coalesce(F.ID,a.ID) as ID,
coalesce(F.AE_CCY,a.AE_CCY) as AE_CCY,
coalesce(F.DRCR_IND,a.DRCR_IND) as DRCR_IND,
coalesce(F.AMOUNT,a.AMOUNT) as AMOUNT,
coalesce(F.MODULE,a.MODULE) as MODULE,
coalesce(F.ACC,a.ACC) as ACC
from (select a.*,
row_number()
over (partition by AE_CCY,DRCR_IND,AMOUNT,MODULE,substr(ACC,1,4)
order by acc) as rn
from account a) a
full outer join
(select f.*,
row_number()
over (partition by AE_CCY,DRCR_IND,AMOUNT,MODULE,substr(ACC,1,4)
order by acc) as rn
from flex f) f
on substr(a.ACC,1,4) = substr(f.ACC,1,4)
and a.AE_CCY = f.AE_CCY
and a.DRCR_IND = f.DRCR_IND
and a.AMOUNT = f.AMOUNT
and a.MODULE = f.MODULE
and a.RN = f.RN
where a.id is null
or f.id is null
I like to use:
SELECT min(which) which, id, ae_ccy, drcr_ind, amount, module, acc
FROM (SELECT DISTINCT 'account' which, id, ae_ccy, drcr_ind, amount, module,
substr(acc, 1, 4) acc
FROM ACCOUNT
UNION ALL
SELECT DISTINCT 'flex' which, id, ae_ccy, drcr_ind, amount, module,
substr(acc, 1, 4) acc
FROM flex)
GROUP BY id, ae_ccy, drcr_ind, amount, module, acc
HAVING COUNT(*) != 2
ORDER BY id, 1
It will show both the new rows, the old missing rows and any difference.

Selecting a single (random) row for an SQL join

I've got an sql query that selects data from several tables, but I only want to match a single(randomly selected) row from another table.
Easier to show some code, I guess ;)
Table K is (k_id, selected)
Table C is (c_id, image)
Table S is (c_id, date)
Table M is (c_id, k_id, score)
All ID-columns are primary keys, with appropriate FK constraints.
What I want, in english, is for eack row in K that has selected = 1 to get a random row from C where there exists a row in M with (K_id, C_id), where the score is higher than a given value, and where c.image is not null and there is a row in s with c_id
Something like:
select k.k_id, c.c_id, m.score
from k,c,m,s
where k.selected = 1
and m.score > some_value
and m.k_id = k.k_id
and m.c_id = c.c_id
and c.image is not null
and s.c_id = c.c_id;
The only problem is this returns all the rows in C that match the criteria - I only want one...
I can see how to do it using PL/SQL to select all relevent rows into a collection and then select a random one, but I'm stuck as to how to select a random one.
you can use the 'order by dbms_random.random' instruction with your query.
i.e.:
SELECT column FROM
(
SELECT column FROM table
ORDER BY dbms_random.value
)
WHERE rownum = 1
References:
http://awads.net/wp/2005/08/09/order-by-no-order/
http://www.petefreitag.com/item/466.cfm
with analytics:
SELECT k_id, c_id, score
FROM (SELECT k.k_id, c.c_id, m.score,
row_number() over(PARTITION BY k.k_id ORDER BY NULL) rk
FROM k, c, m, s
WHERE k.selected = 1
AND m.score > some_value
AND m.k_id = k.k_id
AND m.c_id = c.c_id
AND c.image IS NOT NULL
AND s.c_id = c.c_id)
WHERE rk = 1
This will select one row that satisfies your criteria per k_id. This will likely select the same set of rows if you run the query several times. If you want more randomness (each run produces a different set of rows), you would replace ORDER BY NULL by ORDER BY dbms_random.value
I'm not too familiar with oracle SQL, but try using LIMIT random(), if there is such a function available.