Join 2 tables on multiple case conditions

Join 2 tables on multiple case conditions - sql

I am using pgAdmin on a Postgres db. I am trying to achieve the following result (amounts are random):
In order to do that, I need to query the 2 tables: accounts and transactions
I am not sure how to get the sum(amount) results into 1 column. I have tried the following:
select SUM(
CASE WHEN debit_account_id = 1 then amount
when credit_account_id = 1 then amount * (-1) else 0 end),
SUM(
CASE WHEN debit_account_id = 2 then amount
when credit_account_id = 2 then amount * (-1) else 0 end)
from transactions
where entity_id = 1
and so on up to account_id 6. This will give me the correct sums for each account but each result is in new column. How I can combine this so the results looks like in example above?

You can use UNION ALL.
select debit_account_id account_id, -amount from transactions
union all
select credit_account_id account_id, amount from transactions;
now you have data together in one column

I'd sum the debits and the credits for each account in different queries and join them on the accounts table:
SELECT account_name, sum_credut - sum_debit AS balance
FROM accounts a
JOIN (SELECT credit_account_id, SUM(amount)
FROM transfer
GROUP BY credit_account_id) c ON a.id = c.credit_account_id
JOIN (SELECT debit_account_id, SUM(amount)
FROM transfer
GROUP BY debit_account_id) d ON a.id = d.debit_account_id

I would recommend a lateral joins for this:
select a.account_name,
sum(v.signed_amount) as total_amount
from transactions t left join lateral
(values (t.debit_account_id, t.amount),
(t.credit_account_id, - t.amount)
) v(account_id, signed_amount) join
account a
on a.id = v.account_id
group by a.account_name;
I don't see entity_id in any of the tables, so I don't know where that comes from.

Related

Cross Selling Matrix - In Snowflake

I am trying to build a cross selling matrix with the following structure pivoted as seen below where X is the % of frequency in a basket with the other product:
I need to pivot this data in excel or another tool afterwards so I assume the query in Snowflake needs to output tabular dataset ready for pivoting, and I am struggling with its logic.
This is what I have so far:
SELECT FCT.TRANSACTION_ID,
PRD.PRODUCT_TYPE,
COUNT(DISTINCT FCT.PRODUCT_ID),
COUNT(DISTINCT FCT1.PRODUCT_ID)
FROM TRANSACTION_ORDERS FCT
INNER JOIN DIM_PRODUCT PRD ON FCT.PRODUCT_ID = PRD.PRODUCT_ID
LEFT JOIN FACT_TRANSACTION_ORDERS FCT1 ON FCT.TRANSACTION_ID = FCT1.TRANSACTION_ID
AND FCT.PRODUCT_ID != FCT1.PRODUCT_ID
GROUP BY FCT.TRANSACTION_ID, FCT.PRODUCT_ID, FCT1.PRODUCT_ID
Is the joining even correct? Or should I be doing a cross join? Also, how to capture percent frequency of both products in the same basket?
Many thanks!
EDIT: I am trying to capture the frequency of different product types appearing in the same basket.
The values are the same for combinations in both directions. ProductType1 intersection with column ProductType2 is the same value as column Product Type1 row ProductType2.
When in a basket cross analysis they should vary. It is not the same per direction. In other words, baskets with ProductType1 may have ProductType2 X % of the time but baskets with ProductType2 should have ProductType1 with Y% of the time.

You want a self join. I would expect the products to be in the same order, but you seem be using the same transaction. In any case, this is the structure of the query:
WITH TP AS (
SELECT T.*, P.PRODUCT_TYPE
FROM TRANSACTION_ORDERS T JOIN
DIM_PRODUCT P
ON T.PRODUCT_ID = P.PRODUCT_ID
)
SELECT TP.PRODUCT_TYPE, TP2.PRODUCT_TYPE,
COUNT(DISTINCT TP.TRANSACTION_ID) as NUM_ORDERS
FROM TP JOIN
TP TP2
ON TP2.TRANSACTION_ID = TP.TRANSACTION_ID
GROUP BY TP.PRODUCT_TYPE, TP2.PRODUCT_TYPE;
If this were per order, you would just change the ON clause in the outer query to use the order id.
Note that this uses COUNT(DISTINCT) rather than COUNT(*) because a transaction/order could have multiple products of the same type. Presumably, you want that counted only once.
EDIT:
If you want to divide by the number of transactions that have either product type (which makes sense to me), then I would approach this as:
WITH TP AS (
SELECT DISTINCT T.TRANSACTION_ID, P.PRODUCT_TYPE
FROM TRANSACTION_ORDERS T JOIN
DIM_PRODUCT P
ON T.PRODUCT_ID = P.PRODUCT_ID
)
SELECT TP.PRODUCT_TYPE, TP2.PRODUCT_TYPE,
COUNT(*) as NUM_ORDERS,
( MAX(CASE WHEN TP.PRODUCT_TYPE = TP2.PRODUCT_TYPE THEN COUNT(*) END) OVER (PARTITION BY TP.PRODUCT_TYPE) +
MAX(CASE WHEN TP.PRODUCT_TYPE = TP2.PRODUCT_TYPE THEN COUNT(*) END) OVER (PARTITION BY TP2.PRODUCT_TYPE) -
COUNT(*)
) as Num_Orders_Either,
( COUNT(*) * 1.0 /
( MAX(CASE WHEN TP.PRODUCT_TYPE = TP2.PRODUCT_TYPE THEN COUNT(*) END) OVER (PARTITION BY TP.PRODUCT_TYPE) +
MAX(CASE WHEN TP.PRODUCT_TYPE = TP2.PRODUCT_TYPE THEN COUNT(*) END) OVER (PARTITION BY TP2.PRODUCT_TYPE) -
COUNT(*)
) as ratio
FROM TP JOIN
TP TP2
ON TP2.TRANSACTION_ID = TP.TRANSACTION_ID
GROUP BY TP.PRODUCT_TYPE, TP2.PRODUCT_TYPE;
This calculates the total orders containing both products using the sum of the orders with either product minus the number with both.

SQL joined tables are causing duplicates

So table A is an overall table of policy_id information, while table b is policy_id's with claims attached. Not all of the id's in A exist in B, but I want to join the two tables and sum(total claims).
The issue is that the sum is way higher than the actual sum within the table itself.
Here is what I've tried so far:
select a.policy_id, coalesce(sum(b.claim_amt), 0)
from database.table1 as a
left join database2.table2 as b on a.policy_id = b.policy_id
where product_code = 'CI'
group by a.policy_id
The id's that don't exist in b show up just fine with a 0 next to them, it's the ones that do exist where the claim_amt's seem like they're being duplicated heavily in the sum.

I suspect your policy_id in table1 are not unique and that leads to the doubled,tripled ,etc. amounts
You could aggregate the sums from table2 in a CTE to get around this.
WITH CTE AS (
SELECT
policy_id
coalesce(sum(claim_amt), 0) as sum_amt
FROM database2.table2
group by policy_id
)
select a.policy_id, b.sum_amt
from database.table1 as a
left join CTE as b on a.policy_id = b.policy_id
where product_code = 'CI'

Select first row from join with grouped data

I have two separate queries which I am trying to efficently join.
Query 1:
Select Id From Accounts Where Status='OPEN' and Product = 'Product A'
Query 2:
Select AccountId From Transactions Group By AccountId Having Count(*) > 20;
Table transactions hold millions of rows.
What I want to achieve is to return the first account in an open status with product A having more than 20 transactions.
So far i got this, but it is not very efficent due to full table scan of transaction table:
Select A.Id From Accounts A
Left Outer Join (Select AccountId From Transactions Group By AccountId Having Count(*) > 20 ) T on A.Id=T.AccountId
Where A.Status='OPEN' and A.Product = 'Product A'
And rownum = 1
How do I optimize this query?

Would this be better, filtering the specific accounts and grouping ??
Select AccountId From Transactions where AccountId in (Select id From Accounts Where Status='OPEN' and Product = 'Product A') Group By AccountId Having Count(*) > 20;

Select sum and inner join

I have two tables
Bills: id amount reference
Transactions: id reference amount
The following SQL query
SELECT
*,
(SELECT SUM(amount)
FROM transactions
WHERE transactions.reference = bils.reference) AS paid
FROM bills
GROUP BY id HAVING paid<amount
was meant to some rows from table Bills, adding a column paid with the sum of amount of related transactions.
However, it only works when there is at least one transaction for each bill. Otherwise, no line for a transaction-less bill is returned.
Probably, that's because I should have done an inner join!
So I try the following:
SELECT
*,
(SELECT SUM(transactions.amount)
FROM transactions
INNER JOIN bills ON transactions.reference = bills.reference) AS paid
FROM bills
GROUP BY id
HAVING paid < amount
However, this returns the same value of paid for all rows! What am I doing wrong ?

Use a left join instead of a subquery:
select b.id, b.amount, b.paid, sum(t.amount) as transactionamount
from bills b
left join transactions t on t.reference = b.reference
group by b.id, b.amount, b.paid
having b.paid < b.amount
Edit:
To compare the sum of transactions to the amount, handle the null value that you get when there are no transactions:
having isnull(sum(t.amount), 0) < b.amount

You need a RIGHT JOIN to include all bill rows.
EDIT
So the final query will be
SELECT
*,
(SELECT SUM(transactions.amount)
FROM transactions
WHERE transactions.reference = bills.reference) AS paid
FROM bills
WHERE paid < amount

I knows this thread is old, but I came here today because I encountering the same problem.
Please see another post with same question:
Sum on a left join SQL
As the answer says, use GROUP BY on the left table. This way you get all the records out from left table, and it sums the corresponding rows from right table.
Try to use this:
SELECT
*,
SUM(transactions.sum)
FROM
bills
RIGHT JOIN
transactions
ON
bills.reference = transactions.reference
WHERE
transactions.sum > 0
GROUP BY
bills.id

Multiple COUNT in sql issue

I have a little wired issue.
I have to select two count from query Likes and Collects but when I add second query instead of 2 likes and 10 collects I get 10 likes and 10 collects.
What am I doing wrong here?
select COUNT(tl.ItemLikeId) as a, COUNT(tib.PacketId) as b
from Items i
left join ItemLikes il
on il.ItemId = i.ItemId
left join ItemsInPackets iip
on iip.ItemId = i.ItemId
where i.ItemId = 14591

Try SELECT COUNT(DISTINCT tl.ItemLikeId) AS a, COUNT(DISTINCT tib.PacketId) as b.
Your join gives you ten rows, so you have ten IDs from each table. However, not all of the IDs are unique. You're looking for unique IDs.

Count returns the number of rows. Not the number of rows with a value, and not the number of distinct rows.
To get number row rows with a value
select SUM(CASE WHEN tl.ItemLikeId IS NOT NULL THEN 1 ELSE 0 END) as a,
SUM(CASE WHEN tib.PacketId IS NOT NULL THEN 1 ELSE 0 END) as b
To get the number of distinct values, do what zimdanen suggested and use COUNT(DISTINCT)
select COUNT(DISTINCT tl.ItemLikeId) as a, COUNT(DISTINCT tib.PacketId) as b
Another approach, if all you are using ItemLikes and ItemsInPackets for are the counts
select
(
SELECT COUNT(ItemLikeId)
FROM ItemLikes
WHERE ItemId = i.ItemId
) as a,
(
SELECT COUNT(PacketId)
FROM ItemsInPackets
WHERE ItemId = i.ItemId
) as b
from Items i
where i.ItemId = 14591

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Join 2 tables on multiple case conditions - sql

You can use UNION ALL. select debit_account_id account_id, -amount from transactions union all select credit_account_id account_id, amount from transactions; now you have data together in one column

Related

Cross Selling Matrix - In Snowflake

SQL joined tables are causing duplicates

Select first row from join with grouped data

Select sum and inner join

Multiple COUNT in sql issue

Categories

Resources