Nested self join and creating multiple sums

Nested self join and creating multiple sums - sql

I have a basic parent / child scheme for expenditures:
The underling data is the same so I just added a category column and parent_id. These have child records:
I am trying to aggregate the totals form the orders, related orders and difference between the two like this:
Which is grouped by the orders overall then I am also looking for something like this:
I can get the order_amount no problem either way. That's a simple JOIN and SUM.
I am stuck on the secondary JOINS given that I have to JOIN the invoices expenditures to the orders then JOIN the invoice expenditure items and SUM that up.
I am looking for direction on the correct JOIN or if there is a better way to approach this with some sort of subquery etc.

To sump up by order, one solution would be to use a conditional aggregate query. A trick is to check the category to decide whether to use the value from column expenditures.id or from column expenditures.parent_id as grouping criteria:
SELECT
CASE WHEN e.category = 'order' THEN e.id ELSE e.parent_id END expenditure_id,
SUM(CASE WHEN e.category = 'order' THEN i.amount ELSE 0 END) order_amount,
SUM(CASE WHEN e.category = 'invoice' THEN i.amount ELSE 0 END) order_amount,
SUM(CASE WHEN e.category = 'order' THEN i.amount ELSE 0 END)
- SUM(CASE WHEN e.category = 'invoice' THEN i.amount ELSE 0 END) balance
FROM expenditures e
LEFT JOIN expenditure_items i ON e.id = i.expenditure_id
GROUP BY CASE WHEN e.category = 'order' THEN e.id ELSE e.parent_id END
ORDER BY expenditure_id
Demo on DB Fiddle:
| expenditure_id | order_amount | order_amount | balance |
| -------------- | ------------ | ------------ | ------- |
| 1 | 3740 | 0 | 3740 |
| 2 | 11000 | 9350 | 1650 |
The second query, that sums up by item code, basically follows the same logic, but groups by idem code instead:
SELECT
i.code,
SUM(CASE WHEN e.category = 'order' THEN i.amount ELSE 0 END) order_amount,
SUM(CASE WHEN e.category = 'invoice' THEN i.amount ELSE 0 END) order_amount,
SUM(CASE WHEN e.category = 'order' THEN i.amount ELSE 0 END)
- SUM(CASE WHEN e.category = 'invoice' THEN i.amount ELSE 0 END) balance
FROM expenditures e
LEFT JOIN expenditure_items i ON e.id = i.expenditure_id
GROUP BY i.code
ORDER BY i.code;
Demo:
| code | order_amount | order_amount | balance |
| ---- | ------------ | ------------ | ------- |
| a | 13400 | 8500 | 4900 |
| b | 1340 | 850 | 490 |

Related

SQL two different Aggregate functions with LEFT Join

How can I return two aggregate function with different condition with a LEFT join?
I already have this:
SELECT VehicleType.vehicleTypeName, COUNT(*) as SALE
FROM Transactions
LEFT JOIN VehicleType
ON Transactions.VehicleTypeID = VehicleType.vehicleTypeID
WHERE Transactions.isRefund = 0
GROUP BY VehicleType.vehicleTypeName
This returns the gross vehicle count
Name | Sale
---------------
vehicle1 | 10
vehicle2 | 15
I want to know how to get the net count per vehicle (Count of Vehicles as Sale less the count of vehicles as refund) if possible
Name | NetCount
---------------
vehicle1 | 8
vehicle2 | 10
If not something like this.
Name | Sale | Refund
-------------------------
vehicle1 | 10 | 2
vehicle2 | 15 | 5

You can calculate it suming a conditional expression :
SELECT VehicleType.vehicleTypeName,
sum(case when Transactions.isRefund = 0 then 1 else 0 end) as Sale,
sum(case when Transactions.isRefund = 1 then 1 else 0 end) as Refund
FROM Transactions
LEFT JOIN VehicleType ON Transactions.VehicleTypeID = VehicleType.vehicleTypeID
GROUP BY VehicleType.vehicleTypeName
And your first result would be :
SELECT VehicleType.vehicleTypeName,
sum(case when Transactions.isRefund = 0 then 1 else -1 end) as NetCount
FROM Transactions
LEFT JOIN VehicleType ON Transactions.VehicleTypeID = VehicleType.vehicleTypeID
GROUP BY VehicleType.vehicleTypeName

new vs old customers sql

I am trying to get new customer's vs returning customers and for this I have to create multiple tables. Is there a better way to aggregate the data shown like below:
my SQL code looks like below:
---- ALL INDIVIDUALS WHO PURCHASED IN CURRENT WEEK---------
CREATE TABLE PURCHASES_FEB_WK2 AS (Select DISTINCT INDIVIDUAL_ID
from DM_OWNER.TRANSACTION_DETAIL_MV
WHERE BRAND_ORG_CODE = 'BRAND'
and is_merch = 1
and currency_code = 'USD'
AND LINE_ITEM_AMT_TYPE_CD = 'S'
AND TRUNC(TXN_DATE) BETWEEN '10-FEB-19' AND '16-FEB-19')
----------MINIMUM PURCHASE DATE OF ALL CUSTOMERS------------
Create table feb_wk2_min as
Select distinct Individual_ID, MIN(TRANSACTION_DATE) as FIRST_TRANSACTION
from dm_owner.transaction_mv
WHERE BRAND_ORG_CODE = 'BRAND'
and transaction_type_code in ('PR','EP')
group by individual_ID;
------- NEW CUSTOMERS FOR THE WEEK---------
Select Count(distinct B.INDIVIDUAL_ID)
from PURCHASES_FEB_WK2 A
JOIN FEB_WK2_MIN B ON A.INDIVIDUAL_ID = B.INDIVIDUAL_ID
where FIRST_TRANSACTION between '10-FEB-19' and '16-FEB-19'
---- ALL RETURNING CUSTOMERS
SELECT COUNT (DISTINCT INDIVIDUAL_ID)
FROM PURCHASES_FEB_WK2
WHERE INDIVIDUAL_ID IN (SELECT INDIVIDUAL_ID FROM DM_OWNER.TRANSACTION_DETAIL_MV WHERE TRUNC(TXN_DATE) < '10-FEB-19' AND BRAND_ORG_CODE = 'BRAND' AND IS_MERCH = 1 AND line_item_amt_type_cd = 'S' AND STATUS = 'A')
-------NEW CUSTOMERS DOLLAR_VALUE_US------
SELECT SUM(DOLLAR_VALUE_US) FROM DM_OWNER.TRANSACTION_DETAIL_MV
WHERE INDIVIDUAL_ID IN (Select distinct B.INDIVIDUAL_ID
from PURCHASES_FEB_WK2 A
JOIN FEB_WK2_MIN B ON A.INDIVIDUAL_ID = B.INDIVIDUAL_ID
where FIRST_TRANSACTION between '10-FEB-19' and '16-FEB-19')
AND BRAND_ORG_CODE = 'BRAND'
and is_merch = 1
and currency_code = 'USD'
AND LINE_ITEM_AMT_TYPE_CD = 'S'
AND TRUNC(TXN_DATE) BETWEEN '10-FEB-19' AND '16-FEB-19'
-------RETURNING CUSTOMERS DOLLAR_VALUE_US------
SELECT SUM(DOLLAR_VALUE_US) FROM DM_OWNER.TRANSACTION_DETAIL_MV
WHERE INDIVIDUAL_ID IN (SELECT DISTINCT INDIVIDUAL_ID
FROM PURCHASES_FEB_WK2
WHERE INDIVIDUAL_ID IN (SELECT INDIVIDUAL_ID FROM DM_OWNER.TRANSACTION_DETAIL_MV WHERE TRUNC(TXN_DATE) < '10-FEB-19' AND BRAND_ORG_CODE = 'BRAND' AND IS_MERCH = 1 AND line_item_amt_type_cd = 'S' AND STATUS = 'A'))
AND BRAND_ORG_CODE = 'BRAND'
and is_merch = 1
and currency_code = 'USD'
AND LINE_ITEM_AMT_TYPE_CD = 'S'
AND TRUNC(TXN_DATE) BETWEEN '10-FEB-19' AND '16-FEB-19'
To get the quantity and the count of order, I am replacing the sum (dollar_value_us) with count of distinct orders and sum of quantity. Is there an easy way to pivot and combine this code so that I can just copy paste the data in the format (picture attached) I have provided.

Based on the comments, I understand that you want to split the customers into two groups : customers that had their first transactions during the period should be separated from thoses who had transactions before. For each group, you want to count the number of customers and sum the value of the transactions.
NB : your sql code does not show hot to compute qty and count_of_orders, so I left it apart (but this will likely follow the same logic).
Given this sample data:
INDIVIDUAL_ID | DOLLAR_VALUE_US | TXN_DATE | RAND_ORG_CODE | IS_MERCH | CURRENCY_CODE | LINE_ITEM_AMT_TYPE_CD
------------: | --------------: | :-------- | :------------ | -------: | :------------ | :--------------------
1 | 10 | 01-FEB-19 | BRAND | 1 | USD | S
1 | 10 | 10-FEB-19 | BRAND | 1 | USD | S
1 | 10 | 15-FEB-19 | BRAND | 1 | USD | S
1 | 10 | 28-FEB-19 | BRAND | 1 | USD | S
2 | 11 | 11-FEB-19 | BRAND | 1 | USD | S
2 | 11 | 12-FEB-19 | BRAND | 1 | USD | S
3 | 11 | 12-FEB-19 | BRAND | 1 | USD | S
Considering week range from February 10th to 16th included, customer 1 is a returning customer with 2 transactions in the window, and customers 2 and 3 are new customers with respectively 2 and 1 transactions. You would expect the following output:
TYPE_OF_CUSTOMER | COUNT_OF_CUSTOMERS | SUM_DOLLAR_VALUE_US
:------------------ | -----------------: | ------------------:
New Customers | 2 | 33
Returning Customers | 1 | 20
To solve this, you need to set up several levels of aggregation. First, use window function MIN() OVER() to recover the date of the first transaction of each customer. Then, filter on the anlaysis period, split customers into new/returning groups, and aggregate the money spent. Finally, aggregate all results together.
Query:
SELECT
DECODE(is_new, 1, 'New Customers', 'Returning Customers') type_of_customer,
COUNT(individual_id) count_of_customers,
SUM(dollar_value_us) sum_dollar_value_us
FROM (
SELECT
individual_id,
SUM(dollar_value_us) dollar_value_us,
CASE WHEN MIN(txn_date) = min_txn_date THEN 1 ELSE 0 END is_new
FROM (
SELECT
individual_id,
dollar_value_us,
txn_date,
MIN(txn_date) OVER(PARTITION BY individual_id) min_txn_date
FROM transaction_detail_mv
WHERE
rand_org_code = 'BRAND'
AND is_merch = 1
AND currency_code = 'USD'
AND line_item_amt_type_cd = 'S'
) t
WHERE
txn_date >= TO_DATE('10-02-2019', 'DD-MM-YYYY')
AND txn_date < TO_DATE('17-02-2019', 'DD-MM-YYYY')
GROUP BY
individual_id,
min_txn_date
) x GROUP BY is_new
This demo on DB Fiddle demonstrates each step of the computation.

How to GROUP BY CASE with aggregate function [duplicate]

This question already has answers here:
Count based on condition in SQL Server
(4 answers)
Closed 4 years ago.
I am attempting to get the count of a.clmNo based on the a.stat value displayed on one row. I am currently getting 3 rows returned since I have a count of 5 a.clmNo over 3 different a.stat values, which makes sense since I am grouping by s.stat. How can I change my query so that I don't have to group by s.stat and instead get the results returned on one row?
Current results:
+-------------+--------------+-----------------+-----------+---------------+
| pend_claims | assnd_claims | qa_ready_claims | qa_claims | closed_claims |
+-------------+--------------+-----------------+-----------+---------------+
| 0 | 3 | 0 | 0 | 0 |
+-------------+--------------+-----------------+-----------+---------------+
| 0 | 0 | 0 | 1 | 0 |
+-------------+--------------+-----------------+-----------+---------------+
| 1 | 0 | 0 | 0 | 0 |
+-------------+--------------+-----------------+-----------+---------------+
Desired results:
+------------+--------------+-----------------+-----------+--------------+
|pend_claims | assnd_claims | qa_ready_claims | qa_claims | closed_claims|
+------------+--------------+-----------------+-----------+--------------+
| 1 | 3 | 0 | 1 | 0 |
+------------+--------------+-----------------+-----------+--------------+
Current query:
SELECT ISNULL(case when s.stat = 'Pending Assignment' then count(a.clmNo) end,0) as pend_claims,
ISNULL(case when s.stat = 'Assigned' then count(a.clmNo) end,0) as assnd_claims,
ISNULL(case when s.stat = 'QA Ready' then count(a.clmNo) end,0) as qa_ready_claims,
ISNULL(case when s.stat = 'In QA' then count(a.clmNo) end,0) as qa_claims,
ISNULL(case when s.stat = 'Closed' then count(a.clmNo) end,0) as closed_claims
FROM assnmts a
inner join assnmtStats astats
on a.assnmtIdPk = astats.assnmtIdFk
inner join stats s
on astats.aStatId = s.statIdPk
inner join repAssnmts ra
on a.assnmtIdPk = ra.assnmtIdFk
inner join aspnetusers anu
on ra.repId = anu.Id
inner join clients c
on a.clientIdFk = c.clientIdPk
inner join carrs
on a.carrierId = carrs.carrIdPk
inner join (SELECT a2.assnmtIdPk, MAX(astats2.asCrtdDt) as MaxDate
FROM assnmts a2
INNER JOIN assnmtStats astats2
on a2.assnmtIdPk = astats2.assnmtIdFk
GROUP BY a2.assnmtIdPk
) mdt
on a.assnmtIdPk = mdt.assnmtIdPk
and astats.asCrtdDt = mdt.MaxDate
inner join (select a3.assnmtIdPk, MAX(ra2.raCrtdDt) as MaxRepDate
from assnmts a3
inner join repAssnmts ra2
on a3.assnmtIdPk = ra2.assnmtIdFk
group by a3.assnmtIdPk
) mrepdt
on a.assnmtIdPk = mrepdt.assnmtIdPk
and ra.raCrtdDt = mrepdt.MaxRepDate
group by s.stat

You want conditional aggregation. Remove the GROUP BY and rephrase the SELECT:
SELECT SUM(case when s.stat = 'Pending Assignment' then 1 else 0 end) as pend_claims,
SUM(case when s.stat = 'Assigned' then 1 else 0 end) as assnd_claims,
SUM(case when s.stat = 'QA Ready' then 1 else 0 end) as qa_ready_claims,
SUM(case when s.stat = 'In QA' then 1 else 0 end) as qa_claims,
SUM(case when s.stat = 'Closed' then 1 else 0 end) as closed_claims

Sql Count Where Groupby SubQuery

Hey I have this query,
SELECT item_type.id, item_type.item_type,
(SELECT COUNT(*) FROM item WHERE item.sale_transaction_id IS NULL) as stock_qty,
(SELECT COUNT(*) FROM item WHERE item.sale_transaction_id IS NOT NULL) as sold_qty
FROM item
JOIN item_type ON item.item_type_id = item_type.id
GROUP BY item.item_type_id
This gives me a result:
| id | item_type | stock_qty | sold_qty|
----------------------------------------
| 1 | Book | 12 | 12 |
| 2 | Pencil | 12 | 12 |
| ........... # etc
But this does not work as intended, I need to do it like this to make it work:
SELECT item_type.id, item_type.item_type,
COUNT(item.purchase_transaction_id) - COUNT(item.sale_transaction_id) as stock_qty,
COUNT(item.sale_transaction_id) as sold_qty
FROM item
JOIN item_type ON item.item_type_id = item_type.id
GROUP BY item.item_type_id
and the result is what I want and this is the correct/expected output:
| id | item_type | stock_qty | sold_qty|
----------------------------------------
| 1 | Book | 1 | 0 |
| 2 | Pencil | 0 | 5 |
| ........... # etc
In my Table structure, each item that has sale_transaction_id is marked as sold.
My question is why the first one is not working as intended? and how do I make it to work as 2nd one? Is it actually possible using subquery for this type of query?

SELECT item_type.id, item_type.item_type,
SUM(case when item.sale_transaction_id IS NULL then 1 else 0 end) as stock_qty,
SUM(case when item.sale_transaction_id IS NOT NULL then 1 else 0 end) as sold_qty
FROM item
JOIN item_type ON item.item_type_id = item_type.id
GROUP BY item_type.id, item_type.item_type
Is this what you need?

You need to add correlation to the subqueries:
SELECT item_type.id, item_type.item_type,
(SELECT COUNT(item.purchase_transaction_id) - COUNT(item.sale_transaction_id)
FROM item
WHERE item.item_type_id = i.item_type_id) as stock_qty,
(SELECT COUNT(item.sale_transaction_id)
FROM item
WHERE item.item_type_id = i.item_type_id ) as sold_qty
FROM item AS i
JOIN item_type ON i.item_type_id = item_type.id
GROUP BY i.item_type_id
The subqueries are now correlated: they are executed for each item_type_id of the outer query and return results for this exact value each time.
But this seems like an overkill, since you can get the same result applying aggregation in the outer query, just like you do in the second query of your question.

Start from "item_type" table, instead of "item" table and use left join, otherwise you will never get a row in the query result if you not have items from a type.
SELECT
item_type.id,
item_type.item_type,
SUM(CASE WHEN item.id IS NOT NULL AND item.sale_transaction_id IS NULL THEN 1 ELSE 0 END) AS stock_qty,
SUM(CASE WHEN item.id IS NOT NULL AND item.sale_transaction_id IS NOT NULL THEN 1 ELSE 0 END) AS sold_qty
FROM
item_type
LEFT JOIN
item
ON
item.item_type_id = item_type.id
GROUP BY
item_type.id, item_type.item_type
Avoid using subselects. Each subselect you use will be executed for each row and that will slow down performance a lot. You can run explain on both queries (subselect and join version) and you will see what I mean
It will be helpful if you post an example data of initial tables.

SQL query: self join on subquery necessitates creating a separate (non-temporary) table?

I'm working on what is for me a complicated query, and I've managed to get the information I need, but seem to be forced to create a table to accomplish it. I'm using MySQL, so I can't use WITH, I can't use a view because my SELECT contains a subquery in the FROM clause, and I can't use a temporary table because I need to self-join. Am I missing something?
Background:
a reservation can have 1 or more reservation_detail (foreign key rel'p on reservation_id)
a reservation_detail has a quantity and a ticket_type (foreign key rel'p on ticket_type)
Here's the first part of my current solution:
CREATE TABLE
tmp
SELECT
t.reservation_id,
t.ticket_type,
COALESCE(rd.quantity,0) AS qty
FROM (
SELECT *
FROM
(ticket_type tt, reservation r)
) t
LEFT JOIN
reservation_detail rd
ON
t.reservation_id = rd.reservation_id
AND
t.ticket_type = rd.ticket_type;
This gives me a table that looks like the following, where for each combination of a reservation_id and a ticket_type, I have a qty.
+----------------+-------------+------+
| reservation_id | ticket_type | qty |
+----------------+-------------+------+
| 1 | ADULT | 2 |
| 1 | CHILD | 2 |
| 1 | INFANT | 0 |
| 2 | ADULT | 1 |
| 2 | CHILD | 0 |
| 2 | INFANT | 0 |
| 3 | ADULT | 1 |
| 3 | CHILD | 0 |
| 3 | INFANT | 0 |
+----------------+-------------+------+
Now I can self join thrice on this table to get what I'm really looking for...
SELECT
t1.reservation_id,
t1.qty AS num_adults,
t2.qty AS num_children,
t3.qty AS num_infants
FROM
tmp t1
LEFT JOIN
tmp t2
ON
t1.reservation_id = t2.reservation_id
LEFT JOIN
tmp t3
ON
t2.reservation_id = t3.reservation_id
WHERE
t1.ticket_type = 'ADULT'
AND
t2.ticket_type = 'CHILD'
AND
t3.ticket_type = 'INFANT';
...which is one row for each reservation showing the qty for each of the three ticket types.
+----------------+------------+--------------+-------------+
| reservation_id | num_adults | num_children | num_infants |
+----------------+------------+--------------+-------------+
| 1 | 2 | 2 | 0 |
| 2 | 1 | 0 | 0 |
| 3 | 1 | 0 | 0 |
+----------------+------------+--------------+-------------+
I hope this is enough information. Please leave a comment if it's not.

If your query is considering only these 3 types: ADULT, CHILD, INFANT; you don't have to use table ticket_type.
SELECT
r.reservation_id,
COALESCE(rd_adult.quantity,0) AS num_adults,
COALESCE(rd_child.quantity,0) AS num_children,
COALESCE(rd_infant.quantity,0) AS num_infants
FROM
reservation r
LEFT JOIN
reservation_detail rd_adult
ON r.reservation_id = rd_adult.reservation_id
and rd_adult.ticket_type = 'ADULT'
LEFT JOIN
reservation_detail rd_child
ON r.reservation_id = rd_child.reservation_id
and rd_child.ticket_type = 'CHILD'
LEFT JOIN
reservation_detail rd_infant
ON r.reservation_id = rd_infant.reservation_id
and rd_infant.ticket_type = 'INFANT'

Since table reservation_detail contains all the fields you need, you don't need to join the other tables and create a temp table.
Try this:
SELECT distinct
t.reservation_id,
COALESCE(t1.qty,0) AS num_adults,
COALESCE(t2.qty,0) AS num_children,
COALESCE(t3.qty,0) AS num_infants
FROM reservation t
LEFT JOIN reservation_detail t1 ON t.reservation_id = t1.reservation_id AND t1.ticket_type = 'ADULT'
LEFT JOIN reservation_detail t2 ON t.reservation_id = t2.reservation_id AND t2.ticket_type = 'CHILD'
LEFT JOIN reservation_detail t3 ON t.reservation_id = t3.reservation_id AND t3.ticket_type = 'INFANT';

If you want to stick with your first query, you can sub this for the 2nd:
SELECT reservation_id,
SUM(CASE WHEN ticket_type='ADULT' THEN qty ELSE 0 END) AS adults,
SUM(CASE WHEN ticket_type='CHILD' THEN qty ELSE 0 END) AS children,
SUM(CASE WHEN ticket_type='INFANT' THEN qty ELSE 0 END) AS infants,
FROM tmp
GROUP BY reservation_id;
However, I'm wondering a bit about your schema. You are storing qty, a calculated value. Have you considered just having a row for each ticket instance. If you do that then no tmp table is required, though you'd do the pivot similarly to the above.

a simple GROUP BY should be OK:
SELECT t.reservation_id,
SUM(CASE
WHEN ticket_type = 'ADULT' THEN
COALESCE(rd.quantity, 0)
ELSE
0
END) num_adults,
SUM(CASE
WHEN ticket_type = 'CHILD' THEN
COALESCE(rd.quantity, 0)
ELSE
0
END) num_children,
SUM(CASE
WHEN ticket_type = 'INFANT' THEN
COALESCE(rd.quantity, 0)
ELSE
0
END) num_infants
FROM (SELECT * FROM (ticket_type tt, reservation r)) t
LEFT JOIN reservation_detail rd ON t.reservation_id = rd.reservation_id
AND t.ticket_type = rd.ticket_type
GROUP BY t.reservation_id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Nested self join and creating multiple sums - sql

Related

SQL two different Aggregate functions with LEFT Join

new vs old customers sql

How to GROUP BY CASE with aggregate function [duplicate]

Sql Count Where Groupby SubQuery

SQL query: self join on subquery necessitates creating a separate (non-temporary) table?

Categories

Resources