Joining tables and aggregation/sub queries

Joining tables and aggregation/sub queries - sql

I have 2 tables as show below.
Table1
Order ID
Item_code
Sales_Price
Qty_ordered
Total
Qty shipped
1000
111
10
5
$50
1
1000
222
20
10
$200
2
1000
333
30
15
$450
0
I have another table that stores only the details of how much was invoiced (i.e. how much we shipped)
Table2 (because we shipped only 10x1 and 20x2 = $50)
Order ID
Invoice_total
1000
$50
I wrote the following query,
select T1.Order_ID,
sum(T1.Qty_Ordered) as Qty_Ordered,
sum(T1.Total) as Total_Amt_ordered,
sum(T1.Qty_shipped) as Qty_Shipped,
sum(T2.Invoice_total)
from T1 join
T2 on T1.Order_ID = T2.Order_ID
This query gives me the following output, (It is adding $50 to all the rows of T1 Orders).
Order ID
Qty_ordered
Total
Qty shipped
Invoice_total
1000
30
$700
3
$150
Whereas now, I want my output to be as:
Order ID
Qty_ordered
Total
Qty shipped
Invoice_total
1000
30
$700
3
$50
(because we shipped only $50)
What changes should I make to my query?
I know I can just hard code it but my database has 1000's of orders and 1000's of Half shipped Orders. I want to keep track of Shipped $ (Invoiced $) for all the orders.

If I understand correctly, you want:
select T2.Order_ID, T2.Invoice_total,
sum(T1.Qty_Ordered) as Qty_Ordered,
sum(T1.Total) as Total_Amt_ordered,
sum(T1.Qty_shipped) as Qty_Shipped,
from T2 join
T1
on T1.Order_ID = T2.Order_ID
group by T2.Order_ID, T2.Invoice_total;
That is, you don't want to aggregate Invoice_total. You just want it to be a group by key.

Related

Joining a transaction fact table to a periodic snapshot table in SQL using the nearest date

I am using Redshift on AWS and I have two tables, the first is a list of transactions like so:
cust_ID
order_date
product
100
2022/05/01
A
101
2022/05/01
A
100
2022/05/05
B
101
2022/05/07
B
The second is a snapshot table which has customer attributes for each customer at a specific point in time. Though the second table has rows for most dates, it doesn't have rows for every customer at every date.
cust_ID
as_of_date
favourite_colour
100
2022/05/01
blue
100
2022/05/02
red
100
2022/05/05
green
100
2022/05/07
red
101
2022/05/01
blue
101
2022/05/04
red
101
2022/05/05
green
101
2022/05/08
yellow
How can I join the tables such that the transaction table has the customer attributes either on the date of the order itself, or if the transaction date is not available in table 2, at the nearest available date before the transaction?
An example of the desired output would be:
cust_ID
order_date
product
Favourite_colour
as_of_date
100
2022/05/01
A
blue
2022/05/01
101
2022/05/01
A
blue
2022/05/01
100
2022/05/05
B
green
2022/05/05
101
2022/05/07
B
green
2022/05/05
Joining by cust_ID and order_date = as_of_date doesn't work due to edge cases where the order_date/id combination is not in the second table.
I've also tried something like:
with snapshot as (
SELECT
row_number() OVER(PARTITION BY cust_ID ORDER BY as_of_date DESC) as row_number,
cust_ID,
favourite_color,
as_of_date
FROM table2 t2
INNER JOIN table1 t1
ON t1.cust_ID = t2.cust_ID
AND t2.as_of_date <= t1.order_date
)
SELECT * FROM snapshot
WHERE row_number = 1
However, this doesn't handle cases where the same customer has multiple transactions in table 1. When I check the count of the resulting table, the number of distinct cust_IDs is the same as count(*) so it seems like the resulting table is only retaining one transaction per customer.
Any help would be appreciated.

Using your provided table inputs, I tested this solution in DB Fiddle and it works for your desired output.
with my_cte AS (
select *,
row_number() OVER(PARTITION BY cust_id, order_date ORDER BY as_of_date desc) ranked
from transactions
left join attribs using (cust_id)
where as_of_date <= order_date
)
select cust_id, order_date, product, favorite_color, as_of_date
from my_cte
where ranked = 1
order by order_date, cust_id;

Access sql Moving Average of Top N With 2 criterias

I have been searching the forum and found a single post that is a little smilair to my problem here: Calculate average for Top n combined with SQL Group By.
My situation is:
I have a table tblWEIGHT that contains: ID, Date, idPONR, Weight
I have a second table tblSALES that contains: ID, Date, Sales, idPONR
I have a third table tblPONR that contains: ID, PONR, idProduct
And a fouth table tblPRODUCT that contais: ID, Product
The linking:
tblWEIGHT.idPONR = tblPONR.ID
tblSALES.idPONR = tblPONR.ID
tblPONR.idProduct = tblPRODUCT.ID
The maintable of my query is tblSALES. I want to all my sales listed, with the moving average of the top5
weights of the PRODUCT where the date of the weight is less than the sales date, and the product is the same as the sold product. Its IMPORTANT that the result isn't grouped by the date. I need all the records of tblSALES.
i have gotten as far as to get the top 1 weight, but im not able to get the moving average instread.
The query that gest the top 1 is the following, and i am guessing that the query i need is going to look a lot like it.
SELECT tblSALES.ID, tblSALES.Dato, tblPONR.idPRODUCT,
(
SELECT top 1 Weight FROM tblWEIGHT INNER JOIN tblPONR ON tblWeight.idPONR = tblPONR.ID
WHERE tblPONR.idPRODUCT = idPRODUCT AND
SALES.Date > tblWEIGHT.Date
ORDER BY tblWEIGHT.Date desc
) AS LatestWeight
FROM tblSALES INNER JOIN VtblPONR ON tblSALES.idPONR = tblPONR.ID
this is not my exact query since im danish and i wouldnt make sense. I know im not supposed to use Date as a fieldname.
i imagine the filan query would be something like:
SELECT tblSALES.ID..... avg(SELECT TOP 5 weight .........)
but doing this i keep getting error at max 1 record can be returned by this subquery
Final Question.
How do i make a query that creates a moving average of the top 5 weights of my sold product, where the date of the weight is earlier than the date i sold the product?
EDIT Sampledata:
DATEFORMAT: dd/mm/yyyy
tblWEIGHT
ID Date idPONR Weight
1 01-01-2020 1 100
2 02-01-2020 2 200
3 03-01-2020 3 200
4 04-01-2020 3 400
5 05-01-2020 2 250
6 06-01-2020 1 150
7 07-01-2020 2 200
tblSALES
ID Date Sales(amt) idPONR
1 05-01-2020 30 1
2 06-01-2020 15 2
3 10-01-2020 20 3
tblPONR
ID PONR(production Number) idProduct
1 2521 1
2 1548 1
3 5484 2
tblPRODUCT
ID Product
1 Bricks
2 Tiles
Desired outcome read comments for AvgWeight
tblSALES.ID tblSALES.Date tblSales.Sales(amt) AvgWeigt
1 05-01-2020 30 123 -->avg(top 5 newest weight of both idPONR 1 And 2 because they are the same product, and where tblWeight.Date<05-01-2020)
2 06-01-2020 15 123 -->avg(top 5 newest weight of both idPONR 1 And 2 because they are the same product, and where tblWeight.Date<06-01-2020)
3 10-01-2020 20 123 -->avg(top 5 newest weight of idPONR 3 since thats the only idPONR with that product, and where tblWeight.Date<10-01-2020)

Consider:
Query1
SELECT tblWeight.ID AS WeightID, tblWeight.Date AS WtDate,
tblWeight.idPONR, tblPONR.PONR, tblPONR.idProduct, tblWeight.Weight, tblSales.SalesAmt,
tblSales.ID AS SalesID, tblSales.Date AS SalesDate
FROM (tblPONR INNER JOIN tblWeight ON tblPONR.ID = tblWeight.idPONR)
INNER JOIN tblSales ON tblPONR.ID = tblSales.idPONR;
Query2
SELECT * FROM Query1 WHERE WeightID IN (
SELECT TOP 5 WeightID FROM Query1 AS Dupe WHERE Dupe.idProduct = Query1.idProduct
AND Dupe.WtDate<Query1.SalesDate ORDER BY Dupe.WtDate);
Query3
SELECT Query2.SalesID, Query2.SalesDate, Query2.SalesAmt,
First(DAvg("Weight","Query2","idProduct=" & [idProduct] & " AND WtDate<#" & [SalesDate] & "#")) AS AvgWt
FROM Query2
GROUP BY Query2.SalesID, Query2.SalesDate, Query2.SalesAmt;

SELECT with LEFT JOIN performing math operation twice?

The general idea of what I'm trying to do is this:
Select all planned prices for an order, then subtract from that total all actual prices on that order.
The planned price and actual price are on different tables. When I have a single planned price and a single actual price, this works fine. However, when I have multiple planned prices or multiple actual prices it is giving me odd results as if the algebra is happening multiple times.
Query:
SELECT PL.orderid, (SUM(PL.lineprice) - NVL(SUM(AC.lineprice),0)) AS
Difference FROM plans PL
LEFT JOIN actuals AC ON PL.orderid = AC.orderid
WHERE PL.customer IN (SELECT customer FROM ...)
GROUP BY PL.orderid
ORDER BY PL.orderid;
The results of the query:
Orderid Difference
X-1224 100
X-1226 80
X-1345 70000
X-1351 125000
X-1352 10000
Y-2403 190000
My Plan table looks like this:
Orderid Planned_Price
X-1224 100
X-1226 100
X-1345 105000
X-1351 100000
X-1352 10000
X-1352 50000
Y-2403 25000
Y-2403 100000
And my Actual table this:
Orderid Actual_Price
X-1226 20
X-1345 35000
X-1351 25000
X-1351 50000
X-1352 25000
Y-2403 25000
Y-2403 5000
So it seems to work when I have only a single row in each table, or a single row in plans and no rows in actuals i.e., X-1224, X-1226 and X-1345.
However the results are too high or too low when I have multiple rows, with the same OrderID, in either table i.e., all the rest
I'm stumped as to why this is the case. Any insights are appreciated.
edit: Results I'd like, taking Y-2403 as example: (25000 + 100000) - (25000 + 5000) = 95000. What I'm getting is double that at 190000.

Why is this the case?
Because that is how join works. If you have data like this:
a
1
1
2
2
And b:
b
1
1
1
2
Then the result of a join will have six "1"s and two "2"s.
Your question doesn't say what you want for results, but a typical approach is to aggregate before doing the joins.
EDIT:
You seem to want:
select p.orderid,
(p.lineprice - coalesce(lineprice, 0)) as Difference
from (select orderid, sum(lineprice) as lineprice
from plans p
group by orderid
) p left join
(select orderid, sum(lineprice) as lineprice
from actuals a
group by orderid
) a
on p.orderid = a.orderid
where p.customer in (SELECT customer FROM ...)
order by p.orderid;

I suppose you are looking to compare the summed_up_prices by order id of plan table with the summed_up prices by order id actual plan table.?
If so the following can be done to ensure there are no duplicates entries by order
select a.orderid
,NVL(max(b.summed_up),0) - sum(a.actual_price) as difference
from actual_table a
left join (select pt.orderid
,sum(pt.planned_price) as summed_up
from planned_table pt
group by pt.orderid
)b
on a.orderid=b.orderid
group by a.orderid
+---------+------------+
| ORDERID | DIFFERENCE |
+---------+------------+
| X-1226 | 80 |
| Y-2403 | 95000 |
| X-1351 | 25000 |
| X-1345 | 70000 |
| X-1352 | 35000 |
+---------+------------+
Here is the dbfiddle link with the data
https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=3cacffd19b39ecaf7ad752dff262ac47

Joining to another table only on the first occurrence of a field

Note: I have tried to simplify the below to make it simpler both for me and for anyone else to understand, the tables I reference below are in fact sub-queries joining a lot of different data together from different sources)
I have a table of purchased items:
Items
ItemSaleID CustomerID ItemCode
1 100 A
2 100 B
3 100 C
4 200 A
5 200 C
I also have transaction header and detail tables coming from a till system:
TranDetail
TranDetailID TranHeaderID ItemSaleID Cost
11 51 1 $10
12 51 2 $10
13 51 3 $10
14 52 4 $20
15 52 5 $10
TranHeader
TranHeaderID CustomerID Payment Time
51 100 $100 11:00
52 200 $50 12:00
53 100 $20 13:00
I want to get to a point where I have a table like:
ItemSaleID CustomerID ItemCode Cost Payment Time
1 100 A $10 $120 11:00
2 100 B $10 11:00
3 100 C $10 11:00
4 200 D $20 $50 12:00
5 200 E $10 12:00
I have a query which produces the results but when I add in the ROW_NUMBER() case statement goes from 2 minutes to 30+ minutes.
The query is further confused because I need to supply the earliest date relating to the list of transactions and the total price paid (could be many transactions throughout the day for upgrades etc)
Query below:
SELECT ItemSaleID
, CustomerID
, ItemCode
, Cost
, CASE WHEN ROW_NUMBER() OVER (PARTITION BY TranHeaderID ORDER BY ItemSaleID) = 1
THEN TRN.Payment ELSE NULL END AS Payment
FROM Items I
OUTER APPLY (
SELECT TOP 1 SUB.Payment, Time
FROM TranHeader H
INNER JOIN TranDetail D ON H.TranHeaderID = D.TranHeaderID
OUTER APPLY (SELECT SUM(Payment) AS Payment
FROM TranHeader H2
WHERE H2.CustomerID = Items.CustomerID
) SUB
WHERE D.CustomerID = I.CustomerID
) TRN
WHERE ...
Is there a way that I can only show payments for each occurrence of the customer ID whilst maintaining performance

Show account balance from multiple tables

I am having following two table which stores information about Credit and Debit records.
voucherCr table contains
voucherType voucherPrefix voucherNo crparty cramount
SALES S 1 1 43000
SALES S 2 1 10000
voucherDr table contains
voucherType voucherPrefix voucherNo drparty dramount
SALES S 1 5 43000
SALES S 2 5 10000
Now here, in SALES voucher S/1, party 1 has been credit with 43000 amount agains party 5 of same amount. Same is with SALES voucher S/2, where party 1 has been credited with 10000 amount against party 5 of same amount.
Now I want to display results as follows If i query about party 1
PARTY CREDIT DEBIT DEBITPARTY voucherType voucherPrefix voucherNo
1 43000 5 SALES S 1
1 10000 5 SALES S 2
Please help

Try to use this query. Is it possible in your case that one dramount is divided to many rows in voucherDr? For example 43000->40000+3000
select
vc.Party,vc.CrAmount, vd.drAmount, vd.drparty,
vc.voucherType, vc.voucherPrefix, vc.voucherNo
from voucherCr vc
left join voucherDr vd on (vc.voucherType=vd.voucherType)
and (vc.voucherPrefix=vd.voucherPrefix)
and (vc.voucherNo=vd.voucherNo)
where vc.PARTY=1

If i have understood your question properly then this is what you are looking for
Select c.crParty as Party, d.dramount as credit , null as debit,
d.drParty as DEBITPARTY,c.voucherType as voucherType,
d.voucherPrefix,d.voucherNo
from VoucherCr as c inner join VoucherDr as d
on c.voucherNo=d.VoucherNo and c.voucherPrefix=d.voucherPrefix
where c.crparty=1
group by d.dramount,c.cramount,d.voucherPrefix,d.voucherNo,c.crParty,
c.voucherType,d.drParty
order by d.dramount desc
Try SQLFIDDLE

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Joining tables and aggregation/sub queries - sql

Related

Joining a transaction fact table to a periodic snapshot table in SQL using the nearest date

Access sql Moving Average of Top N With 2 criterias

SELECT with LEFT JOIN performing math operation twice?

Joining to another table only on the first occurrence of a field

Show account balance from multiple tables

Categories

Resources