Query to join tables based on two criteria - sql

First, I'm not sure the title adequetely describes what I am trying to achive - so please ammend as you see fit.
I have a table in an SQL database which records budget allocations and transfers.
Each allocation and transfer is recorded against a combination of two details - the year_ID and program_ID. Allocations can come from nowhere or from other year_id & program_id combinations - these are the transfers.
For example, year_ID 1 & program_ID 2 was allocated $1000, then year_ID 1 & program_ID 2 transfered $100 to year_ID 2 & program_id 2.
This is stored in the database like
From_year_ID From_program_ID To_year_ID To_program_ID Budget
null null 1 2 1000
1 2 2 2 100
The query needs to summarise these budget allocations based on the year_id + program_id combination, so the results would display:
year_ID program_ID Budget_Allocations Budget_Transfers
1 2 1000 100
2 2 100 0
I've spent two days trying to put this query together and am officially stuck - could someone help me out or point me in the right direction? I've tried what feels like every combination of left, right, inner, union joins, with etc - but haven't got the outcome I'm looking for.
Here is a sqlfiddle with sample data: http://sqlfiddle.com/#!3/9c1ec/1/0 and one of the queries that doesnt quite work.

I would sum the Budget by Program_ID and Year_ID in some CTEs and join those to the Program and Year tables to avoid summing Budget values more than once.
WITH
bt AS
(SELECT
To_Year_ID AS Year_ID,
To_Program_ID AS Program_ID,
SUM(Budget) AS Budget_Allocation
FROM T_Budget
GROUP BY
To_Year_ID,
To_Program_ID),
bf AS
(SELECT
From_Year_ID AS Year_ID,
From_Program_ID AS Program_ID,
SUM(Budget) AS Budget_Transfer
FROM T_Budget
GROUP BY
From_Year_ID,
From_Program_ID)
SELECT
y.Year_ID,
p.Program_id,
bt.Budget_Allocation,
bf.Budget_Transfer,
y.Short_Name + ' ' + p.Short_Name AS Year_Program,
isnull(bt.Budget_Allocation,0) -
isnull(bf.Budget_Transfer,0)AS Budget_Balance
FROM T_Programs p
CROSS JOIN T_Years y
INNER JOIN bt
ON bt.Program_ID = p.Program_ID
AND bt.Year_ID = y.Year_ID
LEFT JOIN bf
ON bf.Program_ID = p.Program_ID
AND bf.Year_ID = y.Year_ID
ORDER BY
y.Year_ID,
p.Program_ID
http://sqlfiddle.com/#!3/9c1ec/13

Related

Inner join + group by - select common columns and aggregate functions

Let's say i have two tables
Customer
---
Id Name
1 Foo
2 Bar
and
CustomerPurchase
---
CustomerId, Amount, AmountVAT, Accountable(bit)
1 10 11 1
1 20 22 0
2 5 6 0
2 2 3 0
I need a single record for every joined and grouped Customer and CustomerPurchase group.
Every record would contain
columns from table Customer
some aggregation functions like SUM
a 'calculated' column. For example difference of other columns
result of subquery to CustomerPurchase table
An example of result i would like to get
CustomerPurchases
---
Name Total TotalVAT VAT TotalAccountable
Foo 30 33 3 10
Bar 7 9 2 0
I was able to get a single row only by grouping by all the common columns, which i dont think is the right way to do. Plus i have no idea how to do the 'VAT' column and 'TotalAccountable' column, which filters out only certain rows of CustomerPurchase, and then runs some kind of aggregate function on the result. Following example doesn't work ofc but i wanted to show what i would like to achieve
select C.Name,
SUM(CP.Amount) as 'Total',
SUM(CP.AmountVAT) as 'TotalVAT',
diff? as 'VAT',
subquery? as 'TotalAccountable'
from Customer C
inner join CustomerPurchase CR
on C.Id = CR.CustomerId
group by C.Id
I would suggest you just need the follow slight changes to your query. I would also consider for clarity, if you can, to use the terms net and gross which is typical for prices excluding and including VAT.
select c.[Name],
Sum(cp.Amount) as Total,
Sum(cp.AmountVAT) as TotalVAT,
Sum(cp.AmountVAT) - Sum(CP.Amount) as VAT,
Sum(case when cp.Accountable = 1 then cp.Amount end) as TotalAccountable
from Customer c
join CustomerPurchase cp on cp.CustomerId = c.Id
group by c.[Name];

Left Join Is Returning One value for entire column

I am doing a left join on another table in the subquery and the commission column I want to return from the left join is only bringing one value for the entire commission column which is wrong (see first query result below). Now if I do the left join outside of table A (query 2) then I get the desired results (see second result set). The question is why isn't the left join working within table A in the first query.
I have tried left joining outside of subquery/table A (query 2) and it it works fine but I want to learn why isn't the left join working within the table A.
The query (query 1) is below which is giving the duplicate values in the commission column
Position_A1 table
Sector Short_Side
------------------------
Engineering -2
Financial -5
Industry -10
Corporate -36
Energy -52
Financial -26
Order table
Sector Commission
------------------------
Engineering 10
Financial 100
Industry 36
Corporate 91
Energy 10
Financial 25
Query 1
SELECT *
FROM
(SELECT POS.SECTOR,
SUM(ABS(POS.SHORT_SIDE)) AS Short_Expo,
COM.COMMISSION
FROM Position_A1 POS
LEFT JOIN (SELECT SECTOR, sum(COMMISSION) AS COMMISSION
FROM ORDER
WHERE TRADE_DATE = TO_DATE('2019-11-01','YYYY-MM-DD')
GROUP BY SECTOR
)COM
ON POS.SECTOR = COM.SECTOR
WHERE TRADE_DATE = TO_DATE('2019-11-01','YYYY-MM-DD')
GROUP BY SECTOR ) A
However, if I try the below, I get the correct results in the commission column.
Query 2
SELECT A.*, COM.COMMISSION
FROM
(SELECT POS.SECTOR,
SUM(ABS(POS.SHORT_SIDE)) AS Short_Expo
FROM Position_A1 POS
WHERE TRADE_DATE = TO_DATE('2019-11-01','YYYY-MM-DD')
GROUP BY SECTOR ) A
LEFT JOIN (SELECT SECTOR, sum(COMMISSION) AS COMMISSION
FROM ORDER
WHERE TRADE_DATE = TO_DATE('2019-11-01','YYYY-MM-DD')
GROUP BY SECTOR
)COM
ON POS.SECTOR = COM.SECTOR
As per the first query the result I get is:
Sector Short_Expo Commission
Energy 256 125
Industry 236 125
Financial 125 125
As per the second query the result I get (which is correct) is:
Sector Short_Expo Commission
Energy 256 128
Industry 236 325
Financial 125 186
The question is why isn't query one giving the ideal result whereas query 2. What am I doing wrong in the first query that is resulting in duplicate commission?
Using the first it seems that the commission for only one sector (Financial) is being returned for all sectors.
In the first query, the WHERE clause in the outer query (WHERE TRADE_DATE = TO_DATE('2019-11-01', 'YYYY-MM-DD')) is turning the LEFT JOIN into an INNER JOIN because the non-matching values are NULL and they don't match.
The normal solution is to include filtering on subsequent tables in the ON clause.

How to merge two tables with one to many relationship

I have two tables to main orders and ordered products.
Table 1: ORDERS
"CREATE TABLE IF NOT EXISTS ORDERS("
"id_order INTEGER PRIMARY KEY AUTOINCREMENT,
"o_date TEXT,"
"o_seller TEXT,"
"o_buyer TEXT,"
"o_shipping INTEGER,"
"d_amount INTEGER,"
"d_comm INTEGER,"
"d_netAmount INTEGER)"
Table 2: ORDERED_PRODUCTS
"CREATE TABLE IF NOT EXISTS dispatch_products("
"id_order INTEGER NOT NULL REFERENCES ORDERS(id_order),"
"product_name INTEGER,"
"quantity INTEGER,"
"rate INTEGER)"
I tried to join these two tables using following query:
SELECT *
FROM ORDERS a
INNER JOIN ORDERED_PRODUCTS b
ON a.id_order = b.id_order
WHERE a.buyer = 'abc'
The issue is with the entries with multiple products in table 2.
The output I'm getting is like below:
order_ID date seller buyer Ship amt comm nAmt Prod Qty Rate
1 A x 5 100 5 115 Scale 10 10
2 B abc 10 100 5 115 pen 5 10
2 B abc 10 100 5 115 paper 10 5
3 C xyz 10 100 5 220 book 5 20
3 C xyz 10 100 5 220 stapl 10 10
expected output:
order_ID date seller buyer Ship amt comm nAmt Prod Qty Rate
1 A x 5 100 5 115 Scale 10 10
2 B abc 10 100 5 115 pen 5 10
Paper 10 5
3 C xyz 10 100 5 220 Book 5 20
Stapl 10 10
Databases don't really work like that; you got what you asked for, and with no duplicates (all rows are different). You're looking at the columns of data that came from orders and saying "oh, the data is duplicated" but it isn't - it's joined "in context"
Imagine I gave you just one of your sample rows from your expected output:
Paper 10 5
Promise I just copy pasted that.
What order is it from?
No idea.. You've lost the context, so it could be from any order. Rows are individual entities, that stand alone and without reference to any other row, as a set of data. This is why the same order info needs to appear on each row. A database could be made to produce the expected output you asked for, but it would be really quote complex in a low end database like sqlite. More important to me is to point out why there's a difference between what you thought the query would give you, and what it gave you, as I think that's the real problem: the query gave you what it was supposed to, there's no fault in it; it's more a faulty assumption of what you'd get
If you're trying to prepare a report that uses the order as some kind of header, select them individually in the front end app. Select ALL the orders, then one by one (order by order) pull all the item detail out, building the report as you go:
myorders = dbquery("SELECT * FROM ORDERS")
for each(order o in myorders)
print(o.header)
details = dbquery("SELECT * FROM dispatch_products where id_order = ?", o.id)
for each(detail d in details)
print(d.prod, d.qty, d.rate)
Here's a way to make the DB do it, but you'll need a version of SQLite that supports window functions (3.10 doesn't) or another db (SQLS > 2008, Oracle > 9, or other big-name db from the last 10 or so years, or a very recent MySQL):
SELECT
CASE WHEN rn = 1 THEN d.o_date END as o_date,
CASE WHEN rn = 1 THEN d.o_seller END as o_seller,
CASE WHEN rn = 1 THEN d.o_buyer END as o_buyer,
CASE WHEN rn = 1 THEN d.o_shipping END as o_shipping,
CASE WHEN rn = 1 THEN d.d_amount END as d_amount,
CASE WHEN rn = 1 THEN d.d_comm END as d_comm,
CASE WHEN rn = 1 THEN d.d_netAmount END as d_netAmount,
d.name,
d.qty,
d.rate
FROM
SELECT o.*, op.name, op.qty, op.rate, row_number() over(partition by o.id_order order by op.name, op.qty, op.rate) rn
FROM ORDERS o
INNER JOIN ORDERED_PRODUCTS op
ON o.id_order = op.id_order
WHERE o.buyer = 'abc'
) d
ORDER BY d.id_order, d.rn
We basically take your query, add on a row number that restarts every time order id changes, and only show data from the orders table where rownumber is 1. If your SQLite doesn't have row_number you can fake it: How to use ROW_NUMBER in sqlite which i'll leave as an exercise for the reader :)

Query optimization to yield computational results

Table Structure
It is very difficult to add tables in this posting, atleast I dont know. Tried using HTML table tags, but they wont appear good. Hence posting the table structure as an image.
Considering the 3 tables seen in the image, Projects, BC, Actual Spend, as an sample, I'm looking for an optimal query that returns the Reports as the result. As you can see, BC has some computation, Actual Spend has
SELECT ProjectId, Name, Budget
, (SELECT b.[BC] FROM [BC] b
WHERE b.[BC] IN
(SELECT SUM(mx.[BC]) FROM [BC] mx
WHERE ProjectId=p.ProjectId)) AS 'BC'
, (SELECT sp.[ActualSpendAmount] FROM [ActualSpend] sp
WHERE sp.[DateSpent] IN
(SELECT MAX(as.[DateSpent]) FROM [ActualSpend] as
WHERE ProjectId=p.ProjectId)) AS 'Actual Spend'
, t.[Budget] - ((SELECT b.[BC] FROM [BC] b
WHERE b.[BC] IN
(SELECT SUM(mx.[BC]) FROM [BC] mx
WHERE ProjectId=p.ProjectId))
+
(SELECT sp.[ActualSpendAmount] FROM [ActualSpend] sp
WHERE sp.[DateSpent] IN
(SELECT MAX(as.[DateSpent]) FROM [ActualSpend] as
WHERE ProjectId=p.ProjectId)))
FROM Projects p;
As you can see, the SELECT for BC, Actual Spend is run twice. I have several other tables like BC, Actual Spend, that yields some computation. Is there any way to optimize this. Even if I put them in a function, it would be the same, the function would need to be called more than once.
Is there a way to optimize this query.
Pasting the table structure below:
Projects Table:
ProjectId Name Budget
1 DeadRock 500000
2 HardRock 300000
BC Table: Actual Spend Table:
ProjectId BCId BC ApprovalDate ProjectId ActualSpendId ActualSpendAmount DateSpent
1 1 5000 2015/02/01 1 1 " 15000" " 2015/03/01"
1 2 3000 2015/03/10 1 2 " 33000" " 2015/05/12"
1 3 15000 2015/05/01 1 3 " 45000" " 2015/06/03"
1 4 5000 2015/07/01 1 4 " 75000" " 2015/07/11"
2 5 2000 2015/03/19 2 5 " 5000" " 2015/04/20"
2 6 6000 2015/05/20 2 6 " 19000" " 2015/05/29"
2 7 25000 2015/08/01 2 7 " 42000" " 2015/06/23"
2 8 " 85000" " 2015/07/15"
Report:
ProjectId Name Budget BC Actual Spend ETC
"1 " DeadRock 500,000 28,000 75,000 397,000 Budget-(BC+ActualSpend)
"2 " HardRock 300,000 " 33,000" 85,000 182,000 Budget-(BC+ActualSpend)
Based on your expected result your query is way too complicated (and will not run without errors).
Assuming your DBMS supports Windowed Aggregates:
SELECT p.ProjectId, p.NAME, p.Budget,
BC.BC,
act.ActualSpendAmount,
p.Budget - (BC.BC + act.ActualSpendAmount)
FROM Projects AS p
LEFT JOIN
( -- sum of BC per project
SELECT ProjectId, SUM(BC) AS BC
FROM BC
GROUP BY ProjectId
) AS BC
ON ProjectId=bc.ProjectId
JOIN
( -- latest amount per project
SELECT ProjectId, ActualSpendAmount,
ROW_NUMBER()
OVER (PARTITION BY ProjectId
ORDER BY DateSpent DESC) AS rn
FROM ActualSpend
) AS Act
ON Act.ProjectId=p.ProjectId
AND Act.rn = 1
Your correlated subquery for BC does not make any sense:
, (SELECT b.[BC] FROM [BC] b
WHERE b.[BC] IN
(SELECT SUM(mx.[BC]) FROM [BC] mx
WHERE ProjectId=p.ProjectId)) AS 'BC'
If we concentrate on projectID 1, you have the data here:
ProjectId BCId BC ApprovalDate
--------------------------------------------
1 1 5000 2015/02/01
1 2 3000 2015/03/10
1 3 15000 2015/05/01
1 4 5000 2015/07/01
Therefore this part of the query:
(SELECT SUM(mx.[BC]) FROM [BC] mx
WHERE ProjectId=p.ProjectId)
Will return 28,000. Then what you essentially have is:
, (SELECT b.[BC] FROM [BC] b
WHERE b.[BC] IN (28000)) AS 'BC'
Which will return null, unless there happens to be 1 and only 1 record in the table for any project with that particular amount in BC. If there is more than one you will get an error because more than one record is returned in the subquery.
I suspect, based on the report data you simply want the sum, so you can simply use:
SELECT p.ProjectId,
p.Name,
p.Budget
bc.BC
FROM Projects p
LEFT JOIN
( SELECT bc.ProjectID, SUM(bc.BC) AS bc
FROM BC
GROUP BY bc.ProjectID
) AS bc
ON bc.ProjectID = P.ProjectID;
To get the SUM of BC for each project.
I have made an assumption here that you are using SQL Server based on the use of square brackets for object names, and syntax in your previous questions. In which case I would use OUTER APPLY to get the latest actual spend row, giving a final query of:
SELECT p.ProjectId,
p.Name,
p.Budget
bc.BC,
sp.ActualSpendAmount AS [Actual Spend],
p.Budget - bc.BC + sp.ActualSpendAmount AS ETC
FROM Projects p
LEFT JOIN
( SELECT bc.ProjectID, SUM(bc.BC) AS bc
FROM BC
GROUP BY bc.ProjectID
) AS bc
ON bc.ProjectID = P.ProjectID
OUTER APPLY
( SELECT TOP 1 sp.ActualSpendAmount
FROM [ActualSpend] AS sp
WHERE sp.ProjectID = p.ProjectID
ORDER BY sp.DateSpent DESC
) AS sp;

SQL Inner Join query

I have following table structures,
cust_info
cust_id
cust_name
bill_info
bill_id
cust_id
bill_amount
bill_date
paid_info
paid_id
bill_id
paid_amount
paid_date
Now my output should display records (1 jan 2013 to 1 feb 2013) between two bill_dates dates as single row as follows,
cust_name | bill_id | bill_amount | tpaid_amount | bill_date | balance
where tpaid_amount is total paid for particular bill_id
For example,
for bill id abcd, bill_amount is 10000 and user pays 2000 one time and 3000 second time
means, paid_info table contains two entries for same bill_id
bill_id | paid_amount
abcd 2000
abcd 3000
so, tpaid_amount = 2000 + 3000 = 5000 and balance = 10000 - tpaid_amount = 10000 - 5000 = 5000
Is there any way to do this with single query (inner joins)?
You'd want to join the 3 tables, then group them by bill ids and other relevant data, like so.
-- the select line, as well as getting your columns to display, is where you'll work
-- out your computed columns, or what are called aggregate functions, such as tpaid and balance
SELECT c.cust_name, p.bill_id, b.bill_amount, SUM(p.paid_amount) AS tpaid, b.bill_date, b.bill_amount - SUM(p.paid_amount) AS balance
-- joining up the 3 tables here on the id columns that point to the other tables
FROM cust_info c INNER JOIN bill_info b ON c.cust_id = b.cust_id
INNER JOIN paid_info p ON p.bill_id = b.bill_id
-- between pretty much does what it says
WHERE b.bill_date BETWEEN '2013-01-01' AND '2013-02-01'
-- in group by, we not only need to join rows together based on which bill they're for
-- (bill_id), but also any column we want to select in SELECT.
GROUP BY c.cust_name, p.bill_id, b.bill_amount, b.bill_date
A quick overview of group by: It will take your result set and smoosh rows together, based on where they have the same data in the columns you give it. Since each bill will have the same customer name, amount, date, etc, we are fine to group by those as well as the bill id, and we'll get a record for each bill. If we wanted to group it by p.paid_amount, though, since each payment would have a different one of those (possibly), you'd get a record for each payment as opposed to for each bill, which isn't what you'd want. Once group by has smooshed these rows together, you can run aggregate functions such as SUM(column). In this example, SUM(p.paid_amount) totals up all the payments that have that bill_id to work out how much has been paid. For more information, please look at W3Schools chapter on group by in their SQL tutorials.
Hope I've understood this correctly and that this helps you.
This will do the trick;
select
cust_name,
bill_id,
bill_amount,
sum(paid_amount),
bill_date,
bill_amount - sum(paid_amount)
from
cust_info
left outer join bill_info
left outer join paid_info
on bill_info.bill_id=paid_info.bill_id
on cust_info.cust_id=bill_info.cust_id
where
bill_info.bill_date between X and Y
group by
cust_name,
bill_id,
bill_amount,
bill_date