SQL- Creating an Inner JOIN for Two Columns inside Same Table - sql

I am attempting to work out a practice problem from my book.
The problem goes like this:
Find all the vendors who have invoices that have not been paid yet.
(Hint: the invoice_total will be different than the payment_total).
Rewrite the above query in a total of 3 ways:
Using equijoins, using INNER JOIN and using NATURAL JOIN.
I completed the first step by doing,
SELECT DISTINCT VENDOR_ID
FROM INVOICES
WHERE Invoice_Total != payment_total;
However, when I try to do the inner joins, I keep getting errors.
Both Invoice_Total and Payment_Total are columns inside of the same "INVOICES" table.
How would I be able to show the discrepancies whilst pulling the vendor ID's?
This is a picture of the practice database that I am working with.

It seems silly to inner join a table to itself to solve this particular problem (there are plenty of good reasons to self-join, but this isn't one of them), but I suppose from a "practice problem" standpoint it's reasonable.
I would think here it would be best to pre-aggregate the invoices before the join to cut down on the processing time (unless there is an index in place to help the join):
SELECT t1.vendor_id
FROM (SELECT vendor_id, sum(invoice_total) sum_invoice_total FROM INVOICES GROUP BY vendor_id) t1
INNER JOIN (SELECT vendor_id, sum(payment_total) sum_payment_total FROM INVOICES GROUP BY vendor_id) t2
ON t1.vendor_id = t2.vendor_id
WHERE
t1.sum_invoice_total != t2.sum_payment_total
There is a chance this could break down though if it's possible for a vendor to overpay for an invoice. Consider:
+------------+-----------+---------------+---------------+
| invoice_id | vendor_id | invoice_total | payment_total |
+------------+-----------+---------------+---------------+
| 1 | a | 10 | 20 |
| 2 | a | 10 | 0 |
+------------+-----------+---------------+---------------+
Without pre-aggregating (again this makes no sense, but it will work):
SELECT DISTINCT t1.vendor_id
FROM invoices t1
INNER JOIN invoices t2
ON t1.invoice_id = t2.invoice_id
WHERE
t1.invoice_total != t2.payment_total
This is nearly identical to your original query, but adds in a superfluous inner join. I'm just guessing at your primary key as invoice_id here. Edit as needed.

Related

Select rows in left join which depend on sum of a field in the other table?

Im trying to write a SQL left outer join query where the left rows are selected based on the sum of a field in rows in the other (right) table. The other table has an id field that links back to the left table and there is a one-to-many relationship between left and right tables. The tables (simplified to relevant fields only) look like this:
left_table:
+--------+
| id |
| amount |
+--------+
right_table:
+-------------------+
| id |
| amount |
| left_table_row_id |
+-------------------+
Basically the right table's rows' amount fields have fractions of the amounts in the left table and are associated back to the left_table, so several right_table rows might be linked to a single left_table row.
Im trying to select only left_table rows where left_table.id=right_table_id and where the sum of the amounts in the right_table's rows with linked id are equal to left_table.amount. We can't use aggregate in a WHERE clause and I had no luck with using HAVING. I hope that makes sense.
You can filter with a correlated subquery:
select l.*
from left_table l
where l.amount = (select sum(r.amount) from right_table r where r.id = l.id)
This should be possible with the following query:
with agg as
(
select left_table_row_id,sum(amount) as amount
from right_table
group by left_table_row_id
)
select *
from left_table lt
where exists (select 1 from agg where lt.id=agg.left_table_row_id and lt.amount = agg.amount)

Select MAX and RIGHT OUTER JOIN

I have platform to extract data from sql tables and so far all queries were generated by simple drag and drop tool. Now I am trying to change query manually, but it's not working as expected...
Can you take a look?
Query delivered by generator:
SELECT
repo.MAT.MAT_A_COD,
inventory.INV.MRP_RQMT_DT,
SUM(inventory.INV.MRP_AVL_QTY)
FROM
repo.MAT RIGHT OUTER JOIN inventory.INV ON (inventory.INV.MRP_MAT_A_FK=repo.MAT.MAT_A_PK)
WHERE
( inventory.INV.MRP_COMPANY_COD IN ('01','02') )
GROUP BY
1,
2
Results:
Material A | 2020.01.01 | 100
Material A | 2020.01.02 | 200
Material A | 2020.01.03 | 300
Material B | 2020.01.01 | 10
Material B | 2020.01.02 | 0
What I am looking for: only values for the latest date for each material.
Material A | 2020.01.03 | 300
Material B | 2020.01.02 | 0
I tried with MAX(inventory.INV.MRP_RQMT_DT), but no success. Any help is appreciated!
You can try the below -
SELECT
repo.MAT.MAT_A_COD,
inventory.INV.MRP_RQMT_DT,
SUM(inventory.INV.MRP_AVL_QTY)
FROM
repo.MAT RIGHT OUTER JOIN inventory.INV ON inventory.INV.MRP_MAT_A_FK=repo.MAT.MAT_A_PK
WHERE
inventory.INV.MRP_COMPANY_COD IN ('01','02') and inventory.INV.MRP_RQMT_DT=(select max(inventory.INV.MRP_RQMT_DT) from inventory.INV inv1 where inventory.INV.MRP_MAT_A_FK=inv1.MRP_MAT_A_FK)
GROUP BY 1, 2
You did not specify the database engine, but the RANK windows function works in many major ones (I will use T-SQL syntax).
SELECT * FROM (
SELECT
repo.MAT.MAT_A_COD,
inventory.INV.MRP_RQMT_DT,
SUM(inventory.INV.MRP_AVL_QTY),
RANK () OVER (PARTITION BY repo.MAT.MAT_A_COD ORDER BY inventory.INV.MRP_RQMT_DT) rn
FROM repo.MAT RIGHT OUTER JOIN inventory.INV ON (inventory.INV.MRP_MAT_A_FK=repo.MAT.MAT_A_PK)
WHERE inventory.INV.MRP_COMPANY_COD IN ('01','02')
GROUP BY 1, 2
)
WHERE rn = 1
You can use window functions:
SELECT m.MAT_A_COD, i.MRP_RQMT_DT,
SUM(i.MRP_AVL_QTY)
FROM repo.MAT LEFT JOIN
(SELECT i.*,
MAX(MRP_RQMT_DT) OVER (PARTITION BY MRP_MAT_A_FK ORDER BY DESC) as max_MRP_RQMT_DT
FROM inventory.INV i
) i
ON i.MRP_MAT_A_FK = r.MAT_A_PK AND
i.MRP_RQMT_DT = i.max_MRP_RQMT_DT
WHERE i.MRP_COMPANY_COD IN ('01', '02')
GROUP BY 1, 2;
Note other changes to the query:
Table aliases make the query easier to write and to read.
An outer join does not seem necessary at all. But if you do use one, it is probably on the MAT table, not the inventory table.
If you use an outer join, you should try to take the columns from the table where you are keeping all the rows -- the first table in a LEFT JOIN. I don't recommend RIGHT JOINs in general.

Database query- find most expensive part

Here is my schema
Suppliers(​sid ,​sname,address)
Cata(sid,pid,cost)
Parts(pid,pname,color)
bolded are primary keys
I am trying to write a query
"Find the pids of the most expensive parts"
I am using set difference here is my query however its returning all the pids in the catalogue not the one with the highest cost
select Cata.pid
from Cata
where pid not in(
select c.pid
from Cata c, Cata f
where c.sid=f.sid AND c.pid=f.pid AND c.cost<f.cost
);
Try this one:
select c1.pid
from Cata c1
where not exists (
select c2.pid
from Cata c2
where c2.cost > c1.f.cost
);
If you are wondering what is wrong with your query, notice that the inner SELECT is returning 0 rows, because you are comparing the cost of the items with themselves, so c.cost is always equal to f.cost, so the < comparation fails, so the inner select returns 0 rows, so the "not in" condition is true for all the rows
If you want the pid with the single highest cost:
SELECT TOP 1 WITH TIES
c.pid,
c.cost
FROM
Cata AS c
ORDER BY
c.cost DESC
If you want the five highest cost pids, change the first line of that to:
SELECT TOP 5 WITH TIES
I think this is what you are looking for
SELECT
p.PID
, MAX(c.COST)
FROM
Parts p
LEFT JOIN
Cata c
ON p.PID = c.PID
GROUP BY
p.PID
ORDER BY
MAX(c.COST)
This will return you the most expensive part per PID
Good luck!
You want to
Find the pids of the most expensive parts
As you have not mentioned your requirement clearly,
I have given 2 solutions
Solution 1 : to find most expansive parts for each supplier
Solution 2 : to find most expensive part amongst all the parts
Use which ever suits you.
Solution 1:
SELECT Cata.pid FROM Cata
LEFT OUTER JOIN (SELECT Cata.sid, MAX(Cata.cost) cost FROM Cata GROUP BY Cata.sid) MostExpensive
ON Cata.sid = MostExpensive.sid AND Cata.cost = MostExpensive.cost
Query explanation:
First you should try to find what is most expensive as per cost for each sid
then once you have that got that most expensive table derived, find pids which matches the cost in the same sid.
If there is a tie between 2 parts for being most expensive, this query will return pids for both the parts.
Solution 2:
If you looking for most expansive parts across all the suppliers then query could be simplified as below.
SELECT Cata.pid FROM Cata
WHERE Cata.cost = (SELECT MAX(cost) cost FROM Cata)
Find the pids of the most expensive parts
The most expensive parts are the ones where the minimum cost is highest, i.e. you can get all other parts cheaper than these ones where you must at least pay xxx $. You get these with a top query.
select top(1) with ties
pid
from cata
group by pid
order by min(cost) desc;
Illustration:
pid | supplier A | supplier B | supplier C
----+------------+------------+-----------
p1 | 10$ | 10$ | 100$
p2 | 40$ | 50$ | 60$
Which part is the more expensive one? I can buy p1 for 10$. For p2 I must pay at least 40$. So p2 is more expensive. That supplier C wants an outrageous 100$ for p1 doesn't matter, because who would pay 100$ for something you can get for 10$? So it's the minimum prices we must compare.

WHERE using a temporary table

I have three tables: customers, orders and refunds. Some customers (not all) placed orders and for some orders (not all) there were refunds.
When I join the three tables like this (the details are not that important):
SELECT ...
FROM customers LEFT JOIN orders
ON customers.customer_id=orders.customer_id
LEFT JOIN refunds
ON orders.order_id=refunds.order_id;
//WHERE order_id IS NOT NULL;// uncomment to filter out customers that have no orders
I get a big table in which all customers are listed (even the ones that have not placed any orders and they have NULL in the 'order_id' column), with all their orders and the orders' refunds (even if not all orders have refunds):
NAME ORDER_ID ORDER AMOUNT REFUND
------------------------------------------------------------
Natalie 2 12.50 NULL
Natalie 3 18.00 18.00
Brenda 4 20.00 NULL
Adam NULL NULL NULL
Since I only want to see only customers that have placed orders, i.e in this case I want to filter Adam from the table, I uncomment the 'WHERE' row from the SQL query above.
This yields the desired result.
My question is:
On which table is the WHERE executed - on the original 'orders' table (which has no order_id that is NULL) or on the table that is result of the JOINs?
Apparently it is the latter, but just want to make sure, since it is not very obvious from the SQL syntax and it is a very important point.
Thank you
In this case, you're making SQL work harder than it has to. It is operating on the results (likely a MERGE event, or something along those lines).
There's a chance SQL is realizing what you're doing and optimizing the plan and changing to an INNER JOIN for you. But I can't be certain (and neither can SQL -- it can change how it optimizes over time).
In the case where you only want where an order is there, use an INNER JOIN instead. SQL will be much more efficient at this.
SELECT ...
FROM customers
INNER JOIN orders
ON customers.customer_id=orders.customer_id
LEFT JOIN refunds
ON orders.order_id=refunds.order_id;
You can change the LEFT JOIN as INNER JOIN to eliminate customers which don't have any order
SELECT ...
FROM customers INNER JOIN orders
ON customers.customer_id=orders.customer_id
LEFT JOIN refunds
ON orders.order_id=refunds.order_id;
It's because you're using LEFT JOIN, which will return all rows from the left hand table, in your case this is the Customer Table, and return NULL where no corresponding values appear in the right hand tables.
Just rewrite it using inner joins, so only rows where matching data is found will be returned.
SELECT ...
FROM customers
INNER JOIN orders
ON customers.customer_id=orders.customer_id
INNER JOIN refunds
ON orders.order_id=refunds.order_id;

SQL Select statement which shows row-specific count information?

I need some SQL help ..
Suppose I have 2 tables: Customers and Products.
Now, I want to see a SQL statement that will show me these two columns:
Customer | Number of orders placed
How do I do that ?
The second column is a non-existent column, which is showing a number which states how many orders that customer has placed.
For example:
Customer | Number of orders placed
-------- | -----------------------
John | 23
Jack | 5
Mary | 12
etc ..
What's the SQL for this kind of a select ?
I guess that the Product table contains a foreign key CustomerID which references the Customer. The resulting query would be
select Customers.Name, Count(*)
from Customers join Products
on Customers.CustomerID = Products.CustomerID
However, this is just a guess as you forgot to inform us about the relation between the two tables, i.e. how the Products know to which Customer they belong.
Also, but this is a bit picky, you want the number of orders but only have a 'Product' table...
JOIN. This is just making up your column names since tables weren't given.
SELECT
c.Name,
myOrders = COUNT(o.id)
FROM Customers c
INNER JOIN Orders o
ON c.id = o.customerId
GROUP BY c.Name
Some quick reading: JOINS. GROUP BY