Database query- find most expensive part - sql

Here is my schema
Suppliers(​sid ,​sname,address)
Cata(sid,pid,cost)
Parts(pid,pname,color)
bolded are primary keys
I am trying to write a query
"Find the pids of the most expensive parts"
I am using set difference here is my query however its returning all the pids in the catalogue not the one with the highest cost
select Cata.pid
from Cata
where pid not in(
select c.pid
from Cata c, Cata f
where c.sid=f.sid AND c.pid=f.pid AND c.cost<f.cost
);

Try this one:
select c1.pid
from Cata c1
where not exists (
select c2.pid
from Cata c2
where c2.cost > c1.f.cost
);
If you are wondering what is wrong with your query, notice that the inner SELECT is returning 0 rows, because you are comparing the cost of the items with themselves, so c.cost is always equal to f.cost, so the < comparation fails, so the inner select returns 0 rows, so the "not in" condition is true for all the rows

If you want the pid with the single highest cost:
SELECT TOP 1 WITH TIES
c.pid,
c.cost
FROM
Cata AS c
ORDER BY
c.cost DESC
If you want the five highest cost pids, change the first line of that to:
SELECT TOP 5 WITH TIES

I think this is what you are looking for
SELECT
p.PID
, MAX(c.COST)
FROM
Parts p
LEFT JOIN
Cata c
ON p.PID = c.PID
GROUP BY
p.PID
ORDER BY
MAX(c.COST)
This will return you the most expensive part per PID
Good luck!

You want to
Find the pids of the most expensive parts
As you have not mentioned your requirement clearly,
I have given 2 solutions
Solution 1 : to find most expansive parts for each supplier
Solution 2 : to find most expensive part amongst all the parts
Use which ever suits you.
Solution 1:
SELECT Cata.pid FROM Cata
LEFT OUTER JOIN (SELECT Cata.sid, MAX(Cata.cost) cost FROM Cata GROUP BY Cata.sid) MostExpensive
ON Cata.sid = MostExpensive.sid AND Cata.cost = MostExpensive.cost
Query explanation:
First you should try to find what is most expensive as per cost for each sid
then once you have that got that most expensive table derived, find pids which matches the cost in the same sid.
If there is a tie between 2 parts for being most expensive, this query will return pids for both the parts.
Solution 2:
If you looking for most expansive parts across all the suppliers then query could be simplified as below.
SELECT Cata.pid FROM Cata
WHERE Cata.cost = (SELECT MAX(cost) cost FROM Cata)

Find the pids of the most expensive parts
The most expensive parts are the ones where the minimum cost is highest, i.e. you can get all other parts cheaper than these ones where you must at least pay xxx $. You get these with a top query.
select top(1) with ties
pid
from cata
group by pid
order by min(cost) desc;
Illustration:
pid | supplier A | supplier B | supplier C
----+------------+------------+-----------
p1 | 10$ | 10$ | 100$
p2 | 40$ | 50$ | 60$
Which part is the more expensive one? I can buy p1 for 10$. For p2 I must pay at least 40$. So p2 is more expensive. That supplier C wants an outrageous 100$ for p1 doesn't matter, because who would pay 100$ for something you can get for 10$? So it's the minimum prices we must compare.

Related

SQL- Creating an Inner JOIN for Two Columns inside Same Table

I am attempting to work out a practice problem from my book.
The problem goes like this:
Find all the vendors who have invoices that have not been paid yet.
(Hint: the invoice_total will be different than the payment_total).
Rewrite the above query in a total of 3 ways:
Using equijoins, using INNER JOIN and using NATURAL JOIN.
I completed the first step by doing,
SELECT DISTINCT VENDOR_ID
FROM INVOICES
WHERE Invoice_Total != payment_total;
However, when I try to do the inner joins, I keep getting errors.
Both Invoice_Total and Payment_Total are columns inside of the same "INVOICES" table.
How would I be able to show the discrepancies whilst pulling the vendor ID's?
This is a picture of the practice database that I am working with.
It seems silly to inner join a table to itself to solve this particular problem (there are plenty of good reasons to self-join, but this isn't one of them), but I suppose from a "practice problem" standpoint it's reasonable.
I would think here it would be best to pre-aggregate the invoices before the join to cut down on the processing time (unless there is an index in place to help the join):
SELECT t1.vendor_id
FROM (SELECT vendor_id, sum(invoice_total) sum_invoice_total FROM INVOICES GROUP BY vendor_id) t1
INNER JOIN (SELECT vendor_id, sum(payment_total) sum_payment_total FROM INVOICES GROUP BY vendor_id) t2
ON t1.vendor_id = t2.vendor_id
WHERE
t1.sum_invoice_total != t2.sum_payment_total
There is a chance this could break down though if it's possible for a vendor to overpay for an invoice. Consider:
+------------+-----------+---------------+---------------+
| invoice_id | vendor_id | invoice_total | payment_total |
+------------+-----------+---------------+---------------+
| 1 | a | 10 | 20 |
| 2 | a | 10 | 0 |
+------------+-----------+---------------+---------------+
Without pre-aggregating (again this makes no sense, but it will work):
SELECT DISTINCT t1.vendor_id
FROM invoices t1
INNER JOIN invoices t2
ON t1.invoice_id = t2.invoice_id
WHERE
t1.invoice_total != t2.payment_total
This is nearly identical to your original query, but adds in a superfluous inner join. I'm just guessing at your primary key as invoice_id here. Edit as needed.

Take one condition & show result from two different tables

I have these 3 data tables (quick version):
Payments:
id | datepaid | amountpaid
---------------------------
112|03/5/2017 |9000
115|03/21/2017| 800
Individuals:
id|name|lastDatePaid
--------------------
112|bob|03/2/2017
114|kary|2/3/2016
Business:
id|name|lastDatePaid
--------------------
115| Bakery Love | 05/20/2017
My question is: how would I get the result of both business name and individual name when the condition is the for each payment that was made on or after March 2, 2017?
I was able to get show result in one condition, but I don't seem to understand to get both individual and business to show up?
Want result:
112|bob
115|Bakery Love
This is my version that one result show for example Individual:
SELECT DISTINCT p1.id, i1.name
FROM Payments p1, Individuals i1
WHERE p1.datePaid >= DATE '2017-03-01' AND p1.id = i1.id;
Which this code result only to:
112|bob
Also I've read a book that this code is also equivalent, but I don't think so?
SELECT DISTINCT p1.id, i1.name
FROM Individuals i1, Payments p1
WHERE i1.id IN (SELECT DISTINCT p1.id
FROM Payments p1
WHERE p1.datePaid >= DATE '2017-03-01');
When I rewrote it to try to get the same result, I ended up getting unwanted data for some reason. Apparently, it checks the dates (lastDatePaid) in the Individual and businesses table and print without having that AND condition that it should be the id from the payments.(this is the full version of the data).
One approach you could take is to put the 'name' tables on top of each other.
Select * from individuals
Union all
Select * from business
Then you make that a sub query and join it to the payment table...
Select p.id, n.name
From
Payments p
Join
(
Select * from individuals
Union all
Select * from business) as n
On n.id = p.id
It looks like you have figured how to get the date filter to work, so I won't bother adding that.
Er, I figure it out.
I didn't need to use the JOIN statement rather this
SELECT DISTINCT p1.ID, N.name
FROM Payments p1, (SELECT i1.name, i1.ID
FROM Individuals i1
UNION ALL
SELECT b1.name, b1.ID
FROM Businesses b1) AS N
WHERE p1.datePaid >= DATE '2017-03-01' AND p1.ID = N.ID;

Limiting join results

Apologies for the vague question and if this has been asked before, I had a hard time figuring out to articulate the question.
I have three tables; Lot and Salesorder, and OrderLot:
Lot SKU CreationDate
-------------------------
1000-a 1000 2017-04-12
1000-b 1000 2017-04-13
2000-a 2000 2017-04-12
2000-b 2000 2017-04-13
SalesorderID Revenue
-----------------------------
1 $500
2 $250
3 $125
OrderLotID SalesorderID Lot
------------------------------
1 1 1000-a
2 1 2000-a
3 2 1000-b
4 2 2000-b
5 3 1000-a
I'd like to do a join which gives me the total revenue generated given the creation date of the lots in the SalesOrder.
For example, I'd like to use the CreationDate of 2017-04-12 and get the result of $625 (Lots 1000-a and 2000-a were created on this date, and they were used to "fill" SalesorderIDs 1 and 3). But the joins I'm currently using return two rows in the Salesorder 1 and the one row for Salesorder 3, and the result is $1125.
How do I limit the rows returned from the OrderLot so that only unique Salesorder revenue is counted?
Thanks,
jeff
edit. current query is:
select sum(so.revenue)
from salesorder so
inner join orderlot ol on so.lot = ol.lot
inner join lot l on ol.lot = l.lot
where l.creationdate = '2017-04-12'
SELECT SUM(s.Revenue)
FROM SalesOrder s
INNER JOIN (
SELECT DISTINCT SalesOrderID
FROM Lot l
INNER JOIN OrderLot ol on ol.Lot = l.Lot
WHERE l.CreationDate = #CreationDate
) t ON T.SalesOrderID = s.SalesOrderID
OR
SELECT SUM(s.Revenue)
FROM SalesOrder s
WHERE s.SalesOrderID IN (
SELECT DISTINCT SalesOrderID
FROM Lot l
INNER JOIN OrderLot ol on ol.Lot = l.Lot
WHERE l.CreationDate = #CreationDate
)
I find the second option with the IN() condition slightly easier to understand, but I tend to lean towards JOIN when possible, as it tends to perform a little better in my experience and it's easier to adapt it for something more complicated. And as always, if the performance matters that much you should actually profile the query and look at the execution plan. The optimizer can always surprise you.

SQL - Max Vs Inner Join

I have a question on which is a better method in terms of speed.
I have a database with 2 tables that looks like this:
Table2
UniqueID Price
1 100
2 200
3 300
4 400
5 500
Table1
UniqueID User
1 Tom
2 Tom
3 Jerry
4 Jerry
5 Jerry
I would like to get the max price for each user, and I am now faced with 2 choices:
Use Max or using Inner Join suggested in the following post:Getting max value from rows and joining to another table
Which method is more efficient?
The answer to your question is to try both methods, and see which performs faster on your data in your environment. Unless you have a large amount of data, the difference is probably not important.
In this case, the traditional method of group by is probably better:
select u.user, max(p.price)
from table1 u join
table2 p
on u.uniqueid = p.uniqueid
group by u.user;
For such a query, you want an index on table2(uniqueid, price), and perhaps on table1(uniqueid, user) as well. This depends on the database engine.
Instead of a join, I would suggest not exists:
select u.user, p.price
from table1 u join
table2 p
on u.uniqueid = p.uniqueid
where not exists (select 1
from table1 u2 join
table2 p2
on u2.uniqueid = p2.uniqueid
where p2.price > p.price
);
Do note that these do not do exactly the same things. The first will return one row per user, no matter what. This version can return multiple rows, if there are multiple rows with the same price. On the other hand, it can return other columns from the rows with the maximum price, which is convenient.
Because your data structure requires a join in the subquery, I think you should stick with the group by approach.

How to Make a Query to Return Non-Dup values From Two Tables

Suppose the following:
Table Parts
--------------------------------
ID Category Name Price
--------------------------------
1 A Processor 100
2 A MotherBoard 80
3 B Memory Card 40
4 B HD 70
5 C Cooler 10
Table Product_Views
-----------------------------------
Customer Date Part_ID
-----------------------------------
Bill mar-24-15 17:45 1
Wallace mar-25-15 08:17 4
Heather mar-25-15 08:43 1
Chuck mar-25-15 09:01 5
Cindy mar-25-15 11:23 1
How can I build a SQL query in order to retrieve most viewed parts showing: Category, price and number of views, grouped by Category, WITHOUT a sum on Price column? Must I do a subquery or there's a trick to do that in a simple [INNER/LEFT/RIGHT] JOIN?
select p.ID, p.Name, p.price, Count(v.*) "Number of Views"
from parts p
join product_views v
on p.id = v.part_id
group by p.ID, p.Name, p.price
order by Count(v.*) desc
Something along those lines should work.
EDIT for category:
Sorry I've been out for a while. What do you want to do with the category? If you just need it included for analysis you can just add it to the select and group by statements, like the following:
select p.ID, p.Category, p.Name, p.price, Count(v.*) "Number of Views"
from parts p
join product_views v
on p.id = v.part_id
group by p.ID, p.Category, p.Name, p.price
order by Count(v.*) desc
If however you want to see how many views there are per category, you will either need to average your price or leave the price off. If you think about it, in a category you have multiple items with different prices each. So you need some way to unify those prices in order to have a single price point per category. Generally (not always), the average price is the most indicative price of how a category is doing. A query to look at the information at the category level would look something like the following:
select p.Category, AVG(p.price) "Average Price", Count(v.*) "Number of Views"
from parts p
join product_views v
on p.id = v.part_id
group by p.Category
order by Count(v.*) desc
In order to leave off the price, just remove the AVG(p.price) "Average Price" from the query.
You can compare this last query to the previous one and see that the differences are all in the select and group by statements. The select is going to have all of the different things that you want to see and the group by statement is going to choose at what level you want to see those things. So if you want to see how your categories are doing on a whole, then the most detail that you want to group by will be just the category column. If you want to see how well each item is doing, then you will want to group by the ID or name of each item.