Define sort order when updating - sql

I have a script that updates an ID field on one table where that record matches to another table based on criteria.
Below is the general structure of my query.
update p.saleId = e.saleId
from products p inner join sales s on s.crit1 = p.crit1
where p.someDate between s.startDate and s.endDate
This is working fine. My issue is that in some situations there is more than one match on the 'sales' table with this query which is generally ok. I'd however like to sort these results based on another field to make sure the saleId I get is the one with the highest cost.
Is that possible?

As it is the saleID you want to set and the sales table you are looking up, you can probably just update all products records. Then you can write a simple update statement on the table and don't have to join. This makes this much easier to write:
update products p
set saleId =
(
select top(1) s.saleId
from sales s
where s.crit1 = p.crit1
and p.someDate between s.startDate and s.endDate
order by cost desc
);
The main difference to your statement is that mine sets saleId = NULL where there is no match in the sales table, while your lets these untouched. But I guess that doesn't make a difference here.

I hope the below query may solve. Wrote very high level draft as per your question. Please take only the concept not the syntax.
with maxSales as (select salesId, crit1 from sales s1
where cost = (select max(cost) from
sales s2 where s1.crit1 = s2.crit1)
update products p set p.saleId =
(select s.saleId from
maxSales s
where s.crit1 = p.crit1
and p.someDate between s.startDate and s.endDate)

UPDATE p
set p.saleId = e.rowNumber
FROM products p
INNER JOIN
(SELECT saleId, row_number() OVER (ORDER BY saleId DESC) as rowNumber
FROM sales)
e ON e.saleId = p.saleId

TRY THIS:
UPDATE p
SET p.saleid = s.saleid
FROM products p
INNER JOIN
(SELECT s.crit1,
s.saleid
FROM sales s
WHERE cost IN
(SELECT max(cost) cost
FROM sales
GROUP BY crit1)) s ON s.crit1 = p.crit1

None of the answers worked, but I managed to do it by using and Outer Apply as my join, and specified the sort order in that.
Cheers everyone for the input.

Related

Correlated subquery structure in MS Access SQL

I'm close, but I cannot seem to figure out this SQL query. I've got the SELECT and related FROM tables right, but I think my subquery structure is messed up.
Question: Compose an SQL statement to generate a list of two least expensive vendors (suppliers) for each raw material. In the result table, show the following columns: material ID, material description, vendor ID, vendor name, and the supplier's unit price. Sort the result table by material ID and supplier’s unit price in ascending order. Note: If a raw material has only one vendor (supplier), that supplier and its unit price for the raw material should also be in the result (output) table.
Here's what I've got:
SELECT Supplies_t.Material_ID, Raw_Materials_t.Material_Description,
Vendor_t.Vendor_ID, Vendor_t.Vendor_name, Supplies_t.Unit_price
FROM Supplies_t S1, Raw_Materials_t, Vendor_t
WHERE Vendor_t.Vendor_ID = Supplies_t.Vendor_ID
AND Supplies_t.Material_ID = Raw_Materials_t.Material_ID
AND Supplies_t.Unit_price IN
(SELECT TOP 2 Unit_price
FROM Supplies_t S2
WHERE S1.Material_ID = S2.Material_ID
ORDER BY S2.Material_ID ASC, S2.Unit_price ASC)
Using the correct table aliases may solve your problem. You should also use explicit JOIN syntax:
SELECT s.Material_ID, rm.Material_Description, v.Vendor_ID, v.Vendor_name, s.Unit_price
FROM (Supplies_t s INNER JOIN
Raw_Materials_t rm
ON s.Material_ID = rm.Material_ID
) INNER JOIN
Vendor_t v
ON v.Vendor_ID = s.Vendor_ID
WHERE s.Unit_price IN (SELECT TOP 2 s2.Unit_price
FROM Supplies_t s2
WHERE s.Material_ID = s2.Material_ID
ORDER BY s2.Material_ID ASC, s2.Unit_price ASC
);

Small Issue with Aggregate Functions

I'm having a small problem with an aggregate function that I can't quite crack.
I have to get a count of customers for each representative in my database - I can accomplish this - the second part of my is that I have to only display the representative with the highest number of customers.
So far I have;
SELECT Rep.RepNum, Count(Customer.RepNum) AS [CustomerCount]
FROM Rep INNER JOIN Customer ON Rep.RepNum = Customer.Repnum
GROUP BY Rep.RepNum
I know I'm probably going to have to use a nested query to solve this, but I'm not sure how to go about this problem. It has been fighting me for almost and hour, and ANY help would be greatly appreciated.
Try:
SELECT TOP 1 Rep.RepNum,
Count(Customer.RepNum) AS [CustomerCount]
FROM Rep
INNER JOIN Customer ON Rep.RepNum = Customer.Repnum
GROUP BY Rep.RepNum
ORDER BY COUNT(Customer.RepNum) DESC
Maybe this will do:
SELECT Rep.RepNum, Count(Customer.RepNum) AS [CustomerCount]
FROM Rep INNER JOIN Customer ON Rep.RepNum = Customer.Repnum
GROUP BY Rep.RepNum
HAVING Count(Customer.RepNum) = (
Select max([CustomerCount])
FROM (SELECT Rep.RepNum,
Count(Customer.RepNum) AS [CustomerCount]
FROM Rep INNER JOIN Customer ON Rep.RepNum = Customer.Repnum
GROUP BY Rep.RepNum));

Refactoring a tsql view which uses row_number() to return rows with a unique column value

I have a sql view, which I'm using to retrieve data. Lets say its a large list of products, which are linked to the customers who have bought them. The view should return only one row per product, no matter how many customers it is linked to. I'm using the row_number function to achieve this. (This example is simplified, the generic situation would be a query where there should only be one row returned for each unique value of some column X. Which row is returned is not important)
CREATE VIEW productView AS
SELECT * FROM
(SELECT
Row_number() OVER(PARTITION BY products.Id ORDER BY products.Id) AS product_numbering,
customer.Id
//various other columns
FROM products
LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
//various other joins
) as temp
WHERE temp.prodcut_numbering = 1
Now lets say that the total number of rows in this view is ~1 million, and running select * from productView takes 10 seconds. Performing a query such as select * from productView where productID = 10 takes the same amount of time. I believe this is because the query gets evaluated to this
SELECT * FROM
(SELECT
Row_number() OVER(PARTITION BY products.Id ORDER BY products.Id) AS product_numbering,
customer.Id
//various other columns
FROM products
LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
//various other joins
) as temp
WHERE prodcut_numbering = 1 and prodcut.Id = 10
I think this is causing the inner subquery to be evaluated in full each time. Ideally I'd like to use something along the following lines
SELECT
Row_number() OVER(PARTITION BY products.productID ORDER BY products.productID) AS product_numbering,
customer.id
//various other columns
FROM products
LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
//various other joins
WHERE prodcut_numbering = 1
But this doesn't seem to be allowed. Is there any way to do something similar?
EDIT -
After much experimentation, the actual problem I believe I am having is how to force a join to return exactly 1 row. I tried to use outer apply, as suggested below. Some sample code.
CREATE TABLE Products (id int not null PRIMARY KEY)
CREATE TABLE Customers (
id int not null PRIMARY KEY,
productId int not null,
value varchar(20) NOT NULL)
declare #count int = 1
while #count <= 150000
begin
insert into Customers (id, productID, value)
values (#count,#count/2, 'Value ' + cast(#count/2 as varchar))
insert into Products (id)
values (#count)
SET #count = #count + 1
end
CREATE NONCLUSTERED INDEX productId ON Customers (productID ASC)
With the above sample set, the 'get everything' query below
select * from Products
outer apply (select top 1 *
from Customers
where Products.id = Customers.productID) Customers
takes ~1000ms to run. Adding an explicit condition:
select * from Products
outer apply (select top 1 *
from Customers
where Products.id = Customers.productID) Customers
where Customers.value = 'Value 45872'
Takes some identical amount of time. This 1000ms for a fairly simple query is already too much, and scales the wrong way (upwards) when adding additional similar joins.
Try the following approach, using a Common Table Expression (CTE). With the test data you provided, it returns specific ProductIds in less than a second.
create view ProductTest as
with cte as (
select
row_number() over (partition by p.id order by p.id) as RN,
c.*
from
Products p
inner join Customers c
on p.id = c.productid
)
select *
from cte
where RN = 1
go
select * from ProductTest where ProductId = 25
What if you did something like:
SELECT ...
FROM products
OUTER APPLY (SELECT TOP 1 * from customer where customerid = products.buyerid) as customer
...
Then the filter on productId should help. It might be worse without filtering, though.
The problem is that your data model is flawed. You should have three tables:
Customers (customerId, ...)
Products (productId,...)
ProductSales (customerId, productId)
Furthermore, the sale table should probably be split into 1-to-many (Sales and SalesDetails). Unless you fix your data model you're just going to run circles around your tail chasing red-herring problems. If the system is not your design, fix it. If the boss doesn't let your fix it, then fix it. If you cannot fix it, then fix it. There isn't a easy way out for the bad data model you're proposing.
this will probably be fast enough if you really don't care which customer you bring back
select p1.*, c1.*
FROM products p1
Left Join (
select p2.id, max( c2.id) max_customer_id
From product p2
Join customer c2 on
c2.productID = p2.id
group by 1
) product_max_customer
Left join customer c1 on
c1.id = product_max_customer.max_customer_id
;

Joining two tables with a queried table

Oh great SQL gods I require your assistance.
Here is my Schema:
CAR(Serial_no,Model,Manufacturer,Price)
OPTIONS(Serial_no,Option_name,Price)
SALE(Salesperson_id,Serial_no,Date,Sale_price)
SALESPERSON(Salesperson_id,Name,Phone)
First, I need to join the CAR and SALE table by Serial_no.
Second, i need to take the OPTIONS table and SUM all the prices for similar Serial_no which the following does:
SELECT O.Serial_no, SUM(O.Price)
FROM OPTIONS O
GROUP BY (O.Serial_no);
Last I need to merge steps one and two and query the result so I get a resulting set of where CAR.Price < (SALE.Sale_price + OPTIONS.Price).
Can this be done? Any help would be immensely appreciated!
Thanks,
Mark
SELECT C.Serial_no,
MIN(c.Price) CarPrice,
MIN(s.Sale_price) SalePrice,
SUM(o.Price) OptionsPrice,
MIN(s.Sale_price) + IFNULL(SUM(o.Price),0) TotalPrice
FROM Car c JOIN Sale s ON c.Serial_no = s.Serial_no
LEFT JOIN `Options` o ON c.Serial_no = o.Serial_no
GROUP BY c.Serial_no
HAVING MIN(c.Price) < MIN(s.Sale_price) + IFNULL(SUM(o.Price),0)
Note: the MIN() are not taking anything away, it is only there since you are grouping, and the options table may have multiple rows.
Another option would be to do the calculations in a Subquery which may lead to better performance:
SELECT C.Serial_no,
C.Price,
S.Sale_price,
og.SumPrice
FROM Car c JOIN Sale s ON c.Serial_no = s.Serial_no
LEFT JOIN (
SELECT Serial_no, SUM(Price) SumPrice
FROM `Options`
GROUP BY Serial_no
) og ON c.Serial_no = og.Serial_no
WHERE c.Price < s.Sale_price + IFNULL(og.SumPrice,0)

Should I use a subquery?

I have two tables, one that stores the current price, and one that stores the historical price of items. I want to create a query that pulls the current price, and the difference between the current price and the most recent historical price.
In the historical table, I have the start and end times of the price, so I can just select the most recent price, but how do I pull it all together in one query? Or do I have to do a subquery?
select p.current_price,
h.historical_price
h.historical_time
from price p
inner join price_history h
on p.id = h.id
where max(h.historical_time)
This obviously doesn't work, but that is what I'm trying to accomplish.
This gives me the current and historical price. But I want to make sure I have the most RECENT price. How would I do this?
I would do it like this. Note, you may get duplicate records if there are two price entries with the same date for the same id in price_history:
select p.current_price, h.historical_price,
p.current_price - h.historical_price as PriceDeff, h.historical_time
from price p
inner join (
select id, max(historical_time) as MaxHistoricalTime
from price_history
group by id
) hm on p.id = hm.id
inner join price_history h on hm.id = h.id
and hm.MaxHistoricalTime = h.historical_time
I don't believe there's a way of doing this without a subquery that isn't worse. On the other hand, if your table is indexed correctly, subqueries returning results of aggregate functions are generally pretty fast.
select
p.current_price,
h3.historical_price,
h3.historical_time
from
price p,
( select h1.id, max( h1.historical_time ) as MaxHT
from price_history h1
group by 1 ) h2,
price_history h3
where
p.id = h2.id
and p.id = h3.id
and h2.MaxHT = h3.historical_time