SQLServer get top 1 row from subquery

SQLServer get top 1 row from subquery - sql

In a huge products query, I'm trying to get the prices for the last buy for every element. In my first approach, I added a sub-query, on the buys table ordering by date descendant and only getting the first row, so I ensure I got the latest row. But it didn't show any values.
Now that I see it again, it's logical, as the sub-query still doesn't have a restriction for the product then lists all the buys and gets the latest one, that doesn't have to be for the product currently being processed by the main query. so returns nothing.
That's the very simplified code:
SELECT P.ID, P.Description, P... blah, blah
FROM Products
LEFT JOIN (
SELECT TOP 1 B.Product,B.Date,B.Price --Can't take out TOP 1 if ORDER BY
FROM Buys B
--WHERE... Can't select by P.Product because doesn't exist in that context
ORDER BY B.Date DESC, B.ID DESC
) BUY ON BUY.Product=P.ID
WHERE (Some product family and kind restrictions, etc, so it processes a big amount of products)
I thought about an embedded query in the main select stmt, but as I need several values it would imply doing a query for each, and that's ugly and bad.
Is there a way to do this and avoid the infamous LOOP? Anyone knows the Good?

You are going down the path of using outer apply, so let's continue:
SELECT P.ID, P.Description, P... blah, blah
FROM Products p OUTER APPLY
(SELECT TOP 1 B.Product,B.Date,B.Price --Can't take out TOP 1 if ORDER BY
FROM Buys b
--WHERE... Can't select by P.Product because doesn't exist in that context
WHERE b.Product = P.ID
ORDER BY B.Date DESC, B.ID DESC
) buy
WHERE (Some product family and kind restrictions, etc, so it processes a big amount of products)
In this context, you can thing of apply as being a correlated subquery that can return multiple columns. In other databases, this is called a "lateral join".

Seems like a good candidate for OUTER APPLY. You need something along these lines..
SELECT P.ID, P.Description, P... blah, blah
FROM Products P
OUTER APPLY (
SELECT TOP 1 B.Product,B.Date,B.Price
FROM Buys B
WHERE B.ProductID = P.ID
ORDER BY B.Date DESC, B.ID DESC
) a

Related

Why does the optimizer decide to self-join a table?

I'm analyzing my query that looks like this:
WITH Project_UnfinishedCount AS (
SELECT P.Title AS title, COUNT(T.ID) AS cnt
FROM PROJECT P LEFT JOIN TASK T on P.ID = T.ID_PROJECT AND T.ACTUAL_FINISH_DATE IS NULL
GROUP BY P.ID, P.TITLE
)
SELECT Title
FROM Project_UnfinishedCount
WHERE cnt = (
SELECT MAX(cnt)
FROM Project_UnfinishedCount
);
It returns a title of a project that has the biggest number of unfinished tasks in it.
Here is its execution plan:
I wonder why it has steps 6-8 that look like self-join of project table? And than it stores the result of the join as a view, but the view, according to rows and bytes columns is the same as project table. Why does he do it?
I'd also like to know what 2 and 1 steps stand for. I guess, 2 stores the result of my CTE to use it in steps 10-14 and 1 removes the rows from the view that don't have the 'cnt' value that was returned by the subquery, is this a correct guess?

In addition to the comments above, when you reference a CTE more than once, there is a heuristic that tells the optimizer to materialize the CTE, which is why you see the temp table transformation.
A few other comments/questions regarding this query. I'm assuming that the relationship is that a PROJECT can have 0 or more tasks, and each TASK is for one and only one PROJECT. In that case, I wonder why you have an outer join? Moreover, you are joining on the ACTUAL_FINISH_DATE column. This would mean that if you have a project, where all the task were complete, then the outer join would materialize the non-matching row, which would make your query results appear to indicate that there was 1 unfinished task. So I think your CTE should look more like:
SELECT P.Title AS title, COUNT(T.ID) AS cnt
FROM PROJECT P
JOIN TASK T on P.ID = T.ID_PROJECT
WHERE T.ACTUAL_FINISH_DATE IS NULL
GROUP BY P.ID, P.TITLE
With all that being said, these "find the match (count, max etc) within a group" type of queries are often more efficient when written as a window function. That way you can eliminate the self join. This can make a big performance difference when you have millions or billions of rows. So for example, your query could be re-written as:
SELECT TITLE, CNT
FROM (
SELECT P.Title AS title, COUNT(T.ID) AS cnt
, RANK() OVER( ORDER BY COUNT(*) DESC ) RNK
FROM PROJECT P
JOIN TASK T on P.ID = T.ID_PROJECT
WHERE T.ACTUAL_FINISH_DATE IS NULL
GROUP BY P.ID, P.TITLE
)
WHERE RNK=1

SQL Server query for related products

I am trying to get related products but the issue which I'm facing is that there is product photos table which has one-to-many relationship with products table, so when I get products by matching category Id it also returns multiple product photos with that product which i do not want. I want only one product photo from product photos table of specific product. Is there any way to use distinct in joins or any other way? what I have done so far....
SELECT [Product].[ID],
,[Thumbnail]
,[ProductName]
,[Model]
,[SKU]
,[Price]
,[IsExclusive]
,[DiscountPercentage]
,[DiscountFixed]
,[NetPrice]
,[Url]
FROM [dbo].[Product]
INNER JOIN [ProductPhotos] ON [ProductPhotos].[ProductID]=[Product].[ID]
INNER JOIN [ProductCategories] ON [ProductCategories].[ProductID]=
[Product].[ID]
WHERE [ProductCategories].[CategoryID]=4
And the result I am getting is...
Product Photos table has
Is there any way to use distinct or group by on product Id column in product photos table to return only one row from photos table.

Instead of using inner join, use cross apply:
SELECT . . .
FROM dbo.Product p CROSS APPLY
(SELECT TOP (1) pp.*
FROM ProductPhotos pp
WHERE pp.ProductID = p.id
ORDER BY NEW_ID()
) pp INNER JOIN
ProductCategories pc
ON pc.ProductID = p.id
WHERE pc.CategoryID = 4;
Notes:
The ORDER BY NEWID() chooses a random photo. You can order by specific columns to get the earliest, latest, biggest, or whatever.
Note that I added table aliases. These make the query easier to write and to read.
You should qualify all column names in your query, so it is clear which tables they come from.
I removed the superfluous square braces. They just make the query harder to write and to read.

You can use ROW_NUMBER() to return one row for ProductID, like this:
JOIN (SELECT *,
ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY PhotoID) rn
FROM [ProductPhotos]) [ProductPhotos]
ON [ProductPhotos].[ProductID]=[Product].[ID] AND [ProductPhotos].rn = 1
Instead of this:
JOIN [ProductPhotos] ON [ProductPhotos].[ProductID]=[Product].[ID]

you can use sub query in join with distinct instead of joining table directly.
you can create alias and use that column as distinct in select statement, but it will create performance issues when having loads of data inside.
if you have 3 different photos for same product Id (like 2). you can use sub-query with top 1 order by PK desc to get latest picture.

Can we use order by in subquery? If not why sometime could use top(n) order by?

I'm an entry level trying to learn more about SQL,
I have a question "can we use order by in subquery?" I did look for some article says no we could not use.
But on the other hand, I saw examples using top(n) with order by in subquery:
select c.CustomerId,
c.OrderId
from CustomerOrder c
inner join (
select top 2
with TIES CustomerId,
COUNT(distinct OrderId) as Count
from CustomerOrder
group by CustomerId
order by Count desc
) b on c.CustomerId = b.CustomerId
So now I'm bit confused.
Could anyone advise?
Thank you very much.

Yes, you are right we cannot use order by in a inner query. Because it is acting as a table. A table in itself needs to be sorted when queried for different purposes.
In your query itself the inner query is select some records using Top 2. Eventhough these are top 2 records only, they form a table with 2 records which is enough for it to recognized as a table and join it with another table
The right query will be:-
SELECT * FROM
(
SELECT c.CustomerId, c.OrderId, DENSE_RANK() OVER(ORDER BY b.count DESC) AS RANK
FROM CustomerOrder c
INNER JOIN
(SELECT CustomerId, COUNT(distinct OrderId) as Count
FROM CustomerOrder GROUP BY CustomerId) b
ON c.CustomerId = b.CustomerId
) a
WHERE RANK IN (1,2);
Hope I have answered your question.

Yes we can use order by clause in sub query, for example i have a table named as product (check the screen shot of table http://prntscr.com/f15j3z). Chek this query on your side and revert me in case of any doubt.
select p1.* from product as p1 where product_id = (select p2.product_id from product as p2 order by product_id limit 0,1)

yes we can use order by in subquery,but it is pointless to use it.
It is better to use it in the outer query.There is no use of ordering the result of subquery, because result of inner query will become the input for outer query and it does not have to do any thing with the order of the result of subquery.

SQL Beginner: Getting items from 2 tables (+grouping+ordering)

I have an e-commerce website (using VirtueMart) and I sell products that consist child products. When a product is a parent, it doesn't have ParentID, while it's children refer to it. I know, not the best logic but I didn't create it.
My SQL is very basic and I believe I ask for something quite easy to achieve
Select products that have children.
Sort results by prices (ASC/DSC).

SELECT * FROM Products INNER JOIN Prices ON Products.ProductID = Prices.ProductID ORDER BY Products.Price [ASC/DSC]
Explanation:
SELECT - Select (Get/Retrieve)
* - ALL
FROM Products - Get them from a DB Table named "Products".
INNER JOIN Prices - Selects all rows from both tables as long as there is a match between the columns in both tables. Rather, JOIN DB Table "Products" with DB Table "Prices".
ON - Like WHERE, this defines which rows will be checked for matches.
Products.ProductID = Prices.ProductID - Your match criteria. Get the rows where "ProductID" exists in both DB Tables "Products" and "Prices".
ORDER BY Products.Price [ASC/DSC] - Sorting. Use ASC for Ascending, DSC for Descending.

This table design is subpar for a number of reasons. First, it appears that the value 0 is being used to indicate lack of a parent (as there's no 0 ID for products). Typically this will be a NULL value instead.
If it were a NULL value, the SQL statement to get everything without a parent would be as simple as this:
SELECT * FROM Products WHERE ParentID IS NULL
However, we can't do that. If we make the assumption that 0 = no parent, we can do this:
SELECT * FROM Products WHERE ParentID = 0
However, that's a dangerous assumption to make. Thus, the correct way to do this (given your schema above), would be to compare the two tables and ensure that the parentID exists as a ProductID:
SELECT a.*
FROM Products AS a
WHERE EXISTS (SELECT * FROM Products AS b WHERE a.ID = b.ParentID)
Next, to get the pricing, we have to join those two tables together on a common ID. As the Prices table seems to reference a ProductID, we can use that like so:
SELECT p.ProductID, p.ProductName, pr.Price
FROM Products AS p INNER JOIN Prices AS pr ON p.ProductID = pr.ProductID
WHERE EXISTS (SELECT * FROM Products AS b WHERE p.ID = b.ParentID)
ORDER BY pr.Price
That might be sufficient per the data you've shown, but usually that type of table structure indicates that it's possible to have more than one price associated with a product (we're unable to tell whether this is true based on the quick snapshot).
That should get you close... if you need something more, we'll need more detail.

use the below script if you are using ssms.
SELECT pd.ProductId,ProductName,Price
FROM product pd
LEFT JOIN price pr ON pd.ProductId=pr.ProductID
WHERE EXISTS (SELECT 1 FROM product pd1 WHERE pd.productID=pd1.ParentID)
ORDER BY pr.Price ASC
Note :neither of your parent product have price in price table. If you want the sum of price of their child product use the below script.
SELECT pd.ProductId,pd.ProductName,SUM(ISNULL(pr.Price,0)) SUM_ChildPrice
FROM product pd
LEFT JOIN product pd1 ON pd.productID=pd1.ParentID
LEFT JOIN price pr ON pd1.ProductId=pr.ProductID
GROUP BY pd.ProductId,pd.ProductName
ORDER BY pr.Price ASC

You will have to use self-join:
For example:
SELECT * FROM products parent
JOIN products children ON parent.id = children.parent_id
JOIN prices ON prices.product_id = children.id
ORDER BY prices.price
Because we are using JOIN it will filter out all entries that don't have any children.
I haven't tested it, I hope it would work.

Distinct on multi-columns in sql

I have this query in sql
select cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
I want to get rows distinct by pageid ,so in the end I will not have rows with same pageid more then once(duplicate)
any Ideas
Thanks
Baaroz

Going by what you're expecting in the output and your comment that says "...if there rows in output that contain same pageid only one will be shown...," it sounds like you're trying to get the top record for each page ID. This can be achieved with ROW_NUMBER() and PARTITION BY:
SELECT *
FROM (
SELECT
ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID) rowNumber,
c.id,
c.pageId,
c.quantity,
c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
) a
WHERE a.rowNumber = 1
You can also use ROW_NUMBER() OVER(PARTITION BY ... along with TOP 1 WITH TIES, but it runs a little slower (despite being WAY cleaner):
SELECT TOP 1 WITH TIES c.id, c.pageId, c.quantity, c.price
FROM orders o
INNER JOIN cartlines c ON c.orderId = o.id
WHERE userId = 5
ORDER BY ROW_NUMBER() OVER(PARTITION BY c.pageId ORDER BY c.pageID)

If you wish to remove rows with all columns duplicated this is solved by simply adding a distinct in your query.
select distinct cartlines.id,cartlines.pageId,cartlines.quantity,cartlines.price
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If however, this makes no difference, it means the other columns have different values, so the combinations of column values creates distinct (unique) rows.
As Michael Berkowski stated in comments:
DISTINCT - does operate over all columns in the SELECT list, so we
need to understand your special case better.
In the case that simply adding distinct does not cover you, you need to also remove the columns that are different from row to row, or use aggregate functions to get aggregate values per cartlines.
Example - total quantity per distinct pageId:
select distinct cartlines.id,cartlines.pageId, sum(cartlines.quantity)
from orders
INNER JOIN
cartlines on(cartlines.orderId=orders.id)where userId=5
If this is still not what you wish, you need to give us data and specify better what it is you want.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQLServer get top 1 row from subquery - sql

Seems like a good candidate for OUTER APPLY. You need something along these lines.. SELECT P.ID, P.Description, P... blah, blah FROM Products P OUTER APPLY ( SELECT TOP 1 B.Product,B.Date,B.Price FROM Buys B WHERE B.ProductID = P.ID ORDER BY B.Date DESC, B.ID DESC ) a

Related

Why does the optimizer decide to self-join a table?

SQL Server query for related products

Can we use order by in subquery? If not why sometime could use top(n) order by?

SQL Beginner: Getting items from 2 tables (+grouping+ordering)

Distinct on multi-columns in sql

Categories

Resources