Alternative SQL ANSI for TOP WITH TIES - sql

I have two tables:
product
id_product
description
price
id_category
category
id_category
description
I would like to know the categories that have more products. For example, the category food has 10 products and the eletronics too. They are the same.
Now I'm using SQL Server and I'm using TOP WITH TIES.
SELECT TOP 1 WITH TIES p.id_category, COUNT(*) as amount FROM product p
JOIN category c ON p.id_category = c.id_category
GROUP BY p.id_category
ORDER BY amount
Is there another way to solve this with good performance?
I tried also with DENSE_RANK where the position is = 1.
It also works.
SELECT * FROM (
SELECT p.id_category, COUNT(*) as amount, DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) position FROM product p
JOIN category c ON p.id_category = c.id_category
GROUP BY p.id_category
) rnk
WHERE rnk.position = 1
But I want this solution in SQL ANSI.
I tried using MAX(COUNT(*)) but it doesn't work.
Is there a general solution? Is This solution better than using TOP WITH TIES?

Here is a third option for SQL Server:
WITH cte AS (
SELECT p.id_category, COUNT(*) AS cnt
FROM product p
INNER JOIN category c
ON p.id_category = c.id_category
GROUP BY p.id_category
)
SELECT *
FROM cte
WHERE cnt = (SELECT MAX(cnt) FROM cte);
If you also cannot rely on CTEs being available, you can easily enough just inline the CTE into the query. From a performance point of view, DENSE_RANK would probably outperform my answer.
With the CTE removed this becomes:
SELECT *
FROM
(
SELECT p.id_category, COUNT(*) AS cnt
FROM product p
INNER JOIN category c
ON p.id_category = c.id_category
GROUP BY p.id_category
)
WHERE cnt = (SELECT MAX(cnt) FROM (
SELECT p.id_category, COUNT(*) AS cnt
FROM product p
INNER JOIN category c
ON p.id_category = c.id_category
GROUP BY p.id_category
));
This query would even run on MySQL. As you can see, the query is ugly, which is one reason why things like CTE and analytic functions were introduced into the ANSI standard.

Related

SQL strategy to fetch maximum

Suppose I have these three tables:
I want to get, for all products, it's product_id and the client that bougth it most times (the biggest client of the product).
I solved it like this:
SELECT
product_id AS product,
(SELECT TOP 1 client_id FROM Bill_Item, Bill
WHERE Bill_Item.product_id = p.product_id
and Bill_Item.bill_id = Bill.bill_id
GROUP BY
client_id
ORDER BY
COUNT(*) DESC
) AS client
FROM Product p
Do you know a better way?
the inner query will give you the ranking. The outer query will give you the client that puchase the most for a product
SELECT *
(
SELECT i.product_id, b.client_id,
r = row_number() over (partition by i.product_id
order by count(*) desc)
FROM Bill b
INNER JOIN Bill_Item i ON b.bill_id = i.bill_id
GROUP BY i.product_id, b.client_id
) d
WHERE r = 1
I was going to submit pretty much the same thing as #Squirrell only with a Common Table Expression [CTE] rather than a derived table. So I wont duplicate that but there are some learning points concerning your query. First is IMPLICIT JOINS such as FROM Bill_Item, Bill are really easy to have uintended consequences (one of many questions: Queries that implicit SQL joins can't do?) Next for the Calculated column you can actually do this in a OUTER APPLY or CROSS APPLY which is a very useful technique.
So you could re-write your method as follows:
SELECT *
FROM
Product p
OUTER APPLY (SELECT TOP 1 b.client_id
FROM
Bill_Item bi
INNER JOIN Bill b
ON bi.bill_id = b.bill_id
WHERE
bi.product_id = p.product_id
GROUP BY
b.client_id
ORDER BY
COUNT(*) DESC) c
And to show you how squirell's answer can still include products that have never been sold all you need to do is join Products and LEFT JOIN to other tables:
;WITH cte AS (
SELECT
p.product_id
,b.client_id
,ROW_NUMBER() OVER (PARTITION BY p.product_id ORDER BY COUNT(*) DESC) as RowNumber
FROM
Product p
LEFT JOIN Bill_Item bi
ON p.product_id = bi.product_id
LEFT JOIN Bill b
ON bi.bill_id = b.bill_id
GROUP BY
p.product_id
,b.client_id
)
SELECT *
FROM
cte
WHERE
RowNumber = 1
Techniques used in some of these that are useful.
CTE
APPLY (Outer & Cross)
Window Functions
Squirrel's answer doesn't return products that have never been sold. If you want to include those, then your approach is ok, although I would write the query as:
SELECT product_id as product,
(SELECT TOP 1 b.client_id
FROM Bill_Item bi JOIN
Bill b
ON bi.bill_id = b.bill_id
WHERE Bill_Item.product_id = p.product_id
GROUP BY client_id
ORDER BY COUNT(*) DESC
) as client
FROM Product p;
You can also express this using APPLY, but a correlated subquery is also fine.
Note the correct use of the explicit JOIN syntax.

efficient way to JOIN in SQL

I have to JOIN two tables: articles and sales, but not all the data of articles, just need the last load.
Is there a difference between this two ways? there is a most faster/efficient way? and more important, why?
1)
SELECT *
FROM sales S
INNER JOIN articles A
ON S.article_id = A.article_id AND A.load_date = (SELECT MAX(load_date) FROM articles)
2)
SELECT *
FROM sales s
INNER JOIN (
SELECT *
FROM articles
WHERE load_date = (SELECT MAX(load_date) FROM articles)
) A
ON s.article_id = a.article_id
Most DBMSes also support Windowed Aggregate Functions and a RANK might be more efficient (and easier to write if you're used to it):
SELECT *
FROM
(
SELECT *,
RANK() -- max date gets rank #1
OVER (PARTITION BY a.article_id
ORDER BY a.load_date DESC) rn
FROM sales s
INNER JOIN articles a
ON s.article_id = a.article_id
) dt
WHER rn = 1

SQL Query for top 10 items from two relations

I'm struggling to right a SQL command to get the top 10 names from the following (using standard SQL, cant use TOP) for the following 2 relations:
Orders (customer_email, item_id, date)
Items(id, name, store, price)
Any advice on how to do this? I think I would need to group them, but then what do I do to get the top 10 groupings based on count?
select *
from (select x.*, row_number() over(order by num_orders desc) as rn
from (select i.name, count(*) as num_orders
from orders o
join items i
on o.item_id = i.id
group by i.name) x) x
where rn <= 10
SELECT
COUNT(*) count_per_item
, i.id
, i.name
FROM
Orders o
JOIN
Items i
ON (o.item_id = i.id)
GROUP BY
i.id
, i.name
ORDER BY
count_per_item DESC
LIMIT 10;

Highest Count with a group

I'm having an absolute brain fade
SELECT p.ProductCategory, f.ProductSubCategory, COUNT(*) AS Cnt
FROM Sales f
JOIN Products p ON f.ProductSubCategory = p.ProductSubCategory
GROUP BY p.ProductCategory, f.ProductSubCategory
ORDER BY 1,3 DESC
This shows me the count for each ProductSubCategory, I would like to see only the highest ProductSubCategory per ProductCategory.
I wish to see (I don't care about the Count value)
There are a couple of different ways to do this. One involves joining the results back to themselves and using the max aggregate. But since you are using SQL Server, you can use ROW_NUMBER to achieve the same result:
with cte as (
select p.productcategory, p.ProductSubCategory, COUNT(*) cnt,
ROW_NUMBER() over (partition by p.productcategory order by count(*) desc) rn
from products p
join sales s on p.ProductSubCategory = s.ProductSubCategory
group by p.productcategory, p.ProductSubCategory
)
select *
from cte
where rn = 1
You already got the answer, Please see the following code to. It may help you.
SELECT p.ProductCategory,
f.ProductSubCategory,
COUNT(*) AS Cnt
FROM Sales f
JOIN Products p ON f.ProductSubCategory = p.ProductSubCategory
JOIN (
SELECT p.ProductCategory,
f.ProductSubCategory,
ROW_NUMBER() OVER ( PARTITION BY p.ProductCategory,
f.ProductSubCategory
ORDER BY COUNT(*) DESC) [Row]
FROM Sales f
JOIN Products p ON f.ProductSubCategory = p.ProductSubCategory) Lu
ON P.ProductCategory = Lu.ProductCategory
AND f.ProductSubCategory = Lu.ProductSubCategory
WHERE Lu.Row = 1
GROUP By p.ProductCategory,
f.ProductSubCategory

oracle - maximum per group

University Table - UniversityName, UniversityId
Lease Table - LeaseId, BookId, UniversityId, LeaseDate
Book Table - BookId, UniversityId, Category, PageCount.
For each university, I have to find category that had the most number of books leased.
So, something like
UniversityName Category #OfTimesLeased
I have been playing around with it with some success using Dense_Rank etc - but if there is a tie, only one of them shows up, while I want both of them to show up.
Current Query:
select b.UniversityId, MAX(tempTable.type) KEEP (DENSE_RANK FIRST ORDER BY tempTable.counter DESC)
from book b
join
(select count(l.leaseid) AS counter, b.category, b.universityid
from lease l
join book b
on b.bookid =l.bookid AND b.universityid=r.universityid
group by b.category, b.universityid) tempTable
on counterTable.universityid= b.universityid
group by b.universityid
^Unable to solve the tie issue and get the number of leases for the most leased book type.
Try this
WITH CTE AS
(
SELECT UniversityName, Category, Count(*) NumOfTimesLeased
FROM University u
INNER JOIN Book b on u.UniversityId = b.UniversityId
INNER JOIN Lease l on b.bookid = l.bookid and b.UniversityId = l.UniversityId
GROUP BY UniversityName, Category
),
CTE2 AS (
SELECT UniversityName, Category, NumOfTimesLeased,
RANK() OVER (PARTITION BY UniversityName
ORDER BY NumOfTimesLeased DESC) Rnk
FROM CTE)
SELECT * FROM CTE2 WHERE Rnk = 1
You are on the right track with the analytic functions:
select Univerity, Category, NumLeased
from (select t.*,
row_number() over (partition by university order by Numleased desc) as seqnum
from (select l.university, b.category, count(*) as NumLeased
from lease l join
book b
on l.bookid = b.bookid
group by l.university, b.category
) t
) t
where seqnum = 1
I use the row_number() because you only want the one top value. Rank and dense_rank are more useful when you are looking for values other than "1".
If you want the top values to show up when there is a tie, then use dense_rank instead of row_number. The values will be on different rows.