oracle - maximum per group - sql

University Table - UniversityName, UniversityId
Lease Table - LeaseId, BookId, UniversityId, LeaseDate
Book Table - BookId, UniversityId, Category, PageCount.
For each university, I have to find category that had the most number of books leased.
So, something like
UniversityName Category #OfTimesLeased
I have been playing around with it with some success using Dense_Rank etc - but if there is a tie, only one of them shows up, while I want both of them to show up.
Current Query:
select b.UniversityId, MAX(tempTable.type) KEEP (DENSE_RANK FIRST ORDER BY tempTable.counter DESC)
from book b
join
(select count(l.leaseid) AS counter, b.category, b.universityid
from lease l
join book b
on b.bookid =l.bookid AND b.universityid=r.universityid
group by b.category, b.universityid) tempTable
on counterTable.universityid= b.universityid
group by b.universityid
^Unable to solve the tie issue and get the number of leases for the most leased book type.

Try this
WITH CTE AS
(
SELECT UniversityName, Category, Count(*) NumOfTimesLeased
FROM University u
INNER JOIN Book b on u.UniversityId = b.UniversityId
INNER JOIN Lease l on b.bookid = l.bookid and b.UniversityId = l.UniversityId
GROUP BY UniversityName, Category
),
CTE2 AS (
SELECT UniversityName, Category, NumOfTimesLeased,
RANK() OVER (PARTITION BY UniversityName
ORDER BY NumOfTimesLeased DESC) Rnk
FROM CTE)
SELECT * FROM CTE2 WHERE Rnk = 1

You are on the right track with the analytic functions:
select Univerity, Category, NumLeased
from (select t.*,
row_number() over (partition by university order by Numleased desc) as seqnum
from (select l.university, b.category, count(*) as NumLeased
from lease l join
book b
on l.bookid = b.bookid
group by l.university, b.category
) t
) t
where seqnum = 1
I use the row_number() because you only want the one top value. Rank and dense_rank are more useful when you are looking for values other than "1".
If you want the top values to show up when there is a tie, then use dense_rank instead of row_number. The values will be on different rows.

Related

How optimize select with max subquery on the same table?

We have many old selects like this:
SELECT
tm."ID",tm."R_PERSONES",tm."R_DATASOURCE", ,tm."MATCHCODE",
d.NAME AS DATASOURCE,
p.PDID
FROM TABLE_MAPPINGS tm,
PERSONES p,
DATASOURCES d,
(select ID
from TABLE_MAPPINGS
where (R_PERSONES, MATCHCODE)
in (select
R_PERSONES, MATCHCODE
from TABLE_MAPPINGS
where
id in (select max(id)
from TABLE_MAPPINGS
group by MATCHCODE)
)
) tm2
WHERE tm.R_PERSONES = p.ID
AND tm.R_DATASOURCE=d.ID
and tm2.id = tm.id;
These are large tables, and queries take a long time.
How to rebuild them?
Thank you
You can query the table only once using something like (untested as you have not provided a minimal example of your create table statements or sample data):
SELECT *
FROM (
SELECT m.*,
COUNT(CASE WHEN rnk = 1 THEN 1 END)
OVER (PARTITION BY r_persones, matchcode) AS has_max_id
FROM (
SELECT tm.ID,
tm.R_PERSONES,
tm.R_DATASOURCE,
tm.MATCHCODE,
d.NAME AS DATASOURCE,
p.PDID,
RANK() OVER (PARTITION BY tm.matchcode ORDER BY tm.id DESC) As rnk
FROM TABLE_MAPPINGS tm
INNER JOIN PERSONES p ON tm.R_PERSONES = p.ID
INNER JOIN DATASOURCES d ON tm.R_DATASOURCE = d.ID
) m
)
WHERE has_max_id > 0;
First finding the maximum ID using the RANK analytic function and then finding all the relevant r_persones, matchcode pairs using conditional aggregation in a COUNT analytic function.
Note: you want to use the RANK or DENSE_RANK analytic functions to match the maximums as it can match multiple rows per partition; whereas ROW_NUMBER will only ever put a single row per partition first.
You're querying table_mappings 3 times; how about doing it only once?
WITH
tab_map
AS
(SELECT a.id,
a.r_persones,
a.matchcode,
a.datasource,
ROW_NUMBER ()
OVER (PARTITION BY a.matchcode ORDER BY a.id DESC) rn
FROM table_mappings a)
SELECT tm.id,
tm.r_persones,
tm.matchcode,
d.name AS datasource,
p.pdid
FROM tab_map tm
JOIN persones p ON p.id = tm.r_persones
JOIN datasources d ON d.id = tm.r_datasource
WHERE tm.rn = 1

query to find the athlete with the most medals per year in sql

select A."ID","ATHLETE_NAME", "YEAR"
from OLYM."OLYM_ATHLETES" A
JOIN OLYM."OLYM_MEDALS" B ON A."ID" = B."ATHLETE_GAME_ID"
JOIN OLYM."OLYM_GAMES" C ON B."EVENT_ID" = C.ID;
this gave me a table with athlete id, name and year in which he won a medal. is there any way to extract the highest decorated athlete per year form this table or am i missing something?
Table image
If I followed you correctly, you need the athlete with max award in a year. If this is what you required then you can use the analytical function row_number as follows:
SELECT ID, ATHLETE_NAME, YEAR, CNT FROM
(select A."ID","ATHLETE_NAME", "YEAR", COUNT(1) AS cnt,
Row_number() over (partition by "YEAR" order by count(1) desc) as rn
from OLYM."OLYM_ATHLETES" A
JOIN OLYM."OLYM_MEDALS" B ON A."ID" = B."ATHLETE_GAME_ID"
JOIN OLYM."OLYM_GAMES" C ON B."EVENT_ID" = C.ID
group by A."ID","ATHLETE_NAME", "YEAR")
WHERE RN = 1
ORDER BY YEAR
Please use below query, this query will provide you the count of an athlete. You can use HAVING clause further to filter out according to your requirement
select A."ID","ATHLETE_NAME", "YEAR", count(1)
from OLYM."OLYM_ATHLETES" A
JOIN OLYM."OLYM_MEDALS" B ON A."ID" = B."ATHLETE_GAME_ID"
JOIN OLYM."OLYM_GAMES" C ON B."EVENT_ID" = C.ID
group by A."ID","ATHLETE_NAME", "YEAR" order by count(1) desc;

How to find the three greatest values in each category in PostgreSQL?

I am a SQL beginner. I have trouble on how to find the top 3 max values in each category. The question was
"For order_ids in January 2006, what were the top (by revenue) 3 product_ids for each category_id? "
Table A:
(Column name)
customer_id
order_id
order_date
revenue
product_id
Table B:
product_id
category_id
I tried to combine table B and A using an Inner Join and filtered by the order_date. But then I am stuck on how to find the top 3 max values in each category_id.
Thanks.
This is so far what I can think of
SELECT B.product_id, category_id FROM A
JOIN B ON B.product_id = A.product_id
WHERE order_date BETWEEN ‘2006-01-01’ AND ‘2006-01-31’
ORDER BY revenue DESC
LIMIT 3;
This kind of query is typically solved using window functions
select *
from (
SELECT b.product_id,
b.category_id,
a.revenue,
dense_rank() over (partition by b.category_id, b.product_id order by a.revenue desc) as rnk
from A
join b ON B.product_id = A.product_id
where a.order_date between date '2006-01-01' AND date '2006-01-31'
) as t
where rnk <= 3
order by product_id, category_id, revenue desc;
dense_rank() will also deal with ties (products with the same revenue in the same category) so you might actually get more than 3 rows per product/category.
If the same product can show up more than once in table b (for the same category) you need to combine this with a GROUP BY to get the sum of all revenues:
select *
from (
SELECT b.product_id,
b.category_id,
sum(a.revenue) as total_revenue,
dense_rank() over (partition by b.category_id, a.product_id order by sum(a.revenue) desc) as rnk
from a
join b on B.product_id = A.product_id
where a.order_date between date '2006-01-01' AND date '2006-01-31'
group by b.product_id, b.category_id
) as t
where rnk <= 3
order by product_id, category_id, total_revenue desc;
When combining window functions and GROUP BY, the window function will be applied after the GROUP BY.
You can use window functions to gather the grouped revenue and then pull the last X in the outer query. I have not worked in PostgreSQL in a bit so I may be missing a shortcut function below.
WITH ByRevenue AS
(
--This creates a virtualized table that can be queried similar to a physical table in the conjoined statements below
SELECT
category_id,
product_id,
MAX(revenue) as max_revenue
FROM
A
JOIN B ON B.product_id = A.product_id
WHERE
order_date BETWEEN ‘2018-01-01’ AND ‘2018-01-31’
GROUP BY
category_id,product_id
)
,Normalized
(
--Pull data from the in memory table above using normal sql syntax and normalize it with a RANK function to achieve the limit.
SELECT
category_id,
product_id,
max_revenue,
ROW_NUMBER() OVER (PARTITION BY category_id,product_id ORDER BY max_revenue DESC) as rn
FROM
ByRevenue
)
--Final query from stuff above with each category/product ranked by revenue
SELECT *
FROM Normalized
WHERE RN<=3;
For top-n queries, the first thing to try is usually the lateral join:
WITH categories as (
SELECT DISTINCT category_id
FROM B
)
SELECT categories.category_id, sub.product_id
FROM categories
JOIN LATERAL (
SELECT a.product_id
FROM B
JOIN A ON (a.product_id = b.product_id)
WHERE b.category_id = categories.category_id
AND order_date BETWEEN '2006-01-01' AND '2006-01-31'
GROUP BY a.product_id
ORDER BY sum(revenue) desc
LIMIT 3
) sub on true;
Try using Fetch n rows only?
Note: Let's think that your primary key here is product_id, so I used them for combining the two table.
SELECT A.category,A.revenue From Table A
INNER JOIN Table B on A.product_id = B.Product_ID
WHERE A.Order_Date between (from date) and (to date)
ORDER BY A.Revenue DESC
Fetch first 3 rows only

Alternative SQL ANSI for TOP WITH TIES

I have two tables:
product
id_product
description
price
id_category
category
id_category
description
I would like to know the categories that have more products. For example, the category food has 10 products and the eletronics too. They are the same.
Now I'm using SQL Server and I'm using TOP WITH TIES.
SELECT TOP 1 WITH TIES p.id_category, COUNT(*) as amount FROM product p
JOIN category c ON p.id_category = c.id_category
GROUP BY p.id_category
ORDER BY amount
Is there another way to solve this with good performance?
I tried also with DENSE_RANK where the position is = 1.
It also works.
SELECT * FROM (
SELECT p.id_category, COUNT(*) as amount, DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) position FROM product p
JOIN category c ON p.id_category = c.id_category
GROUP BY p.id_category
) rnk
WHERE rnk.position = 1
But I want this solution in SQL ANSI.
I tried using MAX(COUNT(*)) but it doesn't work.
Is there a general solution? Is This solution better than using TOP WITH TIES?
Here is a third option for SQL Server:
WITH cte AS (
SELECT p.id_category, COUNT(*) AS cnt
FROM product p
INNER JOIN category c
ON p.id_category = c.id_category
GROUP BY p.id_category
)
SELECT *
FROM cte
WHERE cnt = (SELECT MAX(cnt) FROM cte);
If you also cannot rely on CTEs being available, you can easily enough just inline the CTE into the query. From a performance point of view, DENSE_RANK would probably outperform my answer.
With the CTE removed this becomes:
SELECT *
FROM
(
SELECT p.id_category, COUNT(*) AS cnt
FROM product p
INNER JOIN category c
ON p.id_category = c.id_category
GROUP BY p.id_category
)
WHERE cnt = (SELECT MAX(cnt) FROM (
SELECT p.id_category, COUNT(*) AS cnt
FROM product p
INNER JOIN category c
ON p.id_category = c.id_category
GROUP BY p.id_category
));
This query would even run on MySQL. As you can see, the query is ugly, which is one reason why things like CTE and analytic functions were introduced into the ANSI standard.

Highest Count with a group

I'm having an absolute brain fade
SELECT p.ProductCategory, f.ProductSubCategory, COUNT(*) AS Cnt
FROM Sales f
JOIN Products p ON f.ProductSubCategory = p.ProductSubCategory
GROUP BY p.ProductCategory, f.ProductSubCategory
ORDER BY 1,3 DESC
This shows me the count for each ProductSubCategory, I would like to see only the highest ProductSubCategory per ProductCategory.
I wish to see (I don't care about the Count value)
There are a couple of different ways to do this. One involves joining the results back to themselves and using the max aggregate. But since you are using SQL Server, you can use ROW_NUMBER to achieve the same result:
with cte as (
select p.productcategory, p.ProductSubCategory, COUNT(*) cnt,
ROW_NUMBER() over (partition by p.productcategory order by count(*) desc) rn
from products p
join sales s on p.ProductSubCategory = s.ProductSubCategory
group by p.productcategory, p.ProductSubCategory
)
select *
from cte
where rn = 1
You already got the answer, Please see the following code to. It may help you.
SELECT p.ProductCategory,
f.ProductSubCategory,
COUNT(*) AS Cnt
FROM Sales f
JOIN Products p ON f.ProductSubCategory = p.ProductSubCategory
JOIN (
SELECT p.ProductCategory,
f.ProductSubCategory,
ROW_NUMBER() OVER ( PARTITION BY p.ProductCategory,
f.ProductSubCategory
ORDER BY COUNT(*) DESC) [Row]
FROM Sales f
JOIN Products p ON f.ProductSubCategory = p.ProductSubCategory) Lu
ON P.ProductCategory = Lu.ProductCategory
AND f.ProductSubCategory = Lu.ProductSubCategory
WHERE Lu.Row = 1
GROUP By p.ProductCategory,
f.ProductSubCategory