Average of top 2 - sql

I would like to get the average of the top2 limit1 per policyid. I need my resulting table to also have objectid.
Limit1 and objectid come from the table p_coverage.
Policyid comes from the table p_risk.
The table p_item is a linking table between p_risk and p_coverage.
The way I thought I should build my query is: create a ranking of limit1 within each policyid. Then take the avg top2.
However the ranking doesn't work and give wrong result. My query works if I take columns from ONE table, but as soon as I add joins between them it gives false ranking.
SELECT policyid, limit1, /*pcob,*/ RANK() OVER(PARTITION BY policyid ORDER BY limit1 DESC) AS rn
FROM (SELECT policyid, limit1/*, pc.objectid ASpcob*/
FROM p_risk pr
LEFT JOIN p_item
ON pr.objectid=p_item.riskobjectid
LEFT JOIN p_coverage pc
ON p_item.objectid=pc.insuranceitemid) AS s
) AS SubQueryAlias
GROUP BY
policyid, limit1/*, pcob*/, rn
ORDER BY rn,policyid,limit1 DESC
The table at the end of the picture is what I'd like to have. The first table is the result of the query of Golden Linoff

If I understand correctly, you want the ROW_NUMBER() in the subquery and then to aggregate and filter in the outer query:
SELECT policyid, AVG(limit1) as avg_top2_limit1
FROM (SELECT policyid, limit1,
DENSE_RANK() OVER (PARTITION BY policyid ORDER BY limit1 DESC) as seqnum
FROM p_risk pr LEFT JOIN
p_item i
ON pr.objectid = i.riskobjectid LEFT JOIN
p_coverage pc
ON i.objectid = pc.insuranceitemid) AS s
) p
WHERE seqnum <= 2
GROUP BY policyid

thanks to previous comment! I succeed to do what I wanted. There is the query
select b.policyid, avg(b.limit1) as avg_top2_limit1 from(
SELECT distinct(policyid) policyid, limit1
FROM (SELECT policyid, limit1,
Dense_rank() OVER (PARTITION BY policyid ORDER BY limit1 DESC) as
seqnum
FROM p_risk pr LEFT JOIN
p_item i
ON pr.objectid = i.riskobjectid LEFT JOIN
p_coverage pc
ON i.objectid = pc.insuranceitemid) AS s
WHERE seqnum <= 2 ) as b
GROUP BY policyid`

Related

Select second most recent date from inner join

I have this query :
SELECT
companies.display_name, companies.pay_schedule_id,
pay_schedule_periods.schedule_id,
pay_schedule_periods.created_at
FROM
companies
INNER JOIN
pay_schedule_periods ON pay_schedule_id = pay_schedule_periods.schedule_id
ORDER BY
companies.display_name, pay_schedule_periods.created_at DESC;
I get this result :
How can I select only the second most recent created_at date from each unique display_name ?
You could use row_number to assign a sequence to your dates and apply this before joining, then include as part of your join criteria, such as:
select c.display_name, c.pay_schedule_id, psp.schedule_id, psp.created_at
from companies c
join (
select pay_schedule_id, created_at,
Row_Number() over(partition by pay_schedule_id order by created_at desc) rn
from pay_schedule_periods
)psp on psp.schedule_id = c.pay_schedule_id and rn = 2
order by c.display_name, psp.created_at desc;
You could also apply this using a lateral join which would simplify further.

Not getting the result that I need by using ROW_NUMBER()

I'm using advantureworks2017 and what I'm trying to get is the top 2 selling products by year,, what I have so far is this but it's showing me only the top 2 rows which is not what I need, I need the top 2 products in each year.
SELECT TOP (2) ROW_NUMBER() OVER (ORDER BY sum(linetotal) DESC) ,
ProductID,
year(h.OrderDate) 'OrderYear'
from Sales.SalesOrderDetail so
left outer join Sales.SalesOrderHeader h
on so.SalesOrderID = h.SalesOrderID
group by year(h.OrderDate), ProductID
Try to add row_number in the subquery and then use that rank <= 2 in the outer query to select top 2
select
ProductID,
OrderYear
from
(
SELECT
ProductID,
year(h.OrderDate) 'OrderYear',
ROW_NUMBER() OVER (ORDER BY sum(linetotal) DESC) as rnk
from Sales.SalesOrderDetail so
left outer join Sales.SalesOrderHeader h
on so.SalesOrderID = h.SalesOrderID
group by year(h.OrderDate), ProductID
) val
where rnk <= 2
When you ORDER your ROW_NUMBER by sum(linetotal) it's goning to fail if you have multiple sum(linetotal) which are equal.
I prefer to do it that way:
Declare table(number of columns = number of your query results columns + 1)
fill first column in declared table with identity(1,1) and next insert query results into the rest columns.

How to find the three greatest values in each category in PostgreSQL?

I am a SQL beginner. I have trouble on how to find the top 3 max values in each category. The question was
"For order_ids in January 2006, what were the top (by revenue) 3 product_ids for each category_id? "
Table A:
(Column name)
customer_id
order_id
order_date
revenue
product_id
Table B:
product_id
category_id
I tried to combine table B and A using an Inner Join and filtered by the order_date. But then I am stuck on how to find the top 3 max values in each category_id.
Thanks.
This is so far what I can think of
SELECT B.product_id, category_id FROM A
JOIN B ON B.product_id = A.product_id
WHERE order_date BETWEEN ‘2006-01-01’ AND ‘2006-01-31’
ORDER BY revenue DESC
LIMIT 3;
This kind of query is typically solved using window functions
select *
from (
SELECT b.product_id,
b.category_id,
a.revenue,
dense_rank() over (partition by b.category_id, b.product_id order by a.revenue desc) as rnk
from A
join b ON B.product_id = A.product_id
where a.order_date between date '2006-01-01' AND date '2006-01-31'
) as t
where rnk <= 3
order by product_id, category_id, revenue desc;
dense_rank() will also deal with ties (products with the same revenue in the same category) so you might actually get more than 3 rows per product/category.
If the same product can show up more than once in table b (for the same category) you need to combine this with a GROUP BY to get the sum of all revenues:
select *
from (
SELECT b.product_id,
b.category_id,
sum(a.revenue) as total_revenue,
dense_rank() over (partition by b.category_id, a.product_id order by sum(a.revenue) desc) as rnk
from a
join b on B.product_id = A.product_id
where a.order_date between date '2006-01-01' AND date '2006-01-31'
group by b.product_id, b.category_id
) as t
where rnk <= 3
order by product_id, category_id, total_revenue desc;
When combining window functions and GROUP BY, the window function will be applied after the GROUP BY.
You can use window functions to gather the grouped revenue and then pull the last X in the outer query. I have not worked in PostgreSQL in a bit so I may be missing a shortcut function below.
WITH ByRevenue AS
(
--This creates a virtualized table that can be queried similar to a physical table in the conjoined statements below
SELECT
category_id,
product_id,
MAX(revenue) as max_revenue
FROM
A
JOIN B ON B.product_id = A.product_id
WHERE
order_date BETWEEN ‘2018-01-01’ AND ‘2018-01-31’
GROUP BY
category_id,product_id
)
,Normalized
(
--Pull data from the in memory table above using normal sql syntax and normalize it with a RANK function to achieve the limit.
SELECT
category_id,
product_id,
max_revenue,
ROW_NUMBER() OVER (PARTITION BY category_id,product_id ORDER BY max_revenue DESC) as rn
FROM
ByRevenue
)
--Final query from stuff above with each category/product ranked by revenue
SELECT *
FROM Normalized
WHERE RN<=3;
For top-n queries, the first thing to try is usually the lateral join:
WITH categories as (
SELECT DISTINCT category_id
FROM B
)
SELECT categories.category_id, sub.product_id
FROM categories
JOIN LATERAL (
SELECT a.product_id
FROM B
JOIN A ON (a.product_id = b.product_id)
WHERE b.category_id = categories.category_id
AND order_date BETWEEN '2006-01-01' AND '2006-01-31'
GROUP BY a.product_id
ORDER BY sum(revenue) desc
LIMIT 3
) sub on true;
Try using Fetch n rows only?
Note: Let's think that your primary key here is product_id, so I used them for combining the two table.
SELECT A.category,A.revenue From Table A
INNER JOIN Table B on A.product_id = B.Product_ID
WHERE A.Order_Date between (from date) and (to date)
ORDER BY A.Revenue DESC
Fetch first 3 rows only

How can I fix my GROUP BY clause

I have 2 tables as seen below:
Now the question is :
How can I have a view which shows the details of the last Owner? in other words I need the details of person who has MAX(StartDate) in tbl_Owners table?
I want to find the latest owner of each apartment.
I tried different approaches but I couldn't find the way to do that.
I know I need to get the personID in a Group By clause which groups records by AppID but I can't do that
Thank you
Try this
select t1.* from tbl_persons as t1 inner join
(
select t1.* from tbl_owners as t1 inner join
(
select appid,max(startdate) as startdate from tbl_owners group by appid
) as t2
on t1.appid=t2.appid and t1.startdate=t2.startdate
) as t2
on t1.personid=t2.personid
Add this to your query:
JOIN (select AppId, MAX(StartDate) as MAxStartDate
from dbo.tbl_Owners
group by PersonId) o2
ON dbo.tbl_Owners.AppId= o2.AppId and
dbo.tbl_Owners.StartDate = o2.MAxStartDate
The sub-query above returns every AppId together with it's latest StartDate. Self-joining with that result will give you what you want.
You can USE CTE for this purpose
;WITH CTE AS
(
SELECT AppID,PersonID,StartDate,
ROW_NUMBER() OVER (PARTITION BY AppID ORDER BY StartDate DESC) RN
FROM TableNAme
GROUP BY AppID,PersonID,StartDate
)
SELECT * FROM CTE
WHERE RN=1
Using row_number
select t.*, p.* -- change as needed
from (select *, rn= row_number() over(partition by AppID order by StartDate desc)
from dbo.tbl_Owners
) t
join dbo.tbl_Persons p on t.rn=1 and t.PersonId = p.PersonId
using cross apply
select t.*, p.* -- change as needed
from dbo.tbl_Persons p
cross apply (
select top(1) *
from dbo.tbl_Owners o
where o.PersonId = p.PersonId
order by o.StartDate desc
) t
SELECT dbo.tbl_Owners.*,dbo.tbl_Persons.PersonFullname FROM dbo.tbl_Owners
INNER JOIN
dbo.tbl_Persons ON dbo.tbl_Owners.PersonID=dbo.tbl_Persons.PersonID
GROUP BY dbo.tbl_Owners.StartDate HAVING MAX(StartDate);
Use GROUP BY on StartDate instead on PersonID.

Highest Count with a group

I'm having an absolute brain fade
SELECT p.ProductCategory, f.ProductSubCategory, COUNT(*) AS Cnt
FROM Sales f
JOIN Products p ON f.ProductSubCategory = p.ProductSubCategory
GROUP BY p.ProductCategory, f.ProductSubCategory
ORDER BY 1,3 DESC
This shows me the count for each ProductSubCategory, I would like to see only the highest ProductSubCategory per ProductCategory.
I wish to see (I don't care about the Count value)
There are a couple of different ways to do this. One involves joining the results back to themselves and using the max aggregate. But since you are using SQL Server, you can use ROW_NUMBER to achieve the same result:
with cte as (
select p.productcategory, p.ProductSubCategory, COUNT(*) cnt,
ROW_NUMBER() over (partition by p.productcategory order by count(*) desc) rn
from products p
join sales s on p.ProductSubCategory = s.ProductSubCategory
group by p.productcategory, p.ProductSubCategory
)
select *
from cte
where rn = 1
You already got the answer, Please see the following code to. It may help you.
SELECT p.ProductCategory,
f.ProductSubCategory,
COUNT(*) AS Cnt
FROM Sales f
JOIN Products p ON f.ProductSubCategory = p.ProductSubCategory
JOIN (
SELECT p.ProductCategory,
f.ProductSubCategory,
ROW_NUMBER() OVER ( PARTITION BY p.ProductCategory,
f.ProductSubCategory
ORDER BY COUNT(*) DESC) [Row]
FROM Sales f
JOIN Products p ON f.ProductSubCategory = p.ProductSubCategory) Lu
ON P.ProductCategory = Lu.ProductCategory
AND f.ProductSubCategory = Lu.ProductSubCategory
WHERE Lu.Row = 1
GROUP By p.ProductCategory,
f.ProductSubCategory