SQL group by highest occurance - sql

By executing this query
SELECT year, genre, COUNT(genre)
FROM Oscar
GROUP BY year, genre
I got the following output:
2016 Action 2
2016 Romance 1
2017 Action 1
2017 Romance 2
2018 Fantasy 1
2019 Action 1
2019 Fantasy 2
2020 Action 3
2020 Fantasy 1
2020 Romance 1
Now i want to display only the genre with the highest number per year to display. What is the best way to do this?
So I want the output to look like this:
2016 Action
2017 Romance
2018 Fantasy
2019 Fantasy
2020 Action

You can use window functions:
SELECT year, genre
FROM (
SELECT year, genre, RANK() OVER(PARTITION BY year ORDER BY COUNT(*) DESC) rn
FROM Oscar
GROUP BY year, genre
) t
WHERE rn = 1
If your database does not support window functions (eg MySQL < 8.0), another option is:
SELECT year, genre
FROM Oscar o
GROUP BY year, genre
HAVING COUNT(*) = (
SELECT COUNT(*)
FROM Oscar o1
WHERE o1.year = o.year
GROUP BY o1.category
ORDER BY COUNT(*) DESC LIMIT 1
)

Use window functions:
SELECT year, genre
FROM (SELECT year, genre, COUNT(*) as cnt,
RANK() OVER (PARTITION BY year ORDER BY COUNT(*) DESC) as seqnum
FROM Oscar
GROUP BY year, genre
) yg
WHERE seqnum = 1;
If there are ties, RANK() returns all highest ranked values. Use ROW_NUMBER() if you specifically want one row, even when there are ties for first.

Related

Number of Customer Purchases in Their First Month

I have a list of customer orders. I can easily calculate the month and year of first purchase for each customer (e.g. customer 1 had their first purchase in Sept 2021, customer 2 had their first purchase in Oct 2021, etc.). What I want to add is an additional column that counts the number of purchases a customer made in their first month.
Existing data table (Orders):
OrderId
CustomerId
OrderDate
1
1
9/15/2021
2
1
10/15/2021
3
1
11/1/2021
4
2
10/1/2021
5
2
10/6/2021
6
2
10/7/2021
7
2
11/9/2021
8
3
11/15/2021
Desired output:
CustomerId
FirstOrderMonth
FirstOrderYear
FirstMonthPurchaseCount
1
9
2021
1
2
10
2021
3
3
11
2021
1
I was thinking something like this for the first three columns:
SELECT o.CustomerId,
MONTH(MIN(o.OrderDate)) as FirstOrderMonth,
YEAR(MIN(o.OrderDate)) as FirstOrderYear
FROM Orders o
GROUP BY o.CustomerId
I am not sure how to approach the final column and was hoping for some help.
Aggregate by the customer's id, the year and the month of the order and use window functions to get the year and month of the 1st order and the count of that 1st month:
SELECT DISTINCT CustomerId,
FIRST_VALUE(MONTH(OrderDate)) OVER (PARTITION BY CustomerId ORDER BY YEAR(OrderDate), MONTH(OrderDate)) FirstOrderMonth,
MIN(YEAR(OrderDate)) OVER (PARTITION BY CustomerId) FirstOrderYear,
FIRST_VALUE(COUNT(*)) OVER (PARTITION BY CustomerId ORDER BY YEAR(OrderDate), MONTH(OrderDate)) FirstMonthPurchaseCount
FROM Orders
GROUP BY CustomerId, YEAR(OrderDate), MONTH(OrderDate);
See the demo.
You may use the RANK() function to identify the first month purchases for each user as the following:
Select D.CustomerId, MONTH(OrderDate) FirstOrderMonth,
YEAR(OrderDate) FirstOrderYear, COUNT(*) FirstMonthPurchaseCount
From
(
Select *, RANK() Over (Partition By CustomerId Order By YEAR(OrderDate), MONTH(OrderDate)) rnk
From table_name
) D
Where D.rnk = 1
Group By D.CustomerId, MONTH(OrderDate), YEAR(OrderDate)
See a demo.
If you want to find second, third ... month purchases, you may use the DENSE_RANK() function instead of RANK() and change the value in the where clause to the required month order.
select CustomerId
,min(month(OrderDate)) as FirstOrderMonth
,min(year(OrderDate)) as FirstOrderYear
,count(first_month_flag) as FirstMonthPurchaseCount
from (select *
,case when month(OrderDate) = month(min(OrderDate) over(partition by CustomerId)) then 1 end as first_month_flag
from Orders) Orders
group by CustomerId
CustomerId
FirstOrderMonth
FirstOrderYear
FirstMonthPurchaseCount
1
9
2021
1
2
10
2021
3
3
11
2021
1
Fiddle

Find out which people stay in a population/cohort from the base year to last year and the years in between

I have a table with columns: c
The table contains information about players/team relationships over multiple years (say, 2010 to 2020)
What I want to know is:
- For starting year, which players belonged to team Blueberry
- For year 2, who of the Blueberry players in the starting year still belong to the Blueberry team
-..and so on until the last year studied
A nagging feeling I have is that this is presentable as a single table using only one query.
Please help.
Year Player_id team_id
2012 kitliu Blueberry
2012 bobross Blueberry
2012 jacksnake Blueberry
2012 kittyjr Blueberry
2013 kitliu Blueberry
2013 bobross Blueberry
2013 narutol yellow
2014 kitliu Blueberry
2014 narutol Red
result:
2012 kitliu Blueberry
2012 bobross Blueberry
2012 jacksnake Blueberry
2012 kittyjr Blueberry
2013 kitliu Blueberry
2013 bobross Blueberry
2014 kitliu Blueberry
result, count retained player/team combos from base year:
Year Count
2012 4
2013 2
2012 1```
Enumerate the players from the base year. Then use this to check that there are no gaps:
select team, year, count(*)
from (select t.*,
row_number() over (partition by team, player_id order by year) as seqnum
from t
where year >= 2012
) t
where year = 2012 + seqnum - 1
group by team, year;
Here is a db<>fiddle.
I guess below query might help like I did it as an alternate meaning selecting by each year if the player hasnt switched any teams.
SELECT year, playerid,
count(distinct teamid)
from
table t group by year, playerid having
playerid,count(distinct teamid) IN (
Select
playerid,count(distinct teamid) group
by
playerid)
;
You can use analytical function.
SELECT YEAR, PLAYER_ID, TEAM_ID
FROM
(SELECT YEAR, PLAYER_ID, TEAM_ID,
ROW_NUMBER() OVER (PARTITION BY PLAYER_ID, TEAM_ID ORDER BY YEAR) AS RN,
DENSE_RANK() OVER (PARTITION BY TEAM_ID ORDER BY YEAR) AS RNK_YEAR
FROM YOUR_TABLE)
WHERE RN = RNK_YEAR
You can use count and group by to get the count year wise on top of this query.
Cheers!!
I hope this works for you:
with teamyr(year, playerid, teamid) as
(
select min(year), playerid, teamid
from teams
group by playerid, teamid
)
select t1.year, t1.playerid, t1.teamid
from teamyr t1
where t1.year = (select min(year) from teamyr)
union all
select t2.year, t1.playerid, t1.teamid
from teamyr t1
inner join teams t2 on t2.playerid = t1.playerid and t2.teamid = t1.teamid
and t2.year > t1.year

Display max year and its max month with their corresponding value in oracle?

Year Month Value
2015 1 300
2015 2 400
2010 4 100
2016 7 200
2016 8 300
2017 2 100
2017 3 200
2017 6 400
You might try the following:
SELECT MAX(year), MAX(month)
, MAX(value) KEEP ( DENSE_RANK FIRST ORDER BY year DESC, month DESC )
FROM mytable;
If you want the max month per year along with the corresponding value, then you can do this:
SELECT year, MAX(month)
, MAX(value) KEEP ( DENSE_RANK FIRST ORDER BY month DESC )
FROM mytable
GROUP BY year;
Hope this helps.
You could use:
SELECT *
FROM (SELECT *
FROM tab t
ORDER BY Year DESC, Month DESC) s
WHERE rownum = 1;
Select * from table_name where month =
(select max(month) from table_name where year =
(select max(year) from table_name));
This might be the answer you are looking for, I have used nested queries to reach out to the desired result

Limiting rows in SQL doesn't work properly

I want to get the first 5 rows of every season in my select. I have 4 seasons: SUM, SPR, AUT, WIN.
So there should be 20 rows in total.
My select looks like this:
select *
from (
select year, season, ROUND(avg(temperature),1) as avgTemp
from temperature join month on temperature.MONTH = month.MONTH
group by (season, year)
order by season, avgTemp asc
) where rownum <= 5;
It works for just one season. The output is:
1993 AUT 8,7
2007 AUT 9,9
1996 AUT 10
1998 AUT 10
2008 AUT 10,5
But it should look like that:
1996 SPR 9.6
1991 SPR 10.3
2006 SPR 10.3
2004 SPR 10.6
1995 SPR 10.6
1996 SUM 18.9
1993 SUM 19.1
2007 SUM 19.5
1998 SUM 19.5
2000 SUM 19.6
1993 AUT 8.7
2007 AUT 9.9
1998 AUT 10.0
1996 AUT 10.0
2008 AUT 10.5
1996 WIN .3
1991 WIN 1.2
2003 WIN 1.6
2006 WIN 1.9
2005 WIN 2.0
Do you know how to improve the select or do you have any other suggestions? Thanks in advance!
You need to do it in three steps:
Group by season and year, calculating the average temperature
Assign a row number: it restart with each season and assigns in ascending order according to the average temperature
Select only the rows with a row number between 1 and 5
The SQL should look like this (untested):
select year, season, avg_temp
from (
select year, season, avg_temp,
row_number() over(partition by season order by avg_temp) rn
from (
select year, season, ROUND(avg(temperature),1) as avg_temp
from temperature
join month on temperature.MONTH = month.MONTH
group by season, year
)
)
where rn <= 5;
Update
For you special ordering by season, add this:
order by case season
when 'SPR' then 1
when 'SUM' then 2
when 'AUT' then 3
when 'WIN' then 4
end, avg_temp;
WITH cteAverageTempByYearBySeason AS (
SELECT
year
,season
,ROUND(AVG(temperature),1) as AvgTemp
FROM
Temperature t
INNER JOIN Month m
On t.MONTH = m.MONTH
GROUP BY
year
,season
)
, cteRowNumber AS (
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY season ORDER BY AvgTemp ASC) as RowNumber
FROM
cteAverageTempByYearBySeason
)
SELECT *
FROM
cteRowNumber
WHERE
RowNumber <= 5
Here is an example. I broke out the derived tables into Common Table Expressions to make the logic more noticeable. You need to create a PARTITIONED ROW_NUMBER() not just use oracles special rownumber. The latter will only return the same as TOP/LIMIT 5 where as the former will allow you to identify 5 rows per season.
Edit added a neat trick for your order by so you don't have to write a case expression. This one utilizes your month number which I assume is what MONTH column is.
WITH cteAverageTempByYearBySeason AS (
SELECT
year
,season
,ROUND(AVG(temperature),1) as AvgTemp
,MAX(m.MONTH) as SeasonOrderBy
FROM
Temperature t
INNER JOIN Month m
On t.MONTH = m.MONTH
GROUP BY
year
,season
)
, cteRowNumber AS (
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY season ORDER BY AvgTemp ASC) as RowNumber
FROM
cteAverageTempByYearBySeason
)
SELECT
year
,season
,AVG
FROM
cteRowNumber
WHERE
RowNumber <= 5
ORDER BY
SeasonOrderBy
,AvgTemp
,Year
You need to use row_number to get 5 rows for each grouping:
select
year,
season,
round(avg(temperature), 1) as avgTemp
from (
select *,
row_number() over(partition by season, year order by season, avgTemp) as rn
from temperature t
join month m
on m.MONTH = t.MONTH
) a
where
a.rn <= 1

Oracle SQL Query:Find out which year total sales amount is maximum

my working table, Table name: sales
Here Is MY TABLE, [sl_no is primary key] table structure:
CREATE TABLE SALES
( SL_NO NUMBER PRIMARY KEY, REGION VARCHAR2(10) NOT NULL,
MONTH VARCHAR2(20) NOT NULL, YEAR NUMBER NOT NULL,
SALES_AMOUNT NUMBER NOT NULL )
and here is table data:
SQL> select * from sales;
SL_NO REGION MONTH YEAR SALES_AMOUNT
---------- ---------- -------------------- ---------- ------------
1 east december 2011 750000
2 east august 2011 800000
3 west january 2012 640000
5 east march 2012 1200000
6 west february 2011 580000
4 west april 2011 555000
6 rows selected.
I have tried this query to view total sales amount of those[2011,2012] year;
SELECT year, SUM(sales_amount) FROM sales GROUP BY year;
YEAR SUM(SALES_AMOUNT)
---------- -----------------
2011 2685000
2012 1840000
MY GOAL:> I want to find out the year of maximum sales amount.
I tried this,and work perfectly...but when i want to display that year also, it gives an Error.
SQL> select max(sum(sales_amount)) from sales group by year;
MAX(SUM(SALES_AMOUNT))
----------------------
2685000
SQL> select year, max(sum(sales_amount)) from sales group by year;
select year, max(sum(sales_amount)) from sales group by year
*
ERROR at line 1:
ORA-00937: not a single-group group function
Extra addition: if multiple rows have same value means....when sales amount of both year[2011,2012] remain same, Then....
plZ help me to Solve this problem.
This should work.
with yr_agg as (
select year, sum(sales_amount) as total
from sales
group by year
)
select year, total as max_total
from yr_agg
where total = (select max(total)
from yr_agg);
I think the simplest way is to order the results and take the first row:
select year, sales_amount
from (SELECT year, SUM(sales_amount) as sales_amount
FROM sales
GROUP BY year
order by sum(sales_amount) desc
) t
where rownum = 1;
EDIT:
If you need to display all the matching rows (which isn't mentioned in the question), I would suggest using the dense_rank() analytic function:
select year, sales_amount
from (SELECT year, SUM(sales_amount) as sales_amount,
dense_rank(over order by SUM(sales_amount) desc) as seqnum
FROM sales
GROUP BY year
order by sum(sales_amount) desc
) t
where seqnum = 1;
Or, you might like the max() version instead:
select year, sales_amount
from (SELECT year, SUM(sales_amount) as sales_amount,
max(sum(sales_amount)) over () as maxsa
FROM sales
GROUP BY year
order by sum(sales_amount) desc
) t
where sales_amount = maxsa;
Following select should do what you need (untested, do not have Oracle at home):
select year, total
from (
select year, sum(sales_amount) total
from sales
group by year
)
where total = (select max(total_amount)
from (
select year, sum(sales_amount) total_amount
from sales
group by year
))
Take in account, though, that it might give you different years in each execution if two of them have exactly the same total amount. You might want to include some more conditions to avoid this.
Here is my Query where multiple row can select
SELECT year,MAX(total_sale) as max_total
FROM
(SELECT year,SUM(sales_amount) AS total_sale FROM sales GROUP BY year)
GROUP BY
year HAVING MAX(total_sale) =
(SELECT MAX(total_sale) FROM (SELECT SUM(sales_amount) AS total_sale FROM sales GROUP BY year));