Selecting month in each year with maximum number of projects - sql

I have the following scenario
img
For each year I would like to display the month with the highest number of projects that have ended
I have tried the following so far:
SELECT COUNT(proj.projno) nr_proj, extract(month from proj.end_date) month
, extract(year from proj.end_date) year
FROM PROJ
GROUP BY extract(month from proj.end_date)
,extract(year from proj.end_date)
I am getting the information about the number of projects per month, per year.
Could any one give me hints how for each of the years I would select only the records with the highest count of projects?

You can use this solution using max analytic function to get max nr_proj value per year (partition by clause), then keep only rows where nr_proj = mx.
select t.nr_proj, t.month, t.year
from (
SELECT COUNT(proj.projno) nr_proj
, extract(month from proj.end_date) month
, extract(year from proj.end_date) year
, max( COUNT(proj.projno) ) over(partition by extract(year from proj.end_date)) mx
FROM PROJ
GROUP BY extract(month from proj.end_date), extract(year from proj.end_date)
) t
where nr_proj = mx
;
demo

I think the following will give you what you are after (if I understood the requirements). It fist counts the projects for each month then ranks the months by year, finally it selects the first rank.
select dt "Most Projects Month", cnt "Monthly Projects"
from ( -- Rank month Count by Year
select to_char( dt, 'yyyy-mm') dt
, cnt
, rank() over (partition by extract(year from dt)
order by cnt desc) rnk
from (-- count number of in month projects for each year
select trunc(end_date,'mon') dt, count(*) cnt
from projects
group by trunc(end_date,'mon')
)
)
where rnk = 1
order by dt;
NOTE: Not tested, no data supplied. In future do not post images, see Why No Images.

Related

RANK() over (PARTITION BY) To show only TOP 3 rows for each month

I have a question about ranking . (My using Pgadmin for my SQL codes)
Mange to get my sum of sales in DESC order and rank 1 to 3 for the month of APR
But how can I achieve my result by showing only rank 1 to 3 for the month of Apr , May and June.
I need to reflect only 9 rows in my table .
SELECT restaurant_id,
EXTRACT(year FROM submitted_on) AS year,
EXTRACT(month FROM submitted_on) AS month,
SUM(total_amount),
RANK() OVER (PARTITION BY(extract(month from submitted_on))
ORDER BY SUM(total_amount) DESC) rank
FROM orders
WHERE submitted_on::date BETWEEN '2021-04-01' AND '2021-06-30'
GROUP BY restaurant_id, year, month
If you just want 3 records you should use row_number instead of rank. for your requirement you can do it in this way:
select t.* from (
SELECT restaurant_id,
EXTRACT(year FROM submitted_on) AS year,
EXTRACT(month FROM submitted_on) AS month,
SUM(total_amount),
RANK() OVER (PARTITION BY(extract(month from submitted_on))
ORDER BY SUM(total_amount) DESC) rank
FROM orders
WHERE submitted_on::date BETWEEN '2021-04-01' AND '2021-06-30'
GROUP BY restaurant_id, year, month
) t
where rank <=3;

In pgAdmin for postgreSQL, I am unable to query for MAXimum rows from another query that SUMs up and sorts rows. Working with 1 table

This is the initial query that groups, sums up, and orders the busiest day of the week per month and year for a small retail store:
SELECT year, month, day_of_week, SUM(total_revenue)
FROM vip_sales
GROUP BY year, month, day_of_week
ORDER BY year, month, SUM DESC
and returns the table in attached image. And that is what I want to see INITIALLY.
Now I want to do a query on this result (image) that only shows the MAX sums of each month - essentially ONLY the rows that I circled, which is the best day (highest SUM) is each of the months of January (1) , February(2), ...
I tried the following:
SELECT year, month, day_of_week, MAX(SUM(total_revenue))
FROM vip_sales
GROUP BY year, month, day_of_week
ORDER BY year, month
But I got this error:
ERROR: aggregate function calls cannot be nested
LINE 1: SELECT year, month, day_of_week, MAX(SUM(total_revenue))
^
SQL state: 42803
Character: 38
Then I tried:
SELECT year, month, day_of_week, MAX(SUM)
FROM
(SELECT year, month, day_of_week, SUM(total_revenue)
FROM vip_sales
GROUP BY year, month, day_of_week
ORDER BY year, month, SUM DESC)
ORDER BY year, month
And I got another error with hint:
ERROR: subquery in FROM must have an alias
LINE 3: (SELECT year, month, day_of_week, SUM(total_revenue)
^
HINT: For example, FROM (SELECT ...) [AS] foo.
SQL state: 42601
Character: 51
So then I tried:
SELECT year, month, day_of_week, MAX(SUM)
FROM
(SELECT year, month, day_of_week, SUM(total_revenue)
FROM vip_sales
GROUP BY year, month, day_of_week
ORDER BY year, month, SUM DESC) AS foo
GROUP BY foo.year, foo.month, foo.day_of_week
ORDER BY foo.year, foo.month, MAX DESC
AND
SELECT foo.year, foo.month, foo.day_of_week, MAX(foo.SUM)
FROM
(SELECT year, month, day_of_week, SUM(total_revenue)
FROM vip_sales
GROUP BY year, month, day_of_week
ORDER BY year, month, SUM DESC) AS foo
GROUP BY foo.year, foo.month, foo.day_of_week
ORDER BY foo.year, foo.month, MAX DESC
But they are redundant and both return the SAME results as in the image - all days of the week in that month, and NOT the day of the week which is the day with maximum sales in that month in that year.
I googled 'nested queries' and 'sub queries" but I tried some techniques but got errors with no hints. I am not finding anything that logically explains how to do SUM and then query the MAXIMUM of the SUMs.
Any suggestions?
You can use ROW_NUMBER() to create a custom partition
SELECT year, month, day, thesum
FROM (
SELECT year, month, day, thesum,
ROW_NUMBER() OVER (PARTITION BY year, month ORDER BY thesum DESC) RN
FROM (
SELECT year, month, day_of_week, SUM(total_revenue) as thesum
FROM vip_sales
GROUP BY year, month, day_of_week
--ORDER BY year, month, SUM DESC
) x
) y
WHERE RN = 1

SQL - lag variable creation using window function

I have daily city level data with some counts. I have to aggregate this data at monthly level(1st day of each month) and then create lag variables based on last 1 week from 1st day of month.
I have used following code to create lag variables for last 1 month using (after aggregating data at monthly level ( with 1st date of month)
sum(count) over (partition by City order by month_date rows between 1 preceding and 1 preceding) as last_1_month_count
Is there a way to aggregate data at monthly level and create lag variables based on last 7,14,21,28 days using window function?
you can use this L
select
CITY
, month(Date)
, year(date)
, sum(count)
from table1
where date < Datediff(days , 7 , getdate())
group by
City
, month(Date)
, year(date)
I think you're looking for something like this. The first cte summarizes city counts to the day, week, month, year. The second summarizes the counts to the week, month, year. To group sales by weeks starting from the 1st day it uses the DAY function along with YEAR and MONTH. Since DAY returns and integer, groups of distinct weeks can be created by dividing by 7, i.e. DAY(day_dt)/7.
One way to get the prior week sales would be to join the week sales summary cte to itself where the week is offset by -1. Since the prior week might possible have 0 sales it seems safer to LEFT JOIN than to use LAG imo
with
day_sales_cte(city, day_dt, yr, mo, wk, sum_count) as (
select city, day_dt, year(day_dt), month(day_cte), day(day_dt)/7, sum([count]) sum_counts
from city_level_data
group by city, day_dt, year(day_dt), month(day_cte), day(day_dt)/7)
wk_sales_cte(city, yr, mo, wk, sum_count) as (
select city, yr, mo, wk, sum(sum_counts) sum_counts
from sales_cte
group by city, yr, mo, wk)
select ws.*, ws2.sum_sales prior_wk_sales
from wk_sales_cte ws
left join wk_sales_cte ws2 on ws.city=ws2.city
and ws.yr=ws2.yr
and ws.mo=ws2.mo
and ws.wk=ws.wk-1;

Find max value for each year

I have a question that is asking:
-List the max sales for each year?
I think I have the starter query but I can't figure out how to get all the years in my answer:
SELECT TO_CHAR(stockdate,'YYYY') AS year, sales
FROM sample_newbooks
WHERE sales = (SELECT MAX(sales) FROM sample_newbooks);
This query gives me the year with the max sales. I need max sales for EACH year. Thanks for your help!
Use group by and max if all you need is year and max sales of the year.
select
to_char(stockdate, 'yyyy') year,
max(sales) sales
from sample_newbooks
group by to_char(stockdate, 'yyyy')
If you need rows with all the columns with max sales for the year, you can use window function row_number:
select
*
from (
select
t.*,
row_number() over (partition by to_char(stockdate, 'yyyy') order by sales desc) rn
from sample_newbooks t
) t where rn = 1;
If you want to get the rows with ties on sales, use rank:
select
*
from (
select
t.*,
rank() over (partition by to_char(stockdate, 'yyyy') order by sales desc) rn
from sample_newbooks t
) t where rn = 1;

SQL Query to show all results before current month

I have a table in Oracle with columns: [DATEID date, COUNT_OF_PHOTOS int]
This table basically represents how many photos were uploaded per day.
I have a query that summarizes the number of photos uploaded per month:
select extract(year from dateid) as year, extract(month from dateid) as month, count(1) as Photos
from picture_table
group by extract(year from dateid), extract(month from dateid)
order by 1, 2
This does what I want, but I would like to run this query at the beginning of each month, lets say 07-02-2012, and have all data EXCLUDING the current month. How would I add a WHERE clause that ignores all entries that have a date equal to the current year+month?
Here is one way:
where to_char(dateid, 'YYYY-MM') <> to_char(sysdate, 'YYYY-MM')
To preserve any indexing strategy you may have on dateid:
select extract(year from dateid) as year, extract(month from dateid) as month, count(1) as Photos
from picture_table
WHERE (dateid < TRUNC(SYSDATE,'MM') OR dateid >= ADD_MONTHS(TRUNC(SYSDATE,'MM'),1))
group by extract(year from dateid), extract(month from dateid)
order by 1, 2