SQL sum then find max - sql

Here's what I'm trying to achieve. Basically I have a relation in which I have a count for each ID and month. I'd like to sum the counts for each id by month (middle table in the picture) and then from that find the maximum value from all those sums by month, and show the month, id, and the maximum value in that order. Here's what I've got so far:
SELECT month, MAX(summed_counts) AS maximum_result
FROM
(SELECT month, id, SUM(counts) AS summed_counts
FROM info WHERE year=2017 GROUP BY month, id)
AS final_result GROUP BY month ORDER BY month ASC;
However as soon as I add id it no longer works:
SELECT month, id, MAX(summed_counts) AS maximum_result
FROM
(SELECT month, id, SUM(counts) AS summed_counts
FROM info WHERE year=2017 GROUP BY month, id)
AS final_result GROUP BY month, id ORDER BY month ASC;
Any suggestions?

Try this (MS SQL):
select distinct month,
(select top 1 SUM(counts)
FROM info info_detail
WHERE year=2017 and info_detail.month=info.month
GROUP BY id
order by SUM(counts) desc
) as max_value,
(select top 1 id
FROM info info_detail
WHERE year=2017 and info_detail.month=info.month
GROUP BY id
order by SUM(counts) desc
) as max_value_id
from info
where year=2017
ORDER BY month

Related

How do I write a query to find highest earning day per quarter?

I need to write SQL query to pull the single, highest-earning day for a certain brand of each quarter of 2018. I have the following but it does not pull a singular day - it pulls the highest earnings for each day.
select distinct quarter, order_event_date, max(gc) as highest_day_gc
from (
select sum(commission) as cm, order_date,
extract(quarter from order__date) as quarter
from order_table
where advertiser_id ='123'
and event_year='2018'
group by 3,2
)
group by 1,2
order by 2 DESC
You can use window functions to find the highest earning day per quarter by using rank().
select rank() over (partition by quarter order by gc desc) as rank, quarter, order_event_date, gc
from (select sum(gross_commission) gc,
order_event_date,
extract(quarter from order_event_date) quarter
from order_aggregation
where advertiser_id = '123'
and event_year = '2018'
group by order_event_date, quarter) a
You could create the query above as view and filter it by using where rank = 1.
You could add the LIMIT clause at the end of the sentence. Also, change the las ORDER BY clause to ORDER BY highest_day_gc. Something like:
SELECT DISTINCT quarter
,order_event_date
,max(gc) as highest_day_gc
FROM (SELECT sum(gross_commission) as gc
,order_event_date
,extract(quarter from order_event_date) as quarter
FROM order_aggregation
WHERE advertiser_id ='123'
AND event_year='2018'
GROUP BY 3,2) as subquery
GROUP BY 1,2
ORDER BY 3 DESC
LIMIT 1

First users by categories in BigQuery

How can I count the new and existing users by categories and years?
For instance, during 2015-2020 if someone bought a product in category_A in 2016 first, it will be counted as a new uesr in 2016 in category_A although this user bought a product in category_B in 2015.
Table_1 (Columns: product_name, date, category, sales, user_id)
Want to get the result as bleow
One approach uses two levels of aggregation:
select extract(year from mindate) yr, category, count(*) num_new
from (
select user_id, category, min(date) mindate
from table_1
group by user_id, category
) t
group by extract(year from mindate)
The subquery retrieves the first purchase date of each user by category. Then, the outer query aggregates by the year of that date.
If you want the count of current users as well, then it is a bit different. You can use a window function in the subquery rather than aggregation, then count distinct values in the outer query:
select extract(year from mindate) yr, category,
countdistinctif(user_id, date = mindate) num_new,
countdistinct(user_id) num_total
from (
select date, user_id, category, min(date) over(partition by user_id, category) mindate
from table_1
) t
group by extract(year from mindate)
Below is for BigQuery Standard SQL
#standardSQL
WITH temp AS (
SELECT *,
0 = COUNT(1) OVER(
PARTITION BY user_id, category
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) new_user
FROM `project.dataset.table_1`
ORDER BY date, user_id
)
SELECT EXTRACT(YEAR FROM date) AS year,
category,
COUNT(DISTINCT IF(new_user, user_id, NULL)) AS num_new,
COUNT(DISTINCT IF(new_user, NULL, user_id)) AS num_existing
FROM temp
GROUP BY year, category

SQL order with equal group size

I have a table with columns month, name and transaction_id. I would like to count the number of transactions per month and name. However, for each month I want to have the top N names with the highest transaction counts.
The following query groups by month and name. However the LIMIT is applied to the complete result and not per month:
SELECT
month,
name,
COUNT(*) AS transaction_count
FROM my_table
GROUP BY month, name
ORDER BY month, transaction_count DESC
LIMIT N
Does anyone have an idea how I can get the top N results per month?
Use row_number():
SELECT month, name, transaction_count
FROM (SELECT month, name, COUNT(*) AS transaction_count,
ROW_NUMBER() OVER (PARTITION BY month ORDER BY COUNT(*) DESC) as seqnum
FROM my_table
GROUP BY month, name
) mn
WHERE seqnum <= N
ORDER BY month, transaction_count DESC

Running Count Distinct using Over Partition By

I have a data set with user ids that have made purchases over time. I would like to show a YTD distinct count of users that have made a purchase, partitioned by State and Country. The output would have 4 columns: Country, State, Year, Month, YTD Count of Distinct Users with purchase activity.
Is there a way to do this? The following code works when I exclude the month from the view and do a distinct count:
Select Year, Country, State,
COUNT(DISTINCT (CASE WHEN ActiveUserFlag > 0 THEN MBR_ID END)) AS YTD_Active_Member_Count
From MemberActivity
Where Month <= 5
Group By 1,2,3;
The issue occurs when the user has purchases across multiple months, because I can’t aggregate at a monthly level then sum, because it duplicates user counts.
I need to see the YTD count for each month of the year, for trending purposes.
Return each member only once for the first month they make a purchase, count by month and then apply a Cumulative Sum:
select Year, Country, State, month,
sum(cnt)
over (partition by Year, Country, State
order by month
rows unbounded preceding) AS YTD_Active_Member_Count
from
(
Select Year, Country, State, month,
COUNT(*) as cnt -- 1st purchses per month
From
( -- this assumes there's at least one new active member per year/month/country
-- otherwise there would be mising rows
Select *
from MemberActivity
where ActiveUserFlag > 0 -- only active members
and Month <= 5
-- and year = 2019 -- seems to be for this year only
qualify row_number() -- only first purchase per member/year
over (partition by MBR_ID, year
order by month --? probably there's a purchase_date) = 1
) as dt
group by 1,2,3,4
) as dt
;
Count users in the first month they appear:
select Country, State, year, month,
sum(case when ActiveUserFlag > 0 and seqnum = 1 then 1 else 0 end) as YTD_Active_Member_Count
from (select ma.*,
row_number() over (partition by year order by month) as seqnum
from MemberActivity ma
) ma
where Month <= 5
group by Country, State, year, month;

Find max value for each year

I have a question that is asking:
-List the max sales for each year?
I think I have the starter query but I can't figure out how to get all the years in my answer:
SELECT TO_CHAR(stockdate,'YYYY') AS year, sales
FROM sample_newbooks
WHERE sales = (SELECT MAX(sales) FROM sample_newbooks);
This query gives me the year with the max sales. I need max sales for EACH year. Thanks for your help!
Use group by and max if all you need is year and max sales of the year.
select
to_char(stockdate, 'yyyy') year,
max(sales) sales
from sample_newbooks
group by to_char(stockdate, 'yyyy')
If you need rows with all the columns with max sales for the year, you can use window function row_number:
select
*
from (
select
t.*,
row_number() over (partition by to_char(stockdate, 'yyyy') order by sales desc) rn
from sample_newbooks t
) t where rn = 1;
If you want to get the rows with ties on sales, use rank:
select
*
from (
select
t.*,
rank() over (partition by to_char(stockdate, 'yyyy') order by sales desc) rn
from sample_newbooks t
) t where rn = 1;