How exactly does HAVING work with GROUP BY? - sql

As with most programmers, I want to be efficient in my code. In this case, I want everything in 1 statement rather than broken up into many parts.
I am writing a query that provides the highest carbon footprint of each industry group from the most recent year. I'm providing how many companies are in each industry group and grouped by the industry group and year, too.
I want to set MAX(year) in HAVING so that I have the most recent year, but is it possible to do so?
This is what I coded:
SELECT industry_group,
COUNT(company) AS count_industry,
ROUND(SUM(carbon_footprint_pcf), 1) AS total_industry_footprint
FROM product_emissions
GROUP BY industry_group, year
HAVING MAX(year) = year
ORDER BY total_industry_footprint DESC
LIMIT 10;
This code provides the industry groups, the count of companies in each industry group, and the carbon footprint of each industry group. HAVING MAX(year) = year doesn't do anything; I still get the highest carbon footprint in descending order, but it's not by the most recent year.
The correct code is:
SELECT industry_group,
COUNT(company) AS count_industry,
ROUND(SUM(carbon_footprint_pcf), 1) AS total_industry_footprint
FROM product_emissions
GROUP BY industry_group, year
HAVING year = 2017
ORDER BY total_industry_footprint DESC;
Any suggestions?

this has nothing to do with grouping. You need to filter the year.
SELECT industry_group,
COUNT(company) AS count_industry,
ROUND(SUM(carbon_footprint_pcf), 1) AS total_industry_footprint
FROM product_emissions
WHERE YEAR = (SELECT MAX(YEAR) FROM product_emissions)
GROUP BY industry_group
ORDER BY total_industry_footprint DESC
LIMIT 10;
If you need to filter based on industry groups then
SELECT industry_group,
COUNT(company) AS count_industry,
ROUND(SUM(carbon_footprint_pcf), 1) AS total_industry_footprint
FROM product_emissions pe
WHERE YEAR = (SELECT MAX(YEAR) FROM product_emissions pei where pe.industry_group = pei.industry_group )
GROUP BY industry_group
ORDER BY total_industry_footprint DESC
LIMIT 10;

Related

How to conditional SQL select

My table consists of user_id, revenue, publish_month columns.
Right now I use group_by user_id and sum(revenue) to get revenue for all individual users.
Is there a single SQL query I can use to query for user revenue across a time period conditionally? If for a specific user, there is a row for this month, I want to query for this month, last month and the month before. If there is not yet a row for this month, I want to query for last month and the two months before.
Any advice with which approach to take would be helpful. If I should be using cases, if-elses with exists or if this is do-able with a single SQL query?
UPDATE---since I did a bad job of describing the question, I've come to include some example data and expected results
Where current month is not present for user 33
Where current month is present
Assuming publish_month is a DATE datatype, this should get the most recent three months of data per user...
SELECT
user_id, SUM(revenue) as s_revenue
FROM
(
SELECT
user_id, revenue, publish_month,
MAX(publish_month) OVER (PARTITION BY user_id) AS user_latest_publish_month
FROM
yourtableyoudidnotname
)
summarised
WHERE
publish_month >= DATEADD(month, -2, user_latest_publish_month)
GROUP BY
user_id
If you want to limit that to the most recent 3 months out of the last 4 calendar months, just add AND publish_month >= DATEADD(month, -3, DATE_TRUNC(month, GETDATE()))
The ambiguity here is why it is important to include a Minimal Reproducible Example
With input data and require results, we could test our code against your requirements
If you're using strings for the publish_month, you shouldn't be, and should fix that with utmost urgency.
You can use a windowing function to "number" the months. In this way the most recent one will have a value of 1, the prior 2, and the one before 3. Then you can only select the items with a number of 3 or less.
Here is how:
SELECT user_id, revienue, publish_month,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY publish_month DESC) as RN
FROM yourtableyoudidnotname
now you just select the items with RN less than 3 and do your sum
SELECT user_id, SUM(revenue) as s_revenue
FROM (
SELECT user_id, revenue, publish_month,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY publish_month DESC) as RN
FROM yourtableyoudidnotname
) X
WHERE RN <= 3
GROUP BY user_id
You could also do this without a sub query if you use the windowing function for SUM and a range, but I think this is easier to understand.
From the comment -- there could be an issue if you have months from more than one year. To solve this make the biggest number in the order by always the most recent. so instead of
ORDER BY publish_month DESC
you would have
ORDER BY (100*publish_year)+publish_month DESC
This means more recent years will always have a higher number so january of 2023 will be 202301 while december of 2022 will be 202212. Since january is a bigger number it will get a row number of 1 and december will get a row number of 2.

How do I write a query to find highest earning day per quarter?

I need to write SQL query to pull the single, highest-earning day for a certain brand of each quarter of 2018. I have the following but it does not pull a singular day - it pulls the highest earnings for each day.
select distinct quarter, order_event_date, max(gc) as highest_day_gc
from (
select sum(commission) as cm, order_date,
extract(quarter from order__date) as quarter
from order_table
where advertiser_id ='123'
and event_year='2018'
group by 3,2
)
group by 1,2
order by 2 DESC
You can use window functions to find the highest earning day per quarter by using rank().
select rank() over (partition by quarter order by gc desc) as rank, quarter, order_event_date, gc
from (select sum(gross_commission) gc,
order_event_date,
extract(quarter from order_event_date) quarter
from order_aggregation
where advertiser_id = '123'
and event_year = '2018'
group by order_event_date, quarter) a
You could create the query above as view and filter it by using where rank = 1.
You could add the LIMIT clause at the end of the sentence. Also, change the las ORDER BY clause to ORDER BY highest_day_gc. Something like:
SELECT DISTINCT quarter
,order_event_date
,max(gc) as highest_day_gc
FROM (SELECT sum(gross_commission) as gc
,order_event_date
,extract(quarter from order_event_date) as quarter
FROM order_aggregation
WHERE advertiser_id ='123'
AND event_year='2018'
GROUP BY 3,2) as subquery
GROUP BY 1,2
ORDER BY 3 DESC
LIMIT 1

SQLlite getting min and max and average using nested queries

I have these tables
record: sid (string), cid (string), quarter (string), year (integer), grade(integer)
student: sid (string)
For every student who has taken at least one class, meaning a student is entered in the record table at least once, i need to get their GPA in the most recent quarter they were enrolled in. I need to display sid, quarter, year, and grade (gpa).
There are 3 quarters in a given calendar year, and it may be helpful to observe the order of the occurrence of quarters is in reverse alphabetical order ('W' > 'S' > 'F'). These stands for winter, spring, fall respectively. Fall being the latest quarter of the year.
this is what i came up with:
select sid, quarter, year, avg(grade) as gpa
from (select sid, min(quarter) as quarter, year, avg(grade) as grade
from (select *, max(year) as maxy
from record
group by sid)
group by sid)
group by sid;
this gives me the average grade for all quarters/years enrolled, and doesn't give me the latest quarter either.
I can only use functions such as NOT EXIST / EXIST, NOT IN/IN , group by, order by. I cannot use rank().
I was told that I should use NOT EXIST to get the latest quarter since the most recent quarter means for a specific quarter, there is no succeeding quarter.
any help would be greatly appreciated. thank you!
You want solution using not exists? Here you go.
Select t.*
From record t
Where not exists
(Select 1 from record tt
Where tt.year > t.year
And tt.quarter < t.quarter
And tt.sid = t.sid)
Above query will give you all the data of student for latest quarter, then you can use the aggregate function according to your requirement.
Use row_number():
select r.*
from (select r.*,
row_number() over (partition by sid
order by case quarter when 'W' then 1 when 'S' then 2 when 'F' then 3 else 4 end desc
) as seqnum
from records r
) r
where seqnum = 1;
Although this can be simplified a little bit by relying on the "alphabetical" ordering of quarter, I don't recommend that approach. Relying on alphabetic ordering for time periods is counterintuitive and will make the code harder to understand.
If the quarter were stored as a number then it would be appropriate for use in order by.

Is there a way to count how many strings in a specific column are seen for the 1st time?

**Is there a way to count how many strings in a specific column are seen for
Since the value in the column 2 gets repeated sometimes due to the fact that some clients make several transactions in different times (the client can make a transaction in the 1st month then later in the next year).
Is there a way for me to count how many IDs are completely new per month through a group by (never seen before)?
Please let me know if you need more context.
Thanks!
A simple way is two levels of aggregation. The inner level gets the first date for each customer. The outer summarizes by year and month:
select year(min_date), month(min_date), count(*) as num_firsts
from (select customerid, min(date) as min_date
from t
group by customerid
) c
group by year(min_date), month(min_date)
order by year(min_date), month(min_date);
Note that date/time functions depends on the database you are using, so the syntax for getting the year/month from the date may differ in your database.
You can do the following which will assign a rank to each of the transactions which are unique for that particular customer_id (rank 1 therefore will mean that it is the first order for that customer_id)
The above is included in an inline view and the inline view is then queried to give you the month and the count of the customer id for that month ONLY if their rank = 1.
I have tested on Oracle and works as expected.
SELECT DISTINCT
EXTRACT(MONTH FROM date_of_transaction) AS month,
COUNT(customer_id)
FROM
(
SELECT
date_of_transaction,
customer_id,
RANK() OVER(PARTITION BY customer_id
ORDER BY
date_of_transaction ASC
) AS rank
FROM
table_1
)
WHERE
rank = 1
GROUP BY
EXTRACT(MONTH FROM date_of_transaction)
ORDER BY
EXTRACT(MONTH FROM date_of_transaction) ASC;
Firstly you should generate associate every ID with year and month which are completely new then count, while grouping by year and month:
SELECT count(*) as new_customers, extract(year from t1.date) as year,
extract(month from t1.date) as month FROM table t1
WHERE not exists (SELECT 1 FROM table t2 WHERE t1.id==t2.id AND t2.date<t1.date)
GROUP BY year, month;
Your results will contain, new customer count, year and month

SQL order with equal group size

I have a table with columns month, name and transaction_id. I would like to count the number of transactions per month and name. However, for each month I want to have the top N names with the highest transaction counts.
The following query groups by month and name. However the LIMIT is applied to the complete result and not per month:
SELECT
month,
name,
COUNT(*) AS transaction_count
FROM my_table
GROUP BY month, name
ORDER BY month, transaction_count DESC
LIMIT N
Does anyone have an idea how I can get the top N results per month?
Use row_number():
SELECT month, name, transaction_count
FROM (SELECT month, name, COUNT(*) AS transaction_count,
ROW_NUMBER() OVER (PARTITION BY month ORDER BY COUNT(*) DESC) as seqnum
FROM my_table
GROUP BY month, name
) mn
WHERE seqnum <= N
ORDER BY month, transaction_count DESC