SQL - MAX GROUP BY, Include Additional Column [duplicate] - sql

This question already has answers here:
Get top 1 row of each group
(19 answers)
How to select a max row for each group in SQL
(3 answers)
Closed 1 year ago.
I have a table of purchases in the following format (simplified):
tDate
tAmount
tDesription
2021-01-01
1.50
Ice Cream
2021-01-01
1.60
Pencils
2021-02-03
4.50
Paper
2021-02-04
2.50
Staples
I'm trying to find the MAX() value of a purchase for each month, simple enough, but I can't seem to include additional columns in the result set from the row selected as the max. The output I'm looking for is:
tDate
tMonth
tYear
tAmount
tDesription
2021-01-01 00:00:00.000
January
2021
1.60
Pencils
2021-02-01 00:00:00.000
February
2021
4.50
Paper
My thoughts where that I would add in a start of the month column for each row, group by that, and apply the MAX() to the amount, along with a date filter, which works, also had to apply the tMonth and tYear in the group by.
What I've tried is:
SELECT DATEADD(MONTH, DATEDIFF(MONTH,0, [tDate]),0), FORMAT([tDate], 'MMMM') as 'Month', FORMAT([tdate], 'yyyy') as 'Year', MAX([tAmount]) as 'tAmount'
-- Source Table
FROM t
-- Last X months
WHERE [tDate] >= DATEADD(month, -6, getDate())
-- Group by the month
GROUP BY DATEADD(MONTH, DATEDIFF(MONTH,0, [tDate]),0), FORMAT([tDate], 'MMMM'), FORMAT([tDate], 'yyyy')
-- Order
ORDER BY DATEADD(MONTH, DATEDIFF(MONTH,0, [tDate]),0) DESC
Which gives me something very close, but as soon as I add the [tDescription] column I'll receive the 'column not included in aggregate or group by' error, and I obviously can't include the column in the group by, otherwise I'll end up with a row for each.
So I'm pretty stuck on the best approach to include the [tDescription] column in the results, and I've a feeling this query is flawed, does anyone have any ideas?

You can use window functions:
select t.*
from (select t.*,
row_number() over (partition by year(date), month(date) order by tAmount desc) as seqnum
from t
) t
where seqnum = 1;
To include the name of the month, you can add datename(month, date). However, that seems redundant with the date column.

Related

SQL - Get historic count of rows collected within a certain period by date

For many years I've been collecting data and I'm interested in knowing the historic counts of IDs that appeared in the last 30 days. The source looks like this
id
dates
1
2002-01-01
2
2002-01-01
3
2002-01-01
...
...
3
2023-01-10
If I wanted to know the historic count of ids that appeared in the last 30 days I would do something like this
with total_counter as (
select id, count(id) counts
from source
group by id
),
unique_obs as (
select id
from source
where dates >= DATEADD(Day ,-30, current_date)
group by id
)
select count(distinct(id))
from unique_obs
left join total_counter
on total_counter.id = unique_obs.id;
The problem is that this results would return a single result for today's count as provided by current_date.
I would like to see a table with such counts as if for example I had ran this analysis yesterday, and the day before and so on. So the expected result would be something like
counts
date
1235
2023-01-10
1234
2023-01-09
1265
2023-01-08
...
...
7383
2022-12-11
so for example, let's say that if the current_date was 2023-01-10, my query would've returned 1235.
If you need a distinct count of Ids from the 30 days up to and including each date the below should work
WITH CTE_DATES
AS
(
--Create a list of anchor dates
SELECT DISTINCT
dates
FROM source
)
SELECT COUNT(DISTINCT s.id) AS "counts"
,D.dates AS "date"
FROM CTE_DATES D
LEFT JOIN source S ON S.dates BETWEEN DATEADD(DAY,-29,D.dates) AND D.dates --30 DAYS INCLUSIVE
GROUP BY D.dates
ORDER BY D.dates DESC
;
If the distinct count didnt matter you could likely simplify with a rolling sum, only hitting the source table once:
SELECT S.dates AS "date"
,COUNT(1) AS "count_daily"
,SUM("count_daily") OVER(ORDER BY S.dates DESC ROWS BETWEEN CURRENT ROW AND 29 FOLLOWING) AS "count_rolling" --assumes there is at least one row for every day.
FROM source S
GROUP BY S.dates
ORDER BY S.dates DESC;
;
This wont work though if you have gaps in your list of dates as it'll just include the latest 30 days available. In which case the first example without distinct in the count will do the trick.
SELECT count(*) AS Counts
dates AS Date
FROM source
WHERE dates >= DATEADD(DAY, -30, CURRENT_DATE)
GROUP BY dates
ORDER BY dates DESC

Remove Duplicates and show Total sales by year and month

i am trying to work with this query to produce a list of all 11 years and 12 months within the years with the sales data for each month. Any suggestions? this is my query so far.
SELECT
distinct(extract(year from date)) as year
, sum(sale_dollars) as year_sales
from `project-1-349215.Dataset.sales`
group by date
it just creates a long list of over 2000 results when i am expecting 132 max one for each month in the years.
You should change your group by statement if you have more results than you expected.
You can try:
group by YEAR(date), MONTH(date)
or
group by EXTRACT(YEAR_MONTH FROM date)
A Grouping function is for takes a subsection of the date in your case year and moth and collect all rows that fit, and sum it up,
So a sĀ“GROUp BY date makes no sense, what so ever as you don't want the sum of every day
So make this
SELECT
extract(year from date) as year
,extract(MONTH from date) as month
, sum(sale_dollars) as year_sales
from `project-1-349215.Dataset.sales`
group by 1,2
Or you can combine both year and month
SELECT
extract(YEAR_MONTH from date) as year
, sum(sale_dollars) as year_sales
from `project-1-349215.Dataset.sales`
group by 1

SQL: year to end of the month?

I have a dataset in the format of: Date Amount such as:
Date Amount
2018-01 100
2018-02 200
2018-03 300
I want the sum year to date at the end of each month resulting in:
Date Amount
2018-01 100
2018-02 300 (100+200)
2018-03 600 (100+200+300)
How do I go about referencing the previous year-to-date sum in the previous row?
You can do this with a window function:
select date,
sum(amount) over (order by date) as so_far
from the_table
order by date;
The above is ANSI standard SQL
You need to use Running Total for this.
SELECT
Date,
SUM (Amount) OVER (ORDER BY Date ASC) AS RunningAmount
FROM Table1
Order by Date asc;
You can use OVER clause for that. Try as below.
Use PARTITION BY DATEPART(yy, Date) if you want to separate cumulative total for each year as below.
SELECT Date, Amount, SUM(Amount) OVER (PARTITION BY DATEPART(yy, Date) ORDER BY DATEPART(mm, Date)) AS CumulativeTotal
FROM Table
Or if you want to have cumulative total for all years then remove PARTITION BY DATEPART(yy, Date) part. Just order by is enough.
SELECT Date, Amount, SUM(Amount) OVER (ORDER BY Date) AS CumulativeTotal
FROM Table
Refer this for more information. Producing a moving average and cumulative total
Other option would to use a correlated subquery :
select t.*,
(select sum(t1.amount) from table t1 where t1.date <= t.date) as amount
from table t;
This would work almost in all DBMS.

How to get unique dates based on from_date and to_date in SQL Server

from_date to_date duration
-------------------------------------
2018-10-01 2018-10-10 9
2018-10-05 2018-10-07 3
If I provide input #from_date = 2018-10-01, to_date = 2018-10-11, I want to display count as 9
How about that:
SELECT DATEDIFF(DAY,'20181001','20181011')-1
--To select a single value per row
SELECT
DATEDIFF(DAY,from_date,to_date) as duration
FROM
SomeTable
You could apply a WHERE clause to filter to just a specific row that you want the duration of returned or wrap the DATEDIFF function in an AVG() or SUM() to get the avergae or total of all the durations in the table. You can do all kinds of very complex things with T-SQL. For instance the below query will get you the average duration for each month when whatever was started (from_date) for the year 2017.
E.G. -
SELECT
DATEPART(Month, from_date) as Month,
AVG(DATEDIFF(DAY, from_date, to_date) as AvgDuration
FROM SomeTable
WHERE
DATEPART(Year, from_date) = 2017
GROUP BY
DATEPART(Month, from_date)
Hope this helps. If not, feel free to try again. :)

SQL Server / SSRS: Calculating monthly average based on grouping and historical values

I need to calculate an average based on historical data for a graph in SSRS:
Current Month
Previous Month
2 Months ago
6 Months ago
This query returns the average for each month:
SELECT
avg_val1, month, year
FROM
(SELECT
(sum_val1 / count) as avg_val1, month, year
FROM
(SELECT
SUM(val1) AS sum_val1, SUM(count) AS count, month, year
FROM
(SELECT
COUNT(val1) AS count, SUM(val1) AS val1,
MONTH([SnapshotDate]) AS month,
YEAR([SnapshotDate]) AS year
FROM
[DC].[dbo].[KPI_Values]
WHERE
[SnapshotKey] = 'Some text here'
AND No = '001'
AND Channel = '999'
GROUP BY
[SnapshotDate]) AS sub3
GROUP BY
month, year, count) AS sub2
GROUP BY sum_val1, count, month, year) AS sub1
ORDER BY
year, month ASC
When I add the following WHERE clause I get the average for March (2 months ago):
WHERE month = MONTH(GETDATE())-2
AND year = YEAR(GETDATE())
Now the problem is when I want to retrieve data from 6 months ago; MONTH(GETDATE()) - 6 will output -1 instead of 12. I also have an issue with the fact that the year changes to 2016 and I am a bit unsure of how to implement the logic in my query.
I think I might be going about this wrong... Any suggestions?
Subtract the months from the date using the DATEADD function before you do your comparison. Ex:
WHERE SnapshotDate BETWEEN DATEADD(month, -6, GETDATE()) AND GETDATE()
MONTH(GETDATE()) returns an int so you can go to 0 or negative values. you need a user scalar function managing this, adding 12 when <= 0