Account for missing values in group by month

Account for missing values in group by month - sql

I'm trying to retrieve the average number of records added to the database each month. However for months that no records were added, the row is missing and therefore not being calculated into the average.
Here is the query:
SELECT AVG(a.count) AS AVG
FROM ( SELECT COUNT(*) AS count, MONTH(InsertedTimestamp) AS Month
FROM Certificates
WHERE InsertedTimestamp >= '9/19/2014'
AND InsertedTimestamp <= '7/1/2015'
GROUP BY MONTH(InsertedTimestamp)
) AS a
When I run just the inner query, only results from months 9,10,11 are showing, because there are no records for months 12,1,2,3,4,5,6,7. How can I add these missing rows to the table in order to get the correct monthly average?
Thanks!

This is easy enough to fix, just by using sum / cnt:
SELECT COUNT(*) / (TIMESTAMPDIFF(month, '2014-09-19', '2015-07-01' ) + 1)
FROM Certificates
WHERE InsertedTimestamp >= '2014-09-19' AND
InsertedTimestamp <= '2015-07-01' ;
You don't even need the subquery.

Related

SQL - Get historic count of rows collected within a certain period by date

For many years I've been collecting data and I'm interested in knowing the historic counts of IDs that appeared in the last 30 days. The source looks like this
id
dates
1
2002-01-01
2
2002-01-01
3
2002-01-01
...
...
3
2023-01-10
If I wanted to know the historic count of ids that appeared in the last 30 days I would do something like this
with total_counter as (
select id, count(id) counts
from source
group by id
),
unique_obs as (
select id
from source
where dates >= DATEADD(Day ,-30, current_date)
group by id
)
select count(distinct(id))
from unique_obs
left join total_counter
on total_counter.id = unique_obs.id;
The problem is that this results would return a single result for today's count as provided by current_date.
I would like to see a table with such counts as if for example I had ran this analysis yesterday, and the day before and so on. So the expected result would be something like
counts
date
1235
2023-01-10
1234
2023-01-09
1265
2023-01-08
...
...
7383
2022-12-11
so for example, let's say that if the current_date was 2023-01-10, my query would've returned 1235.

If you need a distinct count of Ids from the 30 days up to and including each date the below should work
WITH CTE_DATES
AS
(
--Create a list of anchor dates
SELECT DISTINCT
dates
FROM source
)
SELECT COUNT(DISTINCT s.id) AS "counts"
,D.dates AS "date"
FROM CTE_DATES D
LEFT JOIN source S ON S.dates BETWEEN DATEADD(DAY,-29,D.dates) AND D.dates --30 DAYS INCLUSIVE
GROUP BY D.dates
ORDER BY D.dates DESC
;
If the distinct count didnt matter you could likely simplify with a rolling sum, only hitting the source table once:
SELECT S.dates AS "date"
,COUNT(1) AS "count_daily"
,SUM("count_daily") OVER(ORDER BY S.dates DESC ROWS BETWEEN CURRENT ROW AND 29 FOLLOWING) AS "count_rolling" --assumes there is at least one row for every day.
FROM source S
GROUP BY S.dates
ORDER BY S.dates DESC;
;
This wont work though if you have gaps in your list of dates as it'll just include the latest 30 days available. In which case the first example without distinct in the count will do the trick.

SELECT count(*) AS Counts
dates AS Date
FROM source
WHERE dates >= DATEADD(DAY, -30, CURRENT_DATE)
GROUP BY dates
ORDER BY dates DESC

Rolling 12 month filter criteria in SQL

Having an issue in SQL script where I’m trying to achieve filter criteria of rolling 12 months in the day column which stored data as a text in server.
Goal is to count sizes for product at retail store location over the last 12 months from the current day. Currently, in my query I'm using the criteria of year 2019 which only counts the sizes for that year but not for rolling 12 months from current date.
CALENDARDAY column is in text field in the data set and data stores in yyyymmdd format.
When trying to run below script in Tableau with GETDATE and DATEADD function it is giving me a functional error. I am trying to access SAP HANA server with below query.
Any help would be appreciated
Select
SKU, STYLE_ID, Base_Style_ID, COLOR, SIZEKEY, STORE, Year,
count(SIZEKEY)over(partition by STYLE_ID,COLOR,STORE,Year) as SZ_CNT
from
(
select
a."RAW" As SKU,
a."STYLENUM" As STYLE_ID,
mat."BASENUM" AS Base_Style_ID,
a."COLORNUM" AS COLOR,
a."SIZE" AS SIZEKEY,
a."STORENUM" AS STORE,
substring(a."CALENDARDAY",1,4) As year
from PRTRPT_XRE as a
JOIN ZAT_SKU As mat On a."RAW" = mat."SKU"
where a."ORGANIZATION" = 'M20'
and a."COLORNUM" is not null
and substring(a."CALENDARDAY",1,4) = '2019'
Group BY
a."RAW",
a."STYLENUM",
mat."BASENUM",
a."ZCOLORCD",
a."SIZE",
a."STORENUM",
substring(a."CALENDARDAY",1,4)
)

I have never worked on that DB / Server, so I don't have a way to test this.
But hopefully this will work (expecting exact 12 months before today's date)
AND ADD_MONTHS (TO_DATE (a."CALENDARDAY", 'YYYY-MM-DD'), 12) > CURRENT_DATE
or
AND ADD_MONTHS (a."CALENDARDAY", 12) > CURRENT_DATE

Below condition from one of our CALENDAR table also worked same way as ADD_MONTHS mentioned in above response
select distinct CALENDARDAY
from
(
select FISCALWEEK, CALENDARDAY, CNST, row_number()over(partition by CNST order by FISCALWEEK desc) as rnum
from
(
select distinct FISCALWEEK, CALENDARDAY, 'A' as CNST
from CALENDARTABLE
where CALENDARDAY < current_date
order by 1,2
)
) where rnum < 366

How to Average Number of Chats per Day on LEFT JOIN table in Snowflake SQL?

In Snowflake SQL dictation, how do I average the number of video chats per day using a field from a table I left joined to the entire query?
I'm thinking I have to do a SUM function to total the number of video chats and then aggregate by # of video chats for each date and then divide by 30 days (the rolling date range I specified throughout my entire query).
Any help would be appreciated as deadlines are approaching. Thank you.
SELECT DISTINCT
t1."pid",
IFNULL(t2."VideoChats",0),
t3."SFUser",
t3."TotalProviders",
t4."dimaccount.practice_specialty",
t5."Account: CMRR",
t6."CreatedDate",
t7."stg_sf_case.Date_Time_Resolved__c",
t8."stg_sf_case.Closed_Date",
t9."pid"
FROM (SELECT "pid"
FROM "EDW_PROD"."PUBLIC"."STG_MYSQL_PROVIDERMODULES" AS a
WHERE a."active"
AND a."status" = 'PURCHASED'
AND a."module_id" = '14'
GROUP BY a."pid"
) t1
LEFT JOIN (SELECT "started_at",
"pid",
COUNT(*) AS "VideoChats"
FROM "EDW_PROD"."PUBLIC"."STG_MYSQL_VIDEOCHATROOM" AS b
LEFT JOIN "EDW_PROD"."PUBLIC"."DIMACCOUNT" AS dimaccount
ON b."pid" = dimaccount."PID"
WHERE b."started_at" >= DATE_TRUNC('month', CURRENT_DATE())
AND b."started_at" < DATEADD('month', 1, DATE_TRUNC('month', CURRENT_DATE()))
AND dimaccount."CurrentRow" = 'Y'
GROUP BY b."pid", b."started_at"
) t2 ON t1."pid" = t2."pid"

For a rolling average you probably want to use a window function. Something along these lines.
SELECT AVG(VideoChats) over (partition by pid order by started_at rows between 30 preceding and current row) as AvgVideoChats
--I saw a post about AVG not allowing a sliding window, so you may have to do this instead
SELECT SUM(VideoChats) over (partition by pid order by started_at rows between 30 preceding and current row) / 30. as AvgVideoChats
You may need to do this in a wrapper around your t2 query and adjust your date filters so that there are values available for averaging, but I'm not quite clear enough on what your query is doing with dates, or what results you are looking for, to be sure.

Average Group size per month Over previous ten years

I need to find the average size (average number of employees) of all the groups (employers) that we do business with per month for the last ten years.
So I have no problem getting the average group size for each month. For the Current month I can use the following:
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
group by ER.EmployerName
This will give me a list of how many employees are in each group. I can then copy and paste the column into excel get the average for the current month.
For the previous month, I want exclude any employees that were added after that month. I have a query for this too:
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
where EE.dateadded <= DATEADD(month, -1,GETDATE())
group by ER.EmployerName
That will exclude all employees that were added this month. I can continue to this all the way back ten years, but I know there is a better way to do this. I have no problem running this query 120 times, copying and pasting the results into excel to compute the average. However, I'd rather learn a more efficient way to do this.
Another Question, I can't do the following, anyone know a way around it:
Select avg(count(*))
Thanks in advance guys!!
Edit: Employees that have been terminated can be found like this. NULL are employees that are currently employed.
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
join Gen_Info gne on gne.id = EE.newuserid
where EE.dateadded <= DATEADD(month, -1,GETDATE())
and (gne.TerminationDate is NULL OR gen.TerminationDate < DATEADD(day, -14,GETDATE())
group by ER.EmployerName

Are you after a query that shows the count by year and month they were added? if so this seems pretty straight forward.
this is using mySQL date functions Year & month.
Select AVG(cnt) FROM (
Select count(*) cnt, Year(dateAdded), Month(dateAdded)
from System_Users su
join system_Employers se on se.employerid = su.employerid
group by Year(dateAdded), Month(dateAdded)) B
The inner query counts and breaks out the counts by year and month We then wrap that in a query to show the avg.
--2nd attempt but I'm Brain FriDay'd out.
This uses a Common table Expression (CTE) to generate a set of data for the count by Year, Month of the employees, and then averages out by month.
if this isn't what your after, sample data w/ expected results would help better frame the question and I can making assumptions about what you need/want.
With CTE AS (
Select Year(dateAdded) YR , Month(DateAdded) MO, count(*) over (partition by Year(dateAdded), Month(dateAdded) order by DateAdded Asc) as RunningTotal
from System_Users su
join system_Employers se on se.employerid = su.employerid
Order by YR ASC, MO ASC)
Select avg(RunningTotal), mo from cte;

SQL: Need to SUM on results that meet a HAVING statement

I have a table where we record per user values like money_spent, money_spent_on_candy and the date.
So the columns in this table (let's call it MoneyTable) would be:
UserId
Money_Spent
Money_Spent_On_Candy
Date
My goal is to SUM the total amount of money_spent -- but only for those users where they have spent more than 10% of their total money spent for the date range on candy.
What would that query be?
I know how to select the Users that have this -- and then I can output the data and sum that by hand but I would like to do this in one single query.
Here would be the query to pull the sum of Spend per user for only the users that have spent > 10% of their money on candy.
SELECT
UserId,
SUM(Money_Spent),
SUM(Money_Spent_On_Candy) / SUM(Money_Spent) AS PercentCandySpend
FROM MoneyTable
WHERE DATE >= '2010-01-01'
HAVING PercentCandySpend > 0.1;

You couldn't do this with a single query. You'd need a query that could reach back in time and retroactively filter the source table to handle only users with 10% candy spending. Luckily, that's kind of what sub-queries do:
SELECT SUM(spent) FROM (
SELECT SUM(Money_Spent) AS spent
FROM MoneyTable
WHERE (DATE >= '2010-01-01')
GROUP BY UserID
HAVING (SUM(Money_Spent_On_Candy)/SUM(Money_Spent)) > 0.1
);
The inner query does the heavy lifting of figuring out what the "10%" users spent, and then the outer query uses the sub-query as a virtual table to sum up the per-user Money_Spent sums.
Of course, this only works if you need ONLY the global total Money_Spent. If you end up needing the per-user sums as well, then you'd be better off just running the inner query and doing the global total in your application.

You can use common table expressions. Like this:
WITH temp AS (SELECT
UserId,
SUM(Money_Spent) AS MoneySpent,
SUM(Money_Spent_On_Candy)/SUM(Money_Spent) AS PercentCandySpend
FROM MoneyTable
WHERE DATE >= '2010-01-01'
HAVING PercentCandySpend > 0.1)
SELECT
UserId
SUM(MoneySpent)
FROM UserId

Or you can use a derived table:
SELECT SUM(Total_Money_Spent)
FROM ( SELECT UserId, Total_Money_Spent = SUM(Money_Spent), SUM(Money_Spent_On_Candy)/SUM(Money_Spent) AS PercentCandySpend
FROM MoneyTable
WHERE DATE >= '2010-01-01'
HAVING PercentCandySpend > 0.1 ) x;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Account for missing values in group by month - sql

This is easy enough to fix, just by using sum / cnt: SELECT COUNT(*) / (TIMESTAMPDIFF(month, '2014-09-19', '2015-07-01' ) + 1) FROM Certificates WHERE InsertedTimestamp >= '2014-09-19' AND InsertedTimestamp <= '2015-07-01' ; You don't even need the subquery.

Related

SQL - Get historic count of rows collected within a certain period by date

Rolling 12 month filter criteria in SQL

How to Average Number of Chats per Day on LEFT JOIN table in Snowflake SQL?

Average Group size per month Over previous ten years

SQL: Need to SUM on results that meet a HAVING statement

Categories

Resources