Average Group size per month Over previous ten years - sql

I need to find the average size (average number of employees) of all the groups (employers) that we do business with per month for the last ten years.
So I have no problem getting the average group size for each month. For the Current month I can use the following:
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
group by ER.EmployerName
This will give me a list of how many employees are in each group. I can then copy and paste the column into excel get the average for the current month.
For the previous month, I want exclude any employees that were added after that month. I have a query for this too:
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
where EE.dateadded <= DATEADD(month, -1,GETDATE())
group by ER.EmployerName
That will exclude all employees that were added this month. I can continue to this all the way back ten years, but I know there is a better way to do this. I have no problem running this query 120 times, copying and pasting the results into excel to compute the average. However, I'd rather learn a more efficient way to do this.
Another Question, I can't do the following, anyone know a way around it:
Select avg(count(*))
Thanks in advance guys!!
Edit: Employees that have been terminated can be found like this. NULL are employees that are currently employed.
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
join Gen_Info gne on gne.id = EE.newuserid
where EE.dateadded <= DATEADD(month, -1,GETDATE())
and (gne.TerminationDate is NULL OR gen.TerminationDate < DATEADD(day, -14,GETDATE())
group by ER.EmployerName

Are you after a query that shows the count by year and month they were added? if so this seems pretty straight forward.
this is using mySQL date functions Year & month.
Select AVG(cnt) FROM (
Select count(*) cnt, Year(dateAdded), Month(dateAdded)
from System_Users su
join system_Employers se on se.employerid = su.employerid
group by Year(dateAdded), Month(dateAdded)) B
The inner query counts and breaks out the counts by year and month We then wrap that in a query to show the avg.
--2nd attempt but I'm Brain FriDay'd out.
This uses a Common table Expression (CTE) to generate a set of data for the count by Year, Month of the employees, and then averages out by month.
if this isn't what your after, sample data w/ expected results would help better frame the question and I can making assumptions about what you need/want.
With CTE AS (
Select Year(dateAdded) YR , Month(DateAdded) MO, count(*) over (partition by Year(dateAdded), Month(dateAdded) order by DateAdded Asc) as RunningTotal
from System_Users su
join system_Employers se on se.employerid = su.employerid
Order by YR ASC, MO ASC)
Select avg(RunningTotal), mo from cte;

Related

How to capture first row in a grouping and subsequent rows that are each a minimum of 15 days apart?

Assume a given insurance will only pay for the same patient visiting the same doctor once in 15 days. If the patient comes once, twice, or twenty times within those 15 days to the doctor, the doctor will get only one payment. If the patient comes again on Day 16 or Day 18 or Day 29 (or all three!), the doctor will get a second payment. The first visit (or first after the 15 day interval) is always the one that must be billed, along with its complaint.
The SQL for all visits can be loosely expressed as follows:
SELECT VisitID
,PatientID
,VisitDtm
,DoctorID
,ComplaintCode
FROM Visits
The goal is to query the Visits table in a way that would capture only billable incidents.
I have been trying to work through this question which is in essence quite similar to Group rows with that are less than 15 days apart and assign min/max date. However, the reason this won't work for me is that, as the accepted answerer (Salman A) points out, Note that this could group much longer date ranges together e.g. 01-01, 01-11, 01-21, 02-01 and 02-11 will be grouped together although the first and last dates are more than 15 days apart. This presents a problem for me as it is a requirement to always capture the next incident after 15 days have passed from the first incident.
I have spent quite a few hours thinking this through and poring over like problems, and am looking for help in understanding the path to a solution, not necessarily an actual code solution. If it's easier to answer in the context of a code solution, that is fine. Any and all guidance is very much appreciated!
This type of task requres a iterative process so you can keep track of the last billable visit. One approach is a recursive cte.
You would typically enumerate the visits of each patient use row_number(), then traverse the dataset starting from the first visit, while keeping track of the last "billable" visit. Once a visit is met that is more than 15 days latter than the last billable visit, the value resets.
with
data as (
select visitid, patientid, visitdtm, doctorid,
row_number() over(partition by patientid order by visitdtm) rn
from visits
),
cte as (
select d.*, visitdtm as billabledtm from data d where rn = 1
union all
select d.*,
case when d.visitdtm >= dateadd(day, 15, c.billabledtm)
then d.visitdtm
else c.billabledtm
end
from cte c
inner join data d
on d.patientid = c.patientid and d.rn = c.rn + 1
)
select * from cte where visitdtm = billabledtm order by patientid, rn
If a patient may have more than 100 visits, then you need to add option (maxrecursion 0) at the very end of the query.
Here's another approach. Similar to GMB's this adds a row_number to the Visits table in a CTE but it also adds the lead date difference between VisitDtm's. Then it takes cumulative "sum over" of the date difference and divides by 15. When that quotient increases by a full integer, it represents a billable event in the data.
Something like this
;with lead_cte as (
select v.*, row_number() over (partition by PatientId order by VisitDtm) rn,
datediff(d, VisitDtm, lead(VisitDtm) over (partition by PatientId order by VisitDtm)) lead_dt_diff
from Visits v),
cum_sum_cte as (
select lc.*, sum(lead_dt_diff) over (partition by PatientId order by VisitDtm)/15 cum_dt_diff
from lead_cte),
min_billable_cte as (
select PatientId, cum_dt_diff, min(rn) min_rn
from cum_sum_cte
group by PatientId, cum_dt_diff)
select lc.*
from lead_cte lc
join min_billable_cte mbc on lc.PatintId=mbc.PatientId
and lc.rn=mbc.min_rn;

Running Total - Create row for months that don't have any sales in the region (1 row for each region in each month)

I am working on the below query that I will use inside Tableau to create a line chart that will be color-coded by year and will use the region as a filter for the user. The query works, but I found there are months in regions that don't have any sales. These sections break up the line chart and I am not able to fill in the missing spaces (I am using a non-date dimension on the X-Axis - Number of months until the end of its fiscal year).
I am looking for some help to alter my query to create a row for every month and every region in my dataset so that my running total will have a value to display in the line chart. if there are no values in my table, then = 0 and update the running total for the region.
I have a dimDate table and also a Regions table I can use in the query.
My Query now, (Results sorted in Excel to view easier) Results Table Now
What I want to do; New rows highlighted in Yellow What I want to do
My Code using SQL Server:
SELECT b.gy,
b.sales_month,
b.region,
b.gs_year_total,
b.months_away,
Sum(b.gs_year_total)
OVER (
partition BY b.gy, b.region
ORDER BY b.months_away DESC) RT_by_Region_GY
FROM (SELECT a.gy,
a.region,
a.sales_month,
Sum(a.gy_total) Gs_Year_Total,
a.months_away
FROM (SELECT g.val_id,
g.[gs year] AS GY
,
g.sales_month
AS
Sales_Month,
g.gy_total,
Datediff(month, g.sales_month, dt.lastdayofyear) AS
months_away,
g.value_type,
val.region
FROM uv_sales g
JOIN dbo.dimdate AS dt
ON g.[gs year] = dt.gsyear
JOIN dimvalsummary val
ON g.val_id = val.val_id
WHERE g.[gs year] IN ( 2017, 2018, 2019, 2020, 2021 )
GROUP BY g.valuation_id,
g.[gs year],
val.region,
g.sales_month,
dt.lastdayofyear,
g.gy_total,
g.value_type) a
WHERE a.months_away >= 0
AND sales_month < Dateadd(month, -1, Getdate())
GROUP BY a.gy,
a.region,
a.sales_month,
a.months_away) b
It's tough to envision the best method to solve without data and the meaning of all those fields. Here's a rough sketch of how one might attempt to solve it. This is not complete or tested, sorry, but I'm not sure the meaning of all those fields and don't have data to test.
Create a table called all_months and insert all the months from oldest to whatever date in the future you need.
01/01/2017
02/01/2017
...
12/01/2049
May need one query per region and union them together. Select the year & month from that all_months table, and left join to your other table on month. Coalesce your dollar values.
select 'East' as region,
extract(year from m.month) as gy_year,
m.month as sales_month,
coalesce(g.gy_total, 0) as gy_total,
datediff(month, m.month, dt.lastdayofyear) as months_away
from all_months m
left join uv_sales g on g.sales_month = m.month
--and so on

SQLite - Use a CTE to divide a query

quick question for those SQL experts out there. I feel a bit stupid because I have the feeling I am close to reaching the solution but have not been able to do so.
If I have these two tables, how can I use the former one to divide a column of the second one?
WITH month_usage AS
(SELECT strftime('%m', starttime) AS month, SUM(slots) AS total
FROM Bookings
GROUP BY month)
SELECT strftime('%m', b.starttime) AS month, f.name, SUM(slots) AS usage
FROM Bookings as b
LEFT JOIN Facilities as f
ON b.facid = f.facid
GROUP BY name, month
ORDER BY month
The first one computes the total for each month
The second one is the one I want to divide the usage column by the total of each month to get the percentage
When I JOIN both tables using month as an id it messes up the content, any suggestion?
I want to divide the usage column by the total of each month to get the percentage
Just use window functions:
SELECT
strftime('%m', b.starttime) AS month,
f.name,
SUM(slots) AS usage
1.0 * SUM(slots) AS usage
/ SUM(SUM(slots)) OVER(PARTITION BY strftime('%m', b.starttime)) ratio
FROM Bookings as b
LEFT JOIN Facilities as f
ON b.facid = f.facid
GROUP BY name, month
ORDER BY month

Account for missing values in group by month

I'm trying to retrieve the average number of records added to the database each month. However for months that no records were added, the row is missing and therefore not being calculated into the average.
Here is the query:
SELECT AVG(a.count) AS AVG
FROM ( SELECT COUNT(*) AS count, MONTH(InsertedTimestamp) AS Month
FROM Certificates
WHERE InsertedTimestamp >= '9/19/2014'
AND InsertedTimestamp <= '7/1/2015'
GROUP BY MONTH(InsertedTimestamp)
) AS a
When I run just the inner query, only results from months 9,10,11 are showing, because there are no records for months 12,1,2,3,4,5,6,7. How can I add these missing rows to the table in order to get the correct monthly average?
Thanks!
This is easy enough to fix, just by using sum / cnt:
SELECT COUNT(*) / (TIMESTAMPDIFF(month, '2014-09-19', '2015-07-01' ) + 1)
FROM Certificates
WHERE InsertedTimestamp >= '2014-09-19' AND
InsertedTimestamp <= '2015-07-01' ;
You don't even need the subquery.

MySQL - how to retrieve columns in same row as the values returned by min/mx

I couldn't frame the question's title properly. Suppose a table of weekly movie earnings as below:
MovieName <Varchar(450)>
MovieGross <Decimal(18)>
WeekofYear <Integer>
Year <Integer>
So how do I get the names of top grossers for each week of this year, if I do:
select MovieName , Max(MovieGross) , WeekofYear
from earnings where year = 2010 group by WeekofYear;
Then obviously the query wont run, while
select Max(MovieName) , Max(MovieGross) , WeekofYear
from earnings where year = 2010 group by WeekofYear;
would just give movies starting with lowest alphabet. Is using group_concat() and then substring_index() the only option here?
select
substring_index(group_concat(MovieName order by MovieGross desc),',',1),
Max(MovieGross) , WeekofYear from earnings where year = 2010
group by WeekofYear ;
Seems clumsy. Is there any better way of achieving this?
It's the ever-recurring max-per-group problem. You solve it by selecting the defining properties of your group and then joining your "real" data against that.
select
e.MovieName,
e.MovieGross,
e.WeekofYear
from
earnings e
inner join (
select Max(MovieGross) MovieGross, Year, WeekofYear
from earnings
group by Year, WeekofYear
) max on max.Year = e.Year
and max.WeekofYear = e.WeekofYear
and max.MovieGross = e.MovieGross
where
e.year = 2010
The defining properties of your group are Year, WeekofYear and MAX(MovieGross). There will be one row with different values for each group range.
An INNER JOIN against your data table elimitates all rows that do not fulfill the defining properties of your group. This also means that it lets through all rows that do - you could end up with two movies that made the same amount of money in any particular week. Group the "outer" query again to eliminate the duplicate rows in favor of a single movie.
You need to determine the max weekly gross and then select the movie name based on that criterion. Something like this:
SELECT e.MovieName, m.Gross, m.WeekofYear
FROM earnings e JOIN
(SELECT MAX(MovieGross) Gross, WeekofYear
FROM earnings WHERE `year` = 2010 GROUP BY WeekofYear) m
ON e.MovieGross=m.Gross AND e.WeekofYear=m.WeekofYear
This is pretty fast query, that does the job:
SELECT e.WeekofYear as WeekofYear
, max(MovieGross) as MovieGross
, (SELECT MovieName FROM earnings
WHERE WeekofYear=e.WeekofYear ORDER BY MovieGross DESC LIMIT 1
) as MovieName
FROM earnings AS e
WHERE year='2010'
GROUP BY WeekofYear
ORDER BY WeekofYear;
Happy to help you :)
P.S. and thanks for ratings ;)
Ok, trying again with the having clause, I cannot help myself.
Hopefully this will help you get started. First, create a list of the weeks of the year, then for the inner query, find the one that has the max for that week.
Select MovieName, max(MovieGross) as max_gross, WeekofYear
from earnings
where year = 2010
order by MovieGross desc
Having MovieGross=max_gross
group by WeekofYear
This should return the top grossing movie for each week. This should also return multiple entries for a week in the event of a tie.