Running Total - Create row for months that don't have any sales in the region (1 row for each region in each month) - sql

I am working on the below query that I will use inside Tableau to create a line chart that will be color-coded by year and will use the region as a filter for the user. The query works, but I found there are months in regions that don't have any sales. These sections break up the line chart and I am not able to fill in the missing spaces (I am using a non-date dimension on the X-Axis - Number of months until the end of its fiscal year).
I am looking for some help to alter my query to create a row for every month and every region in my dataset so that my running total will have a value to display in the line chart. if there are no values in my table, then = 0 and update the running total for the region.
I have a dimDate table and also a Regions table I can use in the query.
My Query now, (Results sorted in Excel to view easier) Results Table Now
What I want to do; New rows highlighted in Yellow What I want to do
My Code using SQL Server:
SELECT b.gy,
b.sales_month,
b.region,
b.gs_year_total,
b.months_away,
Sum(b.gs_year_total)
OVER (
partition BY b.gy, b.region
ORDER BY b.months_away DESC) RT_by_Region_GY
FROM (SELECT a.gy,
a.region,
a.sales_month,
Sum(a.gy_total) Gs_Year_Total,
a.months_away
FROM (SELECT g.val_id,
g.[gs year] AS GY
,
g.sales_month
AS
Sales_Month,
g.gy_total,
Datediff(month, g.sales_month, dt.lastdayofyear) AS
months_away,
g.value_type,
val.region
FROM uv_sales g
JOIN dbo.dimdate AS dt
ON g.[gs year] = dt.gsyear
JOIN dimvalsummary val
ON g.val_id = val.val_id
WHERE g.[gs year] IN ( 2017, 2018, 2019, 2020, 2021 )
GROUP BY g.valuation_id,
g.[gs year],
val.region,
g.sales_month,
dt.lastdayofyear,
g.gy_total,
g.value_type) a
WHERE a.months_away >= 0
AND sales_month < Dateadd(month, -1, Getdate())
GROUP BY a.gy,
a.region,
a.sales_month,
a.months_away) b

It's tough to envision the best method to solve without data and the meaning of all those fields. Here's a rough sketch of how one might attempt to solve it. This is not complete or tested, sorry, but I'm not sure the meaning of all those fields and don't have data to test.
Create a table called all_months and insert all the months from oldest to whatever date in the future you need.
01/01/2017
02/01/2017
...
12/01/2049
May need one query per region and union them together. Select the year & month from that all_months table, and left join to your other table on month. Coalesce your dollar values.
select 'East' as region,
extract(year from m.month) as gy_year,
m.month as sales_month,
coalesce(g.gy_total, 0) as gy_total,
datediff(month, m.month, dt.lastdayofyear) as months_away
from all_months m
left join uv_sales g on g.sales_month = m.month
--and so on

Related

SUM with GROUP BY don't display zero sums

I'm trying to get data from a single table. I grouped by CURR. I have 12 condition listed. But some are zero.
SELECT ISNULL(SUM(AMOUNT), 0) AS TOPL,
CURR
FROM XXX
WHERE DATEPART(year, CONVERT(dateTime, DATE)) = 2018
AND DATEPART(month, CONVERT(dateTime, DATE)) = 1
GROUP BY CURR
This is returning 3 value. But I want 12 value including zero sums. I tried this with CASE, but I could not.
Thanks...
What the others have been trying to tell you is that you need a "list" of the CURR values you want to see in your results. Generally, these would come from another table in a properly normalized database. Do you have one? It seems not but it is worth asking. A properly normalized database would generally have one.
So how do you create this list dynamically? Let us assume that your existing table has at least one row for every CURR value you desire in your resultset - even if that row has a date that falls outside of your period of interest. We can use that to form this list and then outer join that list to your existing query that does the summing.
with curr_list as (select distinct CURR from dbo.XXX)
select curr_list.CURR,
sum(isnull(tbl.AMOUNT)) as TOPL
from curr_list left join dbo.XXX as tbl
on curr_list.CURR = tbl.CURR
and tbl.[DATE] >= '20180101'
and tbl.[DATE] < '20180201'
group by curr_list.CURR
order by curr_list.CURR;
That should work assuming I made no typos. The CTE (named curr_list) creates the list of ID values that you want to see in your resultset. When you outer join that to your transaction data you will get at least one row for each CURR value. You then sum the amounts to aggregate those rows into a single row for each CURR value. Notice the change to the date criteria. Your original approach prevents the optimizer from using any useful indexes on that column.
If your existing table does not have a row for every value of CURR you want in your results, then you can simply change the cte and hardcode the values you desire.
I may get what you are trying to do.
If you have 12 CURR (I'm guessing currency?) and want to get the total amount of the transaction in January 2018 grouped by currency.
If this is the case, here is what you should try to do :
SELECT ISNULL(SUM(AMOUNT), 0) AS TOPL,
CURR_TABLE.CURR
FROM CURR_TABLE
LEFT JOIN XXX ON XXX.CURR = CURR_TABLE.CURR
WHERE DATEPART(year, CONVERT(dateTime, XXX.DATE)) = 2018
AND DATEPART(month, CONVERT(dateTime, XXX.DATE)) = 1
GROUP BY CURR_TABLE.CURR
That way you'll get all currency listed (even if no reccord of that currency is available for that month.
EDIT:
I don't like that kind of syntax when you can avoid it, but you can :
SELECT ISNULL(SUM(AMOUNT), 0) AS TOPL,
CURR_TABLE.CURR
FROM (SELECT distinct CURR FROM XXX) CURR_TABLE
LEFT JOIN XXX ON XXX.CURR = CURR_TABLE.CURR
WHERE DATEPART(year, CONVERT(dateTime, XXX.DATE)) = 2018
AND DATEPART(month, CONVERT(dateTime, XXX.DATE)) = 1
GROUP BY CURR_TABLE.CURR
Please execute below query and compare with your expected output.
SELECT CURR, SUM(ISNULL(AMOUNT,0)) AS TOPL,
CURR
FROM XXX
WHERE DATEPART(year, CONVERT(dateTime, DATE)) = 2018
AND DATEPART(month, CONVERT(dateTime, DATE)) = 1
GROUP BY CURR

Modification todate dimension in SQL Server

I need a suggestion around one of the columns that I'm creating in the Date dimension in SQL Server, basically rolling weeks..
I have a table dimDate in my datawarehouse.
I want to create a column in the dimdate table which will have week number in any year and each week should have 7 days.
For eg: In year 2015 there are 53 weeks but the 53rd week has only 5 days (because the week starts on Sunday in SQL Server I guess).
I want to include 2 more days from 2016 (1st and 2nd Jan in 2016) to complete the 53rd week with 7 days and also the the 1st week in 2016 should start on 3rd of Jan 2016, so on and so forth.
If there are any suggestions that will be great to start with.
Assuming that you already have weeks populated (but not extended into the next year), and making some assumptions about columns names
This query finds the last week in a year (which would almost always always be 53 but don't count on it:) and the date that it ends on
SELECT YearNo, MAX(Week) As Week, MAX(DateKey) As DateKey
FROM dimDate
GROUP BY YearNo
This query finds all weeks that are shorter than 7 days, and how many extra days are required to make them 7 days.
SELECT
YearNo,
Week,
7-COUNT(DISTINCT DateKey) As ExtraDaysRequired
FROM dimDate
GROUP BY YearNo, Week
HAVING COUNT(DISTINCT DateKey) < 7
This might always be the last week of the year but lets not make assumptions.
Lets combine these to find all final weeks that have less than 7 days, as well as add the number of days required:
SELECT
Under7Days.YearNo, Under7Days.Week, Under7Days.ExtraDaysRequired,
FinalWeeks.DateKey StartDate,
DATEADD(d,Under7Days.ExtraDaysRequired,FinalWeeks.DateKey) EndDate
FROM
(
SELECT YearNo, MAX(Week) As Week, MAX(DateKey) As DateKey
FROM dimDate
GROUP BY YearNo
) As FinalWeeks
INNER JOIN
(
SELECT YearNo, Week, 7-COUNT(DISTINCT DateKey) As ExtraDaysRequired
FROM dimDate
GROUP BY YearNo, Week
HAVING COUNT(DISTINCT DateKey) < 7
) As Under7Days
ON FinalWeeks.Week = Under7Days.Week
AND FinalWeeks.YearNo = Under7Days.YearNo
So we have a query that identifies the start date and end date and week number that it needs to be updated to. So now we run an update:
UPDATE TGT
SET Week = SRC.Week
FROM dimDate TGT
INNER JOIN
(
SELECT
Under7Days.YearNo, Under7Days.Week, Under7Days.ExtraDaysRequired,
FinalWeeks.DateKey StartDate,
DATEADD(d,Under7Days.ExtraDaysRequired,FinalWeeks.DateKey) EndDate
FROM
(
SELECT YearNo, MAX(Week) As Week, MAX(DateKey) As DateKey
FROM dimDate
GROUP BY YearNo
) As FinalWeeks
INNER JOIN
(
SELECT YearNo, Week, 7-COUNT(DISTINCT DateKey) As ExtraDaysRequired
FROM dimDate
GROUP BY YearNo, Week
HAVING COUNT(DISTINCT DateKey) < 7
) As Under7Days
ON FinalWeeks.Week = Under7Days.Week
AND FinalWeeks.YearNo = Under7Days.YearNo
) SRC
ON TGT.DateID BETWEEN SRC.StartDate AND SRC.EndDate
Looks complicated? There's half a dozen ways to write the same thing but this approach is step-by-step. You could probably write a windowing function to do the same thing but I leave that as an exercise for someone else.

Average Group size per month Over previous ten years

I need to find the average size (average number of employees) of all the groups (employers) that we do business with per month for the last ten years.
So I have no problem getting the average group size for each month. For the Current month I can use the following:
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
group by ER.EmployerName
This will give me a list of how many employees are in each group. I can then copy and paste the column into excel get the average for the current month.
For the previous month, I want exclude any employees that were added after that month. I have a query for this too:
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
where EE.dateadded <= DATEADD(month, -1,GETDATE())
group by ER.EmployerName
That will exclude all employees that were added this month. I can continue to this all the way back ten years, but I know there is a better way to do this. I have no problem running this query 120 times, copying and pasting the results into excel to compute the average. However, I'd rather learn a more efficient way to do this.
Another Question, I can't do the following, anyone know a way around it:
Select avg(count(*))
Thanks in advance guys!!
Edit: Employees that have been terminated can be found like this. NULL are employees that are currently employed.
Select count(*)
from Employees EE
join Employers ER on EE.employerid = ER.employerid
join Gen_Info gne on gne.id = EE.newuserid
where EE.dateadded <= DATEADD(month, -1,GETDATE())
and (gne.TerminationDate is NULL OR gen.TerminationDate < DATEADD(day, -14,GETDATE())
group by ER.EmployerName
Are you after a query that shows the count by year and month they were added? if so this seems pretty straight forward.
this is using mySQL date functions Year & month.
Select AVG(cnt) FROM (
Select count(*) cnt, Year(dateAdded), Month(dateAdded)
from System_Users su
join system_Employers se on se.employerid = su.employerid
group by Year(dateAdded), Month(dateAdded)) B
The inner query counts and breaks out the counts by year and month We then wrap that in a query to show the avg.
--2nd attempt but I'm Brain FriDay'd out.
This uses a Common table Expression (CTE) to generate a set of data for the count by Year, Month of the employees, and then averages out by month.
if this isn't what your after, sample data w/ expected results would help better frame the question and I can making assumptions about what you need/want.
With CTE AS (
Select Year(dateAdded) YR , Month(DateAdded) MO, count(*) over (partition by Year(dateAdded), Month(dateAdded) order by DateAdded Asc) as RunningTotal
from System_Users su
join system_Employers se on se.employerid = su.employerid
Order by YR ASC, MO ASC)
Select avg(RunningTotal), mo from cte;

Fiscal Year To-Date in Where Clause (T-SQL)

Company's Fiscal Year: July 1 - June 30
I have a query where I am trying to capture aggregate # of units and $ revenue by product and cost center for the fiscal year-to-date. It will run on the 1st of the month and look through the last day of the previous month. Fiscal year does not appear in the report - it is criteria.
Mix of pseudocode and SQL:
Where
If datepart(mm,getdate()) - 1 < 7
THEN
transaction_post_date BETWEEN 7/1/ previous year AND dateadd(day,-(day(getdate()),getdate())
Else
transaction_post_date BETWEEN 7/1/current year AND dateadd(day,-(day(getdate()),getdate())
Am I on the write track? How do I write the SQL for a specific date on a year that depends on SQL - 7/1/current year?
I am weak using variables and do not even know if I have access to create them on the SQL Server DB, which is rather locked down. Definitely can't create a function. (I'm a business analyst.)
UPDATE, Fiscal year goes forward, so July 1, 2010, is Fiscal Year 2011.
I think this works:
Year(dateadd(month,6,htx.tx_post_date)) = Year(DateAdd(Month, 5, GetDate()))
Feeback?
And now I've been asked to add Fiscal Year-To-Date fields for quantity and revenue to the following query which gave me totals for
Select
inv.ITEM_CODE
, inventory.ITEM_NAME
, cc.COST_CENTER_CODE
, tx.REV_CODE_ID
, tx.PRICE
, tx.ITEM_SALE_ID
, sum(tx.quantity)
, sum(tx.amount)
from
transactions tx
inner join inventory inv on inv.item_id = tx.item_id
left outer join cost_center cc on cc.cost_center_id = tx.cost_center_id
where
DATEPART(mm, tx.tx_date) = DATEPART(mm,dateadd(m,-1,getdate()))
and DATEPART(yyyy, tx.tx_date) = DATEPART(yyyy,dateadd(m,-1,getdate()))
group by
inv.ITEM_CODE
, inventory.ITEM_NAME
, cc.COST_CENTER_CODE
, tx.REV_CODE_ID
, tx.PRICE
, tx.ITEM_SALE_ID
I need to add the fiscal year-to-date quantity and and amount columns to this report. Would a correlated subquery by the way to go? Would the joins be tricky? I've never used a subquery with an aggregation/grouping query.
Thanks for all the previous help.
Here is how I would do it if I needed to group by Fiscal Year:
Group by Year(DateAdd(Month, -6, TransactionDate))
May be not exactly it, but you get the idea.
I would add a calculated column to your table called FiscalYear (with the proper calculation) and select based on that column
I believe the easiest way is to do this in two steps. Use the WHERE Clause to filter your YTD and then a GROUP BY to group by FY. Since your FY begins in July(7) then increment the FY if the month is greater than June(6).
WHERE CLAUSE:
WHERE
DATEDIFF(DAY, transaction_post_date, Cast(Month(GetDate()) as varchar) +
'/' + Cast(Day(GetDate()) as varchar) + '/' + CAST(Case WHEN
MONTH(transaction_post_date) > 6 then YEAR(transaction_post_date) + 1 else
Year(transaction_post_date) end as varchar)) >=0
GROUP BY CLAUSE:
GROUP BY CASE WHEN MONTH(transaction_post_date) > 6 then
Year(transaction_post_date) + 1 else YEAR(transaction_post_date) end

Calculating Open incidents per month

We have Incidents in our system with Start Time and Finish Time and project name (and other info) .
We would like to have report: How many Incidents has 'open' status per month per project.
Open status mean: Not finished.
If incident is created in December 2009 and closed in March 2010, then it should be included in December 2009, January and February of 2010.
Needed structure should be like this:
Project Year Month Count
------- ------ ------- -------
Test 2009 December 2
Test 2010 January 10
Test 2010 February 12
....
In SQL Server:
SELECT
Project,
Year = YEAR(TimeWhenStillOpen),
Month = DATENAME(month, MONTH(TimeWhenStillOpen)),
Count = COUNT(*)
FROM (
SELECT
i.Project,
i.Incident,
TimeWhenStillOpen = DATEADD(month, v.number, i.StartTime)
FROM (
SELECT
Project,
Incident,
StartTime,
FinishTime = ISNULL(FinishTime, GETDATE()),
MonthDiff = DATEDIFF(month, StartTime, ISNULL(FinishTime, GETDATE()))
FROM Incidents
) i
INNER JOIN master..spt_values v ON v.type = 'P'
AND v.number BETWEEN 0 AND MonthDiff - 1
) s
GROUP BY Project, YEAR(TimeWhenStillOpen), MONTH(TimeWhenStillOpen)
ORDER BY Project, YEAR(TimeWhenStillOpen), MONTH(TimeWhenStillOpen)
Briefly, how it works:
The most inner subselect, that works directly on the Incidents table, simply kind of 'normalises' the table (replaces NULL finish times with the current time) and adds a month difference column, MonthDiff. If there can be no NULLs in your case, just remove the ISNULL expression accordingly.
The outer subselect uses MonthDiff to break up the time range into a series of timestamps corresponding to the months where the incident was still open, i.e. the FinishTime month is not included. A system table called master..spt_values is also employed there as a ready-made numbers table.
Lastly, the main select is only left with the task of grouping the data.
A useful technique here is to create either a table of "all" dates (clearly that would be infinite so I mean a sufficiently large range for your purposes) OR create two tables: one of all the months (12 rows) and another of "all" years.
Let's assume you go for the 1st of these:
create table all_dates (d date)
and populate as appropriate. I'm going to define your incident table as follows
create table incident
(
incident_id int not null,
project_id int not null,
start_date date not null,
end_date date null
)
I'm not sure what RDBMS you are using and date functions vary a lot between them so the next bit may need adjusting for your needs.
select
project_id,
datepart(yy, all_dates.d) as "year",
datepart(mm, all_dates.d) as "month",
count(*) as "count"
from
incident,
all_dates
where
incident.start_date <= all_dates.d and
(incident.end_date >= all_dates.d or incident.end_date is null)
group by
project_id,
datepart(yy, all_dates.d) year,
datepart(mm, all_dates.d) month
That is not going to quite work as we want as the counts will be for every day that the incident was open in each month. To fix this we either need to use a subquery or a temporary table and that really depends on the RDBMS...
Another problem with it is that, for open incidents it will show them against all future months in your all_dates table. adding a all_dates.d <= today solves that. Again, different RDBMSs have different methods of giving back now/today/systemtime...
Another approach is to have an all_months rather than all_dates table that just has the date of first of the month in it:
create table all_months (first_of_month date)
select
project_id,
datepart(yy, all_months.first_of_month) as "year",
datepart(mm, all_months.first_of_month) as "month",
count(*) as "count"
from
incident,
all_months
where
incident.start_date <= dateadd(day, -1, dateadd(month, 1, first_of_month)
(incident.end_date >= first_of_month or incident.end_date is null)
group by
project_id,
datepart(yy, all_months.first_of_month),
datepart(mm, all_months.first_of_month)