Modification todate dimension in SQL Server - sql

I need a suggestion around one of the columns that I'm creating in the Date dimension in SQL Server, basically rolling weeks..
I have a table dimDate in my datawarehouse.
I want to create a column in the dimdate table which will have week number in any year and each week should have 7 days.
For eg: In year 2015 there are 53 weeks but the 53rd week has only 5 days (because the week starts on Sunday in SQL Server I guess).
I want to include 2 more days from 2016 (1st and 2nd Jan in 2016) to complete the 53rd week with 7 days and also the the 1st week in 2016 should start on 3rd of Jan 2016, so on and so forth.
If there are any suggestions that will be great to start with.

Assuming that you already have weeks populated (but not extended into the next year), and making some assumptions about columns names
This query finds the last week in a year (which would almost always always be 53 but don't count on it:) and the date that it ends on
SELECT YearNo, MAX(Week) As Week, MAX(DateKey) As DateKey
FROM dimDate
GROUP BY YearNo
This query finds all weeks that are shorter than 7 days, and how many extra days are required to make them 7 days.
SELECT
YearNo,
Week,
7-COUNT(DISTINCT DateKey) As ExtraDaysRequired
FROM dimDate
GROUP BY YearNo, Week
HAVING COUNT(DISTINCT DateKey) < 7
This might always be the last week of the year but lets not make assumptions.
Lets combine these to find all final weeks that have less than 7 days, as well as add the number of days required:
SELECT
Under7Days.YearNo, Under7Days.Week, Under7Days.ExtraDaysRequired,
FinalWeeks.DateKey StartDate,
DATEADD(d,Under7Days.ExtraDaysRequired,FinalWeeks.DateKey) EndDate
FROM
(
SELECT YearNo, MAX(Week) As Week, MAX(DateKey) As DateKey
FROM dimDate
GROUP BY YearNo
) As FinalWeeks
INNER JOIN
(
SELECT YearNo, Week, 7-COUNT(DISTINCT DateKey) As ExtraDaysRequired
FROM dimDate
GROUP BY YearNo, Week
HAVING COUNT(DISTINCT DateKey) < 7
) As Under7Days
ON FinalWeeks.Week = Under7Days.Week
AND FinalWeeks.YearNo = Under7Days.YearNo
So we have a query that identifies the start date and end date and week number that it needs to be updated to. So now we run an update:
UPDATE TGT
SET Week = SRC.Week
FROM dimDate TGT
INNER JOIN
(
SELECT
Under7Days.YearNo, Under7Days.Week, Under7Days.ExtraDaysRequired,
FinalWeeks.DateKey StartDate,
DATEADD(d,Under7Days.ExtraDaysRequired,FinalWeeks.DateKey) EndDate
FROM
(
SELECT YearNo, MAX(Week) As Week, MAX(DateKey) As DateKey
FROM dimDate
GROUP BY YearNo
) As FinalWeeks
INNER JOIN
(
SELECT YearNo, Week, 7-COUNT(DISTINCT DateKey) As ExtraDaysRequired
FROM dimDate
GROUP BY YearNo, Week
HAVING COUNT(DISTINCT DateKey) < 7
) As Under7Days
ON FinalWeeks.Week = Under7Days.Week
AND FinalWeeks.YearNo = Under7Days.YearNo
) SRC
ON TGT.DateID BETWEEN SRC.StartDate AND SRC.EndDate
Looks complicated? There's half a dozen ways to write the same thing but this approach is step-by-step. You could probably write a windowing function to do the same thing but I leave that as an exercise for someone else.

Related

Remove Duplicates and show Total sales by year and month

i am trying to work with this query to produce a list of all 11 years and 12 months within the years with the sales data for each month. Any suggestions? this is my query so far.
SELECT
distinct(extract(year from date)) as year
, sum(sale_dollars) as year_sales
from `project-1-349215.Dataset.sales`
group by date
it just creates a long list of over 2000 results when i am expecting 132 max one for each month in the years.
You should change your group by statement if you have more results than you expected.
You can try:
group by YEAR(date), MONTH(date)
or
group by EXTRACT(YEAR_MONTH FROM date)
A Grouping function is for takes a subsection of the date in your case year and moth and collect all rows that fit, and sum it up,
So a sĀ“GROUp BY date makes no sense, what so ever as you don't want the sum of every day
So make this
SELECT
extract(year from date) as year
,extract(MONTH from date) as month
, sum(sale_dollars) as year_sales
from `project-1-349215.Dataset.sales`
group by 1,2
Or you can combine both year and month
SELECT
extract(YEAR_MONTH from date) as year
, sum(sale_dollars) as year_sales
from `project-1-349215.Dataset.sales`
group by 1

SQL Server / SSRS: Calculating monthly average based on grouping and historical values

I need to calculate an average based on historical data for a graph in SSRS:
Current Month
Previous Month
2 Months ago
6 Months ago
This query returns the average for each month:
SELECT
avg_val1, month, year
FROM
(SELECT
(sum_val1 / count) as avg_val1, month, year
FROM
(SELECT
SUM(val1) AS sum_val1, SUM(count) AS count, month, year
FROM
(SELECT
COUNT(val1) AS count, SUM(val1) AS val1,
MONTH([SnapshotDate]) AS month,
YEAR([SnapshotDate]) AS year
FROM
[DC].[dbo].[KPI_Values]
WHERE
[SnapshotKey] = 'Some text here'
AND No = '001'
AND Channel = '999'
GROUP BY
[SnapshotDate]) AS sub3
GROUP BY
month, year, count) AS sub2
GROUP BY sum_val1, count, month, year) AS sub1
ORDER BY
year, month ASC
When I add the following WHERE clause I get the average for March (2 months ago):
WHERE month = MONTH(GETDATE())-2
AND year = YEAR(GETDATE())
Now the problem is when I want to retrieve data from 6 months ago; MONTH(GETDATE()) - 6 will output -1 instead of 12. I also have an issue with the fact that the year changes to 2016 and I am a bit unsure of how to implement the logic in my query.
I think I might be going about this wrong... Any suggestions?
Subtract the months from the date using the DATEADD function before you do your comparison. Ex:
WHERE SnapshotDate BETWEEN DATEADD(month, -6, GETDATE()) AND GETDATE()
MONTH(GETDATE()) returns an int so you can go to 0 or negative values. you need a user scalar function managing this, adding 12 when <= 0

Find previous equivalent dates over the past two calender years

If today is say 15th August 2012 then the query should return the following
15/01/2011,
15/02/2011,
...
...
15/07/2012
15/08/2012
If today is 31st August 2012 then the query would return
31/01/2011,
28/02/2011, <<<<this is the nearest date
...
...
31/07/2012
31/08/2012
We have a vw_DimDate in our Warehouse which should help
edit
It contains the following fields
Currently I'm using the following but it seems rather convoluted! ...
DECLARE #Dt DATETIME = '31 JUL 2012'--GETDATE()
;WITH DateSet_cte(DayMarker)
AS
(
SELECT DayMarker
FROM WHData.dbo.vw_DimDate
WHERE
DayMarker >= CONVERT(DATETIME,CONVERT(CHAR(4),DATEADD(YEAR,-1,#Dt),112) + '0101') AND
DayMarker <=#Dt
)
, MaxDate_cte(MaxDate)
AS
(
SELECT [MaxDate] = MAX(DayMarker)
FROM DateSet_cte
)
SELECT
[Mth] = CONVERT(DATETIME,CONVERT(CHAR(6),a.DayMarker,112) + '01')
, MAX(a.DayMarker) [EquivDate]
FROM DateSet_cte a
WHERE DAY(a.DayMarker) <= (SELECT DAY([MaxDate]) FROM MaxDate_cte)
GROUP BY CONVERT(DATETIME,CONVERT(CHAR(6),a.DayMarker,112) + '01')
;with Numbers as (
select distinct number from master..spt_values where number between 0 and 23
), Today as (
select CONVERT(date,CURRENT_TIMESTAMP) as d
)
select
DATEADD(month,-number,d)
from
Numbers,Today
where DATEPART(year,DATEADD(month,-number,d)) >= DATEPART(year,d) - 1
Seems odd to want a variable number of returned values based on how far through the year we are, but that's what I've implemented.
When you use DATEADD to add months to a value, then it automatically adjusts the day number if it would have produced an out of range date (e.g. 31st February), such that it's the last day of the month. Or, as the documentation puts it:
If datepart is month and the date month has more days than the return month and the date day does not exist in the return month, the last day of the return month is returned.
Of course, if you already have a numbers table in your database, you can eliminate the first CTE. You mentioned that you "have a vw_DimDate in our Warehouse which should help", but since I have no idea on what that (presumably, a) view contains, it wasn't any help.

Return just the last day of each month with SQL

I have a table that contains multiple records for each day of the month, over a number of years. Can someone help me out in writing a query that will only return the last day of each month.
SQL Server (other DBMS will work the same or very similarly):
SELECT
*
FROM
YourTable
WHERE
DateField IN (
SELECT MAX(DateField)
FROM YourTable
GROUP BY MONTH(DateField), YEAR(DateField)
)
An index on DateField is helpful here.
PS: If your DateField contains time values, the above will give you the very last record of every month, not the last day's worth of records. In this case use a method to reduce a datetime to its date value before doing the comparison, for example this one.
The easiest way I could find to identify if a date field in the table is the end of the month, is simply adding one day and checking if that day is 1.
where DAY(DATEADD(day, 1, AsOfDate)) = 1
If you use that as your condition (assuming AsOfDate is the date field you are looking for), then it will only returns records where AsOfDate is the last day of the month.
Use the EOMONTH() function if it's available to you (E.g. SQL Server). It returns the last date in a month given a date.
select distinct
Date
from DateTable
Where Date = EOMONTH(Date)
Or, you can use some date math.
select distinct
Date
from DateTable
where Date = DATEADD(MONTH, DATEDIFF(MONTH, -1, Date)-1, -1)
In SQL Server, this is how I usually get to the last day of the month relative to an arbitrary point in time:
select dateadd(day,-day(dateadd(month,1,current_timestamp)) , dateadd(month,1,current_timestamp) )
In a nutshell:
From your reference point-in-time,
Add 1 month,
Then, from the resulting value, subtract its day-of-the-month in days.
Voila! You've the the last day of the month containing your reference point in time.
Getting the 1st day of the month is simpler:
select dateadd(day,-(day(current_timestamp)-1),current_timestamp)
From your reference point-in-time,
subtract (in days), 1 less than the current day-of-the-month component.
Stripping off/normalizing the extraneous time component is left as an exercise for the reader.
A simple way to get the last day of month is to get the first day of the next month and subtract 1.
This should work on Oracle DB
select distinct last_day(trunc(sysdate - rownum)) dt
from dual
connect by rownum < 430
order by 1
I did the following and it worked out great. I also wanted the Maximum Date for the Current Month. Here is what I my output is. Notice the last date for July which is 24th. I pulled it on 7/24/2017, hence the result
Year Month KPI_Date
2017 4 2017-04-28
2017 5 2017-05-31
2017 6 2017-06-30
2017 7 2017-07-24
SELECT B.Year ,
B.Month ,
MAX(DateField) KPI_Date
FROM Table A
INNER JOIN ( SELECT DISTINCT
YEAR(EOMONTH(DateField)) year ,
MONTH(EOMONTH(DateField)) month
FROM Table
) B ON YEAR(A.DateField) = B.year
AND MONTH(A.DateField) = B.Month
GROUP BY B.Year ,
B.Month
SELECT * FROM YourTableName WHERE anyfilter
AND "DATE" IN (SELECT MAX(NameofDATE_Column) FROM YourTableName WHERE
anyfilter GROUP BY
TO_CHAR(NameofDATE_Column,'MONTH'),TO_CHAR(NameofDATE_Column,'YYYY'));
Note: this answer does apply for Oracle DB
Here's how I just solved this. day_date is the date field, calendar is the table that holds the dates.
SELECT cast(datepart(year, day_date) AS VARCHAR)
+ '-'
+ cast(datepart(month, day_date) AS VARCHAR)
+ '-'
+ cast(max(DATEPART(day, day_date)) AS VARCHAR) 'DATE'
FROM calendar
GROUP BY datepart(year, day_date)
,datepart(month, day_date)
ORDER BY 1

Calculating Open incidents per month

We have Incidents in our system with Start Time and Finish Time and project name (and other info) .
We would like to have report: How many Incidents has 'open' status per month per project.
Open status mean: Not finished.
If incident is created in December 2009 and closed in March 2010, then it should be included in December 2009, January and February of 2010.
Needed structure should be like this:
Project Year Month Count
------- ------ ------- -------
Test 2009 December 2
Test 2010 January 10
Test 2010 February 12
....
In SQL Server:
SELECT
Project,
Year = YEAR(TimeWhenStillOpen),
Month = DATENAME(month, MONTH(TimeWhenStillOpen)),
Count = COUNT(*)
FROM (
SELECT
i.Project,
i.Incident,
TimeWhenStillOpen = DATEADD(month, v.number, i.StartTime)
FROM (
SELECT
Project,
Incident,
StartTime,
FinishTime = ISNULL(FinishTime, GETDATE()),
MonthDiff = DATEDIFF(month, StartTime, ISNULL(FinishTime, GETDATE()))
FROM Incidents
) i
INNER JOIN master..spt_values v ON v.type = 'P'
AND v.number BETWEEN 0 AND MonthDiff - 1
) s
GROUP BY Project, YEAR(TimeWhenStillOpen), MONTH(TimeWhenStillOpen)
ORDER BY Project, YEAR(TimeWhenStillOpen), MONTH(TimeWhenStillOpen)
Briefly, how it works:
The most inner subselect, that works directly on the Incidents table, simply kind of 'normalises' the table (replaces NULL finish times with the current time) and adds a month difference column, MonthDiff. If there can be no NULLs in your case, just remove the ISNULL expression accordingly.
The outer subselect uses MonthDiff to break up the time range into a series of timestamps corresponding to the months where the incident was still open, i.e. the FinishTime month is not included. A system table called master..spt_values is also employed there as a ready-made numbers table.
Lastly, the main select is only left with the task of grouping the data.
A useful technique here is to create either a table of "all" dates (clearly that would be infinite so I mean a sufficiently large range for your purposes) OR create two tables: one of all the months (12 rows) and another of "all" years.
Let's assume you go for the 1st of these:
create table all_dates (d date)
and populate as appropriate. I'm going to define your incident table as follows
create table incident
(
incident_id int not null,
project_id int not null,
start_date date not null,
end_date date null
)
I'm not sure what RDBMS you are using and date functions vary a lot between them so the next bit may need adjusting for your needs.
select
project_id,
datepart(yy, all_dates.d) as "year",
datepart(mm, all_dates.d) as "month",
count(*) as "count"
from
incident,
all_dates
where
incident.start_date <= all_dates.d and
(incident.end_date >= all_dates.d or incident.end_date is null)
group by
project_id,
datepart(yy, all_dates.d) year,
datepart(mm, all_dates.d) month
That is not going to quite work as we want as the counts will be for every day that the incident was open in each month. To fix this we either need to use a subquery or a temporary table and that really depends on the RDBMS...
Another problem with it is that, for open incidents it will show them against all future months in your all_dates table. adding a all_dates.d <= today solves that. Again, different RDBMSs have different methods of giving back now/today/systemtime...
Another approach is to have an all_months rather than all_dates table that just has the date of first of the month in it:
create table all_months (first_of_month date)
select
project_id,
datepart(yy, all_months.first_of_month) as "year",
datepart(mm, all_months.first_of_month) as "month",
count(*) as "count"
from
incident,
all_months
where
incident.start_date <= dateadd(day, -1, dateadd(month, 1, first_of_month)
(incident.end_date >= first_of_month or incident.end_date is null)
group by
project_id,
datepart(yy, all_months.first_of_month),
datepart(mm, all_months.first_of_month)