Best Practice in Scenario - sql

I am currently trying to accomplish the following:
get the Last Weekstamp for the last 6 Months, the following ilustrates how the end result might look like:
Month | Weekstamp |
2013-12| 2013-52 |
2014-01| 2014-05 |
.... and so on
I have a auxiliary Table, which has all Weeks in it and allows me to connect to a Calender Table, which in turn has all months, meaning i am able to get all weekstamps per Month,
but how do i get all of the Last Week Numbers for the Last 6 Months ?
my idea was a Temporary table of some sor (never used one, am a beginner when it Comes to SQL)
which calculates all of the Weekstamps needing to be filtered out per month, and than gives out only values which i could than use to filter a query which contains all the data i Need.
Anybody have a better idea?
As i said I am just a beginner so i can't really say what the best way would be
Thanks a lot in Advance!

My guess is that your challenge is determining what the last six months are. To do this you can use a tally table (spt_values) and DateDiff to determine when the last six months are.
You can also depending on which DB and version easily do this without a calander or weeks table.
This
WITH rnge
AS (SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number > 0
AND number < 7),
dates
AS (SELECT EOMONTH(Dateadd(m, number * -1, Getdate())) dt
FROM rnge)
SELECT Year(dt) year,
Month(dt) month,
Datepart(wk, dt) week
FROM dates
Produces this output
| YEAR | MONTH | WEEK |
|------|-------|------|
| 2014 | 1 | 5 |
| 2013 | 12 | 53 |
| 2013 | 11 | 48 |
| 2013 | 10 | 44 |
| 2013 | 9 | 40 |
| 2013 | 8 | 35 |
Demo
I'll leave it to you to format the values
This assumes SQL Server 2012 since it uses EOMONTH see Get the last day of the month in SQL for previous versions of SQL Server

Related

Rolling sum based on date (when dates are missing)

You may be aware of rolling the results of an aggregate over a specific number of preceding rows. I.e.: how many hot dogs did I eat over the last 7 days
SELECT HotDogCount,
DateKey,
SUM(HotDogCount) OVER (ORDER BY DateKey ROWS 6 PRECEDING) AS HotDogsLast7Days
FROM dbo.HotDogConsumption
Results:
+-------------+------------+------------------+
| HotDogCount | DateKey | HotDogsLast7Days |
+-------------+------------+------------------+
| 3 | 09/21/2020 | 3 |
| 2 | 9/22/2020 | 5 |
| 1 | 09/23/2020 | 6 |
| 1 | 09/24/2020 | 7 |
| 1 | 09/25/2020 | 8 |
| 4 | 09/26/2020 | 12 |
| 1 | 09/27/2020 | 13 |
| 3 | 09/28/2020 | 13 |
| 2 | 09/29/2020 | 13 |
| 1 | 09/30/2020 | 13 |
+-------------+------------+------------------+
Now, the problem I am having is when there are gaps in the dates. So, basically, one day my intestines and circulatory system are screaming at me: "What the heck are you doing, you're going to kill us all!!!" So, I decide to give my body a break for a day and now there is no record for that day. When I use the "ROWS 6 PRECEDING" method, I will now be reaching back 8 days, rather than 7, because one day was missed.
So, the question is, do any of you know how I could use the OVER clause to truly use a date value (something like "DATEADD(day,-7,DateKey)") to determine how many previous rows should be summed up for a true 7 day rolling sum, regardless of whether I only ate hot dogs on one day or on all 7 days?
Side note, to have a record of 0 for the days I didn't eat any hotdogs is not an option. I understand that I could use an array of dates and left join to it and do a
CASE WHEN Datekey IS NULL THEN 0 END
type of deal, but I would like to find out if there is a different way where the rows preceding value can somehow be determined dynamically based on the date.
Window functions are the right approach in theory. But to look back at the 7 preceding days (not rows), we need a range frame specification - which, unfornately, SQL Server does not support.
I am going to recommend a subquery, or a lateral join:
select hdc.*, hdc1.*
from dbo.HotDogConsumption hdc
cross apply (
select coalesce(sum(HotDogCount), 0) HotDogsLast7Days
from dbo.HotDogConsumption hdc1
where hdc1.datekey >= dateadd(day, -7, hdc.datekey)
and hdc1.datekey < hdc.datekey
) hdc1
You might want to adjust the conditions in the where clause of the subquery to the precise frame that you want. The above code computes over the last 7 days, not including today. Something equivalent to your current attempt would be like:
where hdc1.datekey >= dateadd(day, -6, hdc.datekey)
and hdc1.datekey <= hdc.datekey
I'm kind of old school, but this is how I'd go about it:
SELECT
HDC1.HotDogCount
,HDC1.DateKey
,( SELECT SUM( HDC2.HotDogCount )
FROM HotDogConsumption HDC2
WHERE HDC2.DateKey BETWEEN DATEADD( DD, -7, HDC1.DateKey )
AND HDC1.DateKey ) AS 'HotDogsLast7Days'
FROM
HotDogConsumption HDC1
;
Someone younger might use an OUTER APPLY or something.

SQL Query Extract Totals by Month for Multiple Date Fields

I have an Oracle Database I am trying to query multiple date fields by dates and get the totals by month and year as output.
This was my original query. This just gets what I want for the dates I want to input.
SELECT COUNT(*) as Total
FROM Some_Table s
WHERE (s.Start_DATE <= TO_Date ('2019/09/01', 'YYYY/MM/DD'))
AND (s.End_DATE IS NULL OR (s.End_DATE > TO_Date ('2019/08/31', 'YYYY/MM/DD')))
I would like to get an output where it gives me a count by Month and Year. The count would be the number between the Start_DATE (beginning of the month) and the End_DATE (end of the month).
I can't do
Edit: this was an example from another query and has no relation to the query above. I was just trying to provide an example of what I cannot do because I have two separate date fields. The example below was stating my knowledge of extracting month and year from a single date field. Sorry for the confusion.
SELECT extract(year from e.DATE_OCCURRED) as Year
,to_char(e.DATE_OCCURRED, 'MONTH') as Month
,count (*) as totals
because the Start_DATE and End_DATE are two separate fields.
Any help would be appreciated
Edit: Example would be
----------------------------------
| Name | Start_DATE | End_DATE |
----------------------------------
| John | 01/16/2018 | 07/09/2019 |
| Sue | 06/01/2015 | 09/01/2018 |
| Joe | 04/06/2016 | Null |
----------------------------------
I want to know my total number of workers that would have been working by month and year. Would want the output to look like.
------------------------
| Year | Month | Total |
------------------------
| 2016 | Aug | 2 |
| 2018 | May | 3 |
| 2019 | Aug | 2 |
------------------------
So I know I had two workers working in August 2016 and three in May 2018.
Do you want this?
SELECT count(*)
from some_table
where year(e.DATE_OCCURRED) > year(start_date)
and year(e.DATE_OCCURRED) < year(end_date)
and month(e.DATE_OCCURRED) > month(start_date)
and month(e.DATE_OCCURRED) < month(end_date)
note: using month and year functions is generally better when working with dates. If you convert to characters you might find that January comes after February (as an example) since J comes after F in the alphabet.
Are you looking for this?(Hoping that end_date > start_date)
select extract (year from end_dt2)- extract(YEAR from st_dt1) as YearDiff ,
extract (month from end_dt2)- extract (month from st_dt1) as monthDiff from tab;

How to write a SQL statement to sum data using group by the same day of every two neighboring months

I have a data table like this:
datetime data
-----------------------
...
2017/8/24 6.0
2017/8/25 5.0
...
2017/9/24 6.0
2017/9/25 6.2
...
2017/10/24 8.1
2017/10/25 8.2
I want to write a SQL statement to sum the data using group by the 24th of every two neighboring months in certain range of time such as : from 2017/7/20 to 2017/10/25 as above.
How to write this SQL statement? I'm using SQL Server 2008 R2.
The expected results table is like this:
datetime_range data_sum
------------------------------------
...
2017/8/24~2017/9/24 100.9
2017/9/24~2017/10/24 120.2
...
One conceptual way to proceed here is to redefine a "month" as ending on the 24th of each normal month. Using the SQL Server month function, we will assign any date occurring after the 24th as belonging to the next month. Then we can aggregate by the year along with this shifted month to obtain the sum of data.
WITH cte AS (
SELECT
data,
YEAR(datetime) AS year,
CASE WHEN DAY(datetime) > 24
THEN MONTH(datetime) + 1 ELSE MONTH(datetime) END AS month
FROM yourTable
)
SELECT
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), month) +
'/25~' +
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), (month + 1)) +
'/24' AS datetime_range,
SUM(data) AS data_sum
FROM cte
GROUP BY
year, month;
Note that your suggested ranges seem to include the 24th on both ends, which does not make sense from an accounting point of view. I assume that the month includes and ends on the 24th (i.e. the 25th is the first day of the next accounting period.
Demo
I would suggest dynamically building some date range rows so that you can then join you data to those for aggregation, like this example:
+----+---------------------+---------------------+----------------+
| | period_start_dt | period_end_dt | your_data_here |
+----+---------------------+---------------------+----------------+
| 1 | 24.04.2017 00:00:00 | 24.05.2017 00:00:00 | 1 |
| 2 | 24.05.2017 00:00:00 | 24.06.2017 00:00:00 | 1 |
| 3 | 24.06.2017 00:00:00 | 24.07.2017 00:00:00 | 1 |
| 4 | 24.07.2017 00:00:00 | 24.08.2017 00:00:00 | 1 |
| 5 | 24.08.2017 00:00:00 | 24.09.2017 00:00:00 | 1 |
| 6 | 24.09.2017 00:00:00 | 24.10.2017 00:00:00 | 1 |
| 7 | 24.10.2017 00:00:00 | 24.11.2017 00:00:00 | 1 |
| 8 | 24.11.2017 00:00:00 | 24.12.2017 00:00:00 | 1 |
| 9 | 24.12.2017 00:00:00 | 24.01.2018 00:00:00 | 1 |
| 10 | 24.01.2018 00:00:00 | 24.02.2018 00:00:00 | 1 |
| 11 | 24.02.2018 00:00:00 | 24.03.2018 00:00:00 | 1 |
| 12 | 24.03.2018 00:00:00 | 24.04.2018 00:00:00 | 1 |
+----+---------------------+---------------------+----------------+
DEMO
declare #start_dt date;
set #start_dt = '20170424';
select
period_start_dt, period_end_dt, sum(1) as your_data_here
from (
select
dateadd(month,m.n,start_dt) period_start_dt
, dateadd(month,m.n+1,start_dt) period_end_dt
from (
select #start_dt start_dt ) seed
cross join (
select 0 n union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
) m
) r
-- LEFT JOIN YOUR DATA
-- ON yourdata.date >= r.period_start_dt and data.date < r.period_end_dt
group by
period_start_dt, period_end_dt
Please don't be tempted to use "between" when it comes to joining to your data. Follow the note above and use yourdata.date >= r.period_start_dt and data.date < r.period_end_dt otherwise you could double count information as between is inclusive of both lower and upper boundaries.
I think the simplest way is to subtract 25 days and aggregate by the month:
select year(dateadd(day, -25, datetime)) as yr,
month(dateadd(day, -25, datetime)) as mon,
sum(data)
from t
group by dateadd(day, -25, datetime);
You can format yr and mon to get the dates for the specific ranges, but this does the aggregation (and the yr/mon columns might be sufficient).
Step 0: Build a calendar table. Every database needs a calendar table eventually to simplify this sort of calculation.
In this table you may have columns such as:
Date (primary key)
Day
Month
Year
Quarter
Half-year (e.g. 1 or 2)
Day of year (1 to 366)
Day of week (numeric or text)
Is weekend (seems redundant now, but is a huge time saver later on)
Fiscal quarter/year (if your company's fiscal year doesn't start on Jan. 1)
Is Holiday
etc.
If your company starts its month on the 24th, then you can add a "Fiscal Month" column that represents that.
Step 1: Join on the calendar table
Step 2: Group by the columns in the calendar table.
Calendar tables sound weird at first, but once you realize that they are in fact tiny even if they span a couple hundred years they quickly become a major asset.
Don't try to cheap out on disk space by using computed columns. You want real columns because they are much faster and can be indexed if necessary. (Though honestly, usually just the PK index is enough for even wide calendar tables.)

How to dynamically call date instead of hardcoding in WHERE clause?

In my code using SQL Server, I am comparing data between two months where I have the exact dates identified. I am trying to find if the value in a certain column changes in a bunch of different scenarios. That part works, but what I'd like to do is make it so that I don't have to always go back to change the date each time I wanted to get the results I'm looking for. Is this possible?
My thought was that adding a WITH clause, but it is giving me an aggregation error. Is there anyway I can go about making this date problem simpler? Thanks in advance
EDIT
Ok I'd like to clarify. In my WITH statement, I have:
select distinct
d.Date
from Database d
Which returns:
+------+-------------+
| | Date |
+------+-------------|
| 1 | 01-06-2017 |
| 2 | 01-13-2017 |
| 3 | 01-20-2017 |
| 4 | 01-27-2017 |
| 5 | 02-03-2017 |
| 6 | 02-10-2017 |
| 7 | 02-17-2017 |
| 8 | 02-24-2017 |
| 9 | ........ |
+------+-------------+
If I select this statement and execute, it will return just the dates from my table as shown above. What I'd like to do is be able to have sql that will pull from these date values and compare the last date value from one month to the last date value of the next month. In essence, it should compare the values from date 8 to values from date 4, but it should be dynamic enough that it can do the same for any two dates without much tinkering.
If I didn't misunderstand your request, it seems you need a numbers table, also known as a tally table, or in this case a calendar table.
Recommended post: https://dba.stackexchange.com/questions/11506/why-are-numbers-tables-invaluable
Basically, you create a table and populate it with numbers of year's week o start and end dates. Then join your main query to this table.
+------+-----------+----------+
| week | startDate | endDate |
+------+-----------+----------+
| 1 | 20170101 | 20170107 |
| 2 | 20170108 | 20170114 |
+------+-----------+----------+
Select b.week, max(a.data) from yourTable a
inner join calendarTable b
on a.Date between b.startDate and b.endDate
group by b.week
dynamic dates to filter by BETWEEN
select dateadd(m,-1,dateadd(day,-(datepart(day,cast(getdate() as date))-1),cast(getdate() as date))) -- 1st date of last month
select dateadd(day,-datepart(day,cast(getdate() as date)),cast(getdate() as date)) -- last date of last month
select dateadd(day,-(datepart(day,cast(getdate() as date))-1),cast(getdate() as date)) -- 1st date of current month
select dateadd(day,-datepart(day,dateadd(m,1,cast(getdate() as date))),dateadd(m,1,cast(getdate() as date))) -- last date of the month

SQL Query X Days back excluding date ranges (Confusing!)

Ok, I have a tough SQL query, and I'm not sure how to go about writing it.
I am summing the number of "bananas collected" by an employee within the last X days, but what I could really use help on is determining X.
The "last X days" value is defined to be the last 100 days that the employee was NOT out due to Purple Fever, starting from some ChosenDate (we'll say today, 6/24/14). That is to say, if the person was sick with Purple Fever for 3 days, then I want to look back over the last 103 days from ChosenDate rather than the last 100 days. Any other reason the employee may have been out does not affect our calculation.
Table PersonOutIncident
+----------------------+----------+-------------+
| PersonOutIncidentID | PersonID | ReasonOut |
+----------------------+----------+-------------+
| 1 | Sarah | PurpleFever |
| 2 | Sarah | PaperCut |
| 3 | Jon | PurpleFever |
| 4 | Sarah | PurpleFever |
+----------------------+----------+-------------+
Table PersonOutDetail
+-------------------+----------------------+-----------+-----------+
| PersonOutDetailID | PersonOutIncidentID | BeginDate | EndDate |
+-------------------+----------------------+-----------+-----------+
| 1 | 1 | 1/1/2014 | 1/3/2014 |
| 2 | 1 | 1/7/2014 | 1/13/2014 |
| 3 | 2 | 2/1/2014 | 2/3/2014 |
| 4 | 3 | 1/15/2014 | 1/20/2014 |
| 5 | 4 | 5/1/2014 | 5/15/2014 |
+-------------------+----------------------+-----------+-----------+
The tables are established. Many PersonOutDetail records can be associated with one PersonOutIncident record and there may be multiple PersonOutIncident records for a single employee (That is to say, there could be two or three PersonOutIncident records with an identical ReasonOut column, because they represent a particular incident or event and the not-necessarily-continuous days lost due to that particular incident)
The nature of this requirement complicates things, even conceptually to me.
The best I can think of is to check for a BeginDate/EndDate pair within the 100 day base period, then determine the number of days from BeginDate to EndDate and add that to the base 100 days. But then I would have to check again that this new range doesn't overlap or contain additional BeginDate/EndDate pairs and add, if so, add those days as well. I can tell already that this isn't the method I want to use, but I can't wrap my mind quite around how exactly what I need to start/structure this query. Does anyone have an idea that might steer me in the correct direction? I realize this might not be clear and I apologize if I'm just confusing things.
One way to do this is to work with a table or WITH CLAUSE that contains a list of days. Let's say days is a table with one column that contains the last 200 days. (This means the query will break if the employee had more than 100 sick days in the last 200 days).
Now you can get a list of all working days of an employee like this (replace ? with the employee id):
WITH t1 AS
(
SELECT day,
ROW_NUMBER() OVER (ORDER BY day DESC) AS 'RowNumber'
FROM days d
WHERE NOT EXISTS (SELECT * FROM PersonOutDetail pd
INNER JOIN PersonOutIncidentID po ON po.PersonOutIncidentID = pd.PersonOutIncidentID
WHERE d.day BETWEEN pd.BeginDate AND pd.EndDate
AND po.ReasonOut = 'PurpleFever'
AND po.PersonID = ?)
)
SELECT * FROM t1
WHERE RowNumber <= 100;
Alternatively, you can obtain the '100th day' by replacing RowNumber <= 100 with RowNumber = 100.