Select Month w/ Most Days in Multi Month Range in SQL - sql

Sorry if this has been asked, but I didn't find anything when searching: I have a large table of ~100k rows in SQL Server. Within each row is a date range, which in many cases spreads across multiple months (and years to a lesser extent). The ranges are typically about 30-35 days however they usually don't start at the the 1st of the month. An example of a typical date range is 01/10/2017-02/11/2017.
I'm looking for the most efficient way to output the month with the most days within in that range as it's own column. I'm doing the same thing for the year
Right now I have the following in my query:
SELECT DISTINCT
a.START_DATE,
a.END_DATE,
cast(month(dateadd(day, datediff(day, a.Start_Date, a.End_Date)/2, a.Start_Date)) as tinyint) as Main_Month,
cast(year(dateadd(day, datediff(day, a.Start_Date, a.End_Date)/2, a.Start_Date)) as smallint) as Main_Year
FROM TABLE
The output from that query using the above date range example would give me:
Start_Date: 01/10/2017 End_Date: 02/11/2017 Main_Month: 1 Main Year: 2017
That method has worked alright, but it slows when being done for all rows in the table. Are there any more efficient alternatives that I can use for the Main_Month and Main_Year columns?
EDIT IN RESPONSE TO COMMENTS:
By "month w/ the most days within a date range", using my example range of 01/10/2017-02/11/2017, since that range contains 21 days in January and only 11 in February, the output I'd get for Main_Month is 1. Also since the year is 2017 throughout the range the output I'd get for Main_Year is 2017
For ties, I'd go with the 1st month containing the max # of days. In the example of, 6/20/2017 - 9/5/2017 I'd go with 7, since that is the 1st month with 31 days in the range

As discused in the comments above, this solution is not perfectly accurate, but it mirrors the accuracy of the original slower solution:
SELECT b.START_DATE, b.END_DATE, month(b.mid_point) as Main_Month, year(b.midpoint) as Main_Year FROM
(SELECT DISTINCT
a.START_DATE,
a.END_DATE,
dateadd(day, datediff(day, a.Start_Date, a.End_Date)/2, a.Start_Date) as mid_Point
FROM a) as b
You should be able to speed it up by making these two changes. First, only compute the datediff and dateadd once, then take it from the derived table to get the two fields you need. Next, don't bother with casting, since Month() and year() both do that for you. Were you able to see a speed difference with this method?

I think using CASE statements could be more efficient, and if you can avoid casting, like this:
, CASE WHEN datediff(day, a.Start_Date, a.End_Date) + 1 - DAY(a.END_Date) < DAY(a.END_DATE) THEN MONTH(a.End_Date)
ELSE MONTH(a.Start_Date) END AS Main_Month
, CASE WHEN YEAR(a.END_Date) = YEAR(a.Start_Date) THEN YEAR(a.Start_Date)
WHEN datediff(day, a.Start_Date, a.End_Date) + 1 - DAY(a.END_Date) < DAY(a.END_DATE) THEN YEAR(a.End_Date)
ELSE YEAR(a.Start_Date) END AS Main_Year

Related

Counting working days between two dates Oracle

I'm trying to write a query which can count the number of working days between a payment being received and being processed, I started playing around with this for payments received in December 2017;
select unique trunc(date_received),
(case when trunc(date_received) in ('25-DEC-17','26-DEC-17') Then 0 when
to_char(date_received,'D') <6 Then 1 else 0 end) Working_day
from payments
where date_received between '01-DEC-17' and '31-dec-17'
order by trunc(date_received)
But to be honest, I'm at a loss as to how to take it further and add in date_processed and count the number of working days between date_processed and date_received... Any help would be much appreciated...
Maybe not the most optimal, but it works quite nicely, and it's easy to incorporate more complicated checks, like holidays. This query first generates all dates between the two dates, and then lets you filter out all the days that 'don't count'.
In this implementation I filtered out only weekend days, but it's quite easy to add checks for holidays and such.
with
-- YourQuery: I used a stub, but you can use your actual query here, which
-- returns a from date and to date. If you have multiple rows, you can also
-- output some id here, which can be used for grouping in the last step.
YourQuery as
(
select
trunc(sysdate - 7) as FromDate,
trunc(sysdate) as ToDate
from dual),
-- DaysBetween. This returns all the dates from the start date up to and
-- including the end date.
DaysBetween as
(
select
FromDate,
FromDate + level - 1 as DayBetween,
ToDate
from
YourQuery
connect by
FromDate + level - 1 <= ToDate)
-- As a last step, you can filter out all the days you want.
-- This default query only filters out Saturdays and Sundays, but you
-- could add a 'not exists' check that checks against a table with known
-- holidays.
select
count(*)
from
DaysBetween d
where
trim(to_char(DAYINBETWEEN, 'DAY', 'NLS_DATE_LANGUAGE=AMERICAN'))
not in ('SATURDAY', 'SUNDAY');

Get date for nth day of week in nth week of month

I have a column with values like '3rd-Wednesday', '2nd-Tuesday', 'Every-Thursday'.
I'd like to create a column that reads those strings, and determines if that date has already come this month, and if it has, then return that date of next month. If it has not passed yet for this month, then it would return the date for this month.
Expected results (on 4/22/16) from the above would be: '05-18-2016', '05-10-2016', '04-28-2016'.
I'd prefer to do it mathematically and avoid creating a calendar table if possible.
Thanks.
Partial answer, which is by no means bug free.
This doesn't cater for 'Every-' entries, but hopefully will give you some inspiration. I'm sure there are plenty of test cases this will fail on, and you might be better off writing a stored proc.
I did try to do this by calculating the day name and day number of the first day of the month, then calculating the next wanted day and applying an offset, but it got messy. I know you said no date table but the CTE simplifies things.
How it works
A CTE creates a calendar for the current month of date and dayname. Some rather suspect parsing code pulls the day name from the test data and joins to the CTE. The where clause filters to dates greater than the Nth occurrence, and the select adds 4 weeks if the date has passed. Or at least that's the theory :)
I'm using DATEFROMPARTS to simplify the code, which is a SQL 2012 function - there are alternatives on SO for 2008.
SELECT * INTO #TEST FROM (VALUES ('3rd-Wednesday'), ('2nd-Tuesday'), ('4th-Monday')) A(Value)
SET DATEFIRST 1
;WITH DAYS AS (
SELECT
CAST(DATEADD(MONTH,DATEDIFF(MONTH,0,GETDATE()),N.Number) AS DATE) Date,
DATENAME(WEEKDAY, DATEADD(MONTH,DATEDIFF(MONTH,0,GETDATE()),N.Number)) DayName
FROM master..spt_values N WHERE N.type = 'P' AND N.number BETWEEN 0 AND 31
)
SELECT
T.Value,
CASE WHEN MIN(D.Date) < GETDATE() THEN DATEADD(WEEK, 4, MIN(D.DATE)) ELSE MIN(D.DATE) END Date
FROM #TEST T
JOIN DAYS D ON REVERSE(SUBSTRING(REVERSE(T.VALUE), 1, CHARINDEX('-', REVERSE(T.VALUE)) -1)) = D.DayName
WHERE D.Date >=
DATEFROMPARTS(
YEAR(GETDATE()),
MONTH(GETDATE()),
1+ 7*(CAST(SUBSTRING(T.Value, 1,1) AS INT) -1)
)
GROUP BY T.Value
Value Date
------------- ----------
2nd-Tuesday 2016-05-10
3rd-Wednesday 2016-05-18
4th-Monday 2016-04-25

Sum of shifting range in SQL Query

I am trying to write an efficient query to get the sum of the previous 7 days worth of values from a relational DB table, and record each total against the final date in the 7 day period (e.g. the 'WeeklyTotals Table' in the example below). For example, in my WeeklyTotals query, I would like the value for February 15th to be 333, since that is the total sum of users from Feb 9th - Feb 15th, and so on:
I have a base query which gets me my previous weeks users for today's date (simplified for the sake of the example):
SELECT Date, Sum("Total Users")
FROM "UserRecords"
WHERE (dateadd(hour, -8, "UserRecords"."Date") BETWEEN
dateadd(hour, -8, sysdate) - INTERVAL '7 DAY' AND dateadd(hour, -8, sysdate);
The problem is, this only get's me the total for today's date. I need a query which will get me this information for the previous seven days.
I know I can make a view for each date (since I only need the previous seven entries) and join them all together, but that seems really inefficient (I'll have to create/update 7 views, and then do all the inner join operations). I am wondering if there's a more efficient way to achieve this.
Provided there are no gaps, you can use a running total with SUM OVER including the six previous rows. Use ROW_NUMBER to exclude the first six records, as their totals don't represent complete weeks.
select log_date, week_total
from
(
select
log_date,
sum(total_users) over (order by log_date rows 6 preceding) as week_total,
row_number() over (order by log_date) as rn
from mytable
where log_date > 0
)
where rn >= 7
order by log_date;
UPDATE: In case there are gaps, it should be
sum(total_users) over (order by log_date range interval '6' day preceding)
but I don't know whether PostgreSQL supports this already. (Moreover the ROW_NUMBER exclusion wouldn't work then and would have to be replaced by something else.)
Here's a a query that self joins to the previous 6 days and sums the value to get the weekly totals:
select u1.date, sum(u2.total_users) as weekly_users
from UserRecords u1
join UserRecords u2
on u1.date - u2.date < 7
and u1.date >= u2.date
group by u1.date
order by u1.date
You can use the SUM over Window function, with the expression using Date Part, of week.
Self joins are much slower than Window functions.

How does this aggregated query find the correct values

I am having trouble understanding this code. It actually seems to work, but I don't understand how the correct value for activity year and month is "found" get the proper min and max? Or is it running all permutations getting the highest? This is very strange to me.
I do understand how the dateadd works, just not how the query is actually working on the whole. This may be a bad question since I don't actually need help solving a problem, just insight into why this works.
select
EmployeeNumber,
sum(BaseCalculation) as BaseCalculation,
min(dateadd(mm, (ActivityYear - 1900) * 12 + ActivityMonth - 1 , 0)) as StartDate,
max(dateadd(mm, (ActivityYear - 1900) * 12 + ActivityMonth - 1 , 0)) as EndDate
from
Compensation
where
1=1
-- and
group by
EmployeeNumber
for both the min and max function call, the algorithm is
dateadd(mm, (ActivityYear - 1900) * 12 + ActivityMonth - 1 , 0)
Your query compute all possible date from the Compensation table using this algorithm. Then, you select the minimum date as StartDate and the maximum as EndDate.
This is how the proper max and min are returned.
Note that the dateadd signature is DATEADD (datepart , number , date )
Since the last parameter is 0, you are addind to month(mm) the number calculated in the algorithm, and return the corresponding date starting from 0.
Check this out for more information : https://msdn.microsoft.com/en-us/library/ms186819.aspx
It is converting the columns ActivityYear and ActivityMmonth to a date. It is doing so by counting the number of months since 1900 and adding them to time zero. So, Jan 2000 would become something like Jan, 100. This seems like a very arcane calculation, because dates that are about 2,000 years old are not really useful.
Of course, this assumes that ActivityYear is a recognizable recent year.
I would convert the year and month to the first day of the beginning of the month, with something like this:
min(cast(cast(ActivityYear * 10000 + ActivityMonth + 1 as varchar(255)) as date)
Sql Server will calculate every value of that statement, and then only return the min and max.
Although I cannot say for sure that sql server executes this way internally, the way I think about it I imagine that the engine strips off the group by and all the aggregate functions, runs that query. And then just sums/finds the min etc off of that.

Sum of values per month divided by days of month

I have a Sales table:
id_item, quantity, date_time
What I need is to sum the items sold in a month and divide them by the days of the month from a selected period of months.
Example - The user selects the dates of Oct 1 to Dec 31. I need to show the items_sold/days_of_month:
Month Items sold Days of month Items/Day
Sep 25 30 0.83333
Oct 36 31 1.16
Dec 15 31 0.4838
I have to specify by Kind of item. the kind is obtained from another table called Items. I use dateformat dd/mm/yy.
select
month(date_time),
sum(quantity) / (select(datepart(dd,getdate())))
from
sales v
join items a on v.id_item=a.id_item
where
a.kind='Kind of Item'
and cast(Convert(varchar(10), date_time, 112) as datetime)
between '01/10/2012' and '31/12/2012'
group by
month(date_time)
My problem is selecting the days of the months, how can I select x number of months and divide the sum(quantity) of each month by the days of each?
I know this part of the code only selects the days of the current month:
(select(datepart(dd,getdate())))
Try this on for size:
DECLARE
#FromDate datetime,
#ToDate date; -- inclusive
SET #FromDate = DateAdd(month, DateDiff(month, 0, '20121118'), 0);
SET #ToDate = DateAdd(month, DateDiff(month, 0, '20121220') + 1, 0);
SELECT
Year = Year(S.date_time),
Month = Month(S.date_time),
QtyPerDay =
Sum(s.quantity) * 1.0
/ DateDiff(day, M.MonthStart, DateAdd(month, 1, M.MonthStart))
FROM
dbo.Sales S
INNER JOIN dbo.Items I
ON S.id_item = I.id_item
CROSS APPLY (
SELECT MonthStart = DateAdd(month, DateDiff(month, 0, S.date_time), 0)
) M
WHERE
I.kind = 'Kind of Item'
AND S.date_time >= #FromDate
AND S.date_time < #ToDate
GROUP BY
Year(S.date_time),
Month(S.date_time),
M.MonthStart
It will select any full month that is partially enclosed by the FromDate and ToDate. The * 1.0 part is required if the quantity column is an integer, otherwise you will get an integer result instead of a decimal one.
Some stylistic notes:
Do NOT use string date conversion on a column to ensure you get whole days. This will completely prevent any index from being used, require more CPU, and furthermore is unclear (what does style 112 do again!?!?). To enclose full date periods, use what I showed in my query of DateCol >= StartDate and DateCol < OneMoreThanEndDate. Do a search for "sargable" to understand a very key concept here. A very safe and valuable general rule is to never put a column inside an expression if the condition can be rewritten to avoid it.
It is good that you're aliasing your tables, but you should use those aliases throughout the query for each column, as I did in my query. I recognize that the aliases V and A came from another language so they make sense there--just in general try to use aliases that match the table names.
Do include the schema name on your objects. Not doing so is not a huge no-no, but there are definite benefits and it is best practice.
When you ask a question it is helpful to explain all the logic so people don't have to guess or ask you--if you know (for example) that users can input mid-month dates but you need whole months then please indicate that in your question and state what needs to be done.
Giving the version of SQL server helps us zero in on the syntax required, as prior versions are less expressive. By telling us the version we can give you the best query possible.
Note: there is nothing wrong with putting the date calculation math in the query itself (instead of using SET to do it). But I figured you would be encoding this in a stored procedure and if so, using SET is just fine.