I need some ideas regarding an efficient way of creating rows per each count on a frequency column on SQL. (SQL Server 2016)
The data:
I have a table with the dates people called in sick and how many days they said they were gonna be absent:
BEGIN_DATE DAYS_SICK
2011-01-01 00:00:00.000 3
2011-01-01 00:00:00.000 3
2011-01-01 00:00:00.000 1
2011-01-02 00:00:00.000 2
2011-01-02 00:00:00.000 3
2011-01-04 00:00:00.000 4
2011-01-04 00:00:00.000 4
2011-01-04 00:00:00.000 3
I want to translate this to a table where each row represents a day in the year and I count the number of people that are sick that day.
DATE PEOPLE_SICK
2011-01-01 00:00:00.000 3
2011-01-02 00:00:00.000 4
2011-01-03 00:00:00.000 4
2011-01-04 00:00:00.000 4
2011-01-05 00:00:00.000 3
2011-01-06 00:00:00.000 3
2011-01-07 00:00:00.000 2
So for example:
For 2011-01-01 there were 3 persons that called in sick, 2 called in sick for 3 days and one only for that day. The output is 3.
Now on 2011-01-02 another 2 (different) persons called in sick but there were 2 persons from the day before that said they were gonna miss that day too so the output is 4.
No person called in sick on 2011-01-03 but there were 2 persons from 2 days ago that said they were gonna miss that day plus 2 persons from the day before. The output is 4.
Etc...
I am currently doing this by iterating through each of the rows in the input and then looping over the frequencies, adding or updating rows on the new table as necessary but it takes an obscene amount of time.
Is there any other way of doing this more efficiently?
This doesn't deal with weekends at all but can get you started. Also if there were a query that ran often I would build a DATE DIM table and use it instead of the Dates CTE. Where I got the DATE DIM code from.
CREATE TABLE #test (ID int IDENTITY(1,1), BEGIN_DATE datetime, DAYS_SICK int);
DECLARE #StartDate datetime = '2011-01-01'
, #CutoffDate datetime = '2011-01-10';
INSERT INTO #test (BEGIN_DATE, DAYS_SICK)
VALUES
('2011-01-01 00:00:00.000', 3),
('2011-01-01 00:00:00.000', 3),
('2011-01-01 00:00:00.000', 1),
('2011-01-02 00:00:00.000', 2),
('2011-01-02 00:00:00.000', 3),
('2011-01-04 00:00:00.000', 4),
('2011-01-04 00:00:00.000', 4),
('2011-01-04 00:00:00.000', 3);
WITH Dates
AS (SELECT d
FROM (
SELECT d = DATEADD(DAY, rn - 1, #StartDate)
FROM (SELECT TOP (DATEDIFF(DAY, #StartDate, #CutoffDate)) rn = ROW_NUMBER() OVER (
ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
ORDER BY s1.[object_id]
) AS x
) AS y
)
,SickRanges
AS (
SELECT BEGIN_DATE
,DATEADD(DAY, DAYS_SICK - 1, BEGIN_DATE) END_DATE
FROM #test
)
SELECT d.d [DATE]
,count(1) PEOPLE_SICK
FROM SickRanges sr
JOIN Dates d ON d.d BETWEEN sr.BEGIN_DATE AND sr.END_DATE
GROUP BY d.d
ORDER BY d.d
DROP TABLE #test
Related
I am working on some reporting module, where I need to implement the logic which gets a date as below cases -
My table :-
Id
Day
1
8
2
14
3
22
4
29
Now I have to write a query to get result as below -
Case 1- If current date (GETDATE()) is 2022-9-5 00:00:00.000
result
2022-9-8 00:00:00.000
2022-9-14 00:00:00.000
2022-9-22 00:00:00.000
2022-9-29 00:00:00.000
Case 2- If current date (GETDATE()) is 2022-9-16 00:00:00.000
result
2022-10-8 00:00:00.000
2022-10-14 00:00:00.000
2022-9-22 00:00:00.000
2022-9-29 00:00:00.000
Note : The query should work with any month / year.
select dateadd(day, day, eomonth(getdate(), case when day < datepart(day, getdate()) then 0 else -1 end)) as result
from t
result
2022-10-08 00:00:00.000
2022-10-14 00:00:00.000
2022-09-22 00:00:00.000
2022-09-29 00:00:00.000
Fiddle
Some DATEADD willhep, as you first need to know, the first daty of the next month and then you can add the das from your table
The moth seledcted will be determined if the day of the Selct run is smaller than the day in the table
CREATE TABLE table1
([Id] int, [Day] int)
;
INSERT INTO table1
([Id], [Day])
VALUES
(1, 8),
(2, 14),
(3, 22),
(4, 29)
;
4 rows affected
SELECT getdate()
(No column name)
2022-09-20 18:55:21.917
SELECT
DATEADD(DAY, [Day] -1,DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE())+
(CASE WHEN DAY(GETDATE()) < [Day] THEN 0 ELSE 1 END), 0))
FROM table1
(No column name)
2022-10-08 00:00:00.000
2022-10-14 00:00:00.000
2022-09-22 00:00:00.000
2022-09-29 00:00:00.000
fiddle
An IF ELSE is probably what you need
https://learn.microsoft.com/en-us/sql/t-sql/language-elements/if-else-transact-sql?view=sql-server-ver16
So if the day is greater than the day in the other table, add one month to the date.
I want to know how many people weren't available in months historically, for that I have an historicTable which contains data from 2012 to 2018 and each row contains how much time an employee wasn't available (vacations, sickness, etc.) this is one example:
idUser startDate endDate daysUn reason nameEmp
--------------------------------------------------------
123 25/01/2018 09/02/2018 12 Sickness John Doe
This is what I need for every row
idUser startDate endDate daysUn reason nameEmp
--------------------------------------------------------
123 25/01/2018 31/01/2018 5 Sickness John Doe
123 01/01/2018 09/02/2018 7 Sickness John Doe
I know this been asked hundred of times here but I'm having trouble doing this for an entire table, for what I've tried in different answers all process work for specific given startdate and enddate columns, and what I need it's to append ALL data to this table and save it as-is so the analyst will be able to study specific cases and specific employees. This is what I get with my current code:
original_INI original_FIN new_INI new_FIN
----------------------- ----------------------- ----------------------- -----------------------
2017-10-15 00:00:00.000 2018-01-06 00:00:00.000 2017-10-15 00:00:00.000 2017-10-31 00:00:00.000
2017-10-15 00:00:00.000 2018-01-06 00:00:00.000 2017-11-01 00:00:00.000 2017-11-30 00:00:00.000
2017-10-15 00:00:00.000 2018-01-06 00:00:00.000 2017-12-01 00:00:00.000 2017-12-31 00:00:00.000
2017-10-15 00:00:00.000 2018-01-06 00:00:00.000 2018-01-01 00:00:00.000 2018-01-06 00:00:00.000
This is the code, original dates are ok as I can sort data more globally but it could print and save the rest of the data so it's more readable:
;WITH n(n) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY [object_id])-1 FROM sys.all_columns
),
d(n,f,t,md,bp,ep) AS
(
SELECT n.n, d.INI, d.FIN,
DATEDIFF(MONTH, d.INI, d.FIN),
DATEADD(MONTH, n.n, DATEADD(DAY, 1-DAY(INI), INI)),
DATEADD(DAY, -1, DATEADD(MONTH, 1, DATEADD(MONTH, n.n,
DATEADD(DAY, 1-DAY(INI), INI))))
FROM n INNER JOIN archivoFuente AS d
ON d.FIN >= DATEADD(MONTH, n.n-1, d.INI)
)
SELECT original_INI = f, original_FIN = t,
new_INI = CASE n WHEN 0 THEN f ELSE bp END,
new_FIN = CASE n WHEN md THEN t ELSE ep END
FROM d WHERE md >= n
ORDER BY original_INI, new_INI;
Any help with the query it's appreciated.
It's pretty easy actually, I used the same code for my requirements, you need to call each column in each select statement so it exist when you split the rows, check this code:
;WITH n(n) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY [object_id])-1 FROM sys.all_columns
),
d(n,f,t,md,bp,ep,
--CALL YOUR COLUMNS HERE EG: name, id, bla, ble
) AS
(
SELECT n.n,d.INI, d.FIN,
DATEDIFF(MONTH, d.INI, d.FIN),
DATEADD(MONTH, n.n, DATEADD(DAY, 1-DAY(INI), INI)),
DATEADD(DAY, -1, DATEADD(MONTH, 1, DATEADD(MONTH, n.n,
DATEADD(DAY, 1-DAY(INI), INI)))),
--CALL YOUR COLUMNS HERE AGAIN, PAY ATTENTION TO NAMES AND COMMAS
d.id_hr,d.Tipo,d.ID_tip,d.Nom_inc,d.RUT,d.Nombre,d.ID_emp,d.Nom_pos,d.Dias_durac,d.Num_lic,d.ID_usu_ap,d.ult_act
FROM n INNER JOIN archivoFuente AS d
ON d.FIN >= DATEADD(MONTH, n.n-1, d.INI)
)
SELECT --PUT ONCE AGAIN YOUR COLUMNS HERE, THIS WILL WORK FOR THE DISPLAYED RESULT
original_INI = f, original_FIN = t,
new_INI = CASE n WHEN 0 THEN f ELSE bp END,
new_FIN = CASE n WHEN md THEN t ELSE ep END
FROM d
WHERE md >= n
ORDER BY original_INI, new_INI;
Now, to save the table, I'd recommend using an INSERT statement to a new table, how will you do it, I don't know, I'am in the same spot as you. Hope someone check this question.
I have complex calculation requirement for a user logging system. I need to locate the most frequently active users based on their number of logins within a 180 day window. Once two login dates are 181 days apart, they do not count towards a total but could count towards a total when grouped with other dates.
For example here is Jim's login history:
Jim 2018-01-01
Jim 2018-04-01
Jim 2018-05-01
Jim 2018-06-01
Jim 2018-07-01
Jim 2018-08-01
Jim 2018-09-01
Jim 2018-12-01
Using 6 months, instead of 180 days, for simplicity, and only looking 6 months in one direction, Jim had the following totals:
Logins: 5 (2018-01-01 + 6 months)
Logins: 6 (2018-04-01 + 6 months)
Logins: 5 (2018-05-01 + 6 months)
Logins: 5 (2018-06-01 + 6 months)
Logins: 4 (2018-07-01 + 6 months)
Logins: 3 (2018-08-01 + 6 months)
Logins: 2 (2018-09-01 + 6 months)
Logins: 1 (2018-12-01 + 6 months)
So my system would report back 6 because it only wants the maximum total.
Other than brute force calculation, I'm lost on how to construct this system. Yes I can denormalize data to any degree, speed is most important.
Try this:
declare #tbl table(name char(3), dt date);
insert into #tbl values
('Jim', '2018-01-01'),
('Jim', '2018-04-01'),
('Jim', '2018-05-01'),
('Jim', '2018-06-01'),
('Jim', '2018-07-01'),
('Jim', '2018-08-01'),
('Jim', '2018-09-01'),
('Jim', '2018-12-01');
;with cte as (
select name, dt, DATEADD(day, 181, dt) upperDt from #tbl
), cte2 as (
select name,
(select COUNT(*) from cte where dt between c.dt and c.upperDt and name = c.name) cnt
from cte c
)
select name, MAX(cnt) [max]
from cte2
group by name
Try this, using a Common Table Expression to Calculate the EndDate Window and CROSS APPLY to calculate the total number of logins
DECLARE #t TABLE (UserName NVARCHAR(10), LoginDate DATETIME)
INSERT INTO #t
(UserName,LoginDate) VALUES
('Jim','2018-01-01'),
('Jim','2018-04-01'),
('Jim','2018-05-01'),
('Jim','2018-06-01'),
('Jim','2018-07-01'),
('Jim','2018-08-01'),
('Jim','2018-09-01'),
('Jim','2018-12-01')
; WITH CteDateRange
AS(
SELECT
T.UserName
,T.LoginDate
--,EndDateRange = DATEADD(DAY, 181, LoginDate)
,EndDateRange = DATEADD(MONTH, 6, LoginDate)
FROM #t T
)
SELECT
DR.UserName
,DR.LoginDate
,DR.EndDateRange
,T.Total
FROM CteDateRange DR
CROSS APPLY ( SELECT Total = COUNT(D.LoginDate)
FROM CteDateRange D
WHERE D.LoginDate >= DR.LoginDate
AND D.LoginDate <= DR.EndDateRange
AND D.UserName = DR.UserName
) T
Output
UserName LoginDate EndDateRange Total
Jim 2018-01-01 00:00:00.000 2018-07-01 00:00:00.000 5
Jim 2018-04-01 00:00:00.000 2018-10-01 00:00:00.000 6
Jim 2018-05-01 00:00:00.000 2018-11-01 00:00:00.000 5
Jim 2018-06-01 00:00:00.000 2018-12-01 00:00:00.000 5
Jim 2018-07-01 00:00:00.000 2019-01-01 00:00:00.000 4
Jim 2018-08-01 00:00:00.000 2019-02-01 00:00:00.000 3
Jim 2018-09-01 00:00:00.000 2019-03-01 00:00:00.000 2
Jim 2018-12-01 00:00:00.000 2019-06-01 00:00:00.000 1
One basic solution uses a join:
select l.*
from (select l.name, count(*) as cnt,
row_number() over (partition by name order by count(*) desc) as seqnum
from logins l join
logins l2
on l.name = l2.name and
l2.date >= l.date and l2.date < dateadd(day, 181, l.date)
group by l.name
) l
where seqnum = 1;
This might have acceptable performance with an index on logins(name, date).
I am trying to find the number of rows that 2 dates fall between. Basically I have an auth dated 1/1/2018 - 4/1/2018 and I need the count of pay periods those dates fall within.
Here is the data I am looking at:
create table #dates
(
pp_start_date date,
pp_end_date date
)
insert into #dates (pp_start_date,pp_end_date)
values ('2017-12-28', '2018-01-10'),
('2018-01-11', '2018-01-24'),
('2018-01-25', '2018-02-07'),
('2018-02-08', '2018-02-21'),
('2018-02-22', '2018-03-07'),
('2018-03-08', '2018-03-21'),
('2018-03-22', '2018-04-04'),
('2018-04-05', '2018-04-18');
When I run this query,
SELECT
ad.pp_start_date, ad.pp_end_date, orderby
FROM
(SELECT
ROW_NUMBER() OVER (ORDER BY pp_start_date) AS orderby, *
FROM
#dates) ad
WHERE
'2018-01-01' <= ad.pp_end_date
I somehow want to only get 7 rows. Is this even possible? Thanks in advance for any help!
EDIT - Ok so using a count(*) worked to get the number of rows but now I am trying to get the number of rows for 2 dynamic dates form another temp table but I don't see a way to relate the data.
Using the #dates temp table referenced above gives me the date data. Now using this data:
create table #stuff
([month] date,
[name] varchar(20),
units int,
fips_code int,
auth_datefrom date,
auth_dateto date)
insert into #stuff (month,name,units,fips_code,auth_datefrom,auth_dateto)
values ('2018-01-01','SMITH','50','760', '2018-01-01', '2018-04-01');
insert into #stuff (month,name,units,fips_code,auth_datefrom,auth_dateto)
values ('2018-01-01','JONES','46','193', '2018-01-01', '2018-04-01');
insert into #stuff (month,name,units,fips_code,auth_datefrom,auth_dateto)
values ('2018-01-01','DAVID','84','109', '2018-02-01', '2018-04-01');
I want to somehow create a statement that does a count of rows from the #dates table where the auth dates are referenced in the #stuff table I just can't figure out how to relate them or join them.
pp_start_date <= auth_dateto and pp_end_date >= auth_datefrom
Here is my output for #dates
pp_start_date pp_end_date
2017-12-28 2018-01-10
2018-01-11 2018-01-24
2018-01-25 2018-02-07
2018-02-08 2018-02-21
2018-02-22 2018-03-07
2018-03-08 2018-03-21
2018-03-22 2018-04-04
2018-04-05 2018-04-18
Here is my output for #stuff
month name units fips_code auth_datefrom auth_dateto
2018-01-01 SMITH 50 760 2018-01-01 2018-04-01
2018-01-01 JONES 46 193 2018-01-01 2018-04-01
2018-01-01 DAVID 84 109 2018-02-01 2018-04-01
I am trying to use the auth_datefrom and auth_dateto from #stuff to find out how many rows that is from #dates.
try this one.
SELECT ad.pp_start_date, ad.pp_end_date, orderby
from (select
row_number()over ( order by pp_start_date) as orderby, * from
#dates) ad
where ad.pp_end_date <= '2018-01-01'
or ad.pp_start_date >= '2018-01-01'
Are you looking for this?
select d.*
from #dates d
where d.startdate <= '2018-04-01' and
d.enddate >= '2018-01-01';
This returns all rows that have a date with the time period you specify.
I'm not sure what the row_number() does. If you want the count, then:
select count(*)
from #dates d
where d.startdate <= '2018-04-01' and
d.enddate >= '2018-01-01';
I'm trying to find all instances of a day of a week between two given dates. The day of the week can change. I've seen a similar question posted on here, but it doesn't seem to work for variables
SET DATEFIRST 1;
DECLARE #earliestStartDate DATETIME = '2016-08-01 00:00:00.000';
DECLARE #latestStartDate DATETIME = '2016-09-30 00:00:00.000';
DECLARE #weeklyCoursesStartDay INT = 1;
DECLARE #maxCourses INT = 30;
CREATE TABLE #Dates(CourseDate DATETIME);
WITH CTE(dt)
AS
(SELECT #earliestStartDate
UNION ALL
SELECT DATEADD(day, #weeklyCoursesStartDay, dt) FROM CTE--SELECT DATEADD(day, #weeklyCoursesStartDay, dt) FROM CTE
WHERE dt < #latestStartDate)
INSERT INTO #Dates(CourseDate)
SELECT TOP(#maxCourses) dt
FROM CTE
WHERE DATEPART(DW, dt) = #weeklyCoursesStartDay
OPTION (MAXRECURSION 0);
SELECT * FROM #Dates
DROP TABLE #Dates
This returns the below, as expected.
2016-08-01 00:00:00.000
2016-08-08 00:00:00.000
2016-08-15 00:00:00.000
2016-08-22 00:00:00.000
2016-08-29 00:00:00.000
2016-09-05 00:00:00.000
2016-09-12 00:00:00.000
2016-09-19 00:00:00.000
2016-09-26 00:00:00.000
But when the #weeklyCourseStartDay is changed to say, 5 (indicating a friday) it only returns two results:
2016-08-26 00:00:00.000
2016-09-30 00:00:00.000
The line
SELECT DATEADD(day, #weeklyCoursesStartDay, dt) FROM CTE
should actually be:
SELECT DATEADD(day, 1, dt) FROM CTE
This means that the recursive CTE will generate every day between #earliestStartDate and #latestStartDate, and, at a later stage, the WHERE DATEPART(DW, dt) = #weeklyCoursesStartDay clause will make sure only the days with the correct week day will remain.