Im stuck on a SQL query. Im using SQL Server.
Given a table that contains Jobs with a start and end date. These jobs can span days or months. I need to get the total combined number of days worked each month for all jobs that intersected those months.
Jobs
-----------------------------------
JobId | Start | End | DayRate |
-----------------------------------
1 | 1.1.13 | 2.2.13 | 2500 |
2 | 5.1.13 | 5.2.13 | 2000 |
3 | 3.3.13 | 2.4.13 | 3000 |
The results i need are:
Month | Days
--------------
Jan | 57
Feb | 7
Mar | 28
Apr | 2
Any idea how i would right such a query ?
I would also like to work out the SUM for each month based on multiplying the dayrate by number of days worked for each job, how would i add this to the results ?
Thanks
You can use recursive CTE to extract all days from start to end for each JobID and then just group by month (and year I guess).
;WITH CTE_TotalDays AS
(
SELECT [Start] AS DT, JobID FROM dbo.Jobs
UNION ALL
SELECT DATEADD(DD,1,c.DT), c.JobID FROM CTE_TotalDays c
WHERE c.DT < (SELECT [End] FROM Jobs j2 WHERE j2.JobId = c.JobID)
)
SELECT
MONTH(DT) AS [Month]
,YEAR(DT) AS [Year]
,COUNT(*) AS [Days]
FROM CTE_TotalDays
GROUP BY MONTH(DT),YEAR(DT)
OPTION (MAXRECURSION 0)
SQLFiddle DEMO
PS: There are 58 days in Jan in your example and not 57 ;)
You can do it using following approach:
/* Your table with periods */
declare #table table(JobId int, Start date, [End] date, DayRate money)
INSERT INTO #table (JobId , Start, [End], DayRate)
VALUES
(1, '20130101','20130202', 2500),
(2,'20130105','20130205', 2000),
(3,'20130303','20130402' , 3000 )
/* create table where stored all possible dates
if this code are supposed to be executed often you can create
table with dates ones to avoid overhead of filling it */
declare #dates table(d date)
declare #d date='20000101'
WHILE #d<'20500101'
BEGIN
INSERT INTO #dates (d) VALUES (#d)
SET #d=DATEADD(DAY,1,#d)
END;
/* and at last get desired output */
SELECT YEAR(d.d) [YEAR], DATENAME(month,d.d) [MONTH], COUNT(*) [Days]
FROM #dates d
CROSS JOIN #table t
WHERE d.d BETWEEN t.Start AND t.[End]
GROUP BY YEAR(d.d), DATENAME(month,d.d)
This only have 1 recursive call instead of 1 for each row. I imagine this will perform better than the chosen answer when you have large amount of data.
declare #t table(JobId int, Start date, [End] date, DayRate int)
insert #t values
(1,'2013-01-01','2013-02-02', 2500),(2,'2013-01-05','2013-02-05', 2000),(3,'2013-03-03', '2013-04-02',3000)
;WITH a AS
(
SELECT min(Start) s, max([End]) e
FROM #t
), b AS
(
SELECT s, e from a
UNION ALL
SELECT dateadd(day, 1, s), e
FROM b WHERE s <> e
)
SELECT
MONTH(b.s) AS [Month]
,YEAR(b.s) AS [Year]
,COUNT(*) AS [Days]
,SUM(DayRate) MonthDayRate
FROM b
join #t t
on b.s between t.Start and t.[End]
GROUP BY MONTH(b.s),YEAR(b.s)
OPTION (MAXRECURSION 0)
Result:
Month Year Days MonthDayRate
1 2013 58 131500
2 2013 7 15000
3 2013 29 87000
4 2013 2 6000
Related
I have a table
DATE Val
01-01-2020 1
01-02-2020 3
01-05-2020 2
01-07-2020 8
01-13-2020 3
...
I want to summarize these values by the following Sunday. For example, in the above example:
1-05-2020, 1-12-2020, and 1-19-2020 are Sundays, so I want to summarize these by those dates.
The final result should be something like
DATE SUM
1-05-2020 6 //(01-01-2020 + 01-02-2020 + 01-05-2020)
1-12-2020 8
1-19-2020 3
I wasn't certain if the best place to start would be to create a temp calendar table, and then try to join backwards based on that? Or if there was an easier way involving DATEDIFF. Any help would be appreciated! Thanks!
Here's a solution that uses DATEADD & DATEPART to calculate the closest Sunday.
With a correction for a different setting of ##datefirst.
(Since the datepart weekday values are different depending on the DATEFIRST setting)
Sample data:
create table #TestTable
(
Id int identity(1,1) primary key,
[Date] date,
Val int
);
insert into #TestTable
([Date], Val)
VALUES
('2020-01-01', 1)
, ('2020-01-02', 3)
, ('2020-01-05', 2)
, ('2020-01-07', 8)
, ('2020-01-13', 3)
;
Query:
WITH CTE_DATA AS
(
SELECT [Date], Val
, DATEADD(day,
((7-(##datefirst+datepart(weekday, [Date])-1)%7)%7),
[Date]) AS Sunday
FROM #TestTable
)
SELECT
Sunday AS [Date],
SUM(Val) AS [Sum]
FROM CTE_DATA
GROUP BY Sunday
ORDER BY Sunday;
Date | Sum
:--------- | --:
2020-01-05 | 6
2020-01-12 | 8
2020-01-19 | 3
db<>fiddle here
Extra:
Apparently the trick of adding the difference of weeks from day 0 to day 6 also works independently from the DATEFIRST setting.
So this query will return the same result for the sample data.
WITH CTE_DATA AS
(
SELECT [Date], Val
, CAST(DATEADD(week, DATEDIFF(week, 0, DATEADD(day, -1, [Date])), 6) AS DATE) AS Sunday
FROM #TestTable
)
SELECT
Sunday AS [Date],
SUM(Val) AS [Sum]
FROM CTE_DATA
GROUP BY Sunday
ORDER BY Sunday;
The subtraction of 1 day makes sure that if the date is already a Sunday that it isn't calculated to the next Sunday.
Here is a way to do it:
nb:1-13-2020 wont show cuz its not a sunday
with cte as
(
select cast('01-01-2020'as Date) as Date, 1 as Val
union select '01-02-2020' , 3
union select '01-05-2020' , 2
union select '01-07-2020' , 8
)
select Date, max(dateadd(dd,number,Date)), sum(distinct Val) as SUM
from master..spt_values a inner join cte on Date <= dateadd(dd,number,Date)
where type = 'p'
and year(dateadd(dd,number,Date))=year(Date)
and DATEPART(dw,dateadd(dd,number,Date)) = 7
group by Date
Output:
Date (No column name) SUM
2020-01-01 2020-12-26 1
2020-01-02 2020-12-26 3
2020-01-05 2020-12-26 2
2020-01-07 2020-12-26 8
Here is a simple solution. Putting your values into a temporary table and viewing the results on that table:
DECLARE #dates TABLE
(
mDATE DATE,
Val INT,
Sunday DATE
)
INSERT INTO #dates (mDATE,Val) VALUES
('01-01-2020',1),('01-02-2020',3),('01-05-2020',2),('01-07-2020',8),('01-13-2020',3)
UPDATE #dates
SET Sunday = dateadd(week, datediff(week, 0, mDATE), 6)
SELECT Sunday,SUM(Val) AS Val FROM #dates
GROUP BY Sunday
OUTPUT:
Sunday Val
2020-01-05 4
2020-01-12 10
2020-01-19 3
I’m using MS-SQL-2008 R2 trying to write a script that calculates the Number of Hospital Beds occupied on any given day, at 2 census points: midnight, and 09:00.
I’m working from a data set of patient Ward Stays. Basically, each row in the table is a record of an individual patient's stay on a single ward, and records the date/time the patient is admitted onto the ward, and the date/time the patient leaves the ward.
A sample of this table is below:
Ward_Stay_Primary_Key | Ward_Start_Date_Time | Ward_End_Date_Time
1 | 2017-09-03 15:04:00.000 | 2017-09-27 16:55:00.000
2 | 2017-09-04 18:08:00.000 | 2017-09-06 18:00:00.000
3 | 2017-09-04 13:00:00.000 | 2017-09-04 22:00:00.000
4 | 2017-09-04 20:54:00.000 | 2017-09-08 14:30:00.000
5 | 2017-09-04 20:52:00.000 | 2017-09-13 11:50:00.000
6 | 2017-09-05 13:32:00.000 | 2017-09-11 14:49:00.000
7 | 2017-09-05 13:17:00.000 | 2017-09-12 21:00:00.000
8 | 2017-09-05 23:11:00.000 | 2017-09-06 17:38:00.000
9 | 2017-09-05 11:35:00.000 | 2017-09-14 16:12:00.000
10 | 2017-09-05 14:05:00.000 | 2017-09-11 16:30:00.000
The key thing to note here is that a patient’s Ward Stay can span any length of time, from a few hours to many days.
The following code enables me to calculate the number of beds at both census points for any given day, by specifying the date in the case statement:
SELECT
'05/09/2017' [Date]
,SUM(case when Ward_Start_Date_Time <= '05/09/2017 00:00:00.000' AND (Ward_End_Date_Time >= '05/09/2017 00:00:00.000' OR Ward_End_Date_Time IS NULL)then 1 else 0 end)[No. Beds Occupied at 00:00]
,SUM(case when Ward_Start_Date_Time <= '05/09/2017 09:00:00.000' AND (Ward_End_Date_Time >= '05/09/2017 09:00:00.000' OR Ward_End_Date_Time IS NULL)then 1 else 0 end)[No. Beds Occupied at 09:00]
FROM
WardStaysTable
And, based on the sample 10 records above, generates this output:
Date | No. Beds Occupied at 00:00 | No. Beds Occupied at 09:00
05/09/2017 | 4 | 4
To perform this for any number of days is obviously onerous, so what I’m looking to create is a query where I can specify a start/end date parameter (e.g. 1st-5th Sept), and for the query to then evaluate the Ward_Start_Date_Time and Ward_End_Date_Time variables for each record, and – grouping by the dates defined in the date parameter – count each time the 00:00:00.000 and 09:00:00.000 census points fall between these 2 variables, to give an output something along these lines (based on the above 10 records):
Date | No. Beds Occupied at 00:00 | No. Beds Occupied at 09:00
01/09/2017 | 0 | 0
02/09/2017 | 0 | 0
03/09/2017 | 0 | 0
04/09/2017 | 1 | 1
05/09/2017 | 4 | 4
I’ve approached this (perhaps naively) thinking that if I use a cte to create a table of dates (defined by the input parameters), along with associated midnight and 9am census date/time points, then I could use these variables to group and evaluate the dataset.
So, this code generates the grouping dates and census date/time points:
DECLARE
#StartDate DATE = '01/09/2017'
,#EndDate DATE = '05/09/2017'
,#0900 INT = 540
SELECT
DATEADD(DAY, nbr - 1, #StartDate) [Date]
,CONVERT(DATETIME,(DATEADD(DAY, nbr - 1, #StartDate))) [MidnightDate]
,DATEADD(mi, #0900,(CONVERT(DATETIME,(DATEADD(DAY, nbr - 1, #StartDate))))) [0900Date]
FROM
(
SELECT
ROW_NUMBER() OVER ( ORDER BY c.object_id ) AS nbr
FROM sys.columns c
) nbrs
WHERE nbr - 1 <= DATEDIFF(DAY, #StartDate, #EndDate)
The stumbling block I’ve hit is how to join the cte to the WardStays dataset, because there’s no appropriate key… I’ve tried a few iterations of using a subquery to make this work, but either I’m taking the wrong approach or I’m getting my syntax in a mess.
In simple terms, the logic I’m trying to create to get the output is something like:
SELECT
[Date]
,SUM (case when WST.Ward_Start_Date_Time <= [MidnightDate] AND (WST.Ward_End_Date_Time >= [MidnightDate] OR WST.Ward_End_Date_Time IS NULL then 1 else 0 end) [No. Beds Occupied at 00:00]
,SUM (case when WST.Ward_Start_Date_Time <= [0900Date] AND (WST.Ward_End_Date_Time >= [0900Date] OR WST.Ward_End_Date_Time IS NULL then 1 else 0 end) [No. Beds Occupied at 09:00]
FROM WardStaysTable WST
GROUP BY [Date]
Is the above somehow possible, or am I barking up the wrong tree and need to take a different approach altogether? Appreciate any advice.
I would expect something like this:
WITH dates as (
SELECT CAST(#StartDate as DATETIME) as dte
UNION ALL
SELECT DATEADD(DAY, 1, dte)
FROM dates
WHERE dte < #EndDate
)
SELECT dates.dte [Date],
SUM(CASE WHEN Ward_Start_Date_Time <= dte AND
Ward_END_Date_Time >= dte
THEN 1 ELSE 0
END) as num_beds_0000,
SUM(CASE WHEN Ward_Start_Date_Time <= dte + CAST('09:00' as DATETIME) AND
Ward_END_Date_Time >= dte + CAST('09:00' as DATETIME)
THEN 1 ELSE 0
END) as num_beds_0900
FROM dates LEFT JOIN
WardStaysTable wt
ON wt.Ward_Start_Date_Time <= DATEADD(day, 1, dates.dte) AND
wt.Ward_END_Date_Time >= dates.dte
GROUP BY dates.dte
ORDER BY dates.dte;
The cte is just creating the list of dates.
What a cool exercise. Here is what I came up with:
CREATE TABLE #tmp (ID int, StartDte datetime, EndDte datetime)
INSERT INTO #tmp values(1,'2017-09-03 15:04:00.000','2017-09-27 06:55:00.000')
INSERT INTO #tmp values(2,'2017-09-04 08:08:00.000','2017-09-06 18:00:00.000')
INSERT INTO #tmp values(3,'2017-09-04 13:00:00.000','2017-09-04 22:00:00.000')
INSERT INTO #tmp values(4,'2017-09-04 20:54:00.000','2017-09-08 14:30:00.000')
INSERT INTO #tmp values(5,'2017-09-04 20:52:00.000','2017-09-13 11:50:00.000')
INSERT INTO #tmp values(6,'2017-09-05 13:32:00.000','2017-09-11 14:49:00.000')
INSERT INTO #tmp values(7,'2017-09-05 13:17:00.000','2017-09-12 21:00:00.000')
INSERT INTO #tmp values(8,'2017-09-05 23:11:00.000','2017-09-06 07:38:00.000')
INSERT INTO #tmp values(9,'2017-09-05 11:35:00.000','2017-09-14 16:12:00.000')
INSERT INTO #tmp values(10,'2017-09-05 14:05:00.000','2017-09-11 16:30:00.000')
DECLARE
#StartDate DATE = '09/01/2017'
,#EndDate DATE = '10/01/2017'
, #nHours INT = 9
;WITH d(OrderDate) AS
(
SELECT DATEADD(DAY, n-1, #StartDate)
FROM (SELECT TOP (DATEDIFF(DAY, #StartDate, #EndDate) + 1)
ROW_NUMBER() OVER (ORDER BY [object_id]) FROM sys.all_objects) AS x(n)
)
, CTE AS(
select OrderDate, t2.*
from #tmp t2
cross apply(select orderdate from d ) d
where StartDte >= #StartDate and EndDte <= #EndDate)
select OrderDate,
SUM(CASE WHEN OrderDate >= StartDte and OrderDate <= EndDte THEN 1 ELSE 0 END) [No. Beds Occupied at 00:00],
SUM(CASE WHEN StartDTE <= DateAdd(hour,#nHours,CAST(OrderDate as datetime)) and DateAdd(hour,#nHours,CAST(OrderDate as datetime)) <= EndDte THEN 1 ELSE 0 END) [No. Beds Occupied at 09:00]
from CTE
GROUP BY OrderDate
This should allow you to check for any hour of the day using the #nHours parameter if you so choose. If you only want to see records that actually fall within your date range then you can filter the cross apply on start and end dates.
I havea table as table_A . table_A includes these columns
-CountryName
-Min_Date
-Max_Date
-Number
I want to duplicate data with seperating by months. For example
Argentina | 2015-01-04 | 2015-04-07 | 100
England | 2015-02-08 | 2015-03-11 | 90
I want to see a table as this (Monthly seperated)
Argentina | 01-2015 | 27 //(days to end of the min_date's month)
Argentina | 02-2015 | 29 //(days full month)
Argentina | 03-2015 | 31 //(days full month)
Argentina | 04-2015 | 7 //(days from start of the max_date's month)
England | 02-2015 | 21 //(days)
England | 03-2015 | 11 //(days)
I tried too much thing to made this for each records. But now my brain is so confusing and my project is delaying.
Does anybody know how can i solve this. I tried to duplicate each rows with datediff count but it is not working
WITH cte AS (
SELECT CountryName, ISNULL(DATEDIFF(M,Min_Date ,Max_Date )+1,1) as count FROM table_A
UNION ALL
SELECT CountryName, count-1 FROM cte WHERE count>1
)
SELECT CountryName,count FROM cte
-Generate all the dates between min and max dates for each country.
-Then get the month start and month end dates for each country,year,month.
-Finally get the date differences of the month start and month end.
WITH cte AS (
SELECT Country, min_date dt,min_date,max_date FROM t
UNION ALL
SELECT Country, dateadd(dd,1,dt),min_date,max_date FROM cte WHERE dt < max_date
)
,monthends as (
SELECT country,year(dt) yr,month(dt) mth,max(dt) monthend,min(dt) monthstart
FROM cte
GROUP BY country,year(dt),month(dt))
select country
,cast(mth as varchar(2))+'-'+cast(yr as varchar(4)) yr_month
,datediff(dd,monthstart,monthend)+1 days_diff
from monthends
Sample Demo
EDIT: Another option would be to generate all the dates once (the example shown here generates 51 years of dates from 2000 to 2050) and then joining it to the table to get the days by month.
WITH cte AS (
SELECT cast('2000-01-01' as date) dt,cast('2050-12-31' as date) maxdt
UNION ALL
SELECT dateadd(dd,1,dt),maxdt FROM cte WHERE dt < maxdt
)
SELECT country,year(dt) yr,month(dt) mth, datediff(dd,min(dt),max(dt))+1 days_diff
FROM cte c
JOIN t on c.dt BETWEEN t.min_date and t.max_date
GROUP BY country,year(dt),month(dt)
OPTION (MAXRECURSION 0)
I think you have the right idea. But you need to construct the months:
WITH cte AS (
SELECT CountryName, Min_Date as dte, Min_Date, Max_Date
FROM table_A
UNION ALL
SELECT CountryName, DATEADD(month, 1, dte), Min_Date, Max_Date
FROM cte
WHERE dte < Max_date
)
SELECT CountryName, dte
FROM cte;
Getting the number of days in the month is a bit more complicated. That requires some thought.
Oh, I forgot about EOMONTH():
select countryName, dte,
(case when dte = min_date
then datediff(day, min_date, eomonth(dte)) + 1
when dte = max_date
then day(dte)
else day(eomonth(dte))
end) as days
from cte;
Using a Calendar Table makes this stuff pretty easy. RexTester: http://rextester.com/EBTIMG23993
begin
create table #enderaric (
CountryName varchar(16)
, Min_Date date
, Max_Date date
, Number int
)
insert into #enderaric values
('Argentina' ,'2015-01-04' ,'2015-04-07' ,'100')
, ('England' ,'2015-02-08' ,'2015-03-11' ,'90')
end;
-- select * from #enderaric
--*/"
declare #FromDate date;
declare #ThruDate date;
set #FromDate = '2015-01-01';
set #ThruDate = '2015-12-31';
with x as (
select top (cast(sqrt(datediff(day, #FromDate, #ThruDate)) as int) + 1)
[number]
from [master]..spt_values v
)
/* Date Range CTE */
,cal as (
select top (1+datediff(day, #FromDate, #ThruDate))
DateValue = convert(date,dateadd(day,
row_number() over (order by x.number)-1,#FromDate)
)
from x cross join x as y
order by DateValue
)
select
e.CountryName
, YearMonth = convert(char(7),left(convert(varchar(10),DateValue),7))
, [Days]=count(c.DateValue)
from #enderaric as e
inner join cal c on c.DateValue >= e.min_date
and c.DateValue <= e.max_date
group by
e.CountryName
, e.Min_Date
, e.Max_Date
, e.Number
, convert(char(7),left(convert(varchar(10),DateValue),7))
results in:
CountryName YearMonth Days
---------------- --------- -----------
Argentina 2015-01 28
Argentina 2015-02 28
Argentina 2015-03 31
Argentina 2015-04 7
England 2015-02 21
England 2015-03 11
More about calendar tables:
Aaron Bertrand - Generate a set or sequence without loops
generate-a-set-1
generate-a-set-2
generate-a-set-3
David Stein - Creating a Date Table/Dimension on SQL 2008
Michael Valentine Jones - F_TABLE_DATE
I have tried browsing the problems & answers in this forum, but neither of them fit's my case sufficiently.
I have some people reporting in their status for 2 categories, which looks like this:
TimeStamp | PersonID | Category | Value
2015-07-02 01:25:00 | 2303 | CatA | 8.2
2015-07-02 01:25:00 | 2303 | CatB | 10.1
2015-07-02 03:35:00 | 2303 | CatA | 8.0
2015-07-02 03:35:00 | 2303 | CatB | 9.9
2015-07-02 02:30:00 | 4307 | CatA | 8.7
2015-07-02 02:30:00 | 4307 | CatB | 12.7
.
.
.
2015-07-31 22:15:00 | 9011 | CatA | 7.9
2015-07-31 22:15:00 | 9011 | CatB | 8.9
Some people report status several times per hour, but others only a couple of times per day.
I need to produce an an output, which shows latest know status for each day, for each hour of the day, for each person and category. This should look like this:
Date |Hour| Person | Category | Value
2015-07-02 | 1 | 2307 | CatA | Null
2015-07-02 | 1 | 2307 | CatB | Null
2015-07-02 | 2 | 2307 | CatA | 8.2
2015-07-02 | 2 | 2307 | CatB | 10.2
2015-07-02 | 3 | 2307 | CatA | 8.2
2015-07-02 | 3 | 2307 | CatB | 10.2
2015-07-02 | 4 | 2307 | CatA | 8.0
2015-07-02 | 4 | 2307 | CatB | 9.9
.
.
.
2015-07-31 | 23 | 9011 | CatA | 7.9
2015-07-31 | 23 | 9011 | CatB | 8.9
The first row(s) for each person and category will probably be null as there will be no known values as this is "beginning of time"
I have tried using a sub query like this:
SELECT Date
,hour
,Person
,Category
,(SELECT TOP 1 status FROM readings WHERE (readings.Date<=structure.Date) AND readings.Hour<=structure.hour)....and so forth.... order by TimeStamp DESC
FROM structure
This works - except in terms of performance because I need to do this for a month, for 2.000 persons for 2 categories and that means that the sub query must run (30*24*2000*2=2,880,000) times, and given the fact that table containing the readings also contains hundreds of thousands of readings, this don't work.
I have also tried messing round with row_number(), but have not succeed in this.
Any suggestions?
Edit (19-10-2015 15:34): In my query example above I am referring to a "structure" table. This is actually just (for the time being) a view, with the following SQL:
SELECT Calendar.CalendarDay, Hours.Hour, Persons.Person, Categories.Category
FROM Calendar CROSS JOIN Hours CROSS JOIN Persons CROSS JOIN Categories
This in order to produce a table containing a row for each day, for each hour for each person and each category. This table then contains (30*24*2000*2=2,880,000) rows.
For each of these rows, I need to locate the latest status from the readings table. So for each Day, for each hour, for each person and each category I need to read the latest available status from the readings table.
Let me guess.
Based on the task "to produce an output, which shows latest know status for each day, for each hour of the day, for each person and category" you need to take three steps:
(1) Find latest records for every hour;
(2) Get a table of all date and hours to show;
(3) Multiply that date-hours-table by persons and categories and left join the result with latest-records-for-every-hour.
-- Test data
declare #t table ([Timestamp] datetime2(0), PersonId int, Category varchar(4), Value decimal(3,1));
insert into #t values
('2015-07-02 01:25:00', 2303, 'CatA', 8.2 ),
('2015-07-02 01:45:00', 2303, 'CatA', 9.9 ),
('2015-07-02 01:25:00', 2303, 'CatB', 10.1 ),
('2015-07-02 03:35:00', 2303, 'CatA', 8.0 ),
('2015-07-02 03:35:00', 2303, 'CatB', 9.9 ),
('2015-07-02 02:30:00', 4307, 'CatA', 8.7 ),
('2015-07-02 02:30:00', 4307, 'CatB', 12.7 );
-- Latest records for every hour
declare #Latest table (
[Date] date,
[Hour] tinyint,
PersonId int,
Category varchar(4),
Value decimal(3,1)
primary key ([Date], [Hour], PersonId, Category)
);
insert into #Latest
select top 1 with ties
[Date] = cast([Timestamp] as date),
[Hour] = datepart(hour, [Timestamp]),
PersonId ,
Category ,
Value
from
#t
order by
row_number() over(partition by cast([Timestamp] as date), datepart(hour, [Timestamp]), PersonId, Category order by [Timestamp] desc);
-- Date-hours table
declare #FromDateTime datetime2(0);
declare #ToDateTime datetime2(0);
select #FromDateTime = min([Timestamp]), #ToDateTime = max([Timestamp]) from #t;
declare #DateDiff int = datediff(day, #FromDateTime, #ToDateTime);
declare #FromDate date = cast(#FromDateTime as date);
declare #FromHour int = datepart(hour, #FromDateTime);
declare #ToHour int = datepart(hour, #ToDateTime);
declare #DayHours table ([Date] date, [Hour] tinyint, primary key clustered ([Date], [Hour]) );
with N as
(
select n from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) t(n)
),
D as (
select
row_number() over(order by (select 1))-1 as d
from
N n1, N n2, N n3
),
H as (
select top 24
row_number() over(order by (select 1)) - 1 as h
from
N n1, N n2
)
insert into #DayHours
select dateadd(day, d, #FromDate), h
from
D, h
where
#FromHour <= (d * 100 + h)
and (d * 100 + h) <= (#DateDiff * 100 + #ToHour);
-- #PersonsIds & #Categories tables (just an imitation of the real tables)
declare #PersonsIds table (Id int primary key);
declare #Categories table (Category varchar(4) primary key);
insert into #PersonsIds select distinct PersonId from #t;
insert into #Categories select distinct Category from #t;
-- The result
select
dh.[Date],
dh.[Hour],
PersonId = p.Id,
c.Category,
l.Value
from
#PersonsIds p cross join #Categories c cross join #DayHours dh
left join #Latest l on l.[Date] = dh.[Date] and l.[Hour] = dh.[Hour] and l.PersonId = p.Id and l.Category = c.Category
order by
[Date], [Hour], PersonId, Category;
Edit (1):
OK.
In order to bring over the previous values to empty spaces,
let's replace the last select statement with this one:
select top 1 with ties
dh.[Date],
dh.[Hour],
PersonId = p.Id,
c.Category,
l.Value
from
#PersonsIds p cross join #Categories c cross join #DayHours dh
left join #Latest l
on (l.[Date] = dh.[Date] and l.[Hour] <= dh.[Hour] or l.[Date] < dh.[Date])
and l.PersonId = p.Id and l.Category = c.Category
order by
row_number()
over (partition by dh.[Date], dh.[Hour], p.Id, c.Category
order by l.[Date] desc, l.[Hour] desc);
Edit (2):
Let's try to collect the Cartesian product in temporary table with clustered index: PersonId, Category, [Date], [Hour].
And then update the table dragging non-changed values:
declare #Result table (
[Date] date,
[Hour] tinyint,
PersonId int,
Category varchar(4),
Value decimal(3,1)
primary key (PersonId, Category, [Date], [Hour]) -- Important !!!
)
insert into #Result
select
dh.[Date],
dh.[Hour],
PersonId = p.Id,
c.Category,
l.Value
from
#PersonsIds p cross join #Categories c cross join #DayHours dh
left join #Latest l on l.[Date] = dh.[Date] and l.[Hour] = dh.[Hour] and l.PersonId = p.Id and l.Category = c.Category
order by
[Date], [Hour], PersonId, Category;
declare #PersonId int;
declare #Category varchar(4);
declare #Value decimal(3,1);
update #Result set
#Value = Value = isnull(Value, case when #PersonId = PersonId and #Category = Category then #Value end),
#PersonId = PersonId,
#Category = Category;
For yet better performance consider changing table variables with temporary tables and applying indexes in accordance with query plan recommendations.
If i got it correctly ..it should give you desired result.
select st.Date,
case when hour =1 then NULL
else hour
end as hour
,st.Person,st.Category,
(select status from reading qualify row_number() over (partition by personid
order by status desc)=1)
from structure;
You can achieve this in SQL, but it will be quite slow, because for every person, category, day and hour you will have to look for the latest entry for the person and category until then. Just think of the process: Pick a record in your big table, find all statuses until then, order them and find the latest thus and pick its value. And this will be done for every record in your big table.
You might be better of to simply retrieve all data with a program written in a programming language and collect the data with a control-break algorithm.
However, let's see how it's done in SQL.
One problem is SQL Server's poor date/time functions. We want to compare date plus hour, which would be easiest with strings in 'yyyymmddhh' format, e.g. '2015101923' < '2015102001'. In your big table you have date and hour and in your status table you have datetimes. Let's see how we can get the desired strings:
convert(varchar(8), bigtable.calendarday, 112) +
right('0' + convert(varchar(2), bigtable.hour), 2)
and
convert(varchar(8), status.timestamp, 112) +
right('0' + convert(varchar(2), datepart(hour, status.timestamp)), 2)
As this is - along with person and category - our key criterion to find records, you may want to have it as computed columns and add indexes (person + category + dayhourkey) in both tables.
You'd select from your big table and get the status value in a subquery. In order to get the latest matching record, you'd order by timestamp and limit to 1 record.
select
personid,
calendarday,
hour,
category,
(
select value
from status s
where s.personid = b.personid
and s.category = b.category
and convert(varchar(8), s.timestamp, 112) + right('0' + convert(varchar(2), datepart(hour, s.timestamp)), 2) <=
convert(varchar(8), b.calendarday, 112) + right('0' + convert(varchar(2), b.hour), 2)
order by s.timestamp desc limit 1
) as value
from bigtable b;
This problem is related to this, which has no solution in sight: here
I have a table that shows me all sessions of an area.
This session has a start date.
I need to get all the days of month of the start date of the session by specific area (in this case)
I have this query:
SELECT idArea, idSession, startDate FROM SessionsPerArea WHERE idArea = 1
idArea | idSession | startDate |
1 | 1 | 01-01-2013 |
1 | 2 | 04-01-2013 |
1 | 3 | 07-02-2013 |
And i want something like this:
date | Session |
01-01-2013 | 1 |
02-01-2013 | NULL |
03-01-2013 | NULL |
04-01-2013 | 1 |
........ | |
29-01-2013 | NULL |
30-01-2013 | NULL |
In this case, the table returns me all the days of January.
The second column is the number of sessions that occur on that day, because there may be several sessions on the same day.
Anyone can help me?
Please try:
DECLARE #SessionsPerArea TABLE (idArea INT, idSession INT, startDate DATEtime)
INSERT #SessionsPerArea VALUES (1,1,'2013-01-01')
INSERT #SessionsPerArea VALUES (1,2,'2013-01-04')
INSERT #SessionsPerArea VALUES (1,3,'2013-07-02')
DECLARE #RepMonth as datetime
SET #RepMonth = '01/01/2013';
WITH DayList (DayDate) AS
(
SELECT #RepMonth
UNION ALL
SELECT DATEADD(d, 1, DayDate)
FROM DayList
WHERE (DayDate < DATEADD(d, -1, DATEADD(m, 1, #RepMonth)))
)
SELECT *
FROM DayList t1 left join #SessionsPerArea t2 on t1.DayDate=startDate and t2.idArea = 1
This will work:
DECLARE #SessionsPerArea TABLE (idArea INT, idSession INT, startDate DATE)
INSERT #SessionsPerArea VALUES
(1,1,'2013-01-01'),
(1,2,'2013-01-04'),
(1,3,'2013-07-02')
;WITH t1 AS
(
SELECT startDate
, DATEADD(MONTH, DATEDIFF(MONTH, '1900-01-01', startDate), '1900-01-01') firstInMonth
, DATEADD(DAY, -1, DATEADD(MONTH, DATEDIFF(MONTH, '1900-01-01', startDate) + 1, '1900-01-01')) lastInMonth
, COUNT(*) cnt
FROM #SessionsPerArea
WHERE idArea = 1
GROUP BY
startDate
)
, calendar AS
(
SELECT DISTINCT DATEADD(DAY, c.number, t1.firstInMonth) d
FROM t1
JOIN master..spt_values c ON
type = 'P'
AND DATEADD(DAY, c.number, t1.firstInMonth) BETWEEN t1.firstInMonth AND t1.lastInMonth
)
SELECT d date
, cnt Session
FROM calendar c
LEFT JOIN t1 ON t1.startDate = c.d
It uses simple join on master..spt_values table to generate rows.
Just an example of calendar table. To return data for a month adjust the number of days between < 32, for a year to 365+1. You can calculate the number of days in a month or between start/end dates with query. I'm not sure how to do this in SQL Server. I'm using hardcoded values to display all dates in Jan-2013. You can adjust start and end dates for diff. month or to get start/end dates with queries...:
WITH data(r, start_date) AS
(
SELECT 1 r, date '2012-12-31' start_date FROM any_table --dual in Oracle
UNION ALL
SELECT r+1, date '2013-01-01'+r-1 FROM data WHERE r < 32 -- number of days between start and end date+1
)
SELECT start_date FROM data WHERE r > 1
/
START_DATE
----------
1/1/2013
1/2/2013
1/3/2013
...
...
1/31/2013