TSQL - Run date comparison for "duplicates"/false positives on initial query? - sql

I'm pretty new to SQL and am working on pulling some data from several very large tables for analysis. The data is basically triggered events for assets on a system. The events all have a created_date (datetime) field that I care about.
I was able to put together the query below to get the data I need (YAY):
SELECT
event.efkey
,event.e_id
,event.e_key
,l.l_name
,event.created_date
,asset.a_id
,asset.asset_name
FROM event
LEFT JOIN asset
ON event.a_key = asset.a_key
LEFT JOIN l
ON event.l_key = l.l_key
WHERE event.e_key IN (350, 352, 378)
ORDER BY asset.a_id, event.created_date
However, while this gives me the data for the specific events I want, I still have another problem. Assets can trigger these events repeatedly, which can result in large numbers of "false positives" for what I'm looking at.
What I need to do is go through the result set of the query above and remove any events for an asset that occur closer than N minutes together (say 30 minutes for this example). So IF the asset_ID is the same AND the event.created_date is within 30 minutes of another event for that asset in the set THEN I want that removed. For example:
For the following records
a_id 1124 created 2016-02-01 12:30:30
a_id 1124 created 2016-02-01 12:35:31
a_id 1124 created 2016-02-01 12:40:33
a_id 1124 created 2016-02-01 12:45:42
a_id 1124 created 2016-02-02 12:30:30
a_id 1124 created 2016-02-02 13:00:30
a_id 1115 created 2016-02-01-12:30:30
I'd want to return only:
a_id 1124 created 2016-02-01 12:30:30
a_id 1124 created 2016-02-02 12:30:30
a_id 1124 created 2016-02-02 13:00:30
a_id 1115 created 2016-02-01-12:30:30
I tried referencing this and this but I can't make the concepts there work for me. I know I probably need to do a SELECT * FROM (my existing query) but I can't seem to do that without ending up with tons of "multi-part identifier can't be bound" errors (and I have no experience creating temp tables, my attempts at that have failed thus far). I also am not exactly sure how to use DATEDIFF as the date filtering function.
Any help would be greatly appreciated! If you could dumb it down for a novice (or link to explanations) that would also be helpful!

This is a trickier problem than it initially appears. The hard part is capturing the previous good row and removing the next bad rows but not allowing those bad rows to influence whether or not the next row is good. Here is what I came up with. I've tried to explain what is going on with comments in the code.
--sample data since I don't have your table structure and your original query won't work for me
declare #events table
(
id int,
timestamp datetime
)
--note that I changed some of your sample data to test some different scenarios
insert into #events values( 1124, '2016-02-01 12:30:30')
insert into #events values( 1124, '2016-02-01 12:35:31')
insert into #events values( 1124, '2016-02-01 12:40:33')
insert into #events values( 1124, '2016-02-01 13:05:42')
insert into #events values( 1124, '2016-02-02 12:30:30')
insert into #events values( 1124, '2016-02-02 13:00:30')
insert into #events values( 1115, '2016-02-01 12:30:30')
--using a cte here to split the result set of your query into groups
--by id (you would want to partition by whatever criteria you use
--to determine that rows are talking about the same event)
--the row_number function gets the row number for each row within that
--id partition
--the over clause specifies how to break up the result set into groups
--(partitions) and what order to put the rows in within that group so
--that the numbering stays consistant
;with orderedEvents as
(
select id, timestamp, row_number() over (partition by id order by timestamp) as rn
from #events
--you would replace #events here with your query
)
--using a second recursive cte here to determine which rows are "good"
--and which ones are not.
, previousGoodTimestamps as
(
--this is the "seeding" part of the recursive cte where I pick the
--first rows of each group as being a desired result. Since they
--are the first in each group, I know they are good. I also assign
--their timestamp as the previous good timestamp since I know that
--this row is good.
select id, timestamp, rn, timestamp as prev_good_timestamp, 1 as is_good
from orderedEvents
where rn = 1
union all
--this is the recursive part of the cte. It takes the rows we have
--already added to this result set and joins those to the "next" rows
--(as defined by our ordering in the first cte). Then we output
--those rows and do some calculations to determine if this row is
--"good" or not. If it is "good" we set it's timestamp as the
--previous good row timestamp so that rows that come after this one
--can use it to determine if they are good or not. If a row is "bad"
--we just forward along the last known good timestamp to the next row.
--
--We also determine if a row is good by checking if the last good row
--timestamp plus 30 minutes is less than or equal to the current row's
--timestamp. If it is then the row is good.
select e2.id
, e2.timestamp
, e2.rn
, last_good_timestamp.timestamp
, case
when dateadd(mi, 30, last_good_timestamp.timestamp) <= e2.timestamp then 1
else 0
end
from previousGoodTimestamps e1
inner join orderedEvents e2 on e2.id = e1.id and e2.rn = e1.rn + 1
--I used a cross apply here to calculate the last good row timestamp
--once. I could have used two identical subqueries above in the select
--and case statements, but I would rather not duplicate the code.
cross apply
(
select case
when e1.is_good = 1 then e1.timestamp --if the last row is good, just use it's timestamp
else e1.prev_good_timestamp --the last row was bad, forward on what it had for the last good timestamp
end as timestamp
) last_good_timestamp
)
select *
from previousGoodTimestamps
where is_good = 1 --only take the "good" rows
Links to MSDN for some of the more complicated things here:
CTEs and Recursive CTEs
CROSS APPLY

-- Sample data.
declare #Samples as Table ( Id Int Identity, A_Id Int, CreatedDate DateTime );
insert into #Samples ( A_Id, CreatedDate ) values
( 1124, '2016-02-01 12:30:30' ),
( 1124, '2016-02-01 12:35:31' ),
( 1124, '2016-02-01 12:40:33' ),
( 1124, '2016-02-01 12:45:42' ),
( 1124, '2016-02-02 12:30:30' ),
( 1124, '2016-02-02 13:00:30' ),
( 1125, '2016-02-01 12:30:30' );
select * from #Samples;
-- Calculate the windows of 30 minutes before and after each CreatedDate and check for conflicts with other rows.
with Ranges as (
select Id, A_Id, CreatedDate,
DateAdd( minute, -30, S.CreatedDate ) as RangeStart, DateAdd( minute, 30, S.CreatedDate ) as RangeEnd
from #Samples as S )
select Id, A_Id, CreatedDate, RangeStart, RangeEnd,
-- Check for a conflict with another row with:
-- the same A_Id value and an earlier CreatedDate that falls inside the +/-30 minute range.
case when exists ( select 42 from #Samples where A_Id = R.A_Id and CreatedDate < R.CreatedDate and R.RangeStart < CreatedDate and CreatedDate < R.RangeEnd ) then 1
else 0 end as Conflict
from Ranges as R;

Related

avoiding group by for column used in datediff?

As the database is currently constructed, I can only use a Date Field of a certain table in a datediff-function that is also part of a count aggregation (not the date field, but that entity where that date field is not null. The group by in the end messes up the counting, since the one entry is counted on it's own / as it's own group.
In some detail:
Our lead recruiter want's a report that shows the sum of applications, and conducted interviews per opening. So far no problem. Additionally he likes to see the total duration per opening from making it public to signing a new employee per opening and of cause only if the opening could already be filled.
I have 4 tables to join:
table 1 holds the data of the opening
table 2 has the single applications
table 3 has the interview data of the applications
table 4 has the data regarding the publication of the openings (with the date when a certain opening was made public)
The problem is the duration requirement. table 4 holds the starting point and in table 2 one (or none) applicant per opening has a date field filled with the time he returned a signed contract and therefor the opening counts as filled. When I use that field in a datediff I'm forced to also put that column in the group by clause and that results in 2 row per opening. 1 row has all the numbers as wanted and in the second row there is always that one person who has a entry in that date field...
So far I haven't come far in thinking of a way of avoiding that problem except for explanining to the colleague that he get's his time-to-fill number in another report.
SELECT
table1.col1 as NameOfProject,
table1.col2 as Company,
table1.col3 as OpeningType,
table1.col4 as ReasonForOpening,
count (table2.col2) as NumberOfApplications,
sum (case when table2.colSTATUS = 'withdrawn' then 1 else 0 end) as mberOfApplicantsWhoWithdraw,
sum (case when table3.colTypeInterview = 'PhoneInterview' then 1 else 0 end) as NumberOfPhoneInterview,
...more sum columns...,
table1.finished, // shows „1“ if opening is occupied
DATEDIFF(day, table4.colValidFrom, **table2.colContractReceived**) as DaysToCompletion
FROM
table2 left join table3 on table2.REF_NR = table3.REF_NR
join table1 on table2.PROJEKT = table1.KBEZ
left join table4 on table1.REFNR = table4.PRJ_REFNR
GROUP BY
**table2.colContractReceived**
and all other columns except the ones in aggregate (sum and count) functions go in the GROUP BY section
ORDER BY table1.NameOfProject
Here is a short rebuild of what it looks like. First a row where the opening is not filled and all aggregations come out in one row as wanted. The next project/opening shows up double, because the field used in the datediff is grouped independently...
project company; no_of_applications; no_of_phoneinterview; no_of_personalinterview; ... ; time_to_fill_in_days; filled?
2018_312 comp a 27 4 2 null 0
2018_313 comp b 54 7 4 null 0
2018_313 comp b 1 1 1 42 1
I'd be glad to get any idea how to solve this. Thanks for considering my request!
(During the 'translation' of all the specific column and table names I might have build in a syntax error here and there but the query worked well ecxept for that unwanted extra aggregation per filled opening)
If I've understood your requirement properly, I believe the issue you are having is that you need to show the date between the starting point and the time at which an applicant responded to an opening, however this must only show a single row based on whether or not the position was filled (if the position was filled, then show that row, if not then show that row).
I've achieved this result by assuming that you count a position as filled using the "ContractsRecevied" column. This may be wrong however the principle should still provide what you are looking for.
I've essentially wrapped your query in to a subquery, performed a rank ordering by the contractsfilled column descending and partitioned by the project. Then in the outer query I filter for the first instance of this ranking.
Even if my assumption about the column structure and data types is wrong, this should provide you with a model to work with.
The only issue you might have with this ranking solution is if you want to aggregate over both rows within one (so include all of the summed columns for both the position filled and position not filled row per project). If this is the case let me know and we can work around that.
Please let me know if you have any questions.
declare #table1 table (
REFNR int,
NameOfProject nvarchar(20),
Company nvarchar(20),
OpeningType nvarchar(20),
ReasonForOpening nvarchar(20),
KBEZ int
);
declare #table2 table (
NumberOfApplications int,
Status nvarchar(15),
REF_NR int,
ReturnedApplicationDate datetime,
ContractsReceived bit,
PROJEKT int
);
declare #table3 table (
TypeInterview nvarchar(25),
REF_NR int
);
declare #table4 table (
PRJ_REFNR int,
StartingPoint datetime
);
insert into #table1 (REFNR, NameOfProject, Company, OpeningType, ReasonForOpening, KBEZ)
values (1, '2018_312', 'comp a' ,'Permanent', 'Business growth', 1),
(2, '2018_313', 'comp a', 'Permanent', 'Business growth', 2),
(3, '2018_313', 'comp a', 'Permanent', 'Business growth', 3);
insert into #table2 (NumberOfApplications, Status, REF_NR, ReturnedApplicationDate, ContractsReceived, PROJEKT)
values (27, 'Processed', 4, '2018-04-01 08:00', 0, 1),
(54, 'Withdrawn', 5, '2018-04-02 10:12', 0, 2),
(1, 'Processed', 6, '2018-04-15 15:00', 1, 3);
insert into #table3 (TypeInterview, REF_NR)
values ('Phone', 4),
('Phone', 5),
('Personal', 6);
insert into #table4 (PRJ_REFNR, StartingPoint)
values (1, '2018-02-25 08:00'),
(2, '2018-03-04 15:00'),
(3, '2018-03-04 15:00');
select * from
(
SELECT
RANK()OVER(Partition by NameOfProject, Company order by ContractsReceived desc) as rowno,
table1. NameOfProject,
table1.Company,
table1.OpeningType,
table1.ReasonForOpening,
case when ContractsReceived >0 then datediff(DAY, StartingPoint, ReturnedApplicationDate) else null end as TimeToFillInDays,
ContractsReceived Filled
FROM
#table2 table2 left join #table3 table3 on table2.REF_NR = table3.REF_NR
join #table1 table1 on table2.PROJEKT = table1.KBEZ
left join #table4 table4 on table1.REFNR = table4.PRJ_REFNR
group by NameOfProject, Company, OpeningType, ReasonForOpening, ContractsReceived,
StartingPoint, ReturnedApplicationDate
) x where rowno=1

Drop rows identified within moving time window

I have a dataset of hospitalisations ('spells') - 1 row per spell. I want to drop any spells recorded within a week after another (there could be multiple) - the rationale being is that they're likely symptomatic of the same underlying cause. Here is some play data:
create table hif_user.rzb_recurse_src (
patid integer not null,
eventdate integer not null,
type smallint not null
);
insert into hif_user.rzb_recurse_src values (1,1,1);
insert into hif_user.rzb_recurse_src values (1,3,2);
insert into hif_user.rzb_recurse_src values (1,5,2);
insert into hif_user.rzb_recurse_src values (1,9,2);
insert into hif_user.rzb_recurse_src values (1,14,2);
insert into hif_user.rzb_recurse_src values (2,1,1);
insert into hif_user.rzb_recurse_src values (2,5,1);
insert into hif_user.rzb_recurse_src values (2,19,2);
Only spells of type 2 - within a week after any other - are to be dropped. Type 1 spells are to remain.
For patient 1, dates 1 & 9 should be kept. For patient 2, all rows should remain.
The issue is with patient 1. Spell date 9 is identified for dropping as it is close to spell date 5; however, as spell date 5 is close to spell date 1 is should be dropped therefore allowing spell date 9 to live...
So, it seems a recursive problem. However, I've not used recursive programming in SQL before and I'm struggling to really picture how to do it. Can anyone help? I should add that I'm using Teradata which has more restrictions than most with recursive SQL (only UNION ALL sets allowed I believe).
It's a cursor logic, check one row after the other if it fits your rules, so recursion is the easiest (maybe the only) way to solve your problem.
To get a decent performance you need a Volatile Table to facilitate this row-by-row processing:
CREATE VOLATILE TABLE vt (patid, eventdate, exac_type, rn, startdate) AS
(
SELECT r.*
,ROW_NUMBER() -- needed to facilitate the join
OVER (PARTITION BY patid ORDER BY eventdate) AS rn
FROM hif_user.rzb_recurse_src AS r
) WITH DATA ON COMMIT PRESERVE ROWS;
WITH RECURSIVE cte (patid, eventdate, exac_type, rn, startdate) AS
(
SELECT vt.*
,eventdate AS startdate
FROM vt
WHERE rn = 1 -- start with the first row
UNION ALL
SELECT vt.*
-- check if type = 1 or more than 7 days from the last eventdate
,CASE WHEN vt.eventdate > cte.startdate + 7
OR vt.exac_type = 1
THEN vt.eventdate -- new start date
ELSE cte.startdate -- keep old date
END
FROM vt JOIN cte
ON vt.patid = cte.patid
AND vt.rn = cte.rn + 1 -- proceed to next row
)
SELECT *
FROM cte
WHERE eventdate - startdate = 0 -- only new start days
order by patid, eventdate
I think the key to solving this is getting the first date more than 7 days from the current date and then doing a recursive subquery:
with rrs as (
select rrs.*,
(select min(rrs2.eventdate)
from hif_user.rzb_recurse_src rrs2
where rrs2.patid = rrs.patid and
rrs2.eventdate > rrs.eventdate + 7
) as eventdate7
from hif_user.rzb_recurse_src rrs
),
recursive cte as (
select patid, min(eventdate) as eventdate, min(eventdate7) as eventdate7
from hif_user.rzb_recurse_src rrs
group by patid
union all
select cte.patid, cte.eventdate7, rrs.eventdate7
from cte join
hif_user.rzb_recurse_src rrs
on rrs.patid = cte.patid and
rrs.eventdate = cte.eventdate7
)
select cte.patid, cte.eventdate
from cte;
If you want additional columns, then join in the original table at the last step.

How to select values by date field (not as simple as it sounds)

I have a table called tblMK The table contains a date time field.
What I wish to do is create a query which will each time, select the 2 latest entries (by the datetime column) and then get the date difference between them and show only that.
How would I go around creating this expression. This doesn't necessarily need to be a query, it could be a view/function/procedure or what ever works. I have created a function called getdatediff which receives to dates, and returns a string the says (x days y hours z minutes) basically that will be the calculated field. So how would I go around doing this?
Edit: I need to each time select 2 and 2 and so on until the oldest one. There will always be an even amount of rows.
Use only sql like this:
create table t1(c1 integer, dt datetime);
insert into t1 values
(1, getdate()),
(2, dateadd(day,1,getdate())),
(3, dateadd(day,2,getdate()));
with temp as (select top 2 dt
from t1
order by dt desc)
select datediff(day,min(dt),max(dt)) as diff_of_dates
from temp;
sql fiddle
On MySQL use limit clause
select max(a.updated_at)-min(a.updated_at)
From
( select * from mytable order by updated_at desc limit 2 ) a
Thanks guys I found the solution please ignore the additional columns they are for my db:
; with numbered as (
Select part,taarich,hulia,mesirakabala,
rowno = row_number() OVER (Partition by parit order.by taarich)
From tblMK)
Select a.rowno-1,a.part, a.Julia,b.taarich,as.taarich_kabala,a.taarich, a.mesirakabala,getdatediff(b.taarich,a.taarich) as due
From numbered a
Left join numbered b ON b.parit=a.parit
And b.rowno = a.rowno - 1
Where b.taarich is not null
Order by part,taarich
Sorry about mistakes I might of made, I'm on my smartphone.

finding a mismatch while iterating rows in sql

I have this issue where date keys where just inserted into a table through SQL Server. They are populated iteratively in the fashion shown below:
20130501
20130502
20130503
...
I am currently trying to find any row where one of the dates was skipped, i.e:
20130504
20130506
20130507
I'm still a rookie in SQL Server and I have looked at CURSOR but I'm having some trouble understanding how to go about about querying this. Any help would be appreciated. Thanks.
Using some tricks from the Itzik Ben-Gan school of thought. The easiest way to find gaps is with the use of a tally table. Here is a way to create a small one into a table variable, but i would recommend creating a substantiated Numbers table because they're really handy for this kind of thing. You can find a bunch of examples on how to do that here.
First create a number table
DECLARE #Numbers TABLE ( [Number] INT );
INSERT INTO #Numbers
(
Number
)
SELECT TOP 1000
ROW_NUMBER() OVER (ORDER BY [s1].[object_id]) AS Number
FROM sys.objects s1
CROSS JOIN sys.objects s2
Next I needed to create a temp table to recreate your example
DECLARE #ExampleDates TABLE ( [RecordDateKey] INT );
INSERT INTO #ExampleDates
( [RecordDateKey] )
VALUES ( 20130501 ),
( 20130502 ),
( 20130503 ),
( 20130504 ),
( 20130506 ),
( 20130507 ),
( 20130508 ),
( 20130511 );
this syntax only works 2008-r2 and forward but since i'm just staging data it's not really a big deal. Just leaving this note for other people testing this example.
Finally we need to do some conversion work.
For larger sets, it might be beneficial to substantiate this data, but for this small example, a cte sufficed.
WITH date_convert
AS (
SELECT [RecordDateKey]
, CONVERT(DATETIME, CAST([RecordDateKey] AS VARCHAR(50)), 112) [RecordDate]
FROM #ExampleDates ed
) ,
date_range
AS (
SELECT DATEDIFF(DAY, MIN([RecordDate]), MAX([RecordDate])) [Range]
, MIN([RecordDate]) [StartDate]
FROM [date_convert]
) ,
all_dates
AS (
SELECT CONVERT(INT, CONVERT(VARCHAR(8), DATEADD(DAY, num.[Number], [StartDate]), 112)) AS [RecordDateKey]
, DATEADD(DAY, num.[Number], [StartDate]) [RecordDate]
FROM #Numbers num
CROSS JOIN [date_range] dr
WHERE num.[Number] <= dr.[Range]
)
SELECT [RecordDateKey]
, [RecordDate]
FROM all_dates ad
WHERE NOT EXISTS ( SELECT 1
FROM [date_convert] dc
WHERE ad.[RecordDate] = dc.RecordDate )
date_convert: changes the key you provided to a datetime for easy comparison and for dateadd.
date_range: finds the range of dates, and where the range starts.
all_dates: finds all of the dates that should have existed in your range.
The final select finds the records in the data that aren't in the generated set.
Using this code, this was my output. This should find gaps regardless of gap size. Which appeared to be the issue with the current accepted answer.
RecordDateKey RecordDate
------------- ----------
20130505 2013-05-05 00:00:00.000
20130509 2013-05-09 00:00:00.000
20130510 2013-05-10 00:00:00.000
SELECT *
FROM table
WHERE date - 1 NOT IN (SELECT date FROM table)
It's probably not super efficient but it should work.

SQL Selecting rows at varying intervals

I've got a situation where I have a huge table, containing a huge number of rows, which looks like (for example):
id Timestamp Value
14574499 2011-09-28 08:33:32.020 99713.3000
14574521 2011-09-28 08:33:42.203 99713.3000
14574540 2011-09-28 08:33:47.017 99713.3000
14574559 2011-09-28 08:38:53.177 99720.3100
14574578 2011-09-28 08:38:58.713 99720.3100
14574597 2011-09-28 08:39:03.590 99720.3100
14574616 2011-09-28 08:39:08.950 99720.3100
14574635 2011-09-28 08:39:13.793 99720.3100
14574654 2011-09-28 08:39:19.063 99720.3100
14574673 2011-09-28 08:39:23.780 99720.3100
14574692 2011-09-28 08:39:29.167 99758.6400
14574711 2011-09-28 08:39:33.967 99758.6400
14574730 2011-09-28 08:39:40.803 99758.6400
14574749 2011-09-28 08:39:49.297 99758.6400
Ok, so the rules are:
The timestamps can be any n number of seconds apart, 5s, 30s, 60s etc, it varies depending on how old the record is (archiving takes place).
I want to be able to query this table to select each nth row based on the timestamp.
So for example:
Select * from mytable where intervalBetweenTheRows = 30s
(for the purposes of this question, based on the presumption the interval requested is always to a higher precision than available in the database)
So, every nth row based on the time between each row
Any ideas?!
Karl
For those of you who are interested, recursive CTE was actually quite slow, I thought of a slightly different method:
SELECT TOP 500
MIN(pvh.[TimeStamp]) as [TimeStamp],
AVG(pvh.[Value]) as [Value]
FROM
PortfolioValueHistory pvh
WHERE
pvh.PortfolioID = #PortfolioID
AND pvh.[TimeStamp] >= #StartDate
AND pvh.[TimeStamp] <= #EndDate
GROUP BY
FLOOR(DateDiff(Second, '01/01/2011 00:00:00', pvh.[TimeStamp]) / #ResolutionInSeconds)
ORDER BY
[TimeStamp] ASC
I take the timestamp minus an arbitrary date to give a base int to work with, then floor and divide this by my desired resolution, I then group by this, taking the min timestamp (the first of that 'region' of stamps) and the average value for that 'period'.
This is used to plot a graph of historical data, so the average value does me fine.
This was the fastest execution based on the table size that I could come up with
Thanks for your help all.
Assuming that the requirement is that the determinant for whether a row is returned or not depends on the time elapsed from the previous returned row this needs a procedural approach. Recursive CTEs might be a bit more efficient than a cursor though.
WITH RecursiveCTE
AS (SELECT TOP 1 *
FROM #T
ORDER BY [Timestamp]
UNION ALL
SELECT id,
[Timestamp],
Value
FROM (
--Can't use TOP directly
SELECT T.*,
rn = ROW_NUMBER() OVER (ORDER BY T.[Timestamp])
FROM #T T
JOIN RecursiveCTE R
ON T.[Timestamp] >=
DATEADD(SECOND, 30, R.[Timestamp])) R
WHERE R.rn = 1)
SELECT *
FROM RecursiveCTE
This isn't as elegant as Martin S's CTE, but instead uses interpolation on predefined sample points to get the first sample in between each pair of sampling times.
If there is no sample in a period then no record is returned.
DECLARE #SampleTime DATETIME
DECLARE #NumberSamples INT
DECLARE #SampleInterval INT
SET #SampleTime = '2011-09-28 08:33:32.020' -- Start time
SET #NumberSamples = 20 -- Or however many sample intervals you need to evaluate
SET #SampleInterval = 30 -- Seconds
CREATE TABLE #tmpTimesToSample
(
SampleID INT,
SampleTime DATETIME NULL
)
-- Works out the time intervals, 0 to 19
INSERT INTO #tmpTimesToSample(SampleID, SampleTime)
SELECT TOP (#NumberSamples)
sv.number,
DATEADD(ss, sv.number * #SampleInterval, #SampleTime)
FROM
master..spt_values sv
WHERE
type = 'p'
ORDER BY
sv.number ASC
-- Now interpolate these sample intervals back into the data table
SELECT ID, [TimeStamp], Value
FROM
(
SELECT mt.Id, mt.[TimeStamp], mt.Value, row_number() over (partition by tmp.SampleID order by tmp.SampleID) as RowNum
FROM #tmpTimesToSample tmp RIGHT OUTER JOIN MyTable mt
on mt.[TimeStamp] BETWEEN tmp.SampleTime and DATEADD(ss, #SampleInterval, tmp.SampleTime)
) x
WHERE x.RowNum = 1 -- Only want the first sample in each bin
DROP TABLE #tmpTimesToSample
Test data:
CREATE TABLE MyTable
(
ID BIGINT,
[TimeStamp] DATETIME,
[Value] DECIMAL(18,4)
)
GO
insert into MyTable values(14574499, '2011-09-28 08:33:32.020', 99713.3000)
insert into MyTable values(14574521 ,'2011-09-28 08:33:42.203', 99713.3000)
insert into MyTable values(14574540 ,'2011-09-28 08:33:47.017', 99713.3000)
insert into MyTable values(14574559 ,'2011-09-28 08:38:53.177', 99720.3100)
insert into MyTable values(14574578 ,'2011-09-28 08:38:58.713', 99720.3100)
insert into MyTable values(14574597 ,'2011-09-28 08:39:03.590', 99720.3100)
insert into MyTable values(14574616 ,'2011-09-28 08:39:08.950', 99720.3100)
insert into MyTable values(14574635 ,'2011-09-28 08:39:13.793', 99720.3100)
insert into MyTable values(14574654 ,'2011-09-28 08:39:19.063', 99720.3100)
insert into MyTable values(14574673 ,'2011-09-28 08:39:23.780', 99720.3100)
insert into MyTable values(14574692 ,'2011-09-28 08:39:29.167', 99758.6400)
insert into MyTable values(14574711 ,'2011-09-28 08:39:33.967', 99758.6400)
insert into MyTable values(14574730 ,'2011-09-28 08:39:40.803', 99758.6400)
insert into MyTable values(14574749 ,'2011-09-28 08:39:49.297', 99758.6400)
go
This will give you all rows that have a 30 millisecond interval to the next row. Both rows will be side by side.
Select T1.*, T2.*
From MyTable T1
Inner Join MyTable T2
On DateDiff (millisecond, T1.Value, T2.Value) = 30