In the same column, compare each value with previous multiple values with condition - sql

I'm working on a table looks like this. The actual dataset contains thousand of Guest_ID, and I'm just showing few sample lines here.
Guest_ID
Visit_ID
Collection_time
Value
6
a178
2007-11-09 11:28:00
2.6
6
a188
2007-11-10 20:28:00
6.6
12
a278
2008-11-11 10:28:00
2.7
12
a278
2008-11-11 11:38:00
3.2
12
a278
2008-11-12 11:48:00
6.8
12
c348
2009-10-12 11:38:00
3.8
15
e179
2013-01-15 09:25:00
1.8
15
e179
2013-01-15 10:26:00
1.6
15
e179
2013-01-15 12:15:00
3.8
15
e179
2013-01-17 09:25:00
3.6
What I'm trying to do here is to find out the values that had increased by at least 3 within the past 48hr, and these values need to be under the same visit_id. In this case, result should only return
Guest_ID
Visit_ID
Collection_time
Value
12
a278
2008-11-12 11:48:00
6.8
I have some vague ideas of creating islands and gaps in SQL Server, but not sure how to approach it. Conceptually, for each value X, I need to extract all the previous value meets the conditions (within past 48hr AND under the same Visit_ID), then check if X - min(previous value) >= 3. And if yes, keep or label X as 1, and repeat the procedure.
I read a lot of posts like using lag() or row_number() over (partition by ... order by ...), but still unsure about what to do. Any help is appreciated!

This would have been a good spot to use window functions with a date range specification. Alas, SQL Server does not support that (yet?).
The simplest approach might be exists and a correlated subquery:
select t.*
from mytable t
where exists (
select 1
from mytable t1
where
t1.visit_id = t.visit_id
and t1.collection_time >= dateadd(day, -2.collection_time)
and t1.collection_time < t.collection_time
and t1.value < t.value - 3
)
Or you can use cross apply:
select t.*
from mytable t
cross apply (
select min(t1.value) as min_value
from mytable t1
where
t1.visit_id = t.visit_id
and t1.collection_time >= dateadd(day, -2.collection_time)
and t1.collection_time < t.collection_time
) t1
where t1.min_value < t.value - 3

I used a CTE to filter out the qualifying rows first and then just join it back up to the original table to grab those rows:
CREATE TABLE #tmp(Guest_ID int, Visit_ID varchar(10), Collection_time datetime, Value decimal(10,1))
INSERT INTO #tmp VALUES
(6, 'a178', '2007-11-09 11:28:00', 2.6),
(6, 'a188', '2007-11-10 20:28:00', 6.6),
(12, 'a278', '2008-11-11 10:28:00', 2.7),
(12, 'a278', '2008-11-11 11:38:00', 3.2),
(12, 'a278', '2008-11-12 11:48:00', 6.8),
(12, 'c348', '2009-10-12 11:38:00', 3.8),
(15, 'e179', '2013-01-15 09:25:00', 1.8),
(15, 'e179', '2013-01-15 10:26:00', 1.6),
(15, 'e179', '2013-01-15 12:15:00', 3.8),
(15, 'e179', '2013-01-17 09:25:00', 3.6)
;WITH CTE AS(
SELECT MAX(Collection_time) MaxCollection_Time, Max(Value) - Min(Value) DiffInValue ,Visit_ID
FROM #tmp
GROUP BY Visit_ID
HAVING Max(Value) - Min(Value) >= 3
)
SELECT t1.*
FROM #tmp t1
INNER JOIN CTE t2 on t1.Visit_ID = t2.Visit_ID and T1.Collection_time = t2.MaxCollection_Time

Related

Display All Quarters based on Year In SQL Server

I Need to display all the Quarters from date field.
SELECT '2021' AS YOE,DATEPART(Quarter,'2021')AS [Quarter],450 AS Qty
Actual Result:
YOE
Quarter
Qty
2021
1
450
Expected Result:
YOE
Quarter
Qty
2021
1
450
2021
2
0
2021
3
0
2021
4
0
Here is an answer that will get you results for any given year.
What you need to do is start with a calendar table as the driving row, then LEFT JOIN your table to it.
We will use Itzik Ben-Gan's tally table for this purpose. We pass through the starting year as #startingYear:
;WITH
L0 AS ( SELECT 1 AS c
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
L1 AS ( SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B ),
Nums AS ( SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS rownum
FROM L1 ),
Years AS ( SELECT DATEADD(year, rownum - 1, #startingYear) AS Year FROM Nums)
SELECT
y.Year AS YOE,
q.Quarter,
SUM(t.Value) AS Qty
FROM Years AS y
CROSS JOIN (VALUES (1), (2), (3), (4) ) AS q(Quarter)
LEFT JOIN (Table AS t
-- any other inner or left joins between Table and ON
)
ON DATEPART(Quarter,t.Date) = q.Quarter AND
t.Date >= DATETIMEFROMPARTS(#startingYear, 1, 1);
I would suggest values():
SELECT '2021' AS YOE, vv.*
FROM (VALUES (1, 350), (2, 0), (3, 0), (4, 0)) v([Quarter], Qty)
Considering it seems that this isn't against a table, but you are "creating" the data, use a VALUES table construct:
SELECT YOE,
[Quarter],
Qty
FROM (VALUES(2021,1,450), --Numerical values don't go in quotes.
(2021,2,0), --Strings and Numerical values are very different,
(2021,3,0), --don't confuse the 2.
(2021,4,0))V(YOE,[Quarter],Qty)
ORDER BY YOE,
[Quarter];

SQL Server cumulative SUM of a DATEDIFF into a percentage

I am trying to get a cumulative SUM of a DATEDIFF into a percentage from some basic data I have, here is a small snapshot:
ID IIn IOut
AB123 2015-11-06 15:24:44.057 2015-11-14 01:00:00.000
QA565 2015-10-27 20:12:19.753 2015-11-06 03:00:00.000
UN555 2015-12-29 06:29:23.417 2016-01-03 08:00:00.000
LG602 2015-08-07 16:52:13.573 2015-08-11 03:00:00.000
ETC ETC
I then use DATEDIFF to get a number of days:
SELECT ID, DATEDIFF(hour, IIn, IOut)/24.0 IDays
FROM TimeTable
Which gives me:
ID IDays
AB123 7.416666
QA565 9.291666
UN555 5.083333
LG602 3.458333
What I want is a count of ID'S split by their IDay's (rounded down) with a cumulative % from lowest IDay's to highest like so:
ID IDays IDaysPer
LG602 3 12.5
UN555 5 33.33
AB123 7 62.49
QA565 9 100
You can do this using a couple of windowed aggregates, placing your original query in a CTE for convenience (A subquery would also work):
declare #timeTable table (ID char(5) not null, IIn datetime not null,
IOut datetime not null)
insert into #timeTable(ID,IIn,IOut) values
('AB123','2015-11-06T15:24:44.057','2015-11-14T01:00:00.000'),
('QA565','2015-10-27T20:12:19.753','2015-11-06T03:00:00.000'),
('UN555','2015-12-29T06:29:23.417','2016-01-03T08:00:00.000'),
('LG602','2015-08-07T16:52:13.573','2015-08-11T03:00:00.000')
;With Diffs as (
SELECT ID, DATEDIFF(hour, IIn, IOut)/24.0 IDays
FROM #timeTable
)
select
*,
(
SUM(IDays) OVER (ORDER BY IDays, ID)
/
SUM(IDays) OVER ()
) * 100 as IDaysPer
from
Diffs
order by IDays
Note that I couldn't quite make sense of your "rounded down" requirement but you should be able to use any common rounding technique wrapped around the appropriate calculation. So my outputs don't quite match yours:
ID IDays IDaysPer
----- --------------------------------------- ---------------------------------------
LG602 3.458333 13.696300
UN555 5.083333 33.828300
AB123 7.416666 63.201300
QA565 9.291666 100.000000
Consider TimeTable has already the data
WITH t1 (ID, IDays)
AS (
SELECT ID, DATEDIFF(hour, IIn, IOut) / 24.0 AS IDays
FROM TimeTable
)
SELECT
ID, FLOOR(IDays),
(FLOOR(IDays) / (SELECT SUM(FLOOR(IDays)) FROM t1 t2 WHERE t1.IDays <= t2.IDays)) * 100.0 AS IDaysPer
FROM t1
ORDER BY 2 ASC
Here you go : Output matches with yours...
create table #TEMp
(ID VARCHAR(100)
,IIn datetime
,IOut datetime
)
insert into #temp(ID,IIn,IOut) values
('AB123','2015-11-06T15:24:44.057','2015-11-14T01:00:00.000'),
('QA565','2015-10-27T20:12:19.753','2015-11-06T03:00:00.000'),
('UN555','2015-12-29T06:29:23.417','2016-01-03T08:00:00.000'),
('LG602','2015-08-07T16:52:13.573','2015-08-11T03:00:00.000')
select ID,IDays AS Idays,ROUND(CAST(SUM(IDays) OVER(ORDER BY IDays) AS FLOAT)/CAST(SUM(IDays)OVER() AS FLOAT) * 100,2) AS IdaysPer
from
(
select *,ROUND(DATEDIFF(hour, IIn, IOut)/24,0) IDays
from #TEMP
)T

SQL find average time difference between rows for a given category

I browsed SO but could not quite find the exact answer or maybe it was for a different language.
Let's say I have a table, where each row is a record of a trade:
trade_id customer trade_date
1 A 2013-05-01 00:00:00
2 B 2013-05-01 10:00:00
3 A 2013-05-02 00:00:00
4 A 2013-05-05 00:00:00
5 B 2013-05-06 12:00:00
I would like to have the average time between trades, in days or fraction of days, for each customer, and the number of days since last trade. So for instance for customer A, time between trades 1 and 3 is 1 day and between trades 3 and 4 is 3 days, for an average of 2. So the end table would look like something like this (assuming today it's the 2013-05-10):
customer avg_time_btw_trades time_since_last_trade
A 2.0 5.0
B 5.08 3.5
If a customer has only got 1 trade I guess NULL is fine as output.
Not even sure SQL is the best way to do this (I am working with SQL server), but any help is appreciated!
SELECT
customer,
DATEDIFF(second, MIN(trade_date), MAX(trade_date)) / (NULLIF(COUNT(*), 1) - 1) / 86400.0,
DATEDIFF(second, MAX(trade_date), GETDATE() ) / 86400.0
FROM
yourTable
GROUP BY
customer
http://sqlfiddle.com/#!6/eb46e/7
EDIT: Added final field that I didn't notice, apologies.
The following SQL script uses your data and gives the expected results.
DECLARE #temp TABLE
( trade_id INT,
customer CHAR(1),
trade_date DATETIME );
INSERT INTO #temp VALUES (1, 'A', '20130501');
INSERT INTO #temp VALUES (2, 'B', '20130501 10:00');
INSERT INTO #temp VALUES (3, 'A', '20130502');
INSERT INTO #temp VALUES (4, 'A', '20130505');
INSERT INTO #temp VALUES (5, 'B', '20130506 12:00');
DECLARE #getdate DATETIME
-- SET #getdate = getdate();
SET #getdate = '20130510';
SELECT s.customer
, AVG(s.days_btw_trades) AS avg_time_between_trades
, CAST(DATEDIFF(hour, MAX(s.trade_date), #getdate) AS float)
/ 24.0 AS time_since_last_trade
FROM (
SELECT CAST(DATEDIFF(HOUR, t2.trade_date, t.trade_date) AS float)
/ 24.0 AS days_btw_trades
, t.customer
, t.trade_date
FROM #temp t
LEFT JOIN #temp t2 ON t2.customer = t.customer
AND t2.trade_date = ( SELECT MAX(t3.trade_date)
FROM #temp t3
WHERE t3.customer = t.customer
AND t3.trade_date < t.trade_date)
) s
GROUP BY s.customer
You need a date difference between every trade and average them.
select
a.customer
,avg(datediff(a.trade_date, b.trade_date))
,datediff(now(),max(a.trade_date))
from yourTable a, yourTable b
where a.customer = b.customer
and b.trade_date = (
select max(trade_date)
from yourTable c
where c.customer = a.customer
and a.trade_date > c.trade_date)
#gets the one earlier date for every trade
group by a.customer
Just for grins I added a solution that would use CTE's. You could probably use a temp table if the first query is too large. I used #MatBailie creation script for the table:
CREATE TABLE customer_trades (
id INT IDENTITY(1,1),
customer_id INT,
trade_date DATETIME,
PRIMARY KEY (id),
INDEX ix_user_trades (customer_id, trade_date)
)
INSERT INTO
customer_trades (
customer_id,
trade_date
)
VALUES
(1, '2013-05-01 00:00:00'),
(2, '2013-05-01 10:00:00'),
(1, '2013-05-02 00:00:00'),
(1, '2013-05-05 00:00:00'),
(2, '2013-05-06 12:00:00')
;
;WITH CTE as(
select customer_id, trade_date, datediff(hour,trade_date,ISNULL(LEAD(trade_date,1) over (partition by customer_id order by trade_date),GETDATE())) Trade_diff
from customer_trades
)
, CTE2 as
(SELECT customer_id, trade_diff, LAST_VALUE(trade_diff) OVER(Partition by customer_id order by trade_date) Curr_Trade from CTE)
SELECT Customer_id, AVG(trade_diff) AV, Max(Curr_Trade) Curr_Trade
FROM CTE2
GROUP BY customer_id

SQL moving average

How do you create a moving average in SQL?
Current table:
Date Clicks
2012-05-01 2,230
2012-05-02 3,150
2012-05-03 5,520
2012-05-04 1,330
2012-05-05 2,260
2012-05-06 3,540
2012-05-07 2,330
Desired table or output:
Date Clicks 3 day Moving Average
2012-05-01 2,230
2012-05-02 3,150
2012-05-03 5,520 4,360
2012-05-04 1,330 3,330
2012-05-05 2,260 3,120
2012-05-06 3,540 3,320
2012-05-07 2,330 3,010
This is an Evergreen Joe Celko question.
I ignore which DBMS platform is used. But in any case Joe was able to answer more than 10 years ago with standard SQL.
Joe Celko SQL Puzzles and Answers citation:
"That last update attempt suggests that we could use the predicate to
construct a query that would give us a moving average:"
SELECT S1.sample_time, AVG(S2.load) AS avg_prev_hour_load
FROM Samples AS S1, Samples AS S2
WHERE S2.sample_time
BETWEEN (S1.sample_time - INTERVAL 1 HOUR)
AND S1.sample_time
GROUP BY S1.sample_time;
Is the extra column or the query approach better? The query is
technically better because the UPDATE approach will denormalize the
database. However, if the historical data being recorded is not going
to change and computing the moving average is expensive, you might
consider using the column approach.
MS SQL Example:
CREATE TABLE #TestDW
( Date1 datetime,
LoadValue Numeric(13,6)
);
INSERT INTO #TestDW VALUES('2012-06-09' , '3.540' );
INSERT INTO #TestDW VALUES('2012-06-08' , '2.260' );
INSERT INTO #TestDW VALUES('2012-06-07' , '1.330' );
INSERT INTO #TestDW VALUES('2012-06-06' , '5.520' );
INSERT INTO #TestDW VALUES('2012-06-05' , '3.150' );
INSERT INTO #TestDW VALUES('2012-06-04' , '2.230' );
SQL Puzzle query:
SELECT S1.date1, AVG(S2.LoadValue) AS avg_prev_3_days
FROM #TestDW AS S1, #TestDW AS S2
WHERE S2.date1
BETWEEN DATEADD(d, -2, S1.date1 )
AND S1.date1
GROUP BY S1.date1
order by 1;
One way to do this is to join on the same table a few times.
select
(Current.Clicks
+ isnull(P1.Clicks, 0)
+ isnull(P2.Clicks, 0)
+ isnull(P3.Clicks, 0)) / 4 as MovingAvg3
from
MyTable as Current
left join MyTable as P1 on P1.Date = DateAdd(day, -1, Current.Date)
left join MyTable as P2 on P2.Date = DateAdd(day, -2, Current.Date)
left join MyTable as P3 on P3.Date = DateAdd(day, -3, Current.Date)
Adjust the DateAdd component of the ON-Clauses to match whether you want your moving average to be strictly from the past-through-now or days-ago through days-ahead.
This works nicely for situations where you need a moving average over only a few data points.
This is not an optimal solution for moving averages with more than a few data points.
select t2.date, round(sum(ct.clicks)/3) as avg_clicks
from
(select date from clickstable) as t2,
(select date, clicks from clickstable) as ct
where datediff(t2.date, ct.date) between 0 and 2
group by t2.date
Example here.
Obviously you can change the interval to whatever you need. You could also use count() instead of a magic number to make it easier to change, but that will also slow it down.
General template for rolling averages that scales well for large data sets
WITH moving_avg AS (
SELECT 0 AS [lag] UNION ALL
SELECT 1 AS [lag] UNION ALL
SELECT 2 AS [lag] UNION ALL
SELECT 3 AS [lag] --ETC
)
SELECT
DATEADD(day,[lag],[date]) AS [reference_date],
[otherkey1],[otherkey2],[otherkey3],
AVG([value1]) AS [avg_value1],
AVG([value2]) AS [avg_value2]
FROM [data_table]
CROSS JOIN moving_avg
GROUP BY [otherkey1],[otherkey2],[otherkey3],DATEADD(day,[lag],[date])
ORDER BY [otherkey1],[otherkey2],[otherkey3],[reference_date];
And for weighted rolling averages:
WITH weighted_avg AS (
SELECT 0 AS [lag], 1.0 AS [weight] UNION ALL
SELECT 1 AS [lag], 0.6 AS [weight] UNION ALL
SELECT 2 AS [lag], 0.3 AS [weight] UNION ALL
SELECT 3 AS [lag], 0.1 AS [weight] --ETC
)
SELECT
DATEADD(day,[lag],[date]) AS [reference_date],
[otherkey1],[otherkey2],[otherkey3],
AVG([value1] * [weight]) / AVG([weight]) AS [wavg_value1],
AVG([value2] * [weight]) / AVG([weight]) AS [wavg_value2]
FROM [data_table]
CROSS JOIN weighted_avg
GROUP BY [otherkey1],[otherkey2],[otherkey3],DATEADD(day,[lag],[date])
ORDER BY [otherkey1],[otherkey2],[otherkey3],[reference_date];
select *
, (select avg(c2.clicks) from #clicks_table c2
where c2.date between dateadd(dd, -2, c1.date) and c1.date) mov_avg
from #clicks_table c1
Use a different join predicate:
SELECT current.date
,avg(periods.clicks)
FROM current left outer join current as periods
ON current.date BETWEEN dateadd(d,-2, periods.date) AND periods.date
GROUP BY current.date HAVING COUNT(*) >= 3
The having statement will prevent any dates without at least N values from being returned.
assume x is the value to be averaged and xDate is the date value:
SELECT avg(x) from myTable WHERE xDate BETWEEN dateadd(d, -2, xDate) and xDate
In hive, maybe you could try
select date, clicks, avg(clicks) over (order by date rows between 2 preceding and current row) as moving_avg from clicktable;
For the purpose, I'd like to create an auxiliary/dimensional date table like
create table date_dim(date date, date_1 date, dates_2 date, dates_3 dates ...)
while date is the key, date_1 for this day, date_2 contains this day and the day before; date_3...
Then you can do the equal join in hive.
Using a view like:
select date, date from date_dim
union all
select date, date_add(date, -1) from date_dim
union all
select date, date_add(date, -2) from date_dim
union all
select date, date_add(date, -3) from date_dim
NOTE: THIS IS NOT AN ANSWER but an enhanced code sample of Diego Scaravaggi's answer. I am posting it as answer as the comment section is insufficient. Note that I have parameter-ized the period for Moving aveage.
declare #p int = 3
declare #t table(d int, bal float)
insert into #t values
(1,94),
(2,99),
(3,76),
(4,74),
(5,48),
(6,55),
(7,90),
(8,77),
(9,16),
(10,19),
(11,66),
(12,47)
select a.d, avg(b.bal)
from
#t a
left join #t b on b.d between a.d-(#p-1) and a.d
group by a.d
--#p1 is period of moving average, #01 is offset
declare #p1 as int
declare #o1 as int
set #p1 = 5;
set #o1 = 3;
with np as(
select *, rank() over(partition by cmdty, tenor order by markdt) as r
from p_prices p1
where
1=1
)
, x1 as (
select s1.*, avg(s2.val) as avgval from np s1
inner join np s2
on s1.cmdty = s2.cmdty and s1.tenor = s2.tenor
and s2.r between s1.r - (#p1 - 1) - (#o1) and s1.r - (#o1)
group by s1.cmdty, s1.tenor, s1.markdt, s1.val, s1.r
)
I'm not sure that your expected result (output) shows classic "simple moving (rolling) average" for 3 days. Because, for example, the first triple of numbers by definition gives:
ThreeDaysMovingAverage = (2.230 + 3.150 + 5.520) / 3 = 3.6333333
but you expect 4.360 and it's confusing.
Nevertheless, I suggest the following solution, which uses window-function AVG. This approach is much more efficient (clear and less resource-intensive) than SELF-JOIN introduced in other answers (and I'm surprised that no one has given a better solution).
-- Oracle-SQL dialect
with
data_table as (
select date '2012-05-01' AS dt, 2.230 AS clicks from dual union all
select date '2012-05-02' AS dt, 3.150 AS clicks from dual union all
select date '2012-05-03' AS dt, 5.520 AS clicks from dual union all
select date '2012-05-04' AS dt, 1.330 AS clicks from dual union all
select date '2012-05-05' AS dt, 2.260 AS clicks from dual union all
select date '2012-05-06' AS dt, 3.540 AS clicks from dual union all
select date '2012-05-07' AS dt, 2.330 AS clicks from dual
),
param as (select 3 days from dual)
select
dt AS "Date",
clicks AS "Clicks",
case when rownum >= p.days then
avg(clicks) over (order by dt
rows between p.days - 1 preceding and current row)
end
AS "3 day Moving Average"
from data_table t, param p;
You see that AVG is wrapped with case when rownum >= p.days then to force NULLs in first rows, where "3 day Moving Average" is meaningless.
We can apply Joe Celko's "dirty" left outer join method (as cited above by Diego Scaravaggi) to answer the question as it was asked.
declare #ClicksTable table ([Date] date, Clicks int)
insert into #ClicksTable
select '2012-05-01', 2230 union all
select '2012-05-02', 3150 union all
select '2012-05-03', 5520 union all
select '2012-05-04', 1330 union all
select '2012-05-05', 2260 union all
select '2012-05-06', 3540 union all
select '2012-05-07', 2330
This query:
SELECT
T1.[Date],
T1.Clicks,
-- AVG ignores NULL values so we have to explicitly NULLify
-- the days when we don't have a full 3-day sample
CASE WHEN count(T2.[Date]) < 3 THEN NULL
ELSE AVG(T2.Clicks)
END AS [3-Day Moving Average]
FROM #ClicksTable T1
LEFT OUTER JOIN #ClicksTable T2
ON T2.[Date] BETWEEN DATEADD(d, -2, T1.[Date]) AND T1.[Date]
GROUP BY T1.[Date]
Generates the requested output:
Date Clicks 3-Day Moving Average
2012-05-01 2,230
2012-05-02 3,150
2012-05-03 5,520 4,360
2012-05-04 1,330 3,330
2012-05-05 2,260 3,120
2012-05-06 3,540 3,320
2012-05-07 2,330 3,010

sql query - select duplicates within a 12 hour period

if i have data as follows
A | 01/01/2008 00:00:00
B | 01/01/2008 01:00:00
A | 01/01/2008 11:59:00
C | 02/01/2008 00:00:00
D | 02/01/2008 01:00:00
D | 02/01/2008 20:00:00
I want to only select the records whose identifiers (A, B, C or D) have occured twice within a 12 hour period. In this example above this would only be 'A'
Can anyone help please (this is for an Oracle data base)
Thanks
M
Select Distinct A.Identifer
From Table A
Join Table B -- EDIT to eliminate self Joins (to same row)
On A.PrimKey <> B.PrimaryKey
And A.Identifer = B.Identifer
-- EDIT to fix case where 2 at same time
And A.OccurTime >= B.OccurTime
And A.OccurTime < B.OccurTime + .5
and to implement question asked in comment,
(Ignoring records which are on different days)
-- for SQL Server,
Select Distinct A.Identifer
From Table A
Join Table B
On A.PrimKey <> B.PrimaryKey
And A.Identifer = B.Identifer
-- EDIT to fix case where 2 at same time
And A.OccurTime >= B.OccurTime
And A.OccurTime < B.OccurTime + .5
Where DateDiff(day, A.OccurTime, B.OccurTime) = 0
-- or for oracle...
Select Distinct A.Identifer
From Table A
Join Table B
On A.PrimKey <> B.PrimaryKey
And A.Identifer = B.Identifer
-- EDIT to fix case where 2 at same time
And A.OccurTime >= B.OccurTime
And A.OccurTime < B.OccurTime + .5
Where Trunc(A.OccurTime) = Trunc(B.OccurTime)
Select
A.Id
From
YourTable A
Where
A.YourDateTime Between :StartDateTime and :EndDateTime
Group By
A.Id
Having
COUNT(A.Id) = 2
I haven't checked William's query but I would seriously consider using what he has over every other. Analytics are da bomb. Anytime you find yourself joining a table back to itself is virtually guaranteed to be an opportunity to use analytics and will out perform the query with one table referenced twice every time.
You'll be amazed how much faster the analytic solution will be.
SELECT identifier
FROM table_name outer
WHERE EXISTS( SELECT 1
FROM table_name inner
WHERE inner.identifier = outer.identifier
AND inner.date_column BETWEEN outer.date_column AND outer.date_column + interval '12' hour
AND inner.rowid != outer.rowid )
I'm not 100% sure of your requirements, however this might give you some ideas about how to do what you need. For example you said exactly 2; what if there are 3 occurances? etc.
create table t (ident varchar2(16), occurance timestamp);
insert into t (ident, occurance) values ('a', to_date('20080101000000', 'yyyymmddhh24miss'));
insert into t (ident, occurance) values ('b', to_date('20080101010000', 'yyyymmddhh24miss'));
insert into t (ident, occurance) values ('a', to_date('20080101115900', 'yyyymmddhh24miss'));
insert into t (ident, occurance) values ('c', to_date('20080102000000', 'yyyymmddhh24miss'));
insert into t (ident, occurance) values ('d', to_date('20080102010000', 'yyyymmddhh24miss'));
insert into t (ident, occurance) values ('d', to_date('20080102200000', 'yyyymmddhh24miss'));
insert into t (ident, occurance) values ('d', to_date('20080103020000', 'yyyymmddhh24miss'));
select ident, occurance
from
(
select ident, occurance,
lag(occurance) over (partition by ident order by occurance) previous,
lead(occurance) over (partition by ident order by occurance) next
from t
)
where
((occurance-previous<interval'12:00' hour to minute and extract(day from occurance) = extract(day from previous))
or (next-occurance<interval'12:00' hour to minute and extract(day from occurance) = extract(day from next)))
/
SELECT namecol FROM tbl A
WHERE EXISTS (
SELECT 1 from tbl B
WHERE b.namecol = a.namecol
AND b.timestamp > a.timestamp
AND b.timestamp - 0.5 <= a.timestamp )