I am trying to write a Microsoft SQL Server query for retrieving the oldest record in which the text fields are the same, but the dates are 30 seconds or less apart. Here is an example:
My table:
RecordID TextField1 TextField2 DateField1
--------------------------------------------------------------------------------
1 SomeData1 SomeData2 9/11/2011 2:33:00pm
2 SomeData3 SomeData4 9/11/2011 2:33:15pm
3 SomeData3 SomeData4 9/11/2011 2:33:18pm
4 SomeData3 SomeData4 9/11/2011 2:42:12pm
5 SomeData1 SomeData2 9/11/2011 2:33:01pm
6 SomeData6 SomeData7 9/11/2011 2:33:01pm
7 SomeData1 SomeData2 9/12/2011 2:33:00pm
8 SomeData6 SomeData8 9/11/2011 2:33:03pm
Okay, so in this example, I want a query that will pull the rows in which TextField1=TextField1 and TextField2=TextField2 and the dates between them are 30 seconds or less (I want the oldest of the two returned). So the query, in this example, should return:
RecordID TextField1 TextField2 DateField1
--------------------------------------------------------------------------------
1 SomeData1 SomeData2 9/11/2011 2:33:00pm
2 SomeData3 SomeData4 9/11/2011 2:33:15pm
RecordID 8 is not returned because TextField2 is different.
Hopefully I explained this clearly enough. Any help would be appreciated!
I couldn't understand everything on your question.
This is a generic SQL query that will compare the records of you table, against the same table, looking for records with different RecordID, but equal TextField1 and TextField2.
Leave a comment if this looks like what you want and we can improve this query to get exactly what you are looking for.
UPDATED:
SELECT * FROM my_table AS t1
INNER JOIN my_table AS t2
ON (
t1.RecordID < t2.recordID
AND
DATEDIFF(second, t1.DateField1, t2.DateField1) <= 30
AND
t1.TextField1 = t2.TextField1
AND
t2.TextField2 = t1.TextField2
);
This returns the two records you are looking for in your example. The join between t1 and t2 returns the records that meet your criteria, and then joining to t3 returns the oldest of the rows meeting the criteria.
;
with TestCTE(RecordID, TextField1, TextField2, DateField1)
as
(
select 1, 'SomeData1', 'SomeData2', cast('9/11/2011 2:33:00pm' as datetime)
union
select 2, 'SomeData3', 'SomeData4', cast('9/11/2011 2:33:15pm' as datetime)
union
select 3, 'SomeData3', 'SomeData4', cast('9/11/2011 2:33:18pm' as datetime)
union
select 4, 'SomeData3', 'SomeData4', cast('9/11/2011 2:42:12pm' as datetime)
union
select 5, 'SomeData1', 'SomeData2', cast('9/11/2011 2:33:01pm' as datetime)
union
select 6, 'SomeData6', 'SomeData7', cast('9/11/2011 2:33:01pm' as datetime)
union
select 7, 'SomeData1', 'SomeData2', cast('9/12/2011 2:33:00pm' as datetime)
union
select 8, 'SomeData6', 'SomeData8', cast('9/11/2011 2:33:03pm' as datetime)
)
select t1.*
from TestCTE t1
join TestCTE t2 on t1.RecordID <> t2.RecordID
and t1.TextField1 = t2.TextField1
and t1.TextField2 = t2.TextField2
and datediff(second, t1.DateField1, t2.DateField1) <= 30
join
(
select TextField1, TextField2, min(DateField1) as MinDate
from TestCTE
group by TextField1, TextField2
) t3 on t1.TextField1 = t3.TextField1
and t1.TextField2 = t3.TextField2
and t1.DateField1 = t3.MinDate
Sounds like a simple self-join and not trying to make it more than it should. By doing the self-join and applying the group by, the following should get it done for you.
The self-join is on both text fields, and the first table's Record ID always GREATER than the one in the second table.... Then, the comparison on the date/time factor of 30 seconds.
Due to a comment from ADrift, and re-looking at the data, what I THOUGHT was a date/time stamp field on the record would always be increasing is not always the case... Slight change... Get the latest date/time for the given text1 and text2, then re-join back for rest of the details.
select
YT3.*
from
( select
YT.TextField1,
YT.TextField2,
MIN( YT.DateField1) OldestDateTime,
from
YourTable YT
Join YourTable YT2
on YT.TextField1 = YT2.TextField1
AND YT.TextField2 = YT2.TextField2
AND YT.RecordID > YT2.RecordID
AND datediff(second, YT.DateField1, YT2.DateField1) <= 30
group by
YT.TextField1,
YT.TextFIeld2 ) PreQuery
JOIN YourTable YT3
on PreQuery.TextField1 = YT3.TextField1
AND PreQuery.TextField2 = YT3.TextField2
AND PreQuery.OldestDateTime = YT3.DateField1
order by
whatever...
Assuming there can only be no more than two adjacent rows (those within 30 second from each other) and you are on SQL Server 2005 or later version:
WITH sampledata (RecordID, TextField1, TextField2, DateField1) AS
(
SELECT 1, 'SomeData1', 'SomeData2', CAST('20110911 14:33:00' AS datetime) UNION ALL
SELECT 2, 'SomeData3', 'SomeData4', CAST('20110911 14:33:15' AS datetime) UNION ALL
SELECT 3, 'SomeData3', 'SomeData4', CAST('20110911 14:33:18' AS datetime) UNION ALL
SELECT 4, 'SomeData3', 'SomeData4', CAST('20110911 14:42:12' AS datetime) UNION ALL
SELECT 5, 'SomeData1', 'SomeData2', CAST('20110911 14:33:01' AS datetime) UNION ALL
SELECT 6, 'SomeData6', 'SomeData7', CAST('20110911 14:33:01' AS datetime) UNION ALL
SELECT 7, 'SomeData1', 'SomeData2', CAST('20110912 14:33:00' AS datetime) UNION ALL
SELECT 8, 'SomeData6', 'SomeData8', CAST('20110911 14:33:03' AS datetime)
),
ranked AS (
SELECT
*,
rn = ROW_NUMBER() OVER (PARTITION BY TextField1, TextField2 ORDER BY DateField1)
FROM sampledata
),
SELECT
r1.RecordID,
r1.TextField1,
r1.TextField2,
r1.DateField1
FROM ranked r1
INNER JOIN ranked r2 ON r1.TextField1 = r2.TextField1
AND r1.TextField2 = r2.TextField2
AND r1.rn = r2.rn - 1
WHERE r2.DateField1 BETWEEN r1.DateField1 AND DATEADD(SECOND, 30, r1.DateField1)
Output:
RecordID TextField1 TextField2 DateField1
----------- ---------- ---------- -----------------------
1 SomeData1 SomeData2 2011-09-11 14:33:00.000
2 SomeData3 SomeData4 2011-09-11 14:33:15.000
Related
Table 1
Loc_Id
Label_Id
Active_Date
Inactive_Date
1
1001
2022/05/13
9999/12/31
2
1001
2018/05/20
2022/05/12
3
1001
2012/06/14
2018/05/12
Table 2
Label_Id
Tab2_Active_Date
Tab2_Inactive_Date
1001
2022/05/13
9999/12/31
1001
2018/05/22
2022/05/12
1001
2012/06/14
2018/05/12
I want to know which records in Table2 have Tab2_Active Date > Active Date in Table 1 and Tab2_Inactive Date < Inactive Date in Table 1.
For example in this the scenario the date Tab2_Active Date 2018/05/22 mentioned in Table 2 is greater than 2018/05/20 mentioned in table 1.
So the o/p will be
Loc_Id
Tab2_Active_Date
Tab2_Inactive_Date
2
2018/05/22
2022/05/12
Since I only have only Ids to join as the keys for 2 tables and I need to compare the dates, I cannot take dates to join the tables which results in inaccurate data.
Create table #T1
(
Loc_Id int,
Label_Id int,
Active_Date date,
Inactive_Date date
)
Create table #T2
(
Label_Id int,
Active_Date date,
Inactive_Date date
)
Insert into #T1
Select 1, 1001, '2022-05-13', '9999-12-31'
union
Select 2, 1001, '2022-05-20', '2022-05-12'
union
Select 3, 1001, '2022-06-14', '2018-05-12'
union
Select 4, 1001, '2022-07-14', '2018-08-13'
Insert into #T2
Select 1001, '2022-05-13', '9999-12-31'
union
Select 1001, '2022-05-22', '2022-05-12'
union
Select 1001, '2022-06-14', '2018-05-12'
union
Select 1001, '2022-06-14', '2018-05-12'
union
Select 1001, '2022-07-14', '2018-08-12'
;with Cte as
(
Select Label_Id, Active_Date, Inactive_Date from #T2
EXCEPT
Select Label_Id, Active_Date, Inactive_Date from #T1
)
Select t1.Loc_Id, t2.Active_Date, t2.Inactive_Date
from #T1 t1
inner join Cte t2 on t1.Label_Id = t2.Label_Id and (t2.Active_Date > t1.Active_Date and t2.Inactive_Date = t1.Inactive_Date)
union
Select t1.Loc_Id, t2.Active_Date, t2.Inactive_Date
from #T1 t1
inner join Cte t2 on t1.Label_Id = t2.Label_Id and (t2.Inactive_Date < t1.Inactive_Date and t2.Active_Date = t1.Active_Date)
Drop table #T1
Drop table #T2
Here's what I came up with
with t1 as (
select * from (values
(1, 1001, '2022-05-13', '9999-12-31'),
(2, 1001, '2018-05-20', '2022-05-12'),
(3, 1001, '2012-06-14', '2018-05-12')
) as x(Loc_ID, Label_Id, Active_Date, Inactive_Date)
),
t2 as (
select * from (values
(1001, '2022-05-13', '9999-12-31'),
(1001, '2018-05-22', '2022-05-12'),
(1001, '2012-06-14', '2018-05-12')
) as x(Label_Id, Active_Date, Inactive_Date)
)
select t1.*, '||', t2.*
from t1
join t2
on t2.Active_Date >= t1.Active_Date
and t2.Inactive_Date <= t1.Inactive_Date
and (
t1.Active_Date <> t2.Active_Date
or t1.Inactive_Date <> t2.Inactive_Date
)
Ignoring the CTEs (that's just a way to get the data into a tabular structure), the join criteria in the SELECT statement say that there must be partial overlap in the interval (which are the first two predicates on only one of active_date or inactive_date) but not complete overlap (which is the compound predicate saying that at least one of active_date or inactive_date must not match).
Row 3 in the following table is a duplicate. I know this because there is another row (row 5) that was created by the same user less than one second earlier.
row record created_by created_dt
1 5734 '00E759CF' '2020-06-05 19:59:36.610'
2 9856 '1E095CBA' '2020-06-05 19:57:31.207'
3 4592 '1E095CBA' '2020-06-05 19:54:41.930'
4 7454 '00E759CF' '2020-06-05 19:54:41.840'
5 4126 '1E095CBA' '2020-06-05 19:54:41.757'
I want a query that returns all rows created by the same user less than one second apart.
Like so:
row record created_by created_dt
1 4592 '1E095CBA' '2020-06-05 19:54:41.930'
2 4126 '1E095CBA' '2020-06-05 19:54:41.757'
This is what I have so far:
SELECT DISTINCT a1.*
FROM table AS a1
LEFT JOIN table AS a2
ON a1.created_by = a2.created_by
AND a1.created_dt > a2.created_dt
AND a1.created_dt <= DATEADD(second, 1, a2.created_dt)
WHERE a1.created_dt IS NOT NULL
AND a.created_dt IS NOT NULL
This is what finally did the trick:
SELECT
a.*
FROM table a
WHERE EXISTS (SELECT TOP 1
*
FROM table a1
WHERE a1.created_by = a.created_by
AND ABS(DATEDIFF(SECOND, a.created_dt, a1.created_dt)) < 1
AND a.created_dt <> a1.created_dt)
ORDER BY created_dt DESC
You could use exists:
select t.*
from mytable t
where exists(
select 1
from mytable t1
where
t1.created_by = t.created_by
and abs(datediff(second, t.created_dt, t1.created_dt)) < 1
)
How about something like this
SELECT DISTINCT a1.*
FROM #a1 AS a1
LEFT JOIN #a1 AS a2 ON a1.[Created_By] = a2.[Created_By]
AND a1.[Record] <> a2.[Record]
WHERE ABS(DATEDIFF(SECOND, a1.[Created_Dt], a2.[Created_Dt])) < 1
Here is the sample query I used to verify the results.
DECLARE #a1 TABLE (
[Record] INT,
[Created_By] NVARCHAR(10),
[Created_Dt] DATETIME
)
INSERT INTO #a1 VALUES
(5734, '00E759CF', '2020-06-05 19:59:36.610'),
(9856, '1E095CBA', '2020-06-05 19:57:31.207'),
(4592, '1E095CBA', '2020-06-05 19:54:41.930'),
(7454, '00E759CF', '2020-06-05 19:54:41.840'),
(4126, '1E095CBA', '2020-06-05 19:54:41.757')
SELECT DISTINCT a1.*
FROM #a1 AS a1
LEFT JOIN #a1 AS a2 ON a1.[Created_By] = a2.[Created_By]
AND a1.[Record] <> a2.[Record]
WHERE ABS(DATEDIFF(SECOND, a1.[Created_Dt], a2.[Created_Dt])) < 1
I would suggest lead() and lag() instead of self-joins:
select t.*
from (select t.*,
lag(created_dt) over (partition by created_dt) as prev_cd,
lead(created_dt) over (partition by created_dt) as next_cd
from t
) t
where created_dt < dateadd(second, 1, prev_created_dt) or
created_dt > dateadd(second, -1, next_created_dt)
I have a table setup as follows:
Key || Code || Date
5 2 2018
5 1 2017
8 1 2018
8 2 2017
I need to retrieve only the key and code where:
Code=2 AND Date > the other record's date
So based on this data above, I need to retrieve:
Key 5 with code=2
Key 8 does not meet the criteria since code 2's date is lower than code 1's date
I tried joining the table on itself but this returned incorrect data
Select key,code
from data d1
Join data d2 on d1.key = d2.key
Where d1.code = 2 and d1.date > d2.date
This method returned data with incorrect values and wrong data.
Perhaps you want this:
select d.*
from data d
where d.code = 2 and
d.date > (select d2.date
from data d2
where d2.key = d.key and d2.code = 1
);
If you just want the key, I would go for aggregation:
select d.key
from data d
group by d.key
having max(case when d2.code = 2 then date end) > max(case when d2.code <> 2 then date end);
use row_number, u can select rows with dates in ascending order. This is based on your sample data, selecting 2 rows
DECLARE #table TABLE ([key] INT, code INT, DATE INT)
INSERT #table
SELECT 5, 2, 2018
UNION ALL
SELECT 5, 2, 2018
UNION ALL
SELECT 8, 1, 2018
UNION ALL
SELECT 8, 2, 2017
SELECT [key], code, DATE
FROM (
SELECT [key], code, DATE, ROW_NUMBER() OVER (
PARTITION BY [key], code ORDER BY DATE
) rn
FROM #table
) x
WHERE rn = 2
I have a SQL script that returns this derived table.
MM/YYYY Cat Score
01/2012 Test1 17
02/2012 Test1 19
04/2012 Test1 15
05/2012 Test1 16
07/2012 Test1 14
08/2012 Test1 15
09/2012 Test1 15
12/2012 Test1 11
01/2013 Test2 10
02/2013 Test2 15
03/2013 Test2 13
05/2013 Test2 18
06/2013 Test2 14
08/2013 Test2 15
09/2013 Test2 14
12/2013 Test2 10
As you can see, I am missing some MM/YYYYs (03/2012, 06/2012, 11/2012, etc).
I would like to fill in the missing MM/YYYYs with the Cat & a 0 (zero) form the score.
I have tried to join a table that contains the all MM/YYYY for the ranges the query will be run, but this only returns the missing rows for the first occurrence, it does not repeat for each Cat (should have known that).
So my question is this, can I do this using a join or will I have to do this in a temp table, and then output the data.
AHIGA,
LarryR…
You need to cross join your categories and a list of all dates in the range. Since you have posted no table structures I'll have to guess at your structure slightly, but assuming you have a calendar table you can use something like this:
SELECT calendar.Date,
Category.Cat,
Score = ISNULL(Scores.Score, 0)
FROM Calendar
CROSS JOIN Catogory
LEFT JOIN Scores
ON Scores.Cat = Category.Cat
AND Scores.Date = Calendar.Date
WHERE Calendar.DayOfMonth = 1;
If you do not have a calendar table you can generate a list of dates using the system table Master..spt_values:
SELECT Date = DATEADD(MONTH, Number, '20120101')
FROM Master..spt_values
WHERE Type = 'P';
Where the hardcoded date '20120101' is the first date in your range.
ADDENDUM
If you need to actually insert the missing rows, rather than just have a query that fills in the blanks you can use this:
INSERT Scores (Date, Cat, Score)
SELECT calendar.Date,
Category.Cat,
Score = 0
FROM Calendar
CROSS JOIN Catogory
WHERE Calendar.DayOfMonth = 1
AND NOT EXISTS
( SELECT 1
FROM Scores
WHERE Scores.Cat = Category.Cat
AND Scores.Date = Calendar.Date
)
Although, in my opinion if you have a query that fills in the blanks inserting the data is a bit of a waste of time.
To get what you want, start with a driver table and then use left outer join. The result is something like this:
select driver.cat, driver.MMYYYY, coalesce(t.score, 0) as score
from (select cat, MMYYYY
from (select distinct cat from t) c cross join
themonths -- use where to get a date range
) driver left outer join
t
on t.cat = driver.cat and t.MMMYYYY = driver.MMYYYY
Try this one -
DECLARE #temp TABLE (FDOM DATETIME, Cat NVARCHAR(50), Score INT)
INSERT INTO #temp (FDOM, Cat, Score)
VALUES
('20120101', 'Test1', 17),('20120201', 'Test1', 19),
('20120401', 'Test1', 15),('20120501', 'Test1', 16),
('20120701', 'Test1', 14),('20120801', 'Test1', 15),
('20120901', 'Test1', 15),('20121001', 'Test1', 13),
('20121201', 'Test1', 11),('20130101', 'Test1', 10),
('20130201', 'Test1', 15),('20130301', 'Test1', 13),
('20130501', 'Test1', 18),('20130601', 'Test1', 14),
('20130801', 'Test1', 15),('20130901', 'Test1', 14),
('20131201', 'Test1', 10),('20120601', 'Test2', 10)
;WITH enum AS
(
SELECT Cat, StartDate = MIN(FDOM), EndDate = MAX(FDOM)
FROM #temp
GROUP BY Cat
UNION ALL
SELECT Cat, DATEADD(MONTH, 1, StartDate), EndDate
FROM enum
WHERE StartDate < EndDate
)
SELECT e.StartDate, t.Cat, Score = ISNULL(t.Score, 0)
FROM enum e
LEFT JOIN #temp t ON e.StartDate = t.FDOM AND e.Cat = t.Cat
ORDER BY e.StartDate, t.Cat
Do a left join from "complete table" to "incomplete table" and set a where statement to check the date column of the "incomplete" table. So you will only get the missing results in your select query. After that, just set a "insert into tablename" before.
In the first run it will find two rows, that aren't already in the incomplete table. So it will be inserted by the insert into statement, two rows affected. In a second run the result in the select statement has 0 rows, so nothing happens. Zero rows affected :-)
Sample: http://sqlfiddle.com/#!2/895fe/6
(Just mark the select statement; the insert into statement isn't required to just see, how the join works)
Insert Into supportContacts
Select * FROM
(
Select
'01/2012' as DDate, 'Test1' as Cat, 17 as Score
UNION
Select
'02/2012' as DDate, 'Test1' as Cat, 17 as Score
UNION
Select
'03/2012' as DDate, 'Test1' as Cat, 17 as Score
UNION
Select
'04/2012' as DDate, 'Test1' as Cat, 17 as Score
UNION
Select
'05/2012' as DDate, 'Test1' as Cat, 17 as Score
) CompleteTable
LEFT JOIN
(
Select
'01/2012' as DDate, 'Test1' as Cat, 17 as Score
UNION
Select
'02/2012' as DDate, 'Test1' as Cat, 17 as Score
UNION
Select
'03/2012' as DDate, 'Test1' as Cat, 17 as Score
) InCompleteTable
ON CompleteTable.DDate = IncompleteTable.DDate
WHERE IncompleteTable.DDate is null
I have a table::
ItemID VersionNo CreatedDate
-------------------------------
1 3 7/9/2010
1 2 7/3/2010
1 1 5/3/2010
1 0 3/3/2010
2 0 4/4/2010
3 1 4/5/2010
3 0 3/4/2010
...where Version 0 means .. its a newly produced item. Here I need to find time,(time gap between two versions) and add a column as process time.
like::
ItemID VersionNo CreatedDate ProcessTime
-------------------------------------------
1 3 7/9/2010 6Days or 6*24Hrs
1 2 7/3/2010 60Days
1 1 5/3/2010 2Days
1 0 3/3/2010 ''
2 0 4/4/2010 ''
3 1 4/5/2010 31Days
3 0 3/4/2010 ''
VersionNo's are not Fixed..means with time, it could increase... How to acheive the desire result in MS Access or in SQL-Server.
Thanks in advance for all your sincere efforts.
Thanks
How about (Access):
SELECT t.ItemID,
t.VersionNo,
t.CreatedDate, (
SELECT Top 1
CreatedDate
FROM Versions v
WHERE v.ItemID=t.ItemID
And v.VersionNo<t.VersionNo
ORDER BY VersionNo DESC) AS LastDate,
DateDiff("h",[LastDate],[CreatedDate]) AS DiffHrs,
DateDiff("d",[LastDate],[CreatedDate]) AS DiffDays
FROM Versions t
Join the table with itself, like this (SQL Server):
-- create the table and your data
create table #x (ItemID int, VersionNo int, CreatedDate datetime)
go
insert into #x
select 1, 3 ,'7/9/2010'
union all select 1 ,2 ,'7/3/2010'
union all select 1 ,1 ,'5/3/2010'
union all select 1 ,0 ,'3/3/2010'
union all select 2 ,0 ,'4/4/2010'
union all select 3 ,1 ,'4/5/2010'
union all select 3 ,0 ,'3/4/2010'
go
-- The query
select v2.ItemID, v2.VersionNo, datediff(dd, v1.CreatedDate, v2.CreatedDate)
from #x v1, #x v2
where v1.ItemID = v2.ItemID and v1.VersionNo + 1 = v2.VersionNo
Here it is in Access SQL, using 3 queries, one for each step.
Query1, self-join on itemID where versionNo is smaller:
SELECT t1.itemID, t1.versionNo, t1.created, t2.versionNo AS t2Version
FROM Table1 AS t1 INNER JOIN Table1 AS t2 ON t1.itemID = t2.itemID
WHERE (t2.versionNo)<[t1].[versionNo];
Query2, limit to max of smaller versionNos:
SELECT q1.itemID, q1.versionNo, q1.created, Max(q1.t2Version) AS MaxOft2Version
FROM Query1 AS q1
GROUP BY q1.itemID, q1.versionNo, q1.created;
Query3, now do datediff:
SELECT q2.itemID, q2.versionNo, q2.created, q2.MaxOft2Version, t1.created,
DateDiff("d",[t1].[created],[Q2].[created]) AS daysdiff
FROM Query2 AS q2 INNER JOIN Table1 AS t1
ON (q2.MaxOft2Version = t1.versionNo)
AND (q2.itemID = t1.itemID);
SQL Server 2005, to handle the case where there are gaps in VersionNo.
-- Declare a query that extends your table with a new column
-- that is the sequentially numbered representation of VersionNo.
-- This could be a view, but I used a CTE. I am going to use this
-- query twice below.
WITH Sequential AS (select *,
RANK() over (partition by ItemId order by VersionNo) as SequentialVersionNo
from #T as x
)
select
v.ItemID, v.VersionNo, v.SequentialVersionNo, v.CreatedDate,
DATEDIFF(day, vPrior.CreatedDate, v.CreatedDate) as ProcessTime
from Sequential as v
left outer join Sequential as vPrior
on v.ItemID=vPrior.ItemID
and v.SequentialVersionNo = vPrior.SequentialVersionNo+1;