MSSQL get rows which only differ at 2 columns - sql

I have a task on which I have no idea how that could even work out.
I have to find records, which have a time difference of X and where a boolean is ON/OFF. I tried to use a LEFT OUTER JOIN and used the conditions in the ON clause, but it gave me the wrong result.
So my question is, how can I select rows, which have the same value in 2 columns, but different values in other 2 columns?
Edit:
My problem is, that for some reason my actual query returns the same entry multiple times. I checked if the entry exists multiple times, but it doesn't
Data for reference:
ID1 ID2 Boolean Time
1 1 0 2018-03-06 11:31:39
1 1 1 2018-03-06 11:33:39
2 1 0 2018-03-06 11:31:39
2 2 1 2018-03-06 11:40:39
The desired output from the query would be
ID1 ID2 Boolean Time
1 1 0 2018-03-06 11:31:39
1 1 1 2018-03-06 11:33:39
because ID1 and ID2 are the same, the Boolean is different and the time difference is in the specified range (lets say 5 minutes). The other 2 entries are not valid, because ID2 differs and the time difference is too big.
My current query:
select
t1.id1,
t1.id2,
t1.boolean,
t1.time
from t1 t1
left outer join t1 t2
on t1.boolean != t2.boolean and datediff(minute, t1.time, t2.time)<=5
where t1.id1 = t2.id1
and t1.id2 = t2.id2

Your query looks fine, I found few small issues
1- Table alias used is wrong instead of t it should be t1
2- Order or data is wrong
3- Changed left join to inner join
4- Modified ON and Where condition for better readability and performance
Check following corrected query.
WITH t1 AS
(
SELECT * FROM (VALUES
(1 , 1 , 0 , '2018-03-06 11:31:39'),
(1 , 1 , 1 , '2018-03-06 11:33:39'),
(2 , 1 , 0 , '2018-03-06 11:31:39'),
(2 , 2 , 1 , '2018-03-06 11:40:39')
) T( ID1, ID2 , Boolean, Time)
)
select
t1.id1,
t1.id2,
t1.boolean,
t1.time
from t1 t1
inner join t1 t2
on t1.id1 = t2.id1 and t1.id2 = t2.id2
where
t1.boolean != t2.boolean and datediff(minute, t1.time, t2.time)<=5
ORDER BY [TIME]
Output
+-----+-----+---------+---------------------+
| id1 | id2 | boolean | time |
+-----+-----+---------+---------------------+
| 1 | 1 | 0 | 2018-03-06 11:31:39 |
+-----+-----+---------+---------------------+
| 1 | 1 | 1 | 2018-03-06 11:33:39 |
+-----+-----+---------+---------------------+

To avoid duplicate value use GROUP BY
SELECT t1.id1
,t1.id2
,t1.boolean
,t1.TIME
FROM t1 t1
INNER JOIN t1 t2 ON t1.boolean != t2.boolean
AND datediff(minute, t1.TIME, t2.TIME) <= 5
WHERE t1.id1 = t2.id1
AND t1.id2 = t2.id2
GROUP BY t1.id1
,t1.id2
,t1.boolean
,t1.TIME

SELECT
D1.*
FROM
Data AS D1
WHERE
EXISTS (
SELECT
1
FROM
Data AS D2
WHERE
D1.ID1 = D2.ID2 AND
~D1.Boolean = D2.Boolean AND
ABS(DATEDIFF(MINUTE, D1.Time, D2.Time)) <= 5)
ORDER BY
D1.ID1,
D1.Boolean,
D1.Time

Related

Match nearest timestamp in Redshift SQL

I have two tables, t1 and t2. For each id in t1 I have multiple records in t2. I want to match the closest timestamp of t2 to each record of t1. In t1 there is a flag, if it's 1 I want to match the closest timetamp of t2 that's smaller and if it's 0 I want to match the closest timestamp that is larger than that in t1.
So alltogether I have the following table:
T1
id, flag, timestamp
T2
id, timestamp
Is there an efficient way to do that?
Edit, here is some example:
T1
customer_id
timestamp_t1
flag
1
01.01.21 12:00
1
2
01.01.21 13:00
0
T2
customer_id
timestamp_t2
additional attributes
1
01.01.21 11:00
attribute1
1
01.01.21 10:00
attribute2
1
01.01.21 13:00
attribute3
2
01.01.21 11:00
attribute4
2
01.01.21 12:00
attribute5
2
01.01.21 14:00
attribute6
2
01.01.21 15:00
attribute7
Result:
customer_id
timetsamp_t1
timestamp_t2
flag
additional attributes
1
01.01.21 12:00
01.01.21 11:00
1
attribute1
2
01.01.21 13:00
01.01.21 14:00
0
attribute6
I hope this helps. As you can see. In the result, we matched 11:00 of T2 with 12:00 of T1 because the flag was 1 we chose the closest timestamp that was smaller than 12:00. We also matched 14:00 with 13:00, because the flag was 0 (so we matched the closest timestamp with id 2 that is larger than 13:00).
You could use correlated sub-queries to find the rows before/after the timestamp, and then use a CASE expression to pick which to join on...
SELECT
*
FROM
t1
INNER JOIN
t2
ON t2.id = CASE WHEN t1.flag = 1 THEN
(
SELECT t2.id
FROM t2
WHERE t2.customer_id = t1.customer_id
AND t2.timestamp_t2 <= t1.timestamp_t1
ORDER BY t2.timestamp DESC
LIMIT 1
)
ELSE
(
SELECT t2.id
FROM t2
WHERE t2.customer_id = t1.customer_id
AND t2.timestamp_t2 >= t1.timestamp_t1
ORDER BY t2.timestamp ASC
LIMIT 1
)
END
Oh, you haven't included an id column in your example, this works similarly...
SELECT
*
FROM
t1
INNER JOIN
t2
ON t2.customer_id = t1.customer_id
AND t2.timestamp_t2
=
CASE WHEN t1.flag = 1 THEN
(
SELECT MAX(t2.timestamp_t2)
FROM t2
WHERE t2.customer_id = t1.customer_id
AND t2.timestamp_t2 <= t1.timestamp_t1
)
ELSE
(
SELECT MIN(t2.timestamp_t2)
FROM t2
WHERE t2.customer_id = t1.customer_id
AND t2.timestamp_t2 >= t1.timestamp_t1
)
END

Bigquery select rows where the logtime is below min(value) of other table logtime

Let say I have the following two tables :
Table 1:
ID log_time
1 2013-10-12
1 2014-11-15
2 2013-12-21
2 2016-12-21
3 2015-09-21
3 2018-03-21
Table 2:
ID log_time
1 2011-10-12
1 2012-11-15
2 2012-12-21
2 2017-12-21
3 2014-09-21
3 2019-03-21
I want to get rows of Table 2 which are below min(log_time) of Table1 for each ID.
The result should be like this:
ID log_time
1 2011-10-12
1 2012-11-15
2 2012-12-21
3 2015-09-21
This is join and aggregation:
select t2.*
from table2 t2 join
(select t1.id, min(t1.log_time) as min_log_time
from table1 t1
group by t1.id
) t1
on t2.id = t.id and t2.timestamp < t1.timestamp;
You can also express this as a correlated subquery:
select t2.*
from table2 t2
where t2.log_time < (select min(t1.log_time) from t1 where t1.id = t2.id);
Note that both of these formulations will return no rows for ids missing from table1 (which is quite consistent with your question).

Waterfall join conditions

I have two tables similar to:
Table 1 --unique ID's
ID Date
1 3/8/2017
2 3/8/2017
3 3/8/2017
Table 2
ID Date SourceID
1 3/8/2017 1
1 3/8/2017 2
1 3/8/2017 3
2 3/8/2017 2
3 3/8/2017 1
3 3/8/2017 3
And I want to write a query that has a result like:
Result
ID SourceID
1 2
2 2
3 1
Where the source ID ordering should be 2, 1, 3
I have:
select Table1.ID
, COALESCE(Join1.SourceID, Join2.SourceID, Join3.SourceID) as SourceID
from Table1
left outer join Table2 Join1
on Table1.date = Join1.date
and Table1.ID = Join1.ID
and Join1.SourceID = 2
left outer join Table2 Join2
on Table1.date = Join2.date
and Table1.ID = Join2.ID
and Join2.SourceID = 1
and Join1.SourceID is null
left outer join Table2 Join3
on Table1.date = Join3.date
and Table1.ID = Join3.ID
and Join3.SourceID = 3
and Join1.SourceID is null
and Join2.SourceID is null
But this currently just keeps the records where sourceid = 2 and does not add in the other sourceid's.
Thanks in advance for any help. Let me know if you need any clarification. Using SQL-Server. I only need a few and fixed amount of sources so I am avoiding using a cursor.
This is a prioritization query. I would do it using outer apply:
select t1.*, t2.sourceId
from table1 t1 outer apply
(select top 1 t2.*
from table2 t2
where t2.id = t1.id and t2.date = t1.date
order by (case t2.sourceid when 2 then 1 when 1 then 2 when 3 then 3 end)
) t2;
Note: For readability, you can simplify the order by to:
order by charindex(cast(t2.sourceId as varchar(255)), '2,1,3')
If you are uncomfortable with outer apply, you can do the same thing with a single join:
select t1.*, t2.sourceId
from table1 t1 join
(select t2.*,
row_number() over (partition by id, date
order by (case t2.sourceid when 2 then 1 when 1 then 2 when 3 then 3 end)
) as seqnum
from table2 t2
) t2
on t2.id = t1.id and t2.date = t1.date and t2.seqnum = 1;

SQL subtract times from tables (decreased additionally for specific time status from second table)

I have table T1 and T2.
T1
ID TIME1 TIME2
1001 1 10
1002 1 20
T2
ID STATUS TIME
1001 NEW 1
1001 CLOSED 10
1002 NEW 1
1002 HOLD 5
1002 CLOSED 13
I want result TIME2-TIME1 if status HOLD does not exist in table T2; or TIME2-TIME1-TIME if status HOLD exist in table for that record
1001 9 (10-1)
1002 14(20-1-5)
I initially wrote SQL query but this does not work because it returns NULL for first record while the for the second record result is OK
SELECT T1.ID,T1.TIME2-T1.TIME1-T2.TIME
FROM T1
LEFT OUTER JOIN T2 ON T1.ID=T2.ID AND T2.STATUS='HOLD'
Thanks
or in short
select
t1.id,
t1.time2-case when status='HOLD' then t2.time else 0 end-t1.time1
from
t1 left join t2 on t1.id=t2.id and t2.status='HOLD'
SELECT T1.ID
, Case
When T2.STATUS = 'HOLD' THEN T1.Time2 - T1.Time1 - T2.Time
Else T1.Time2 -T1.Time1
END
FROM T1
LEFT OUTER JOIN T2 ON T1.ID=T2.ID AND T2.STATUS='HOLD'
Evidently table T2 may contain more than one row matching T1.ID. Assuming that T2 always contains at least one such row, then we can do:
select T1.ID, min(case when T2.STATUS<>'HOLD' then T1.TIME2-T1.TIME1
when T2.STATUS='HOLD' then T1.TIME2-T1.TIME1-T2.[TIME] end)
from T1 join T2
on T1.ID=T2.ID
group by T1.ID

Selecting time intervals of value live - missing first and last intervals

I've got a table with following structure
| ChangedDate | IDParameter | ChangedTo(bit column) |
So I need to get time intervals when my parameter is True or False, like following
| IDParameter | ChangedToDate1 | ChangedToDate2 | ChangedTo(true to false || false to true)
and I do
With RankedDates As
(
Select T1.[ChangedDate], T1.ID, T1.[ChangedToValue]
, Row_Number() Over( Partition By T1.ID Order By T1.[ChangedDate] ) As Num
From [Changes] As T1
)
SELECT T1.[ID]
,T2.[ChangedToValue]
,T1.[ChangedDate] AS startDate
,T2.[ChangedDate] AS endDate
FROM [RankedDates] AS T1
Join RankedDates As T2
On T2.ID = T1.ID
And T2.Num = T1.Num + 1
And T2.[ChangedToValue] <> T1.[ChangedToValue]
Order By T2.[ChangedDate]
The trouble is that I am missing first and last intervals here. it must be NULL for start date if that is first and NULL for endDate for last interval for each Parameter ID. I guess I need add it with UNION but my trouble I can't understand how to add it for each IDParameter.
I don't know when value were changed first time and I don't know if the value will be changed in any time so I need NULL or some mindate for first intervals and NULL or some maxdate for last intervals.
ms sql server 2008
sorry for such complex question.
Example :
08.03.2011 ID1 0 -> 1
09.03.2011 ID1 1 -> 0
09.03.2011 ID2 0 -> 1
10.03.2011 ID1 0 -> 1
10.03.2011 ID2 1 -> 0
--->
NULL , 08.03.2011 ID1 is 0
NULL , 09.03.2011 ID2 is 0
08.03.2011, 09.03.2011 ID1 is 1
09.03.2011, 10.03.2011 ID2 is 1
09.03.2011, 10.03.2011 ID1 is 0
10.03.2011, NULL ID1 is 1
10.03.2011, NULL ID2 is 0
how about using FULL JOIN instead of JOIN?
Does it solve your problem?
EDIT:
I think this should work as you want.
select isnull(T1.ID, T2.ID) as ID
,isnull(T2.[ChangedToValue], case when T1.[ChangedToValue] = 1 then 0 else 1 end) as [ChangedToValue]
,T1.[ChangedDate] as startdate
,T2.[ChangedDate] as enddate
from [RankedDates] T1
full join [RankedDates] T2
on T2.num = T1.num +1
and T2.ID = T1.ID
and T1.[ChangedToValue] <> T2.[ChangedToValue]
order by
case when T2.[ChangedDate] is null then 1 else 0 end
,T2.[ChangedDate]
You where right about the ChangedToValue, I modified it to show the opposite now, if T2 is null.
Assuming thats how your base table looks:
ChangeDate IDParameter ChangedTo
2011-03-08 ID1 True
2011-03-09 ID1 False
2011-03-09 ID2 True
2011-03-10 ID1 True
2011-03-10 ID2 False
SELECT (SELECT TOP 1 t0.[ChangeDate] FROM [calendardb].[dbo].[Table_1] t0
WHERE t0.IDParameter = t1.IDParameter AND t0.ChangeDate < t1.ChangeDate ORDER
BY t0.ChangeDate DESC),
[ChangeDate]
,[IDParameter]
,[ChangedTo]
FROM [calendardb].[dbo].[Table_1] t1
UNION
SELECT MAX(ChangeDate) as maxd ,NULL,[IDParameter],
(SELECT ChangedTo FROM [calendardb].[dbo].[Table_1] t0 WHERE t0.ChangeDate = (SELECT MAX(ChangeDate) FROM [calendardb].[dbo].[Table_1]
GROUP BY [IDParameter] HAVING IDParameter = t1.IDParameter) AND t1.IDParameter = t0.IDParameter)
FROM [calendardb].[dbo].[Table_1] t1
GROUP BY [IDParameter]
will give you result like this:
NULL 2011-03-08 ID1 1
2011-03-08 2011-03-09 ID1 0
NULL 2011-03-09 ID2 1
2011-03-09 2011-03-10 ID1 1
2011-03-09 2011-03-10 ID2 0
2011-03-10 NULL ID1 1
2011-03-10 NULL ID2 0