MSSQL get rows which only differ at 2 columns

MSSQL get rows which only differ at 2 columns - sql

I have a task on which I have no idea how that could even work out.
I have to find records, which have a time difference of X and where a boolean is ON/OFF. I tried to use a LEFT OUTER JOIN and used the conditions in the ON clause, but it gave me the wrong result.
So my question is, how can I select rows, which have the same value in 2 columns, but different values in other 2 columns?
Edit:
My problem is, that for some reason my actual query returns the same entry multiple times. I checked if the entry exists multiple times, but it doesn't
Data for reference:
ID1 ID2 Boolean Time
1 1 0 2018-03-06 11:31:39
1 1 1 2018-03-06 11:33:39
2 1 0 2018-03-06 11:31:39
2 2 1 2018-03-06 11:40:39
The desired output from the query would be
ID1 ID2 Boolean Time
1 1 0 2018-03-06 11:31:39
1 1 1 2018-03-06 11:33:39
because ID1 and ID2 are the same, the Boolean is different and the time difference is in the specified range (lets say 5 minutes). The other 2 entries are not valid, because ID2 differs and the time difference is too big.
My current query:
select
t1.id1,
t1.id2,
t1.boolean,
t1.time
from t1 t1
left outer join t1 t2
on t1.boolean != t2.boolean and datediff(minute, t1.time, t2.time)<=5
where t1.id1 = t2.id1
and t1.id2 = t2.id2

Your query looks fine, I found few small issues
1- Table alias used is wrong instead of t it should be t1
2- Order or data is wrong
3- Changed left join to inner join
4- Modified ON and Where condition for better readability and performance
Check following corrected query.
WITH t1 AS
(
SELECT * FROM (VALUES
(1 , 1 , 0 , '2018-03-06 11:31:39'),
(1 , 1 , 1 , '2018-03-06 11:33:39'),
(2 , 1 , 0 , '2018-03-06 11:31:39'),
(2 , 2 , 1 , '2018-03-06 11:40:39')
) T( ID1, ID2 , Boolean, Time)
)
select
t1.id1,
t1.id2,
t1.boolean,
t1.time
from t1 t1
inner join t1 t2
on t1.id1 = t2.id1 and t1.id2 = t2.id2
where
t1.boolean != t2.boolean and datediff(minute, t1.time, t2.time)<=5
ORDER BY [TIME]
Output
+-----+-----+---------+---------------------+
| id1 | id2 | boolean | time |
+-----+-----+---------+---------------------+
| 1 | 1 | 0 | 2018-03-06 11:31:39 |
+-----+-----+---------+---------------------+
| 1 | 1 | 1 | 2018-03-06 11:33:39 |
+-----+-----+---------+---------------------+

To avoid duplicate value use GROUP BY
SELECT t1.id1
,t1.id2
,t1.boolean
,t1.TIME
FROM t1 t1
INNER JOIN t1 t2 ON t1.boolean != t2.boolean
AND datediff(minute, t1.TIME, t2.TIME) <= 5
WHERE t1.id1 = t2.id1
AND t1.id2 = t2.id2
GROUP BY t1.id1
,t1.id2
,t1.boolean
,t1.TIME

SELECT
D1.*
FROM
Data AS D1
WHERE
EXISTS (
SELECT
1
FROM
Data AS D2
WHERE
D1.ID1 = D2.ID2 AND
~D1.Boolean = D2.Boolean AND
ABS(DATEDIFF(MINUTE, D1.Time, D2.Time)) <= 5)
ORDER BY
D1.ID1,
D1.Boolean,
D1.Time

Related

Match nearest timestamp in Redshift SQL

I have two tables, t1 and t2. For each id in t1 I have multiple records in t2. I want to match the closest timestamp of t2 to each record of t1. In t1 there is a flag, if it's 1 I want to match the closest timetamp of t2 that's smaller and if it's 0 I want to match the closest timestamp that is larger than that in t1.
So alltogether I have the following table:
T1
id, flag, timestamp
T2
id, timestamp
Is there an efficient way to do that?
Edit, here is some example:
T1
customer_id
timestamp_t1
flag
1
01.01.21 12:00
1
2
01.01.21 13:00
0
T2
customer_id
timestamp_t2
additional attributes
1
01.01.21 11:00
attribute1
1
01.01.21 10:00
attribute2
1
01.01.21 13:00
attribute3
2
01.01.21 11:00
attribute4
2
01.01.21 12:00
attribute5
2
01.01.21 14:00
attribute6
2
01.01.21 15:00
attribute7
Result:
customer_id
timetsamp_t1
timestamp_t2
flag
additional attributes
1
01.01.21 12:00
01.01.21 11:00
1
attribute1
2
01.01.21 13:00
01.01.21 14:00
0
attribute6
I hope this helps. As you can see. In the result, we matched 11:00 of T2 with 12:00 of T1 because the flag was 1 we chose the closest timestamp that was smaller than 12:00. We also matched 14:00 with 13:00, because the flag was 0 (so we matched the closest timestamp with id 2 that is larger than 13:00).

You could use correlated sub-queries to find the rows before/after the timestamp, and then use a CASE expression to pick which to join on...
SELECT
*
FROM
t1
INNER JOIN
t2
ON t2.id = CASE WHEN t1.flag = 1 THEN
(
SELECT t2.id
FROM t2
WHERE t2.customer_id = t1.customer_id
AND t2.timestamp_t2 <= t1.timestamp_t1
ORDER BY t2.timestamp DESC
LIMIT 1
)
ELSE
(
SELECT t2.id
FROM t2
WHERE t2.customer_id = t1.customer_id
AND t2.timestamp_t2 >= t1.timestamp_t1
ORDER BY t2.timestamp ASC
LIMIT 1
)
END
Oh, you haven't included an id column in your example, this works similarly...
SELECT
*
FROM
t1
INNER JOIN
t2
ON t2.customer_id = t1.customer_id
AND t2.timestamp_t2
=
CASE WHEN t1.flag = 1 THEN
(
SELECT MAX(t2.timestamp_t2)
FROM t2
WHERE t2.customer_id = t1.customer_id
AND t2.timestamp_t2 <= t1.timestamp_t1
)
ELSE
(
SELECT MIN(t2.timestamp_t2)
FROM t2
WHERE t2.customer_id = t1.customer_id
AND t2.timestamp_t2 >= t1.timestamp_t1
)
END

Bigquery select rows where the logtime is below min(value) of other table logtime

Let say I have the following two tables :
Table 1:
ID log_time
1 2013-10-12
1 2014-11-15
2 2013-12-21
2 2016-12-21
3 2015-09-21
3 2018-03-21
Table 2:
ID log_time
1 2011-10-12
1 2012-11-15
2 2012-12-21
2 2017-12-21
3 2014-09-21
3 2019-03-21
I want to get rows of Table 2 which are below min(log_time) of Table1 for each ID.
The result should be like this:
ID log_time
1 2011-10-12
1 2012-11-15
2 2012-12-21
3 2015-09-21

This is join and aggregation:
select t2.*
from table2 t2 join
(select t1.id, min(t1.log_time) as min_log_time
from table1 t1
group by t1.id
) t1
on t2.id = t.id and t2.timestamp < t1.timestamp;
You can also express this as a correlated subquery:
select t2.*
from table2 t2
where t2.log_time < (select min(t1.log_time) from t1 where t1.id = t2.id);
Note that both of these formulations will return no rows for ids missing from table1 (which is quite consistent with your question).

Waterfall join conditions

I have two tables similar to:
Table 1 --unique ID's
ID Date
1 3/8/2017
2 3/8/2017
3 3/8/2017
Table 2
ID Date SourceID
1 3/8/2017 1
1 3/8/2017 2
1 3/8/2017 3
2 3/8/2017 2
3 3/8/2017 1
3 3/8/2017 3
And I want to write a query that has a result like:
Result
ID SourceID
1 2
2 2
3 1
Where the source ID ordering should be 2, 1, 3
I have:
select Table1.ID
, COALESCE(Join1.SourceID, Join2.SourceID, Join3.SourceID) as SourceID
from Table1
left outer join Table2 Join1
on Table1.date = Join1.date
and Table1.ID = Join1.ID
and Join1.SourceID = 2
left outer join Table2 Join2
on Table1.date = Join2.date
and Table1.ID = Join2.ID
and Join2.SourceID = 1
and Join1.SourceID is null
left outer join Table2 Join3
on Table1.date = Join3.date
and Table1.ID = Join3.ID
and Join3.SourceID = 3
and Join1.SourceID is null
and Join2.SourceID is null
But this currently just keeps the records where sourceid = 2 and does not add in the other sourceid's.
Thanks in advance for any help. Let me know if you need any clarification. Using SQL-Server. I only need a few and fixed amount of sources so I am avoiding using a cursor.

This is a prioritization query. I would do it using outer apply:
select t1.*, t2.sourceId
from table1 t1 outer apply
(select top 1 t2.*
from table2 t2
where t2.id = t1.id and t2.date = t1.date
order by (case t2.sourceid when 2 then 1 when 1 then 2 when 3 then 3 end)
) t2;
Note: For readability, you can simplify the order by to:
order by charindex(cast(t2.sourceId as varchar(255)), '2,1,3')
If you are uncomfortable with outer apply, you can do the same thing with a single join:
select t1.*, t2.sourceId
from table1 t1 join
(select t2.*,
row_number() over (partition by id, date
order by (case t2.sourceid when 2 then 1 when 1 then 2 when 3 then 3 end)
) as seqnum
from table2 t2
) t2
on t2.id = t1.id and t2.date = t1.date and t2.seqnum = 1;

SQL subtract times from tables (decreased additionally for specific time status from second table)

I have table T1 and T2.
T1
ID TIME1 TIME2
1001 1 10
1002 1 20
T2
ID STATUS TIME
1001 NEW 1
1001 CLOSED 10
1002 NEW 1
1002 HOLD 5
1002 CLOSED 13
I want result TIME2-TIME1 if status HOLD does not exist in table T2; or TIME2-TIME1-TIME if status HOLD exist in table for that record
1001 9 (10-1)
1002 14(20-1-5)
I initially wrote SQL query but this does not work because it returns NULL for first record while the for the second record result is OK
SELECT T1.ID,T1.TIME2-T1.TIME1-T2.TIME
FROM T1
LEFT OUTER JOIN T2 ON T1.ID=T2.ID AND T2.STATUS='HOLD'
Thanks

or in short
select
t1.id,
t1.time2-case when status='HOLD' then t2.time else 0 end-t1.time1
from
t1 left join t2 on t1.id=t2.id and t2.status='HOLD'

SELECT T1.ID
, Case
When T2.STATUS = 'HOLD' THEN T1.Time2 - T1.Time1 - T2.Time
Else T1.Time2 -T1.Time1
END
FROM T1
LEFT OUTER JOIN T2 ON T1.ID=T2.ID AND T2.STATUS='HOLD'

Evidently table T2 may contain more than one row matching T1.ID. Assuming that T2 always contains at least one such row, then we can do:
select T1.ID, min(case when T2.STATUS<>'HOLD' then T1.TIME2-T1.TIME1
when T2.STATUS='HOLD' then T1.TIME2-T1.TIME1-T2.[TIME] end)
from T1 join T2
on T1.ID=T2.ID
group by T1.ID

Selecting time intervals of value live - missing first and last intervals

I've got a table with following structure
| ChangedDate | IDParameter | ChangedTo(bit column) |
So I need to get time intervals when my parameter is True or False, like following
| IDParameter | ChangedToDate1 | ChangedToDate2 | ChangedTo(true to false || false to true)
and I do
With RankedDates As
(
Select T1.[ChangedDate], T1.ID, T1.[ChangedToValue]
, Row_Number() Over( Partition By T1.ID Order By T1.[ChangedDate] ) As Num
From [Changes] As T1
)
SELECT T1.[ID]
,T2.[ChangedToValue]
,T1.[ChangedDate] AS startDate
,T2.[ChangedDate] AS endDate
FROM [RankedDates] AS T1
Join RankedDates As T2
On T2.ID = T1.ID
And T2.Num = T1.Num + 1
And T2.[ChangedToValue] <> T1.[ChangedToValue]
Order By T2.[ChangedDate]
The trouble is that I am missing first and last intervals here. it must be NULL for start date if that is first and NULL for endDate for last interval for each Parameter ID. I guess I need add it with UNION but my trouble I can't understand how to add it for each IDParameter.
I don't know when value were changed first time and I don't know if the value will be changed in any time so I need NULL or some mindate for first intervals and NULL or some maxdate for last intervals.
ms sql server 2008
sorry for such complex question.
Example :
08.03.2011 ID1 0 -> 1
09.03.2011 ID1 1 -> 0
09.03.2011 ID2 0 -> 1
10.03.2011 ID1 0 -> 1
10.03.2011 ID2 1 -> 0
--->
NULL , 08.03.2011 ID1 is 0
NULL , 09.03.2011 ID2 is 0
08.03.2011, 09.03.2011 ID1 is 1
09.03.2011, 10.03.2011 ID2 is 1
09.03.2011, 10.03.2011 ID1 is 0
10.03.2011, NULL ID1 is 1
10.03.2011, NULL ID2 is 0

how about using FULL JOIN instead of JOIN?
Does it solve your problem?
EDIT:
I think this should work as you want.
select isnull(T1.ID, T2.ID) as ID
,isnull(T2.[ChangedToValue], case when T1.[ChangedToValue] = 1 then 0 else 1 end) as [ChangedToValue]
,T1.[ChangedDate] as startdate
,T2.[ChangedDate] as enddate
from [RankedDates] T1
full join [RankedDates] T2
on T2.num = T1.num +1
and T2.ID = T1.ID
and T1.[ChangedToValue] <> T2.[ChangedToValue]
order by
case when T2.[ChangedDate] is null then 1 else 0 end
,T2.[ChangedDate]
You where right about the ChangedToValue, I modified it to show the opposite now, if T2 is null.

Assuming thats how your base table looks:
ChangeDate IDParameter ChangedTo
2011-03-08 ID1 True
2011-03-09 ID1 False
2011-03-09 ID2 True
2011-03-10 ID1 True
2011-03-10 ID2 False
SELECT (SELECT TOP 1 t0.[ChangeDate] FROM [calendardb].[dbo].[Table_1] t0
WHERE t0.IDParameter = t1.IDParameter AND t0.ChangeDate < t1.ChangeDate ORDER
BY t0.ChangeDate DESC),
[ChangeDate]
,[IDParameter]
,[ChangedTo]
FROM [calendardb].[dbo].[Table_1] t1
UNION
SELECT MAX(ChangeDate) as maxd ,NULL,[IDParameter],
(SELECT ChangedTo FROM [calendardb].[dbo].[Table_1] t0 WHERE t0.ChangeDate = (SELECT MAX(ChangeDate) FROM [calendardb].[dbo].[Table_1]
GROUP BY [IDParameter] HAVING IDParameter = t1.IDParameter) AND t1.IDParameter = t0.IDParameter)
FROM [calendardb].[dbo].[Table_1] t1
GROUP BY [IDParameter]
will give you result like this:
NULL 2011-03-08 ID1 1
2011-03-08 2011-03-09 ID1 0
NULL 2011-03-09 ID2 1
2011-03-09 2011-03-10 ID1 1
2011-03-09 2011-03-10 ID2 0
2011-03-10 NULL ID1 1
2011-03-10 NULL ID2 0

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

MSSQL get rows which only differ at 2 columns - sql

To avoid duplicate value use GROUP BY SELECT t1.id1 ,t1.id2 ,t1.boolean ,t1.TIME FROM t1 t1 INNER JOIN t1 t2 ON t1.boolean != t2.boolean AND datediff(minute, t1.TIME, t2.TIME) <= 5 WHERE t1.id1 = t2.id1 AND t1.id2 = t2.id2 GROUP BY t1.id1 ,t1.id2 ,t1.boolean ,t1.TIME

SELECT D1.* FROM Data AS D1 WHERE EXISTS ( SELECT 1 FROM Data AS D2 WHERE D1.ID1 = D2.ID2 AND ~D1.Boolean = D2.Boolean AND ABS(DATEDIFF(MINUTE, D1.Time, D2.Time)) <= 5) ORDER BY D1.ID1, D1.Boolean, D1.Time

Related

Match nearest timestamp in Redshift SQL

Bigquery select rows where the logtime is below min(value) of other table logtime

Waterfall join conditions

SQL subtract times from tables (decreased additionally for specific time status from second table)

Selecting time intervals of value live - missing first and last intervals

Categories

Resources