SQL : select 2 consecutive rows with conditions - sql

I have a table event with 3 columns and would like to select two consecutive rows of the same case id with certain criteria (rules) as follows. I have about 5k+ of different case id to select based on the criteria given and below is just example of 2 case id. I have part of code to try, however, got stuck because i dont know how to select both rows if conditions is met.
Rules:
If D1 follows by D3 THEN Select both rows
IfElse D1 follows by D4 THEN Select both rows
IfElse D2 follows by D1 THEN Select both rows
IfElse D2 follows by D3 THEN Select both rows
IfElse D3 follows by D2 THEN Select both rows
IfElse D3 follows by D1 THEN Select both rows
Else Do not select
Table event:
caseID D Timestamp
-----------------------------------
1 D1 T1
1 D2 T2
1 D3 T3
1 D1 T4
1 D3 T5
1 D2 T6
1 D1 T7
1 D2 T8
1 D4 T9
2 D2 T1
2 D1 T2
2 D2 T3
2 D3 T4
2 D1 T5
2 D4 T6
2 D5 T7
Expected output:
caseID D Timestamp
----------------------------------
1 D2 T2
1 D3 T3
1 D1 T4
1 D3 T5
1 D2 T6
1 D1 T7
2 D2 T1
2 D1 T2
2 D2 T3
2 D3 T4
2 D1 T5
2 D4 T6
Code I might try:
SELECT caseID, D, Timestamp
FROM event e1
INNER JOIN event e2 ON e1.caseID = e2.caseID
WHERE
CASE #D
WHEN e1.D = D1 AND e2.D = D3 THEN ?

Here's one option using lead and lag with case:
select caseid, d, timestamp
from (
select *, lead(d) over (partition by caseId order by timestamp) lead,
lag(d) over (partition by caseId order by timestamp) lag
from event
) t
where 1 = case
when d = 'D1' and lead in ('D3','D4') then 1
when d = 'D2' and lead in ('D1','D3') then 1
when d = 'D3' and lead in ('D2','D1') then 1
when d = 'D1' and lag in ('D2', 'D3') then 1
when d = 'D2' and lag in ('D3') then 1
when d = 'D3' and lag in ('D2','D1') then 1
when d = 'D4' and lag in ('D1') then 1
else 0
end
order by caseid, timestamp
Online Demo
It could be consolidate, but wanted to be as explicit as possible to define your criteria.

Due to SQL-server 2008 didn't support Lag and Lead you can write a subquery to make it.
SELECT caseID,
D,
Timestamp
FROM (
select *,(
select TOP 1 D
FROM T tt
WHERE t1.caseID = tt.caseID
and t1.Timestamp < tt.Timestamp
ORDER BY tt.Timestamp
) nextD,(
select TOP 1 D
FROM T tt
WHERE t1.caseID = tt.caseID
and t1.Timestamp > tt.Timestamp
ORDER BY tt.Timestamp desc
) pervD
from T t1
) t1
WHERE (CASE WHEN d = 'D1' and nextD in ('D3','D4') OR
d = 'D2' and nextD in ('D1','D3') OR
d = 'D3' and nextD in ('D2','D1') OR
d = 'D1' and pervD in ('D2', 'D3') OR
d = 'D2' and pervD in ('D3') OR
d = 'D3' and pervD in ('D2','D1') OR
d = 'D4' and pervD in ('D1')
THEN D END) IS NOT NULL
sqlfiddle

Related

How to join on a default value from the 2nd table if no match is found?

I'm looking for a way to join 2 tables as follows:
T1: T2:
a b c d e
------- -------------
1 b1 1 d1 e1
2 b2 2 d2 e2
3 b3 ST d0 e0
--> join on T1.a = T2.c (if no match found join on T1.c = ST)
a b c d e
----------------------
1 b1 1 d1 e1
2 b2 2 b2 e2
3 b3 ST d0 e0 <- No match found so ST values are used.
Right now I only found a way when T2.c are integers. I do a conditional join en afterwards i take the max value of c en group by every other column.
Is there any way to do this with string values like in the example in the match column?
Thanks
You want a default. You can use left join:
select t1.*, coalesce(t2.c, t2def.c) as c,
coalesce(t2.d, t2def.d) as d, coalesce(t2.e, t2def.e) as e
from t1 left join
t2
on t1.a = t2.c left join
t2 t2def
on t2def.c = 'ST';
Or, you can use apply:
select t1.*, t2.*
from t1 outer apply
(select top (1) t2.*
from t2
where t2.c in ('ST', t1.a)
order by (case when t2.c = 'ST' then 2 else 1 end)
) t2;

SQL add values of rows, if columns are switched

After a join of the same table, I have a result like this:
c1 c2 count
A B 5
A C 4
B A 2
B C 2
C A 1
Now, the numbers should been added, if c1 and c2 are switched, like this:
c1 c2 count
A B 7
A C 5
B C 2
How can this be done with a query?
Using a left join to self join the table on inverse positions and returning those where c1 is less than c2, or it had no matching row. Using coalesce to add 0 when the left joined count is null.
select
t.c1
, t.c2
, t.count + coalesce(s.count,0) as count
from t
left join t as s
on t.c1 = s.c2
and t.c2 = s.c1
where t.c1 < t.c2 or s.c1 is null
rextester demo in sql server: http://rextester.com/VBQI62112
returns:
+----+----+-------+
| c1 | c2 | count |
+----+----+-------+
| A | B | 7 |
| A | C | 5 |
| B | C | 2 |
+----+----+-------+
Many databases support least() and greatest(). If they are available, you can do:
select least(c1, c2) as c1, greatest(c1, c2) as c2, sum(count) as cnt
from (<your query here>) t
group by least(c1, c2), greatest(c1, c2);
In databases that don't support these functions, you can use case.
Note: The semantics of least() and greatest() return NULL if either column is NULL, so you may need to be careful if either value could be NULL.
Perhaps join the output c1,c2 with the same c2,c1?
select t1.c1
,t1.c2
,sum(coalesce(t1.count,0), coalesce(t2.count,0))
from table t1
left join table t2
on t1.c1 = t2.c2
and t1.c2 = t2.c1
group by t1.c1, t1.c2
having t1.c1 < t1.c2
SELECT t.c1
, t.c2
, t.cnt + CASE WHEN s.cnt IS NULL THEN 0 ELSE s.cnt END as cnt
FROM t
LEFT JOIN
t as s
ON t.c1 = s.c2
AND t.c2 = s.c1
WHERE t.c1 < t.c2;

Selecting the the last row in a partition in HIVE

I have a table t1:
c1 | c2 | c3| c4
1 1 1 A
1 1 2 B
1 1 3 C
1 1 4 D
1 1 4 E
1 1 4 F
2 2 1 A
2 2 2 A
2 2 3 A
I want to select the last row of each c1, c2 pair. So (1,1,4,F) and (2,2,3,A) in this case. My idea is to do something like this:
create table t2 as
select *, row_number() over (partition by c1, c2 order by c3) as rank
from t1
create table t3 as
select a.c1, a.c2, a.c3, a.c4
from t2 a
inner join
(select c1, c2, max(rank) as maxrank
from t2
group by c1, c2
)
on a.c1=b.c1 and a.c2=b.c1
where a.rank=b.maxrank
Would this work? (Having environment issues so can't test myself)
Just use a subquery:
select t1.*
from (select t1.*, row_number() over (partition by c1, c2 order by c3 desc) as rank
from t1
) t1
where rank = 1;
Note the use of desc for the order by.

Query to update row which contains maximum column value in MS SQL 2008

I have a table similar to below:
C1 C2 C3
A 5 0
A 15 0
A 2 0
B 5 0
B 8 0
Result table updates C3 with 1 for mac value of C2 group by C1
C1 C2 C3
A 5 0
A 15 1
A 2 0
B 5 0
B 8 1
For SQL Server 2005+:
;WITH CTE AS
(
SELECT *,
MAX(C2) OVER(PARTITION BY C1) MaxC2
FROM YourTable
)
UPDATE CTE
SET C3 = CASE WHEN MaxC2 = C2 THEN 1 ELSE 0 END
Supossing table is named table_name
UPDATE t1 SET t1.C3 = 1
FROM table_name as t1 INNER JOIN (
SELECT C1, MAX(C2) AS C2 FROM table_name GROUP BY C1
) AS t2 ON t1.C1 = t2.C1 AND t1.C2 = t2.C2

Returning Unique rows from Left Outer Join

I am trying to build a query which will give me unique rows. Details:-
Table 1 (F1, F2 are the columns)
F1 F2
1 A1
2 A2
3 A3
4 A4
Table 2 (F3,F4 are the columns)
F3 F4
1 B1
1 B11
2 B2
2 B22
My Query (Incorrect)
select rrn(A), F1,F2,F3,F4 from rush2hem1.T1 A left outer join rush2hem1.T2 B on A.F1=B.F3
This gives me below output which is not what I am looking for:-
RRN F1 F2 F3 F4
1 1 A1 1 B1
1 1 A1 1 B11
2 2 A2 2 B2
2 2 A2 2 B22
3 3 A3 (null) (null)
4 4 A4 (null) (null)
Expected output that I am building query for is:-
RRN F1 F2 F3 F4
1 1 A1 1 B1
2 2 A2 2 B2
3 3 A3 (null) (null)
4 4 A4 (null) (null)
Please let me know if you have any suggestions.
This problem could be solved differently in different RDBMS. In any case, you have to specify which one record from Table2 do you want to get (by order by clause)
If your database have window function row_number(), you can use it like this:
select
F1, F2, F3, F4
from (
select
T1.F1, T1.F2, T2.F3, T2.F4,
-- order by specifying which row you would get from Table2
row_number() over(partition by T1.F1 order by T2.F4) as rn
from Table1 as T1
left outer join Table2 as T2 on T2.F3 = T1.F1
) as cte
where rn = 1;
In SQL Server, you can use outer apply:
select
T1.F1, T1.F2, T2.F3, T2.F4
from Table1 as T1
outer apply (
select top 1 T2.F3, T2.F4
from Table2 as T2
where T2.F3 = T1.F1
order by T2.F3 asc -- This is important line
) as T2;
In PostgreSQL, you can use distinct on syntax:
select distinct on (T1.F1)
T1.F1, T1.F2, T2.F3, T2.F4
from Table1 as T1
left outer join Table2 as T2 on T2.F3 = T1.F1
order by T1.F1, T2.F4; -- sort by F4 is important
SQL Server sqlfiddle demo
PostgreSQL sqlfiddle demo
Not tested, It's a SQL server version.
select rrn(A), F1,F2,F3,F4
from
(
select rrn(A), F1,F2,F3,F4,row_number() over(partition by RRN order by RRN) as rn
from rush2hem1.T1 A left outer join rush2hem1.T2 B
on A.F1=B.F3
) as dt
where dt.rn = 1
Please check the result with OUTER APPLY
SELECT
*
FROM
Table1 a
OUTER APPLY
(SELECT
TOP 1 *
FROM
Table2 b
WHERE a.F1=b.F3)c