Find next row with specific value in a given row - sql

The table I have now looks something like this. Each row has a time value (on which the table is sorted in ascending order), and two values which can be replicated across rows:
Key TimeCall R_ID S_ID
-------------------------------------------
1 100 40 A
2 101 50 B
3 102 40 C
4 103 50 D
5 104 60 A
6 105 40 B
I would like to return something like this, wherein for each row, a JOIN is applied such that the S_ID and Time_Call of the next row that shares that row's R_ID is displayed (or is NULL if that row is the last instance of a given R_ID). Example:
Key TimeCall R_ID S_ID NextTimeCall NextS_ID
----------------------------------------------------------------------
1 100 40 A 102 C
2 101 50 B 103 D
3 102 40 C 105 B
4 103 50 D NULL NULL
5 104 60 A NULL NULL
6 105 40 B NULL NULL
Any advice on how to do this would be much appreciated. Right now I'm joining the table on itself and staggering the key on which I'm joining, but I know this won't work for the instance that I've outlined above:
SELECT TOP 10 Table.*, Table2.TimeCall AS NextTimeCall, Table2.S_ID AS NextS_ID
FROM tempdb..#Table AS Table
INNER JOIN tempdb..#Table AS Table2
ON Table.TimeCall + 1 = Table2.TimeCall
So if anyone could show me how to do this such that it can call rows that aren't just consecutive, much obliged!

Use LEAD() function:
SELECT *
, LEAD(TimeCall) OVER (PARTITiON BY R_ID ORDER BY [Key]) AS NextTimeCall
, LEAD(S_ID) OVER (PARTITiON BY R_ID ORDER BY [Key]) AS NextS_ID
FROM Table2
ORDER BY [Key]
SQLFiddle DEMO

This is only test example I had close by ... but i think it could help you out, just adapt it to your case, it uses Lag and Lead ... and it's for SQL Server
if object_id('tempdb..#Test') IS NOT NULL drop table #Test
create table #Test (id int, value int)
insert into #Test (id, value)
values
(1, 1),
(1, 2),
(1, 3)
select id,
value,
lag(value, 1, 0) over (order by id) as [PreviusValue],
lead(Value, 1, 0) over (order by id) as [NextValue]
from #Test
Results are
id value PreviusValue NextValue
1 1 0 2
1 2 1 3
1 3 2 0

Use an OUTER APPLY to select the top 1 value that has the same R_ID as the first Query and has a higher Key field
Just change the TableName to the actual name of your table in both parts of the query
SELECT a.*, b.TimeCall as NextTimeCall, b.S_ID as NextS_ID FROM
(
SELECT * FROM TableName as a
) as a
OUTER APPLY
(
SELECT TOP 1 FROM TableName as b
WHERE a.R_ID = b.R_ID
AND a.Key > B.Key
ORDER BY Key ASC
) as b
Hope this helps! :)

For older versions, here is one trick using Outer Apply
SELECT a.*,
nexttimecall,
nexts_id
FROM table1 a
OUTER apply (SELECT TOP 1 timecall,s_id
FROM table1 b
WHERE a.r_id = b.r_id
AND a.[key] < b.[key]
ORDER BY [key] ASC) oa (nexttimecall, nexts_id)
LIVE DEMO
Note : It is better to avoid reserved keywords(Key) as column/table names.

Related

Using Group By as part of a where clause

I'm trying to eliminate certain records from a dataset using SQL Server. The title of my post may be inaccurate, as a better solution may exist than what I have in mind.
In my query, I am selecting from Table A, and the rows that I want to end up with should meet the following criteria:
All rows where A.ItemNumber = B.ItemNumber
All rows where A.ItemNumber <> B.ItemNumber AND that row's Task value does not have another row that meets criteria #1.
So for the below example:
Gives us ItemNumber 102, 104, 106 rows.
Gives us ItemNumber 105 row.
100, 101 are removed from dataset because their Task (1) is associated with Table B at ItemNumber 102. Same for 103 with Task (2) being associated at ItemNumber 104.
Table A
Task ItemNumber
1 100
1 101
1 102
2 103
2 104
3 105
4 106
Table B
ItemNumber Data
102 aaa
104 bbb
106 ccc
My initial thought was to load Table A into a temp table, LEFT JOIN with Table B, and DELETE FROM {temp table} WHERE (data IS NULL AND {insert some kind of grouping logic here}). But I have been completely unable to figure out a grouping logic that will work for the problem. I spent the weekend hoping a solution would come to me, but am now giving in and seeking advice.
With a CTE that meets the 1st condition and UNION ALL to return the rest of the rows:
with cte as (
select a.*
from TableA a
where exists (select 1 from TableB where ItemNumber = a.ItemNumber)
)
select * from cte
union all
select a.* from TableA a
where not exists (select 1 from cte where Task = a.Task)
order by Task
See the demo.
Results:
Task ItemNumber
1 102
2 104
3 105
4 106
One way to phrase this puts all the filtering logic in the where clause:
select a.*
from tablea a
where exists (select 1
from tableb b
where b.itemnumber = a.itemnumber
) or
not exists (select 1
from tableb b2 join
tablea a2
on b2.itemnumber = a2.itemnumber
where a2.task = a.task
);
SELECT *
FROM TABLEA AS A
LEFT JOIN TABLEB AS B ON A.ItemNumber = B.ItemNumber
WHERE B.ItemNumber IS NOT NULL -- critera 1
OR (B.ItemNumber IS NULL AND B.ItemNumber NOT IN
(SELECT A.ItemNumber
FROM TABLEA AS A
JOIN TABLEB AS B ON A.ItemNumber = B.ItemNumber)) -- criteria 2

How can I select unique values from several columns in Oracle SQL?

Basically, I've got the following table:
ID | Amount
AA | 10
AA | 20
BB | 30
BB | 40
CC | 10
CC | 50
DD | 20
DD | 60
EE | 30
EE | 70
I need to get unique entries in each column as in following example:
ID | Amount
AA | 10
BB | 30
CC | 50
DD | 60
EE | 70
So far following snippet gives almost what I wanted, but first_value() may return some value, which isn't unique in current column:
first_value(Amount) over (partition by ID)
Distinct also isn't helpful, as it returns unique rows, not its values
EDIT:
Selection order doesn't matter
This works for me, even with the problematic combinations mentioned by Dimitri. I don't know how fast that is for larger volumes though
with ids as (
select id, row_number() over (order by id) as rn
from data
group by id
), amounts as (
select amount, row_number() over (order by amount) as rn
from data
group by amount
)
select i.id, a.amount
from ids i
join amounts a on i.rn = a.rn;
SQLFiddle currently doesn't work for me, here is my test script:
create table data (id varchar(10), amount integer);
insert into data values ('AA',10);
insert into data values ('AA',20);
insert into data values ('BB',30);
insert into data values ('BB',40);
insert into data values ('CC',10);
insert into data values ('CC',50);
insert into data values ('DD',20);
insert into data values ('DD',60);
insert into data values ('EE',30);
insert into data values ('EE',70);
Output:
id | amount
---+-------
AA | 10
BB | 20
CC | 30
DD | 40
EE | 50
I suggest using row_number() like this:
select ID ,Amount
from (
select ID ,Amount, row_number() over(partition by id order by 1) as rn
from yourtable
)
where rn = 1
However your expected results don't conform to a discrenable order, some are the first/lowest while some the last/highest so I wasn't sure what to include for the ordering.
My solution implements recursive with and makes following: first - select minival values of ID and amount, then for every next level searches values of ID and amount, which are more than already choosed (this provides uniqueness), and at the end query selects 1 row for every value of recursion level. But this is not an ultimate solution, because it is possible to find a combination of source data, where query will not work (I suppose, that such solution is impossible, at least in SQL).
with r (id, amount, lvl) as (select min(id), min(amount), 1
from t
union all
select t.id, t.amount, r.lvl + 1
from t, r
where t.id > r.id and t.amount > r.amount)
select lvl, min(id), min(amount)
from r
group by lvl
order by lvl
SQL Fiddle
I knew that there is an elegant solution! Thanks to friend of mine for a tip:
select max(ID), mAmount from (
select ID, max(Amount) mAmount from table group by ID
)
group by mAmount;
Maybe something like this can solve:
WITH tx AS
( SELECT ROWNUM ROW_NUMBER,
t.id,
t.amount
FROM test t
INNER JOIN test t2
ON t.id = t2.id
AND t.amount != t2.amount
ORDER BY t.id)
SELECT tx1.id, tx1.amount
FROM tx tx1
LEFT JOIN tx tx2
ON tx1.id = tx2.id
AND tx1.ROW_NUMBER > tx2.ROW_NUMBER
WHERE tx2.ROW_NUMBER IS NULL

Select from table based on the latest entry in another table in Oracle

I have a WHERE clause in a query that needs to see whether the latest entry in a related table meets certain criteria. However, I'm not able to inject the PK of the top query directly into the clause for a number of different reasons.
Is there any way to rewrite the following query to depend on the outer alias (ie. make ALIAS.pk work)? foo has a composite primary key.
(SELECT CASE WHEN EXISTS (
SELECT * FROM (
SELECT n.val1, n.val2 FROM (
SELECT * FROM foo f
WHERE f.val0 = 100 AND f.outerid = ALIAS.pk
ORDER BY f.date DESC
) n
WHERE n.rownum = 1
) t
WHERE t.val1 = 1 AND t.val2 = 2
) THEN 1 ELSE 0 END FROM dual) = 1
Edit: Outer table (bar):
id name city
1 Bob London
2 Mike Atlanta
3 Susan Toronto
Inner table (foo):
outerid date val1 val2 val100 fk1 fk2 fk3
1 2014-11-11 1 2 100 11 523 15
1 2014-11-11 1 2 101 14 12 87
1 2014-11-10 1 2 100 17 1667 12
2 2014-11-11 1 1 100 91 12 188
The primary key for foo is a composite key over fk1..3.
So what I need is to select the latest entry from foo that corresponds to a certain user and check that it has certain characteristics.
Edit 2:
SELECT CASE WHEN ({inner query})=1 THEN 1 ELSE 0 END WHERE id = 1 should return "1" SELECT SELECT CASE WHEN ({inner query})=1 THEN 1 ELSE 0 END WHERE id = 2 should return "0".
This may give you the output you require:
SELECT b.name
FROM bar b
INNER JOIN
(SELECT DISTINCT
f.outerid
FROM
(SELECT f.outerid
, f.val1
, f.val2
, f.date
, max(f.date) OVER
(PARTITION BY f.outerid
ORDER BY f.date) max_date
FROM foo f
WHERE f.val0 = 100) f
WHERE f.date = f.max_date
AND f.val1 = 1
AND f.val2 = 2) f
ON (f.outerid = b.id)

Fastest way to find distinct matching records

I have two tables A and B. Both have same structure. We find matching records between these two. Here are the scripts
CREATE TABLE HRS.A
(
F_1 NUMBER(5,0),
F_2 NUMBER(5,0),
F_3 NUMBER(5,0)
);
CREATE TABLE HRS.B
(
F_1 NUMBER(5,0),
F_2 NUMBER(5,0),
F_3 NUMBER(5,0)
);
INSERT INTO hrs.a VALUES (1,1000,2000);
INSERT INTO hrs.a VALUES (2,1100,8000);
INSERT INTO hrs.a VALUES (3,4000,3000);
INSERT INTO hrs.a VALUES (4,2000,5000);
INSERT INTO hrs.a VALUES (5,5000,3000);
INSERT INTO hrs.a VALUES (6,6000,6000);
INSERT INTO hrs.a VALUES (7,3000,7000);
INSERT INTO hrs.a VALUES (8,1100,9000);
INSERT INTO hrs.b VALUES (1,4000,2000);
INSERT INTO hrs.b VALUES (2,6000,8000);
INSERT INTO hrs.b VALUES (3,1000,3000);
INSERT INTO hrs.b VALUES (4,2000,5000);
INSERT INTO hrs.b VALUES (5,8000,3000);
INSERT INTO hrs.b VALUES (6,1100,6000);
INSERT INTO hrs.b VALUES (7,5000,7000);
INSERT INTO hrs.b VALUES (8,1000,9000);
To find matching records
SELECT a.F_1 A_F1, b.F_1 B_F1 FROM HRS.A, HRS.B WHERE A.F_2 = B.F_2
results
A_F1 B_F1
3 1
6 2
1 3
4 4
8 6
2 6
5 7
1 8
Now i want to remove duplicate entries in both columns separately e.g. 1 is repeating in A_F1 (regardless of B_F1) so row # 3(1-3) and 8(1-8) will be removed. Now 6 is repeating in B_F1 (regardless of A_F1) so row # 5(8-6) and 6(2-6) will be removed. Final result should be
A_F1 B_F1
3 1
6 2
4 4
5 7
Now most important part, These two tables contain 500,000 records each. I was first finding and inserting these matching records into a temp table, then removing duplicate from first column then from second column and then selecting all from temp table. This is too too slow. How can i achieve this as faster as possible?
Edit # 1
I executed following statements multiple times to generate 4096 records in each table
INSERT INTO hrs.a SELECT F_1 + 1, F_2 + 1, 0 FROM hrs.a;
INSERT INTO hrs.b SELECT F_1 + 1, F_2 + 1, 0 FROM hrs.b;
Now i executed all answers and found these
Rachcha 9.11 secs OK
techdo 1.14 secs OK
Gentlezerg 577 msecs WRONG RESULTS
Justin 218 msecs OK
Even #Justin took 37.69 secs for 65,536 records in each (total = 131,072)
Waiting for more optimized answers as actual number of records are 1,000,000 :)
Here is the execution plan of the query based on Justin's answer
Please try:
select A_F1, B_F1 From(
SELECT a.F_1 A_F1, b.F_1 B_F1,
count(*) over (partition by a.F_1 order by a.F_1) C1,
count(*) over (partition by b.F_1 order by b.F_1) C2
FROM HRS.A A, HRS.B B WHERE A.F_2 = B.F_2
)x
where C1=1 and C2=1;
How about an INNER JOIN instead? Please check with this query.
select A_F1, B_F1 From(
SELECT a.F_1 A_F1, b.F_1 B_F1,
count(*) over (partition by a.F_1 order by a.F_1) C1,
count(*) over (partition by b.F_1 order by b.F_1) C2
FROM HRS.A A INNER JOIN HRS.B B ON A.F_2 = B.F_2
)x
where C1=1 and C2=1;
Query:
SQLFIDDLEExample
SELECT a.f_1 AS a_f_1,
b.f_1 AS b_f_1
FROM a JOIN b ON a.f_2 = b.f_2
WHERE 1 = (SELECT COUNT(*)
FROM a aa JOIN b bb ON aa.f_2 = bb.f_2
WHERE aa.f_1 = a.f_1 )
AND 1 = (SELECT COUNT(*)
FROM a aa JOIN b bb ON aa.f_2 = bb.f_2
WHERE bb.f_1 = b.f_1 )
Result:
| A_F_1 | B_F_1 |
-----------------
| 3 | 1 |
| 6 | 2 |
| 4 | 4 |
| 5 | 7 |
According to #techdo 's answer, I think this can be better:
select A_F1, B_F1 From(
SELECT a.F_1 A_F1, b.F_1 B_F1,a.F_2,
count(*) OVER(PARTITION BY A.F_2) C
FROM HRS.A A, HRS.B B WHERE A.F_2 = B.F_2
)x
where C=1 ;
The existence of multi rows is due to the same f_2. This SQL has only one count..over,so you said you have vast data, I think this would be a little faster.
I have the answer.
See this fiddle here.
I used the following code:
WITH x AS (SELECT a.f_1 AS a_f_1, b.f_1 AS b_f_1
FROM a JOIN b ON a.f_2 = b.f_2)
SELECT *
FROM x x1
WHERE NOT EXISTS (SELECT 1
FROM x x2
WHERE (x2.a_f_1 = x1.a_f_1
AND x2.b_f_1 != x1.b_f_1)
OR (x2.a_f_1 != x1.a_f_1
AND x2.b_f_1 = x1.b_f_1)
)
;
EDIT
I used to following code that runs within 14 ms on SQL fiddle. I removed the common table expression and observed that the query performance improved.
SELECT a1.f_1 AS a_f1, b1.f_1 AS b_f1
FROM a a1 JOIN b b1 ON a1.f_2 = b1.f_2
WHERE NOT EXISTS (SELECT 1
FROM a a2 JOIN b b2 ON a2.f_2 = b2.f_2
WHERE (a2.f_1 = a1.f_1
AND b2.f_1 != b1.f_1)
OR (a2.f_1 != a1.f_1
AND b2.f_1 = b1.f_1))
;
Output:
A_F_1 B_F_1
3 1
6 2
4 4
5 7
Each one of these solutions are taking time, the best one (Justin) took almost 45 mins without even returning for 2 million records. I ended up with inserting matching records in a temp table and then removing duplicates and i found it much faster than these solutions with this data set.

SELECT only records which must fill two conditions

I have this table:
id type otherid
1 4 1234
2 5 1234
3 4 4321
As you can see there are 3 records, 2 of them belongs to otherid "1234" and got type of 4 and 5.
Last record belongs to otherid of "4321" and has only a type of 4.
I need to select all otherid that got only the type 4 and not the type5.
Example: after this select on that table the query shuould return only the record 3
Thanks
add1:
Please consider the TYPE can be any number from 1 up to 20.
I only need otherid that got type 4 but not type 5 ( except than that they can have any other type )
add2:
using mysql 5.1
This is kind of a workaround
SELECT * FROM (
SELECT GROUP_CONCAT('|',type,'|') type,other_id FROM table GROUP BY otherid
) t WHERE type LIKE '%|4|%' AND type NOT LIKE '%|5|%'
You could use a not exists subquery:
select distinct otherid
from YourTable as yt1
where yt1.type = 4
and not exists
(
select *
from YourTable as yt2
where yt1.otherid = yt2.otherid
and yt1.type <> yt2.type -- use this line for any difference
and yt2.type = 5 -- or this line to just exclude 5
)
Another way is by using a left join from where you exclude rows that have both type 4 and 5:
select a.*
from table1 a
left join table1 b on b.otherid = a.otherid and b.type = 5
where a.type = 4 and b.id is null