How to substract rows from one table from another only once - sql

I'm working for a university project, and I have the following question:
I have 2 tables in a Oracle DB... I need to select those rows from table1, which are not included in table2... But the main problem is that I need to exclude that rows from table2 wich was selected once... For example:
Table1 Table2 ResultTable
id | Number | Letter id | Number | Letter id | Number | Letter
_____________________ _____________________ _____________________
1 4 S 1 6 G 2 2 P
2 2 P 2 8 B 3 5 B
3 5 B 3 4 S 4 4 S
4 4 S 4 1 A 6 2 P
5 1 A 5 1 H
6 2 P 6 2 X
So, how you see it, if one row from Table1 has a "twin" in Table2, they both are excluded.

Probably the most thorough query is this:
SELECT table1.id,
table1.digit,
table1.letter
FROM ( SELECT id,
digit,
letter,
ROW_NUMBER() OVER (PARTITION BY digit, letter ORDER BY id) rn
FROM table1
) table1
LEFT
JOIN ( SELECT id,
digit,
letter,
ROW_NUMBER() OVER (PARTITION BY digit, letter ORDER BY id) rn
FROM table2
) table2
ON table2.digit = table1.digit
AND table2.letter = table1.letter
AND table2.rn = table1.rn
WHERE table2.id IS NULL
ORDER
BY table1.id
;
which gives each record in table1 and table2 a "row number" within its group of "twins". For example, this:
SELECT id,
digit,
letter,
ROW_NUMBER() OVER (PARTITION BY digit, letter ORDER BY id) rn
FROM table1
ORDER
BY table1.id
;
returns this:
ID DIGIT LETT RN
---------- ---------- ---- ----------
1 4 S 1
2 2 P 1
3 5 B 1
4 4 S 2 -- second row with 4 S
5 1 A 1
6 2 P 2 -- second row with 2 P
That said, if you know that no (digit, letter) can ever appear more than once in table2, you can simplify this considerably by using EXISTS instead of ROW_NUMBER():
SELECT id,
digit,
letter
FROM table1 table1a
WHERE EXISTS
( SELECT 1
FROM table1
WHERE digit = table1a.digit
AND letter = table1a.letter
AND id < table1a.id
)
OR NOT EXISTS
( SELECT 1
FROM table2
WHERE digit = table1a.digit
AND letter = table1a.letter
)
;

Break it into parts.
Perhaps you have an EOR - Exclusive OR.
So you might have
(condition1
OR
condition2)
AND NOT
(condition1 AND condition2).

Use the Oracle MINUS keyword, which does exactly what you're asking for. See http://oreilly.com/catalog/mastorasql/chapter/ch07.html for more detail.

I can't see how to do what you want with one SQL SELECT.
I think you'll need a temporary table, and several statements.
Call it tmpResults, with id1 and id2, which match the id from Table1 and the id from Table2 respectively.
-- get matched rows - this is too many, we'll delete some later.
INSERT INTO tmpResults (id1, id2)
SELECT Table1.id id1, Table2.id id2
FROM Table1 INNER JOIN Table2
ON Table1.Number = Table2.Number AND Table1.Letter = Table2.Letter;
-- Delete where Table1 has matched more than 1 row
DELETE tmpResults
WHERE rowid IN
(SELECT tmpResults.RowId
FROM tmpResults
INNER JOIN
(SELECT id1, MAX(id2) id2m FROM tmpResults GROUP BY id1 HAVING count(*) > 1) m1
ON tmpResults.id1 = m1.id1 AND tmpResults.id2 = m1.id2m );
-- Delete where Table2 has matched more than 1 row
DELETE tmpResults
WHERE rowid IN
(SELECT tmpResults.RowId
FROM tmpResults
INNER JOIN
(SELECT MAX(id1) id1m, id2 FROM tmpResults GROUP BY id2 HAVING count(*) >1) m2
ON tmpResults.id1 = m2.id1m AND tmpResults.id2 = m2.id2 );
-- now tmpResults should have unique matches only, so we want Table1 where there is no match
SELECT Table1.*
FROM Table1
LEFT JOIN tmpResults
ON table1.id = tmpResults.id1
WHERE tmpResults.id2 IS NULL;

Related

Group by and filter on 2 Distinct values in SQL

I have table T1
ID Size
A 1
A 2
A 3
B 3
B 4
C 2
C 4
I want to group by ID and filter the smallest size for each ID
Desired outcome:
A 1
B 3
C 2
I tried doing something like this:
SELECT ID, Size
FROM T1
WHERE ID IN (SELECT DISTINCT ID FROM T1)
You want a basic GROUP BY query:
SELECT ID, MIN(Size) AS Size
FROM T1
GROUP BY ID;
SELECT T1.id, MIN(T1.SIZE) AS MinimumSize FROM T1 GROUP BY T1.ID
may be you can find your solution.

SQL: How do I combine tables on a single but non-unique identifier?

I have two tables:
TABLE 1
ID Value ValueFromTable2
1 A NULL
1 B NULL
1 C NULL
1 D NULL
2 E NULL
2 F NULL
TABLE 2
ID Value
1 A1
1 A2
1 A3
2 BOB
2 JIM
I would like to update TABLE 1 with the values of TABLE 2 such that the following rows would result:
TABLE 1
ID Value ValueFromTable2
1 A A1
1 B A2
1 C A3
1 D NULL
2 E BOB
2 F JIM
Order it not terribly important. That is, I'm not concerned that A be paired with A1 or that B be paired with A2. I just need a full set of data from the Value column in Table 2 to be available from Table 1.
Please advise!
You need a key for joining them. The implicit key is the ordering. You can add that in explicitly, using row_number():
select coalesce(t1.id, t2.id) as id,
t1.value, t2.value
from (select t1.*, row_number() over (partition by id order by (select NULL)) as seqnum
from table1 t1
) t1 full outer join
(select t2.*, row_number() over (partition by id order by (select NULL)) as seqnum
from table2 t2
) t2
on t1.id = t2.id and t1.seqnum = t2.seqnum;
By using full outer join, all values will appear, regardless of which is the longer list.

left join without duplicate values using MIN()

I have a table_1:
id custno
1 1
2 2
3 3
and a table_2:
id custno qty descr
1 1 10 a
2 1 7 b
3 2 4 c
4 3 7 d
5 1 5 e
6 1 5 f
When I run this query to show the minimum order quantities from every customer:
SELECT DISTINCT table_1.custno,table_2.qty,table_2.descr
FROM table_1
LEFT OUTER JOIN table_2
ON table_1.custno = table_2.custno AND qty = (SELECT MIN(qty) FROM table_2
WHERE table_2.custno = table_1.custno )
Then I get this result:
custno qty descr
1 5 e
1 5 f
2 4 c
3 7 d
Customer 1 appears twice each time with the same minimum qty (& a different description) but I only want to see customer 1 appear once. I don't care if that is the record with 'e' as a description or 'f' as a description.
First of all... I'm not sure why you need to include table_1 in the queries to begin with:
select custno, min(qty) as min_qty
from table_2
group by custno;
But just in case there is other information that you need that wasn't included in the question:
select table_1.custno, ifnull(min(qty),0) as min_qty
from table_1
left outer join table_2
on table_1.custno = table_2.custno
group by table_1.custno;
"Generic" SQL way:
SELECT table_1.custno,table_2.qty,table_2.descr
FROM table_1, table_2
WHERE table_2.id = (SELECT TOP 1 id
FROM table_2
WHERE custno = table_1.custno
ORDER BY qty )
SQL 2008 way (probably faster):
SELECT custno, qty, descr
FROM
(SELECT
custno,
qty,
descr,
ROW_NUMBER() OVER (PARTITION BY custno ORDER BY qty) RowNum
FROM table_2
) A
WHERE RowNum = 1
If you use SQL-Server you could use ROW_NUMBER and a CTE:
WITH CTE AS
(
SELECT table_1.custno,table_2.qty,table_2.descr,
RN = ROW_NUMBER() OVER ( PARTITION BY table_1.custno
Order By table_2.qty ASC)
FROM table_1
LEFT OUTER JOIN table_2
ON table_1.custno = table_2.custno
)
SELECT custno, qty,descr
FROM CTE
WHERE RN = 1
Demolink

how to detect a faulty sequence column with sql?

I have this table
ID | Seq
------------
A 1
A 2
A 3
B 1
B 2
B 3
B 3 <--duplicate seq where ID=B
C 1
C 2
C 4 <--missing seq id number 3
D 1
D 2
. .
. .
Is there a way to detect if/when there is an error in the logic of the Seq column, specifically if there are jumps and/or duplicates.
try this:
this should work both in sql server as well as Oracle
select ID,seq
from(
select ID,seq,
row_number() over (partition by id order by seq ) rn
from t_seq)a
where a.seq<>a.rn
SQL fiddle demo for SQL server
SQL Fiddle demo for Oracle
These are both SQL agnostic so should work in just about any rdbms.
This will check for a break in the sequence:
select t1.id, t1.seq
from t_seq t1
where
t1.seq <> 1
and not exists (
select *
from t_seq t2
where t2.id = t1.id
and t2.seq = t1. seq - 1
)
This will check for duplicates:
select t1.id, t1.seq
from mytable t1
group by t1.id, t1.se1
having count(*) > 1
To get the duplicates you can use the following T-SQL.
SELECT ID, Seq FROM MyTable GROUP BY ID, Seq HAVING COUNT(Seq) > 1
Edit
To find out the missing sequence numbers I have updated the code provided by njr101 to follows:
SELECT ID, Seq FROM MyTable t1 WHERE ID IN (
SELECT ID FROM MyTable
GROUP BY ID
HAVING COUNT(DISTINCT Seq) <> MAX(Seq)
) AND t1.seq <> 1 AND NOT EXISTS (
SELECT * FROM MyTable t2 WHERE t2.id=t1.id AND t2.seq = t1.seq - 1
)
ORDER BY ID
The first sub query counts the number of distinct rows for that ID (ignores duplicates). If that number is the same is the maximum number contained in the result set, the values should be fine for that ID. If it is not equal, the results will be available in the sub query.
The second part (with the help of njr101's query), filters the result set to only contain the last ID and seq where missing values should be inserted. Results below:
My Data
=========
A 1
A 2
A 3
A 20 <--- Missing (displayed in results)
B 1
B 2
B 3
B 3
B 4
C 1
C 2
C 4 <--- Missing (displayed in results)
C 5
C 15 <--- Missing (displayed in results)
C 16
Results
=======
A 20
C 4
C 15

SQL: Outputting Multiple Rows When Joining From Same Table

My question is this: Is it possible to output multiple rows when joining from the same table?
With this code for example, I would like it to output 2 rows, one for each table. Instead, what it does is gives me 1 row with all of the data.
SELECT t1.*, t2.*
FROM table t1
JOIN table t2
ON t2.id = t1.oldId
WHERE t1.id = '1'
UPDATE
Well the problem that I have with the UNION/UNION ALL is this: I don't know what the t1.oldId value is equal to. All I know is the id for t1. I am trying to avoid using 2 queries so is there a way I could do something like this:
SELECT t1.*
FROM table t1
WHERE t1.id = '1'
UNION
SELECT t2.*
FROM table t2
WHERE t2.id = t1.oldId
SAMPLE DATA
messages_users
id message_id user_id box thread_id latest_id
--------------------------------------------------------
8 1 1 1 NULL NULL
9 2 1 2 NULL 16
10 2 65 1 NULL 15
11 3 65 2 2 NULL
12 3 1 1 2 NULL
13 4 1 2 2 NULL
14 4 65 1 2 NULL
15 5 65 2 2 NULL
16 6 1 1 2 NULL
Query:
SELECT mu.id FROM messages_users mu
JOIN messages_users mu2 ON mu2.latest_id IS NOT NULL
WHERE mu.user_id = '1' AND mu2.user_id = '1' AND ((mu.box = '1'
AND mu.thread_id IS NULL AND mu.latest_id IS NULL) OR mu.id = mu2.latest_id)
This query fixes my problem. But it seems the answer to my question is to not use a JOIN but a UNION.
You mean one row for t1 and one row from t2?
You're looking for UNION, not JOIN.
select * from table where id = 1
union
select * from table where oldid = 1
If you are trying to multiply rows in a table, you need UNION ALL (not UNION):
select *
from ((select * from t) union all
(select * from t)
) t
I also sometimes use a cross join to do this:
select *
from t cross join
(select 1 as seqnum union all select 2) vals
The cross join is explicitly multiplying the number of rows, in this case, with a sequencenumber attached.
Well, since it's the same table, you could do:
SELECT t2.*
FROM table t1
JOIN table t2
ON t2.id = t1.oldId
OR t2.id = t1.id
WHERE t1.id = '1'