Fetching fields with specific criteria - Oracle - sql

I am trying to extract particular data from 2 tables based on specific criteria. But the result is not as expected. Can someone please help?
Criteria:
Need to fetch id pairs whose type is A alone.
Tables:
Table A
ID1 ID2
579643307310619501 644543316683180704
296151129721950503 328945291791563504
Table B
ID TYPE
579643307310619501 A
579643307310619501 B
579643307310619501 C
644543316683180704 A
296151129721950503 A
328945291791563504 A
Expected Result:
ID1 ID2
296151129721950503 328945291791563504
(Since only this pair is of type A alone, individually)
Note: The IDs, ID1 and ID2 both must be present in ID field of Table B.
What I've tried:
SELECT id1, id2
FROM A
JOIN B ON A.id1 = B.id
WHERE A.id1 IN (SELECT id FROM B)
AND A.id2 IN (SELECT id FROM B)
AND B.type='A'
GROUP BY id1, id2
HAVING count(*)=1;

In the approach below, I use a CTE to first identify all ID values having exclusively the 'A' type. Then I join TableA to this CTE, twice, to filter off any records either of whose ID1 or ID2 values are not in the exclusively 'A' type list.
WITH cte (ID) AS (
SELECT ID
FROM TableB
GROUP BY ID
HAVING SUM(CASE WHEN TYPE <> 'A' THEN 1 ELSE 0 END) = 0
)
SELECT a.ID1, a.ID2
FROM TableA a
INNER JOIN cte t1
ON a.ID1 = t1.ID
INNER JOIN cte t2
ON a.ID2 = t2.ID;
Find below a working demo (for SQL Server - I can't get Oracle to work anywhere).
Demo

Here is an Oracle solution using the MINUS operator.
The top sub-query gets the set of records where both ID1 and ID2 are of type 'A'. The bottom sub-query gets the set of records where either ID1 or ID2 is not of type 'A'. The result is the set of records in the top set which are not also in the bottom set.
select a.id1, a.id2
from a
join b b1 on b1.id = a.id1
join b b2 on b2.id = a.id2
where b1.type = 'A'
and b2.type = 'A'
minus
select a.id1, a.id2
from a
join b b1 on b1.id = a.id1
join b b2 on b2.id = a.id2
where b1.type != 'A'
or b2.type != 'A'
/
This SQL Fiddle demo returns the right row but there's a bit of a problem with its display: for some reason the numbers are rounded down.
Note on performance
This hits table A twice and table B four times. With small tables and a well-sized buffer cache this is not so important.
#TimBiegeleisen uses the WITH clause and that approach only hits each table once. However, Oracle will materialize the CTE as a temporary table. The overhead of doing this for such small amounts of data makes his solution consistently slower than mine. Including an /*+ inline */ hint in the CTE projection prevents Oracle from materializing the temporary table and the performance of the two queries becomes comparable.
However, if the tables become large enough there will be a point at which the WITH clause approach with a materialized temporary table is the more performative approach. As always with query tuning, the specifics matter greatly and benchmarking is the key to success.

Here is a sample for Oracle and my solution.
This is valid for any letter, A, B, C... If you want only for A, add an additional filter in the where of the main query.
create table a (id1 number,id2 number, constraint pk_a primary key(id1,id2));
create table b (id number, type char(1), constraint pk_b primary key(id,type));
insert into a values(57,64);
insert into a values(29,32);
insert into b values(57,'A');
insert into b values(57,'B');
insert into b values(57,'C');
insert into b values(64,'A');
insert into b values(29,'A');
insert into b values(32,'A');
commit;
select a.*
from a, b b1, b b2
where a.id1 = b1.id
and a.id2 = b2.id
and b1.type = b2.type
and not exists (select null
from b b1bis
where b1bis.id = b1.id
and b1.type <> b1bis.type)
and not exists (select null
from b b2bis
where b2bis.id = b2.id
and b2.type <> b2bis.type);

Related

SQL Get rows that doesn't appear in another table

I have this SQL problem: I have tables A and B. Table A has columns id and name, Table B amount and id which is a foreign key to table A.id.
I need to return all table A rows that don't have their id stored in table B. Any ideas?
So the complete opposite is:
SELECT *
FROM a
LEFT OUTER JOIN b ON a.id = b.id;
Here row what I need is left out of result
Just add a where clause:
SELECT a.*
FROM a LEFT OUTER JOIN
b
ON a.id = b.id
WHERE b.id IS NULL;
You can also use NOT EXISTS:
select a.*
from a
where not exists (select 1 from b where b.id = a.id);
In most databases, the two methods typically have similar performance.

Combining four tables in SQL Server

I have four tables Table A, Table B, Table C and Table D. The schema of all four tables are identical. I need to union these four tables in the following way:
If a record is present in Table A then that is considered in the output table.
If a record is present in Table B then it is considered in the output table ONLY if it is not present in Table A.
If a record is present in Table C then it is considered ONLY if it is not present in Table A and Table B.
If a record is present in Table D then it is considered ONLY if it is not present in Table A, Table B, and Table C.
Note -
Every table has a column which identifies the table itself for every record (I don't know if this is of any importance)
Records are identified based on a particular column - Column X which is not unique even within each table
You could do something like (only two cases shown but you should see how to extend this)
WITH CTE1 AS
(
SELECT 't1' as Source, X, Y
FROM t1
UNION ALL
SELECT 't2' as Source, X, Y
FROM t2
), CTE2 AS
(
SELECT *,
RANK() OVER (PARTITION BY X
ORDER BY CASE Source
WHEN 't1' THEN 1
WHEN 't2' THEN 2
END) As RN
FROM CTE1
)
SELECT X,Y
FROM CTE2
WHERE RN=1
I would be inclined to do this using not exists:
select a.*
from a
union all
select b.*
from b
where not exists (select 1 from a where a.x = b.x)
union all
select c.*
from c
where not exists (select 1 from a where a.x = c.x) and
not exists (select 1 from b where b.x = c.x)
union all
select d.*
from d
where not exists (select 1 from a where a.x = d.x) and
not exists (select 1 from b where b.x = d.x) and
not exists (select 1 from c where c.x = d.x);
If you have an index on the x column in each table, then this should be the fastest method.
This will work as long as there are no NULL columns, or if columns for a record that exists in table with higher precedence are NULL you can assume the same column will NULL in tables with lower precedence.
SELECT coalesce(a.column1, b.column1, c.column1, d.column1) column1
,coalesce(a.column2, b.column2, c.column2, d.column2) column2
,coalesce(a.column3, b.column3, c.column3, d.column3) column3
--...
,coalesce(a.columnN, b.columnN, c.columnN, d.columnN) columnN
FROM TableA a
FULL JOIN TableB b on b.ColumnX = a.ColumnX
FULL JOIN TableC c on c.ColumnX = a.ColumnX or c.ColumnX = b.ColumnX
FULL JOIN TableD d on d.ColumnX = a.ColumnX or d.ColumnX = b.ColumnX or d.ColumnX = c.ColumnX
If the NULL values matter, you can switch to a more-complicated (and likely slower) CASE version:
CASE WHEN a.columnX IS NOT NULL THEN a.column1
WHEN b.columnX IS NOT NULL THEN b.column1
WHEN c.columnX IS NOT NULL THEN c.column1
WHEN d.columnX IS NOT NULL THEN d.column1 END column1
Of course, you can mix and match, so columns that are not nullable can use the former syntax, and columns where NULL values matter use the latter.
Hopefully the purpose of this is to fix the broken schema and put this data all in the same table, where it belongs.
This might seem stupid, but if, by any chance, you can leave out the table-identifying column and you also want to eliminate duplicate records (from within one table) too then the most straightforward answer would be
select <all columns without table identifier> from tableA
union
select <all columns without table identifier> from tableB
union
select <all columns without table identifier> from tableC
...
This is exactly, what union was designed to do: add rows only if they do not already exist before.

Select full outer join from many-to-many relationships

I am trying to do something in MSSQL which I suppose is a fairly simple and common thing in any database with many-to-many relationships. However I seem to always end up with a quite complicated select query, I seem to be repeating the same conditions several times to get the desired output.
The scenario is like this. I have 2 tables (table A and B) and a cross table with foreign keys to the ID columns of A and B. There can only be one unique pair of As and Bs in the crosstable (I guess the 2 foreign keys make up a primary key in the cross table ?!?). Data in the three tables could look like this:
TABLE A TABLE B TABLE AB
ID Type ID Type AID BID
--------------------------------------------------
R Up 1 IN R 3
S DOWN 2 IN T 3
T UP 3 OUT T 5
X UP 4 OUT Z 6
Y DOWN 5 IN
Z UP 6 OUT
Now let's say I select all rows in A of type UP and all rows in B of type OUT:
SELECT ID FROM A AS A1
WHERE Type = 'UP'
(Result: R, T, X, Z)
SELECT ID FROM B AS B1
WHERE Type = 'OUT'
(Result: 3, 4, 6)
What I want now is to fully outer join these 2 sub queries based on the relations listed in AB. Hence I want all IDs in A1 and B1 to be listed at least once:
A.ID B.ID
R 3
T 3
null 4
X null
Z 6
From this results set I want to be able to see:
- Which rows in A1 does not relate to any rows in B1
- Which rows in B1 does not relate to any rows in A1
- Relations between rows in A1 and B1
I have tried a couple of things such as:
SELECT A1.ID, B1.ID
FROM (
SELECT * FROM A
WHERE Type = 'UP') AS A1
FULL OUTER JOIN AB ON
A1.ID = AB.AID
FULL OUTER JOIN (
SELECT * FROM B
WHERE Type = 'OUT') AS B1
ON AB.BID = B1.ID
This doesn't work, since some of the relations listed in AB are between rows in A1 and rows NOT IN B1 OR between rows in B1 but NOT IN A1.
In other words - I seem to be forced to create a subquery for the AB table also:
SELECT A1.ID, B1.ID
FROM (
SELECT * FROM A
WHERE Type = 'UP') AS A1
FULL OUTER JOIN (
SELECT * FROM AB AS AB1
WHERE
AID IN (SELECT ID FROM A WHERE type = 'UP') AND
BID IN (SELECT ID FROM B WHERE type = 'OUT')
) AS AB1 ON
A1.ID = AB1.AID
FULL OUTER JOIN (
SELECT * FROM B
WHERE Type = 'OUT') AS B1
ON AB1.BID = B1.ID
That just seems like a rather complicated solution for a seemingly simply problem. Especially when you consider that for A1 and B1 subqueries with more (complex) conditions - possible involving joins to other tables (one-to-many) would require the same temporary joins and conditions to be repeated in the AB1 subquery.
I am thinking that there must be an obvious way to rewrite the above select statements in order to avoid having to repeat the same conditions several times. The solution is probably right there in front me, but I just can't see it.
Any help would be appreciated.
I think you could employ a CTE in this case, like this:
;WITH cte AS (
SELECT A.ID AS AID, A.Type AS AType, B.ID AS BID, B.Type AS BType
FROM A FULL OUTER JOIN AB ON A.ID = AB.AID
FULL OUTER JOIN B ON B.ID = AB.BID)
SELECT AID, BID FROM CTE WHERE AType = 'UP' OR BType = 'OUT'
The advantage of using a CTE is that it will be compiled once. Then you can add additional criteria to the WHERE clause outside the CTE
Check this SQL Fiddle

Query to check the consistency of records

I have four tables
TableA:
id1
id2
id3
value
TableB:
id1
desc
TableC:
id2
desc
TableD:
id3
desc
What I need to do is to check if all combinations of id1 id2 id3 from table B C and D exist in the TableA. In other words, table A should contain all possible combinations of id1 id2 and id3 which are stored in the other three tables.
This query evaluates all combinations of id1, id2 and id3 (the cross join) and finds which combinations are not present in table a.
select b.id1, c.id2, d.id3 from
TableB b cross join TableC c cross join TableD d WHERE NOT EXIST
(select 1 FROM TableA a WHERE a.id1=b.id1 AND a.id2=c.id2 AND a.id3=d.id3)
EDIT: With an RIGHT JOIN
SELECT allPerms.id1, allPerms.id2, allPerms.id3 FROM a RIGHT JOIN (select b.id1, c.id2, d.id3 from
TableB b cross join TableC c cross join TableD) allPerms
ON a.id1=allPerms.id1 AND a.id2=allPerms.id2 AND a.id3=allPerms.id3
WHERE a.id1 IS NULL
The two are pretty much the same. Since we are not actually fetching values from the joined table, some people prefer the first approach, since it captures the intent and spirit of the query. The second version is more "implementation oriented". A good optimizer will produce an efficient plan for both, but on some lesser RDBMSs, the second version will run faster.
With a predefined set for table D - id3 has values (2,5,6)
select b.id1, c.id2 from
TableB b cross join TableC c WHERE NOT EXIST
(select 1 FROM TableA a WHERE a.id1=b.id1 AND a.id2=c.id2 AND a.id3 IN (2,5,6))
But, this doesn't give you the id3 that is missing in the table A row. For that, I think the simplest is to emulate the table via a union, e.g.
select b.id1, c.id2, d.id3 from
TableB b, TableC c, (select 2 id3 union select 5 union select 6) d
WHERE NOT EXIST
(select 1 FROM TableA a WHERE a.id1=b.id1 AND a.id2=c.id2 AND a.id3=d.id3)
(This is still using cross join - it's implied if tables are separated by commas.)

filter duplicates in SQL join

When using a SQL join, is it possible to keep only rows that have a single row for the left table?
For example:
select * from A, B where A.id = B.a_id;
a1 b1
a2 b1
a2 b2
In this case, I want to remove all except the first row, where a single row from A matched exactly 1 row from B.
I'm using MySQL.
This should work in MySQL:
select * from A, B where A.id = B.a_id GROUP BY A.id HAVING COUNT(*) = 1;
For those of you not using MySQL, you will need to use aggregate functions (like min() or max()) on all the columns (except A.id) so your database engine doesn't complain.
It helps if you specify the keys of your tables when asking a question such as this. It isn't obvious from your example what the key of B might be (assuming it has one).
Here's a possible solution assuming that ID is a candidate key of table B.
SELECT *
FROM A, B
WHERE B.id =
(SELECT MIN(B.id)
FROM B
WHERE A.id = B.a_id);
First, I would recommend using the JOIN syntax instead of the outdated syntax of separating tables by commas. Second, if A.id is the primary key of the table A, then you need only inspect table B for duplicates:
Select ...
From A
Join B
On B.a_id = A.id
Where Exists (
Select 1
From B B2
Where B2.a_id = A.id
Having Count(*) = 1
)
This avoids the cost of counting matching rows, which can be expensive for large tables.
As usual, when comparing various possible solutions, benchmarking / comparing the execution plans is suggested.
select
*
from
A
join B on A.id = B.a_id
where
not exists (
select
1
from
B B2
where
A.id = b2.a_id
and b2.id != b.id
)