adding flag for common rows between two tables - sql

i have two tables say A and B. B is a subset of A. what i want to do is this : Add a flag column to table A(only for viewing, not permanently in the table) and the value of this flag should be yes for common rows between A and B and no for non common rows. For ex:
A table
Column1 Column2 Column3
X1 X2 X3
Y1 Y2 Y3
Z1 Z2 Z3
select * from A where column1=Y1; to get B
now my final output should be
Column1 Column2 Column3 FLAG
X1 X2 X3 NO
Y1 Y2 Y3 YES
Z1 Z2 Z3 NO
i have to everything below the code block in 1 sql statement(extracting B and adding flag).
i am just able to extract B. unable to add flag
Using oracle 11.2.0.2.0,sqlplus

Use an outer join to conditionally link tables A and B, then use a CASE() statement to test whether a given row in A matches a row in B.
select a.*
, case when b.column1 is not null then 'YES' else 'NO' end as flag
from a left outer join b
on a.column1 = b.column1
Note that this only works properly when there is just 0 or 1 instances of B.COLUMN1. If B contains multiple instances of any value of COLUMN1 then you can use this variant:
select a.*
, case when b.column1 is not null then 'YES' else 'NO' end as flag
from a left outer join ( select distinct column1 from b ) b
on a.column1 = b.column1

You could try something like this:
SELECT A.*,
CASE WHEN EXISTS
(SELECT Column1 FROM B WHERE Column1=A.Column1)
THEN "YES"
ELSE "NO"
END
FROM A
My PL-SQL is a bit rusty, example taken from here
You can also do a LEFT JOIN on B, and see if B.Column1 is NULL or not.

SELECT A.*, 'NO'
FROM A
WHERE NOT EXISTS
(SELECT 1 FROM B
WHERE B.COL1 = A.COL1
AND B.COL2 = A.COL2
AND B.COL3 = A.COL3) -- gets records only in A
UNION ALL
(SELECT B.*, 'YES') -- gets B records which are a subset of A
Since B is a subset of A - you already know these records should be tagged with a YES for your aliased column.
The classical way of removing records from one recordset where they exist or don't exist in another recordset is of course using the EXISTS clause.
The advantage of the EXISTS clause is it is a boolean operator and returns TRUE or FALSE to the call. And this return happens without the need for a full scan of the table - it is therefore faster (generally).
You could also choose to use a MINUS clause, it might be more efficient. Try turning on the EXPLAIN PLAN.

Related

Why 'where' statement seems to filter expected rows in SAS proc SQL?

I full joined 2 tables first and then full joined a 3rd table, now I got 1000 more rows in the result. But when I added a where statement following the join process I can only get 200 more rows and it seems that some expected rows were filtered. I don't know what I've done wrong.
proc sql;
create table ECG.RECON as
select a.SUBJID as SUBJID_004 ,
a.VISIT as VISIT_004,
input(a.EGDAT, yymmdd10.) as EGDAT_004 ,
...
b.SUBJID as SUBJID_001 ,
...
c.DSDECOD
from
SOURCE.A a full join SOURCE.B b on
(a.SUBJID = b.SUBJID and a.VISIT = b.VISIT )
full join SOURCE.C as c on b.SUBJID = c.SUBJID
where c.EPOCH = "SCR" and c.DSDECOD ne "FAILURE" and a.TEST = "Inter";
quit;
Your where clause is causing empty rows to be filtered. Consider a simplified schema:
TableA
Col1 Col2
----------------
1 A
2 B
TableB
Col1 Col2
----------------
1 X
3 Y
And a simple full join with no filter:
SELECT *
FROM TableA AS A
FULL JOIN TableB AS B
ON A.Col1 = B.Col1
Which will return
A.Col1 A.Col2 B.Col1 B.Col2
---------------------------------------
1 A 1 X
2 B NULL NULL
NULL NULL 3 Y
Now, if you apply a filter to anything from A, e.g. WHERE A.Col1 = 1, you'll get rid of the 2nd Row (probably as intended) since 2 <> 1, but you'll also remove the 3rd row, since A.Col is NULL, and NULL <> 1. As you have removed all rows with no matching record in TableA you have effectively turned your full join into a left join. If you then apply a further predicate on TableB, your left join becomes an inner join.
With Full joins, I find the easiest solution is to apply your filters before the join by using subqueries, e.g.:
SELECT *
FROM (SELECT * FROM TableA WHERE Col1 = 1) AS A
FULL JOIN TableB AS B
ON A.Col1 = B.Col1;
Which removes the 2nd row, but still retains the 3rd row from the previous results:
A.Col1 A.Col2 B.Col1 B.Col2
---------------------------------------
1 A 1 X
NULL NULL 3 Y
You can also use OR, but the more predicates you have the more convoluted this can get, e.g.
SELECT *
FROM TableA AS A
FULL JOIN TableB AS B
ON A.Col1 = B.Col1
WHERE (Col1 = 1 OR A.Col1 IS NULL);

Fetching fields with specific criteria - Oracle

I am trying to extract particular data from 2 tables based on specific criteria. But the result is not as expected. Can someone please help?
Criteria:
Need to fetch id pairs whose type is A alone.
Tables:
Table A
ID1 ID2
579643307310619501 644543316683180704
296151129721950503 328945291791563504
Table B
ID TYPE
579643307310619501 A
579643307310619501 B
579643307310619501 C
644543316683180704 A
296151129721950503 A
328945291791563504 A
Expected Result:
ID1 ID2
296151129721950503 328945291791563504
(Since only this pair is of type A alone, individually)
Note: The IDs, ID1 and ID2 both must be present in ID field of Table B.
What I've tried:
SELECT id1, id2
FROM A
JOIN B ON A.id1 = B.id
WHERE A.id1 IN (SELECT id FROM B)
AND A.id2 IN (SELECT id FROM B)
AND B.type='A'
GROUP BY id1, id2
HAVING count(*)=1;
In the approach below, I use a CTE to first identify all ID values having exclusively the 'A' type. Then I join TableA to this CTE, twice, to filter off any records either of whose ID1 or ID2 values are not in the exclusively 'A' type list.
WITH cte (ID) AS (
SELECT ID
FROM TableB
GROUP BY ID
HAVING SUM(CASE WHEN TYPE <> 'A' THEN 1 ELSE 0 END) = 0
)
SELECT a.ID1, a.ID2
FROM TableA a
INNER JOIN cte t1
ON a.ID1 = t1.ID
INNER JOIN cte t2
ON a.ID2 = t2.ID;
Find below a working demo (for SQL Server - I can't get Oracle to work anywhere).
Demo
Here is an Oracle solution using the MINUS operator.
The top sub-query gets the set of records where both ID1 and ID2 are of type 'A'. The bottom sub-query gets the set of records where either ID1 or ID2 is not of type 'A'. The result is the set of records in the top set which are not also in the bottom set.
select a.id1, a.id2
from a
join b b1 on b1.id = a.id1
join b b2 on b2.id = a.id2
where b1.type = 'A'
and b2.type = 'A'
minus
select a.id1, a.id2
from a
join b b1 on b1.id = a.id1
join b b2 on b2.id = a.id2
where b1.type != 'A'
or b2.type != 'A'
/
This SQL Fiddle demo returns the right row but there's a bit of a problem with its display: for some reason the numbers are rounded down.
Note on performance
This hits table A twice and table B four times. With small tables and a well-sized buffer cache this is not so important.
#TimBiegeleisen uses the WITH clause and that approach only hits each table once. However, Oracle will materialize the CTE as a temporary table. The overhead of doing this for such small amounts of data makes his solution consistently slower than mine. Including an /*+ inline */ hint in the CTE projection prevents Oracle from materializing the temporary table and the performance of the two queries becomes comparable.
However, if the tables become large enough there will be a point at which the WITH clause approach with a materialized temporary table is the more performative approach. As always with query tuning, the specifics matter greatly and benchmarking is the key to success.
Here is a sample for Oracle and my solution.
This is valid for any letter, A, B, C... If you want only for A, add an additional filter in the where of the main query.
create table a (id1 number,id2 number, constraint pk_a primary key(id1,id2));
create table b (id number, type char(1), constraint pk_b primary key(id,type));
insert into a values(57,64);
insert into a values(29,32);
insert into b values(57,'A');
insert into b values(57,'B');
insert into b values(57,'C');
insert into b values(64,'A');
insert into b values(29,'A');
insert into b values(32,'A');
commit;
select a.*
from a, b b1, b b2
where a.id1 = b1.id
and a.id2 = b2.id
and b1.type = b2.type
and not exists (select null
from b b1bis
where b1bis.id = b1.id
and b1.type <> b1bis.type)
and not exists (select null
from b b2bis
where b2bis.id = b2.id
and b2.type <> b2bis.type);

Combining four tables in SQL Server

I have four tables Table A, Table B, Table C and Table D. The schema of all four tables are identical. I need to union these four tables in the following way:
If a record is present in Table A then that is considered in the output table.
If a record is present in Table B then it is considered in the output table ONLY if it is not present in Table A.
If a record is present in Table C then it is considered ONLY if it is not present in Table A and Table B.
If a record is present in Table D then it is considered ONLY if it is not present in Table A, Table B, and Table C.
Note -
Every table has a column which identifies the table itself for every record (I don't know if this is of any importance)
Records are identified based on a particular column - Column X which is not unique even within each table
You could do something like (only two cases shown but you should see how to extend this)
WITH CTE1 AS
(
SELECT 't1' as Source, X, Y
FROM t1
UNION ALL
SELECT 't2' as Source, X, Y
FROM t2
), CTE2 AS
(
SELECT *,
RANK() OVER (PARTITION BY X
ORDER BY CASE Source
WHEN 't1' THEN 1
WHEN 't2' THEN 2
END) As RN
FROM CTE1
)
SELECT X,Y
FROM CTE2
WHERE RN=1
I would be inclined to do this using not exists:
select a.*
from a
union all
select b.*
from b
where not exists (select 1 from a where a.x = b.x)
union all
select c.*
from c
where not exists (select 1 from a where a.x = c.x) and
not exists (select 1 from b where b.x = c.x)
union all
select d.*
from d
where not exists (select 1 from a where a.x = d.x) and
not exists (select 1 from b where b.x = d.x) and
not exists (select 1 from c where c.x = d.x);
If you have an index on the x column in each table, then this should be the fastest method.
This will work as long as there are no NULL columns, or if columns for a record that exists in table with higher precedence are NULL you can assume the same column will NULL in tables with lower precedence.
SELECT coalesce(a.column1, b.column1, c.column1, d.column1) column1
,coalesce(a.column2, b.column2, c.column2, d.column2) column2
,coalesce(a.column3, b.column3, c.column3, d.column3) column3
--...
,coalesce(a.columnN, b.columnN, c.columnN, d.columnN) columnN
FROM TableA a
FULL JOIN TableB b on b.ColumnX = a.ColumnX
FULL JOIN TableC c on c.ColumnX = a.ColumnX or c.ColumnX = b.ColumnX
FULL JOIN TableD d on d.ColumnX = a.ColumnX or d.ColumnX = b.ColumnX or d.ColumnX = c.ColumnX
If the NULL values matter, you can switch to a more-complicated (and likely slower) CASE version:
CASE WHEN a.columnX IS NOT NULL THEN a.column1
WHEN b.columnX IS NOT NULL THEN b.column1
WHEN c.columnX IS NOT NULL THEN c.column1
WHEN d.columnX IS NOT NULL THEN d.column1 END column1
Of course, you can mix and match, so columns that are not nullable can use the former syntax, and columns where NULL values matter use the latter.
Hopefully the purpose of this is to fix the broken schema and put this data all in the same table, where it belongs.
This might seem stupid, but if, by any chance, you can leave out the table-identifying column and you also want to eliminate duplicate records (from within one table) too then the most straightforward answer would be
select <all columns without table identifier> from tableA
union
select <all columns without table identifier> from tableB
union
select <all columns without table identifier> from tableC
...
This is exactly, what union was designed to do: add rows only if they do not already exist before.

SQL: How to know if a LEFT JOIN returned a row?

Simple problem. I have a simple SQL as thus...
SELECT a.Col1, a.Col2, XXX
FROM table1 AS a
LEFT JOIN table2 as b
ON b.Key1 = a.Key1
What can I put in the 'XXX' above to say something like "does table B exists?".
ie: EXISTS(b) AS YesTable2
I am hoping there is a simpler solution than just using CASE...END statements.
Thanks
You could use ISNULL(b.Key1, 'XXX') Or COALESCE for checking against multiple values for the first non null value.
Pick any column from b that is not allowed to be NULL. If there is a NULL there, the record does not exist. If there is a value there, the record does exist. If every column in b is allowed to be NULL (rare: you should always have something that's not nullable in the primary key) you'll have to build an expression that mimics the JOIN conditions.
I am hoping there is a simpler solution than just using CASE...END statements.
Your guess is spot-on: you can use CASE...END to compare b.Key1 to NULL, like this:
SELECT
a.Col1
, a.Col2
, CASE WHEN b.Key1 IS NOT NULL THEN 1 ELSE 0 END as YesTable2
FROM table1 AS a
LEFT JOIN table2 as b
ON b.Key1 = a.Key1
If you just want to know if a record exists, I would suggest using exists in the select clause:
SELECT a.Col1, a.Col2,
(CASE WHEN EXISTS (SELECT 1 FROM table2 b ON b.Key1 = a.Key1)
THEN 1 ELSE 0
END) as ExistsInTable2
FROM table1 a;
This version will guarantee that you do not get duplicated rows if there are multiple matches in the two tables.

filter duplicates in SQL join

When using a SQL join, is it possible to keep only rows that have a single row for the left table?
For example:
select * from A, B where A.id = B.a_id;
a1 b1
a2 b1
a2 b2
In this case, I want to remove all except the first row, where a single row from A matched exactly 1 row from B.
I'm using MySQL.
This should work in MySQL:
select * from A, B where A.id = B.a_id GROUP BY A.id HAVING COUNT(*) = 1;
For those of you not using MySQL, you will need to use aggregate functions (like min() or max()) on all the columns (except A.id) so your database engine doesn't complain.
It helps if you specify the keys of your tables when asking a question such as this. It isn't obvious from your example what the key of B might be (assuming it has one).
Here's a possible solution assuming that ID is a candidate key of table B.
SELECT *
FROM A, B
WHERE B.id =
(SELECT MIN(B.id)
FROM B
WHERE A.id = B.a_id);
First, I would recommend using the JOIN syntax instead of the outdated syntax of separating tables by commas. Second, if A.id is the primary key of the table A, then you need only inspect table B for duplicates:
Select ...
From A
Join B
On B.a_id = A.id
Where Exists (
Select 1
From B B2
Where B2.a_id = A.id
Having Count(*) = 1
)
This avoids the cost of counting matching rows, which can be expensive for large tables.
As usual, when comparing various possible solutions, benchmarking / comparing the execution plans is suggested.
select
*
from
A
join B on A.id = B.a_id
where
not exists (
select
1
from
B B2
where
A.id = b2.a_id
and b2.id != b.id
)