Issues with SQL Select utilizing Except and UNION All - sql

Select *
From (
Select a
Except
Select b
) x
UNION ALL
Select *
From (
Select b
Except
Select a
) y
This sql statement returns an extremely wrong amount of data. If Select a returns a million, how does this entire statement return 100,000? In this instance, Select b contains mutually exclusive data, so there should be no elimination due to the except.

As already stated in the comment, EXCEPT does an implicit DISTINCT, according to this and the ALL in your UNION ALL cannot re-create the duplicates. Hence you cannot use your approach if you want to keep duplicates.
As you want to get the data that is contained in exactly one of the tables a and b, but not in both, a more efficient way to achieve that would be the following (I am just assuming the tables have columns id and c where id is the primary key, as you did not state any column names):
SELECT CASE WHEN a.id IS NULL THEN 'from b' ELSE 'from a' END as source_table
,coalesce(a.id, b.id) as id
,coalesce(a.c, b.c) as c
FROM a
FULL OUTER JOIN b ON a.id = b.id AND a.c = b.c -- use all columns of both tables here!
WHERE a.id IS NULL OR b.id IS NULL
This makes use of a FULL OUTER JOIN, excluding the matching records via the WHERE conditions, as the primary key cannot be null except if it comes from the OUTER side.
If your tables do not have primary keys - which is bad practice anyway - you would have to check across all columns for NULL, not just the one primary key column.
And if you have records completely consisting of NULLs, this method would not work.
Then you could use an approach similar to your original one, just using
SELECT ...
FROM a
WHERE NOT EXISTS (SELECT 1 FROM b WHERE <join by all columns>)
UNION ALL
SELECT ...
FROM b
WHERE NOT EXISTS (SELECT 1 FROM a WHERE <join by all columns>)

If you're trying to get any data that is in one table and not in the other regardless of which table, I would try something like the following:
select id, 'table a data not in b' from a where id not in (select id from b)
union
select id, 'table b data not in a' from b where id not in (select id from a)

Related

SQL query to append values not contained in second table

I have table A and table B with different number of columns but both containing a column with IDs. Table A contains more complete list of IDs and table B contains some of the IDs from the table A.
I would like to return resulting table B with original information plus appended IDs that are missing in B but contained in A. For these appended rows, other columns should be blank while column with IDs in B should just contain missing ID values.
Simple solution UNION ALL, with NOT EXISTS:
select b.id, b.c1, ..., b.cn
from b
UNION ALL
select distinct a.id, null, ..., null -- should be same number of columns as in the above select
from a
where not exists (select 1 from b where b.id = a.id)
I think you described left join:
select *
from b left join
a
using (id)

Value present in more than one table

I have 3 tables. All of them have a column - id. I want to find if there is any value that is common across the tables. Assuming that the tables are named a.b and c, if id value 3 is present is a and b, there is a problem. The query can/should exit at the first such occurrence. There is no need to probe further. What I have now is something like
( select id from a intersect select id from b )
union
( select id from b intersect select id from c )
union
( select id from a intersect select id from c )
Obviously, this is not very efficient. Database is PostgreSQL, version 9.0
id is not unique in the individual tables. It is OK to have duplicates in the same table. But if a value is present in just 2 of the 3 tables, that also needs to be flagged and there is no need to check for existence in he third table, or check if there are more such values. One value, present in more than one table, and I can stop.
Although id is not unique within any given table, it should be unique across the tables; a union of distinct id should be unique, so:
select id from (
select distinct id from a
union all
select distinct id from b
union all
select distinct id from c) x
group by id
having count(*) > 1
Note the use of union all, which preserves duplicates (plain union removes duplicates).
I would suggest a simple join:
select a.id
from a join
b
on a.id = b.id join
c
on a.id = c.id
limit 1;
If you have a query that uses union or group by (or order by, but that is not relevant here), then you need to process all the data before returning a single row. A join can start returning rows as soon as the first values are found.
An alternative, but similar method is:
select a.id
from a
where exists (select 1 from b where a.id = b.id) and
exists (select 1 from c where a.id = c.id);
If a is the smallest table and id is indexes in b and c, then this could be quite fast.
Try this
select id from
(
select distinct id, 1 as t from a
union all
select distinct id, 2 as t from b
union all
select distinct id, 3 as t from c
) as t
group by id having count(t)=3
It is OK to have duplicates in the same table.
The query can/should exit at the first such occurrence.
SELECT 'OMG!' AS danger_bill_robinson
WHERE EXISTS (SELECT 1
FROM a,b,c -- maybe there is a place for old-style joins ...
WHERE a.id = b.id
OR a.id = c.id
OR c.id = b.id
);
Update: it appears the optimiser does not like carthesian joins with 3 OR conditions. The below query is a bit faster:
SELECT 'WTF!' AS danger_bill_robinson
WHERE exists (select 1 from a JOIN b USING (id))
OR exists (select 1 from a JOIN c USING (id))
OR exists (select 1 from c JOIN b USING (id))
;

Combining four tables in SQL Server

I have four tables Table A, Table B, Table C and Table D. The schema of all four tables are identical. I need to union these four tables in the following way:
If a record is present in Table A then that is considered in the output table.
If a record is present in Table B then it is considered in the output table ONLY if it is not present in Table A.
If a record is present in Table C then it is considered ONLY if it is not present in Table A and Table B.
If a record is present in Table D then it is considered ONLY if it is not present in Table A, Table B, and Table C.
Note -
Every table has a column which identifies the table itself for every record (I don't know if this is of any importance)
Records are identified based on a particular column - Column X which is not unique even within each table
You could do something like (only two cases shown but you should see how to extend this)
WITH CTE1 AS
(
SELECT 't1' as Source, X, Y
FROM t1
UNION ALL
SELECT 't2' as Source, X, Y
FROM t2
), CTE2 AS
(
SELECT *,
RANK() OVER (PARTITION BY X
ORDER BY CASE Source
WHEN 't1' THEN 1
WHEN 't2' THEN 2
END) As RN
FROM CTE1
)
SELECT X,Y
FROM CTE2
WHERE RN=1
I would be inclined to do this using not exists:
select a.*
from a
union all
select b.*
from b
where not exists (select 1 from a where a.x = b.x)
union all
select c.*
from c
where not exists (select 1 from a where a.x = c.x) and
not exists (select 1 from b where b.x = c.x)
union all
select d.*
from d
where not exists (select 1 from a where a.x = d.x) and
not exists (select 1 from b where b.x = d.x) and
not exists (select 1 from c where c.x = d.x);
If you have an index on the x column in each table, then this should be the fastest method.
This will work as long as there are no NULL columns, or if columns for a record that exists in table with higher precedence are NULL you can assume the same column will NULL in tables with lower precedence.
SELECT coalesce(a.column1, b.column1, c.column1, d.column1) column1
,coalesce(a.column2, b.column2, c.column2, d.column2) column2
,coalesce(a.column3, b.column3, c.column3, d.column3) column3
--...
,coalesce(a.columnN, b.columnN, c.columnN, d.columnN) columnN
FROM TableA a
FULL JOIN TableB b on b.ColumnX = a.ColumnX
FULL JOIN TableC c on c.ColumnX = a.ColumnX or c.ColumnX = b.ColumnX
FULL JOIN TableD d on d.ColumnX = a.ColumnX or d.ColumnX = b.ColumnX or d.ColumnX = c.ColumnX
If the NULL values matter, you can switch to a more-complicated (and likely slower) CASE version:
CASE WHEN a.columnX IS NOT NULL THEN a.column1
WHEN b.columnX IS NOT NULL THEN b.column1
WHEN c.columnX IS NOT NULL THEN c.column1
WHEN d.columnX IS NOT NULL THEN d.column1 END column1
Of course, you can mix and match, so columns that are not nullable can use the former syntax, and columns where NULL values matter use the latter.
Hopefully the purpose of this is to fix the broken schema and put this data all in the same table, where it belongs.
This might seem stupid, but if, by any chance, you can leave out the table-identifying column and you also want to eliminate duplicate records (from within one table) too then the most straightforward answer would be
select <all columns without table identifier> from tableA
union
select <all columns without table identifier> from tableB
union
select <all columns without table identifier> from tableC
...
This is exactly, what union was designed to do: add rows only if they do not already exist before.

Include table name in column from select wildcard sql

Is it possible to include table name in the returned column if I use wildcard to select all columns from tables?
To explain it further. Suppose I want to join two tables and both tables have the column name “name” and many other columns. I want to use wildcard to select all columns and not explicitly specifying each column name in the select.
Select *
From
TableA a,
TableB b
Where
a.id = b.id
Instead of seeing two column with same name "name", could I write a sql to return one column name as "a.name" (or TableA.name) and one as "b.name"(or TableB.name) without explicitly putting the column name in select?
I would prefer a solution for mssql but other database could be a reference too.
Thanks!
You can use select a.*, ' ', b.* from T1 a, T2 b to make it more visible where columns from T1 end and columns from T2 begin.
You are basically joining two tables on the ID field, so you will only see one column labeled "ID", not two, because you are asking to see only those records where the ID is the same in table a and table b: they share the same id.
Try ...
SELECT 'TableA' AS 'Table', A.* FROM TableA A
WHERE A.id IN (SELECT id FROM TableB)
UNION
SELECT 'TableB' AS 'Table', B.* FROM TableB B
WHERE B.id IN (SELECT id FROM TableA)
ORDER BY id, [Table]

Querying a table finding if child table's matching records exist in ANSI SQL

I have two tables A and B where there is one-to-many relationship.
Now I want some records from A and with this existence field that shows if B has any matching records. I don't want to use the count function as B has too many records that delays SQL execution. Either I don't want to use proprietary keywords like rownum of Oracle like below, as I need as much compatibility as possible.
select A.*, (
select 1 from B where ref_column = A.ref_column and rownum = 1
) existence
...
You would use left join + count anyway, select statement in select list can be executed multiple times while join will be done only once.
Also you can consider EXISTS:
select A.*, case when exists (
select 1 from B where ref_column = A.ref_column and rownum = 1
) then 1 else 0 end
Use an EXISTS clause. If the foreign key in B is indexed, performance should not be an issue.
SELECT *
FROM a
WHERE EXISTS (SELECT 1 FROM b WHERE b.a_id = a.id)