SQL statement to return non-intersection records - sql

I was recently asked this question and was a little stumped so I want to ask the experts...
Given two tables A & B, I want to return all the values from A and B that do not overlap. Think of two overlapping circles; how do we return all the data that is NOT in the overlapping center section? And, I had to use ANSI Standard SQL rather than Oracle syntax.
Assuming we want everything exclusive to both A & B, my answer was
select *
from A
cross join B
minus
(select a.common_column from a
intersect
select b.common_column)
Does this look correct, or even close? If it is correct, is there a more efficient way to do this?
BTW - my solution was soundly rejected....
Thank you!

Given the tables A and B, you are looking for (A U B) - (A & B). In other words, you need A union B minus their intersection. Remember A and B must be union-compatible for this query to work. I would do:
(select * from A
union
select * from B
)
minus
(select * from A
intersect
select * from B
)

May be full outer join?
select coalesce(A.col, B.col)
from A full outer join B on A.col = B.col
where A.col is null or B.col is null;

For computing a set symmetric difference, you can use a combination of MINUS and UNION ALL:
select * from (
(select * from A
minus
select * from B)
union all
(select * from B
minus
select * from A)
)

Your query was rejected because it is syntactically incorrect: the number of columns differ and it confuses cross join and union all. However, I think you have the right idea for solving this.
You can easily fix this:
(select *
from A
union all
select *
from B
) minus
(select *
from A
intersect
select *
from B
);
That is, combine everything using union all and then subtract the rows that occur in both tables.
Of course, if there is a single id, then you can use the id with join and other operations.

Just like Frank Schmitt answered in the meantime:
Here it is including a data example:
WITH
table_a(name) AS (
SELECT 'From_A_1'
UNION ALL SELECT 'From_A_2'
UNION ALL SELECT 'From_A_3'
UNION ALL SELECT 'From_A_4'
UNION ALL SELECT 'From_A_5'
UNION ALL SELECT 'From_BOTH_6'
UNION ALL SELECT 'From_BOTH_7'
UNION ALL SELECT 'From_BOTH_8'
)
,
table_b(name) AS (
SELECT 'From_B_1'
UNION ALL SELECT 'From_B_2'
UNION ALL SELECT 'From_B_3'
UNION ALL SELECT 'From_B_4'
UNION ALL SELECT 'From_B_5'
UNION ALL SELECT 'From_BOTH_6'
UNION ALL SELECT 'From_BOTH_7'
UNION ALL SELECT 'From_BOTH_8'
)
(SELECT * FROM table_a EXCEPT SELECT * FROM table_b)
UNION ALL
(SELECT * FROM table_b EXCEPT SELECT * FROM table_a)
ORDER BY name
;
name
From_A_1
From_A_2
From_A_3
From_A_4
From_A_5
From_B_1
From_B_2
From_B_3
From_B_4
From_B_5

You will need to select all the data from both tables, except where they overlap, and then combine the data with a union. The code provided should work for your example.
SELECT *
FROM
(
SELECT * FROM Table1
EXCEPT SELECT * FROM Table2
)
UNION
SELECT *
FROM
(
SELECT * FROM Table2
EXCEPT SELECT * FROM Table1
)
Hope this helps.

Related

How to Union Multiple Tables with same column names on Google Big Query?

I am trying to create a union with 3 tables that have the same column names. However, the queries tested seems not to be working.
The first query that I have used is the following:
Select * FROM table1
UNION ALL
SELECT * FROM table2
SELECT * FROM table3
UNION ALL
The second query used is the following:
SELECT *
FROM
(select * from table_1),
(select * from table_2),
(select * from table_3)
Both of them are not working for me. Please someone can help me with this?
This should work if columns are identical:
SELECT * FROM table1 UNION ALL
SELECT * FROM table2 UNION ALL
SELECT * FROM table3

Is there a SQL function to expand table?

I vaguely remember there being a function that does this, but I think I may be going crazy.
Say I have a datatable, call it table1. It has three columns: column1, column2, column3. The query
SELECT * FROM table1
returns all rows/columns from table1. Isn't there some type of EXPAND function that allows me to duplicate that result? For example, if I want to duplicate everything from the SELECT * FROM table1 query three times, I can do something like EXPAND(3) ?
In BigQuery, I would recommend a CROSS JOIN:
SELECT t1.*
FROM table1 CROSS JOIN
(SELECT 1 as n UNION ALL SELECT 2 UNION ALL SELECT 3) n;
This can get cumbersome for lots of copies, but you can simplify this by generating the numbers:
SELECT t1.*
FROM table1 CROSS JOIN
UNNEST(GENERATE_ARRAY(1, 3)) n
This creates an array with three elements and unnests it into rows.
In both these cases, you can include n in the SELECT to distinguish the copies.
Below is for BigQuery Standard SQL
I think below is close enough to what "got you crazy" o)
#standardSQL
SELECT copy.*
FROM `project.dataset.tabel1` t, UNNEST(FN.EXPAND(t, 3)) copy
To be able to do so, you can leverage recently announced support for persistent standard SQL UDFs, namely - you need to create FN.EXPAND() function as in below example (note: you need to have FN dataset in your project - or use existing dataset in which case you should use YOUR_DATASET.EXPAND() reference
#standardSQL
CREATE FUNCTION FN.EXPAND(s ANY TYPE, dups INT64) AS (
ARRAY (
SELECT s FROM UNNEST(GENERATE_ARRAY(1, dups))
)
);
Finally, if you don't want to create persistent UDF - you can use temp UDF as in below example
#standardSQL
CREATE TEMP FUNCTION EXPAND(s ANY TYPE, dups INT64) AS ( ARRAY(
SELECT s FROM UNNEST(GENERATE_ARRAY(1, dups))
));
SELECT copy.*
FROM `project.dataset.tabel1` t, UNNEST(EXPAND(t, 3)) copy
if you want a cartesian product (all the combination on a row ) you could use
SELECT a.*, b.*, c.*
FROM table1 a
CROSS JOIN table1 b
CROSS JOIN table1 c
if you want the same rows repeated you can use UNION ALL
SELECT *
FROM table1
UNION ALL
SELECT *
FROM table1
UNION ALL
SELECT *
FROM table1
Use union all
Select * from table1
Union all
Select * from table1
Union all
Select * from table1
Union all
Select * from table1
For reuse purposes can embed this code in a procedure like
Create Procedure
expandTable(tablename
varchar2(50))
As
Select * from table1
Union all
Select * from table1
Union all
Select * from table1
Union all
Select * from table1
End
/

find difference using UNION and minus

I am bit out of my wits why the sql below would not produce any row. Clearly there is an id 1 which is not in b and I expected that to be the output. I know I am missing some fundamentals on how union works - may be due to the fact that there is not output in the second minus?
Redshift:
WITh a as
(select 1 id union all select 2
)
,b as (select 2 id)
select * from a
minus
select * from b
union all
select * from b
minus
select * from a
Oracle-
WITh a as
(select 1 id from dual union all select 2 from dual
)
,b as (select 2 id from dual)
select * from a
minus
select * from b
union all
select * from b
minus
select * from a
There is an order of operations issue with the way you wrote your query. If you wrap the two sides of the union as subqueries, and select from them, then you get the result you expect:
select * from
(select * from a
minus
select * from b ) t1
union all
select * from
(select * from b
minus
select * from a ) t2
What appears to be happening is that first the following is run, leaving us with id=1:
select * from a
minus
select * from b
Then, this result is being unioned with a query on b:
(select * from a
minus
select * from b)
union all
select * from b
At this point, the result set again has both 1 and 2 in it. But now, we take a minus operation against table a:
(select * from a
minus
select * from b
union all
select * from b)
minus
select * from a
This results in an empty set, since (1,2) minus (1,2) leaves us with nothing.

SQL Order of processing

Consider I have a query like
select * from A
Except
select * from B
union all
select * from B
except
select * from A
Query is processed like
select *
from
(
select * from A
Except
select * from B
) a
union all
(
select * from B
Except
select * from A
) b
How is the order of processing defined in sql. Will it process like this at any case
select * from A
Except
select * from
(
select * from B
union all
select * from B
) a
except
select * from A
EXCEPT and UNION are processed "left to right". Meaning that without any parenthesis to make the determination they will process in the order they appear in the sql.
https://msdn.microsoft.com/en-us/library/ms188055.aspx
UNION and EXCEPT have equal precedence but will bind from left to right, meaning they are evaluated in "left to right" order, as they are processed.
From #SeanLange's URL (TL;DR), worth taking note of:
If EXCEPT or INTERSECT is used together with other operators in an
expression, it is evaluated in the context of the following
precedence:
Expressions in parentheses
The INTERSECT operator
EXCEPT and UNION evaluated from left to right based on their position in the
expression

Count rows in more than one table with tSQL

I need to count rows in more than one table in SQL Server 2008. I do this:
select count(*) from (select * from tbl1 union all select * from tbl2)
But it gives me an error of incorrect syntax near ). Why?
PS. The actual number of tables can be more than 2.
In case you have different number of columns in your tables try this way
SELECT count(*)
FROM (
SELECT NULL as columnName
FROM tbl1
UNION ALL
SELECT NULL
FROM tbl2
) T
try this:
You have to give a name to your derived table
select count(*) from
(select * from tbl1 union all select * from tbl2)a
I think you have to alias the SELECT in the FROM clause:
select count(*)
from
(
select * from tbl1
union all
select * from tbl2
) AS SUB
You also need to ensure that the * in both tables tbl1 and tbl2 return exactly the same number of columns and they have to be matched in their type.
I don't like doing the union before doing the count. It gives the SQL optimizer an opportunithy to choose to do more work.
AlexK's (deleted) solution is fine. You could also do:
select (select count(*) from tbl1) + (select count(*) from tbl2) as cnt