find difference using UNION and minus - sql

I am bit out of my wits why the sql below would not produce any row. Clearly there is an id 1 which is not in b and I expected that to be the output. I know I am missing some fundamentals on how union works - may be due to the fact that there is not output in the second minus?
Redshift:
WITh a as
(select 1 id union all select 2
)
,b as (select 2 id)
select * from a
minus
select * from b
union all
select * from b
minus
select * from a
Oracle-
WITh a as
(select 1 id from dual union all select 2 from dual
)
,b as (select 2 id from dual)
select * from a
minus
select * from b
union all
select * from b
minus
select * from a

There is an order of operations issue with the way you wrote your query. If you wrap the two sides of the union as subqueries, and select from them, then you get the result you expect:
select * from
(select * from a
minus
select * from b ) t1
union all
select * from
(select * from b
minus
select * from a ) t2
What appears to be happening is that first the following is run, leaving us with id=1:
select * from a
minus
select * from b
Then, this result is being unioned with a query on b:
(select * from a
minus
select * from b)
union all
select * from b
At this point, the result set again has both 1 and 2 in it. But now, we take a minus operation against table a:
(select * from a
minus
select * from b
union all
select * from b)
minus
select * from a
This results in an empty set, since (1,2) minus (1,2) leaves us with nothing.

Related

BigQuery: Symmetric difference (xor) between two sets

BigQuery has UNION, INTERSECT, and EXCEPT [1], but not XOR.
SELECT * FROM [0, 1,2,3] XOR SELECT * FROM [2,3,4]
would return
0
1
4
As 0 and 1 are present in the first select but not second, and 4 is present in the second select, but not first.
I'd like to use it to find discrepancies between two tables, eg find customers that are present in one table, but not other and vice versa.
Any hints how to best do it?
[1] https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#set_operators
BigQuery does not need a XOR operator as it can be obtained from existing operators:
a first way to do so as #Genato points it out is to use JOIN like in this issue
Another way is to use set operators: A XOR B can be translated as (A AND NOT B) OR (B AND NOT A), so with your example you could write
(
SELECT * FROM UNNEST(ARRAY<int64>[0, 1, 2, 3]) AS number
EXCEPT DISTINCT SELECT * FROM UNNEST(ARRAY<int64>[2, 3, 4]) AS number)
UNION ALL
(
SELECT * FROM UNNEST(ARRAY<int64>[2,3,4]) AS number
EXCEPT DISTINCT SELECT * FROM UNNEST(ARRAY<int64>[0, 1, 2, 3]) AS number);
which results in:
Few 'workarounds'
Option 1
with table1 as (
select * from unnest([0, 1,2,3]) num
), table2 as (
select * from unnest([2,3,4]) num
)
select * from table1 where not num in (select num from table2)
union all
select * from table2 where not num in (select num from table1)
Option 2
with table1 as (
select * from unnest([0,1,2,3]) num
), table2 as (
select * from unnest([2,3,4]) num
)
select num from (
select distinct num from table1 union all
select distinct num from table2
)
group by num
having count(*) = 1
in both cases - output is

Merge three tables in Select query by rule 3, 2, 1 records from each table

Merge three tables in a Select query by rule 3, 2, 1 records from each table as follows:
TableA: ID, FieldA, FieldB, FieldC,....
TableB: ID, FieldA, FieldB, FieldC,....
TableC: ID, FieldA, FieldB, FieldC,....
ID : auto number in each table
FieldA will be unique in all three tables.
I am looking for a Select query to merge three tables as follows:
TOP three records from TableA sorted by ID
TOP two records from TableB sorted by ID
TOP 1 record from TableC sorted by ID
Repeat this until select all records from all three tables.
If some table has fewer records or does not meet the criteria, ignore that and continue with others.
My attempt:
I did it totally through programming way, like cursors and If conditions inside a SQL Server stored procedure.
It makes delay.
This requires a formula that takes row numbers from each table and transforms it into a series of integers that skips the desired values.
In the query below, I am adding some CTE for the sake of shortening the formula. The real magic is in the UNION. Also, I am adding an additional field for your control. Feel free to get rid of it.
WITH A_Aux as (
SELECT 'A' As FromTable, ROW_NUMBER() OVER (ORDER BY ID) AS RowNum, TableA.*
FROM TableA
), B_Aux AS (
SELECT 'B' As FromTable, ROW_NUMBER() OVER (ORDER BY ID) AS RowNum, TableB.*
FROM TableB
), C_Aux AS (
SELECT 'C' As FromTable, ROW_NUMBER() OVER (Order BY ID) AS RowNum, TableC.*
FROM TableC
)
SELECT *
FROM (
SELECT RowNum+3*FLOOR((RowNum-1)/3) As ColumnForOrder, A_Aux.* FROM A_Aux
UNION ALL
SELECT 3+RowNum+4*FLOOR((RowNum-1)/2), B_Aux.* FROM B_Aux
UNION ALL
SELECT 6*RowNum, C_Aux.* FROM C_Aux
) T
ORDER BY ColumnForOrder
PS: note the pattern Offset + RowNum + (6-N) * Floor((RowNum-1)/N) to group N records together (it of course simplifies a lot for TableC).
PPS: I don't have a SQL server at hand to test it. Let me know if there is a syntax error.
You may try this..
GO
select * into #temp1 from (select * from table1) as t1
select * into #temp2 from (select * from table2) as t2
select * into #temp3 from (select * from table3) as t3
select * into #final from (select col1, col2, col3 from #temp1 where 1=0) as tb
declare #i int
set #i=1
while( (select COUNT(*) from #temp1)>#i)
Begin
;with ct1 as (
select ROW_NUMBER() over (order by id) as Slno, * from #temp1
),ct2 as (
select ROW_NUMBER() over (order by id) as Slno, * from #temp2
),ct3 as (
select ROW_NUMBER() over (order by id) as Slno, * from #temp3
),cfinal as (
select top 3 * from #temp1
union all
select top 2 * from #temp2
union all
select top 1 * from #temp3
)
insert into #final ( col1 , col2, col3 )
select col1, col2, col3 from cfinal
delete from #temp1 where id in (select top 3 ID from #temp1)
delete from #temp2 where id in (select top 2 ID from #temp2)
delete from #temp3 where id in (select top 1 ID from #temp3)
set #i = #i+1
End
Select * from #final
Drop table #temp1
Drop table #temp2
Drop table #temp3
GO
First create temp table for all 3 tables with each insert delete the inserted record and this will result you the desired result, if nothing is missing from my side.
Please see to this if this works.
There is not a lot of information to go with here, but I assume you can use UNION to combine multiple statements.
SELECT * TableA ORDER BY ID DESC OFFSET 3 ROWS
UNION
SELECT * TableB ORDER BY ID DESC OFFSET 2 ROWS
UNION
SELECT * TableC ORDER BY ID DESC OFFSET 1 ROWS
Execute and see if this works.
/AF
From my understanding, I create three temp tables as ta, tb, tc.
select * into #ta from (
select 'A' a
union all
select 'A' a
union all
select 'A' a
union all
select 'A' a
union all
select 'A' a
union all
select 'A' a
union all
select 'A' a
) a
select * into #tb from (
select 'B' b
union all
select 'B'
union all
select 'B'
union all
select 'B'
union all
select 'B'
) b
select * into #tc from (
select 'C' c
union all
select 'C'
union all
select 'C'
union all
select 'C'
union all
select 'C'
) c
If tables match you tables, then the output looks like A,A,A,B,B,C,A,A,A,B,B,C,A,B,C,C,C
T-SQL
declare #TAC int = (select count (*) from #ta) -- Table A Count = 7
declare #TBC int = (select count (*) from #tb) -- Table B Count = 5
declare #TAR int = #TAC % 3 -- Table A Reminder = 1
declare #TBR int = #TBC % 2 -- Table B Reminder = 1
declare #TAQ int = (#TAC - #TAR) / 3 -- Table A Quotient = (7 - 1) / 3 = 2, is will passed on NTILE
-- So we gonna split as two group (111), (222)
declare #TBQ int = (#TBC - #TBR) / 2 -- Table B Quotient = (5 - 1) / 2 = 2, is will passed on NTILE
-- So we gonna split as two group (11), (22)
select * from (
select *, NTILE (#TAQ) over ( order by a) FirstOrder, 1 SecondOrder from (
select top (#TAC - #TAR) * from #ta order by a
) ta -- 6 rows are obtained out of 7.
union all
select *, #TAQ + 1, 1 from (
select top (#TAR) * from #ta order by a desc
) ta -- Remaining one row is obtained. Order by desc is must
-- Here FirstOrder is next value of previous value.
union all
select *, NTILE (#TBQ) over ( order by b), 2 from (
select top (#TBC - #TBR) * from #tb order by b
) tb
union all
select *, #TBQ + 1, 2 from (
select top (#TBR) * from #tb order by b desc
) tb
union all
select *, ROW_NUMBER () over (order by c), 3 from #tc
) abc order by FirstOrder, SecondOrder
Let me explain the T-SQL:
Before that, FYR: NTILE and Row Number
Get the count.
Find the Quotient which will pass to NTILE function.
Order by the NTILE value and static.
Note:
I am using SQL Server 2017.
If T-SQL works fine, then you need to change the column in order by <yourcolumn>.

SQL statement to return non-intersection records

I was recently asked this question and was a little stumped so I want to ask the experts...
Given two tables A & B, I want to return all the values from A and B that do not overlap. Think of two overlapping circles; how do we return all the data that is NOT in the overlapping center section? And, I had to use ANSI Standard SQL rather than Oracle syntax.
Assuming we want everything exclusive to both A & B, my answer was
select *
from A
cross join B
minus
(select a.common_column from a
intersect
select b.common_column)
Does this look correct, or even close? If it is correct, is there a more efficient way to do this?
BTW - my solution was soundly rejected....
Thank you!
Given the tables A and B, you are looking for (A U B) - (A & B). In other words, you need A union B minus their intersection. Remember A and B must be union-compatible for this query to work. I would do:
(select * from A
union
select * from B
)
minus
(select * from A
intersect
select * from B
)
May be full outer join?
select coalesce(A.col, B.col)
from A full outer join B on A.col = B.col
where A.col is null or B.col is null;
For computing a set symmetric difference, you can use a combination of MINUS and UNION ALL:
select * from (
(select * from A
minus
select * from B)
union all
(select * from B
minus
select * from A)
)
Your query was rejected because it is syntactically incorrect: the number of columns differ and it confuses cross join and union all. However, I think you have the right idea for solving this.
You can easily fix this:
(select *
from A
union all
select *
from B
) minus
(select *
from A
intersect
select *
from B
);
That is, combine everything using union all and then subtract the rows that occur in both tables.
Of course, if there is a single id, then you can use the id with join and other operations.
Just like Frank Schmitt answered in the meantime:
Here it is including a data example:
WITH
table_a(name) AS (
SELECT 'From_A_1'
UNION ALL SELECT 'From_A_2'
UNION ALL SELECT 'From_A_3'
UNION ALL SELECT 'From_A_4'
UNION ALL SELECT 'From_A_5'
UNION ALL SELECT 'From_BOTH_6'
UNION ALL SELECT 'From_BOTH_7'
UNION ALL SELECT 'From_BOTH_8'
)
,
table_b(name) AS (
SELECT 'From_B_1'
UNION ALL SELECT 'From_B_2'
UNION ALL SELECT 'From_B_3'
UNION ALL SELECT 'From_B_4'
UNION ALL SELECT 'From_B_5'
UNION ALL SELECT 'From_BOTH_6'
UNION ALL SELECT 'From_BOTH_7'
UNION ALL SELECT 'From_BOTH_8'
)
(SELECT * FROM table_a EXCEPT SELECT * FROM table_b)
UNION ALL
(SELECT * FROM table_b EXCEPT SELECT * FROM table_a)
ORDER BY name
;
name
From_A_1
From_A_2
From_A_3
From_A_4
From_A_5
From_B_1
From_B_2
From_B_3
From_B_4
From_B_5
You will need to select all the data from both tables, except where they overlap, and then combine the data with a union. The code provided should work for your example.
SELECT *
FROM
(
SELECT * FROM Table1
EXCEPT SELECT * FROM Table2
)
UNION
SELECT *
FROM
(
SELECT * FROM Table2
EXCEPT SELECT * FROM Table1
)
Hope this helps.

Select limited rows from multiple tables

So this is fairly common knowledge to select rows from multiple tables and stack the results on top of each other:
SELECT * FROM table1
UNION
SELECT * FROM table2
UNION
...
However, if I want only a limited number of rows from each table, then how should I write it?
SELECT * FROM table1 LIMIT 2
UNION
SELECT * FROM table2 LIMIT 2
UNION
...
Clearly doesn't work.
Note that in my case, I have 51 tables, all with the same exact columns.
could be work this way
( SELECT * FROM table1 LIMIT 2 )
UNION
( SELECT * FROM table2 LIMIT 2 )
UNION
...

Joining All Rows of Two Tables in SQL Server

My goal is combining all rows in 2 tables. The simplest example I can think of is:
Table 1
Letter
A
B
Table 2
Number
0
1
Combined Table
Letter Number
A 0
B 0
A 1
B 1
I have come up with this SQL statement:
select * from
(
select * From (
select 'A' as 'Letter'
UNION
select 'B' as 'Letter'
) as Letter
) as Letter,
(
select * from (
select 0 as 'Number'
UNION
select 1 as 'Number'
) as Number
) as Number
This works but I don't like it.
defining the same alias multiple times
7 select statements? really....
Does anyone know a cleaner way of doing this? I am sure the answer is out there already but I had no idea how to search for it. Thanks all
Try this
select * from table1 join table2 on 1=1
This is the Cartesian product and if that's what you want to get,
you just have to specify some join condition which is always true.
And try this too.
SELECT * FROM
(
SELECT 'A' AS ch
UNION ALL
SELECT 'B'
)
T1
JOIN
(
SELECT 0 AS dg
UNION ALL
SELECT 1
) T2
ON 1 = 1
In SQL Server you can also do this (if you find it more concise/clear).
SELECT *
FROM
(
VALUES
('A'),
('B')
)
AS ch1(ch)
JOIN
(
SELECT *
FROM
(
VALUES
(0),
(1)
)
AS dg1(dg)
) TBL
ON 1 = 1
Easy enough with CROSS JOIN...
SELECT *
FROM Table1
CROSS JOIN Table2
Result:
Letter Number
------------------------- -----------
A 0
B 0
A 1
B 1
(4 row(s) affected)