MINUS functionality in BigQuery database - google-bigquery

I am new to BigQuery database.
Like in Oracle database MINUS operator what is the same functionality in BigQuery? I did not find MINUS operator in BigQuery.
Oracle --> Minus
BigQuery --> ??

Though there is no MINUS function in BigQuery, you can use a LEFT OUTER JOIN as an alternative.
SELECT name, uid FROM a
MINUS
SELECT name, uid FROM b
Can be written as:
SELECT a.name, a.uid
FROM a LEFT OUTER JOIN b ON a.name= b.name AND a.uid= b.uid
WHERE b.name IS NULL

BigQuery doesn't have "MINUS", but it does have the functionally identical "EXCEPT DISTINCT".

with whole as
( select 1 as id, 'One' as value
union all
select 2 as id, 'Two' as value
union all
select 3 as id, 'Three' as value
),
sub_set as
(
select 1 as id, 'One' as value
union all
select 2 as id, 'Two' as value
)
select * from whole
except distinct
select * from sub_set
Result was
3 Three
Refer: https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#except
I am getting the error EXCEPT ALL is not supported, DISTINCT worked. Hope this helps.

StandardSQL Output for MINUS where ID is the composite key or primary key in Table 1 and Table2
same concept as Vamsi Mohan's
Select ID, Name from Table 1
where ID not in (Select distinct ID in Table 2)

Related

How to return two values from PostgreSQL subquery?

I have a problem where I need to get the last item across various tables in PostgreSQL.
The following code works and returns me the type of the latest update and when it was last updated.
The problem is, this query needs to be used as a subquery, so I want to select both the type and the last updated value from this query and PostgreSQL does not seem to like this... (Subquery must return only one column)
Any suggestions?
SELECT last.type, last.max FROM (
SELECT MAX(a.updated_at), 'a' AS type FROM table_a a WHERE a.ref = 5 UNION
SELECT MAX(b.updated_at), 'b' AS type FROM table_b b WHERE b.ref = 5
) AS last ORDER BY max LIMIT 1
Query is used like this inside of a CTE;
WITH sql_query as (
SELECT id, name, address, (...other columns),
last.type, last.max FROM (
SELECT MAX(a.updated_at), 'a' AS type FROM table_a a WHERE a.ref = 5 UNION
SELECT MAX(b.updated_at), 'b' AS type FROM table_b b WHERE b.ref = 5
) AS last ORDER BY max LIMIT 1
FROM table_c
WHERE table_c.fk_id = 1
)
The inherent problem is that SQL (all SQL not just Postgres) requires that a subquery used within a select clause can only return a single value. If you think about that restriction for a while it does makes sense. The select clause is returning rows and a certain number of columns, each row.column location is a single position within a grid. You can bend that rule a bit by putting concatenations into a single position (or a single "complex type" like a JSON value) but it remains a single position in that grid regardless.
Here however you do want 2 separate columns AND you need to return both columns from the same row, so instead of LIMIT 1 I suggest using ROW_NUMBER() instead to facilitate this:
WITH LastVals as (
SELECT type
, max_date
, row_number() over(order by max_date DESC) as rn
FROM (
SELECT MAX(a.updated_at) AS max_date, 'a' AS type FROM table_a a WHERE a.ref = 5
UNION ALL
SELECT MAX(b.updated_at) AS max_date, 'b' AS type FROM table_b b WHERE b.ref = 5
)
)
, sql_query as (
SELECT id
, name, address, (...other columns)
, (select type from lastVals where rn = 1) as last_type
, (select max_date from lastVals where rn = 1) as last_date
FROM table_c
WHERE table_c.fk_id = 1
)
----
By the way in your subquery you should use UNION ALL with type being a constant like 'a' or 'b' then even if MAX(a.updated_at) was identical for 2 or more tables, the rows would still be unique because of the difference in type. UNION will attempt to remove duplicate rows but here it just isn't going to help, so avoid that wasted effort by using UNION ALL.
----
For another way to skin this cat, consider using a LEFT JOIN instead
SELECT id
, name, address, (...other columns)
, lastVals.type
, LastVals.last_date
FROM table_c
WHERE table_c.fk_id = 1
LEFT JOIN (
SELECT type
, last_date
, row_number() over(order by last_date DESC) as rn
FROM (
SELECT MAX(a.updated_at) AS last_date, 'a' AS type FROM table_a a WHERE a.ref = 5
UNION ALL
SELECT MAX(b.updated_at) AS last_date, 'b' AS type FROM table_b b WHERE b.ref = 5
)
) LastVals ON LastVals.rn = 1

SQL statement to return non-intersection records

I was recently asked this question and was a little stumped so I want to ask the experts...
Given two tables A & B, I want to return all the values from A and B that do not overlap. Think of two overlapping circles; how do we return all the data that is NOT in the overlapping center section? And, I had to use ANSI Standard SQL rather than Oracle syntax.
Assuming we want everything exclusive to both A & B, my answer was
select *
from A
cross join B
minus
(select a.common_column from a
intersect
select b.common_column)
Does this look correct, or even close? If it is correct, is there a more efficient way to do this?
BTW - my solution was soundly rejected....
Thank you!
Given the tables A and B, you are looking for (A U B) - (A & B). In other words, you need A union B minus their intersection. Remember A and B must be union-compatible for this query to work. I would do:
(select * from A
union
select * from B
)
minus
(select * from A
intersect
select * from B
)
May be full outer join?
select coalesce(A.col, B.col)
from A full outer join B on A.col = B.col
where A.col is null or B.col is null;
For computing a set symmetric difference, you can use a combination of MINUS and UNION ALL:
select * from (
(select * from A
minus
select * from B)
union all
(select * from B
minus
select * from A)
)
Your query was rejected because it is syntactically incorrect: the number of columns differ and it confuses cross join and union all. However, I think you have the right idea for solving this.
You can easily fix this:
(select *
from A
union all
select *
from B
) minus
(select *
from A
intersect
select *
from B
);
That is, combine everything using union all and then subtract the rows that occur in both tables.
Of course, if there is a single id, then you can use the id with join and other operations.
Just like Frank Schmitt answered in the meantime:
Here it is including a data example:
WITH
table_a(name) AS (
SELECT 'From_A_1'
UNION ALL SELECT 'From_A_2'
UNION ALL SELECT 'From_A_3'
UNION ALL SELECT 'From_A_4'
UNION ALL SELECT 'From_A_5'
UNION ALL SELECT 'From_BOTH_6'
UNION ALL SELECT 'From_BOTH_7'
UNION ALL SELECT 'From_BOTH_8'
)
,
table_b(name) AS (
SELECT 'From_B_1'
UNION ALL SELECT 'From_B_2'
UNION ALL SELECT 'From_B_3'
UNION ALL SELECT 'From_B_4'
UNION ALL SELECT 'From_B_5'
UNION ALL SELECT 'From_BOTH_6'
UNION ALL SELECT 'From_BOTH_7'
UNION ALL SELECT 'From_BOTH_8'
)
(SELECT * FROM table_a EXCEPT SELECT * FROM table_b)
UNION ALL
(SELECT * FROM table_b EXCEPT SELECT * FROM table_a)
ORDER BY name
;
name
From_A_1
From_A_2
From_A_3
From_A_4
From_A_5
From_B_1
From_B_2
From_B_3
From_B_4
From_B_5
You will need to select all the data from both tables, except where they overlap, and then combine the data with a union. The code provided should work for your example.
SELECT *
FROM
(
SELECT * FROM Table1
EXCEPT SELECT * FROM Table2
)
UNION
SELECT *
FROM
(
SELECT * FROM Table2
EXCEPT SELECT * FROM Table1
)
Hope this helps.

how to repeat each row twice

I have a requirement for a report and I would like my sql query to repeat each row twice.
Example :
**Table 1**
Id Name
1 Ab
2 Cd
3 Ef
I want to write a query which outputs the following :
1 Ab
1 Ab
2 Cd
2 Cd
3 Ef
3 Ef
Is there a way I can do it ?
I cannot think of anything except using union
Select Id, name from Table1 union select Id, name from Table1
You can use a union all. A union will not work, because it will eliminate duplicates. Another way is a cross join:
select id, name
from table1 t1 cross join
(select 1 as n union all select 2) n;
You can also use UNION ALL, put them under CTE (Common Table Expression) and Order By Id:
WITH CTE AS
(
SELECT Id, Name FROM Table_1
UNION ALL
SELECT Id, Name FROM Table_1
)
SELECT Id, Name
FROM CTE
ORDER BY Id;
As this will reorder them and stacked them as duplicates
Solution will be like this:
select Id, name from Table1
union all
select Id, name from Table1

Case on union of multiple unions and issue with alias

I have 2 series of unions which I wish to join by another union. In the first one, I have 3 Selects and in the second one I have 2 different Selects.
Select id, min(value)
from table1 t1
join (Select id, value
Union
Select id, value
Union
Select id, value) as foo
on foo.id=t1.id
Group by id
Select id, max(value)
from table1 t1
join (Select id, value
Union
Select id, value) as bar
on bar.id=t1.id
Group by id
I tried to do a union between these two, but it made things pretty complicated. My biggest issue is with my alias. My second is with the case linked to my value columns, which I wish to name value.
Select (alias).id,
Case
When foo.value= 0 or bar.value=1 THEN 1
Else 0
End as value
from table1 t1
Join (Select id, min(value)
from table1 t1
join (Select id, value
Union
Select id, value
Union
Select id, value) as foo
on foo.id=t1.id
Group by id
UNION
Select id, max(value)
from table1 t1
join (Select id, value
Union
Select id, value) as bar
on bar.id=t1.id
Group by id) as (alias)
on ??.id=??.id
I wrote my case the way I think it should be written, but normally, when there are more than one column with the same name, SQL states it as ambiguous. I am still unsure if I should use UNION or INTERSECT, but I assume either of them would be done the same way. How should I deal with this?
I'm reading this right, you probably want something like this
SELECT ...
FROM ( ... union #1 ) AS u1
JOIN (... union #2 ) AS u2 ON u1.id = u2.id

How to use order by with union all in sql?

I tried the sql query given below:
SELECT * FROM (SELECT *
FROM TABLE_A ORDER BY COLUMN_1)DUMMY_TABLE
UNION ALL
SELECT * FROM TABLE_B
It results in the following error:
The ORDER BY clause is invalid in views, inline functions, derived
tables, subqueries, and common table expressions, unless TOP or FOR
XML is also specified.
I need to use order by in union all. How do I accomplish this?
SELECT *
FROM
(
SELECT * FROM TABLE_A
UNION ALL
SELECT * FROM TABLE_B
) dum
-- ORDER BY .....
but if you want to have all records from Table_A on the top of the result list, the you can add user define value which you can use for ordering,
SELECT *
FROM
(
SELECT *, 1 sortby FROM TABLE_A
UNION ALL
SELECT *, 2 sortby FROM TABLE_B
) dum
ORDER BY sortby
You don't really need to have parenthesis. You can sort directly:
SELECT *, 1 AS RN FROM TABLE_A
UNION ALL
SELECT *, 2 AS RN FROM TABLE_B
ORDER BY RN, COLUMN_1
Not an OP direct response, but I thought I would jimmy in here responding to the the OP's ERROR messsage, which may point you in another direction entirely!
All these answers are referring to an overall ORDER BY once the record set has been retrieved and you sort the lot.
What if you want to ORDER BY each portion of the UNION independantly, and still have them "joined" in the same SELECT?
SELECT pass1.* FROM
(SELECT TOP 1000 tblA.ID, tblA.CustomerName
FROM TABLE_A AS tblA ORDER BY 2) AS pass1
UNION ALL
SELECT pass2.* FROM
(SELECT TOP 1000 tblB.ID, tblB.CustomerName
FROM TABLE_B AS tblB ORDER BY 2) AS pass2
Note the TOP 1000 is an arbitary number. Use a big enough number to capture all of the data you require.
There will be times when you need to do something like this :
Pull top 5 from table 1 based on a sort
and bottom 5 from table 2 based on another sort
and union these together.
solution
select * from (
-- top 5 records
select top 5 col1, col2, col3
from table1
group by col1, col2
order by col3 desc ) z
union all
select * from (
-- bottom 5 records
select top 5 col1, col2, col3
from table2
group by col1, col2
order by col3 ) z
this was the only way i was able to get around the error and worked fine for me.
SELECT * FROM (SELECT *
FROM TABLE_A ORDER BY COLUMN_1)DUMMY_TABLE
UNION ALL
SELECT * FROM TABLE_B
ORDER BY 2;
2 is column number here .. In Oracle SQL you can use the column number by which you want to sort the data
This solved my SELECT statement:
SELECT * FROM
(SELECT id,name FROM TABLE_A
UNION ALL
SELECT id,name FROM TABLE_B ) dum
order by dum.id , dum.name
where id and name columns available in tables and you can use your columns .
Simply use that , no need parenthesis or anything else
SELECT *, id as TABLE_A_ID FROM TABLE_A
UNION ALL
SELECT *, id as TABLE_B_ID FROM TABLE_B
ORDER BY TABLE_A_ID, TABLE_B_ID
ORDER BY after the last UNION should apply to both datasets joined by union.
The solution shown below:
SELECT *,id AS sameColumn1 FROM Locations
UNION ALL
SELECT *,id AS sameColumn2 FROM Cities
ORDER BY sameColumn1,sameColumn2
select CONCAT(Name, '(',substr(occupation, 1, 1), ')') AS f1
from OCCUPATIONS
union
select temp.str AS f1 from
(select count(occupation) AS counts, occupation, concat('There are a total of ' ,count(occupation) ,' ', lower(occupation),'s.') As str from OCCUPATIONS group by occupation order by counts ASC, occupation ASC
) As temp
order by f1