How to use MINUS in google bigquery? - google-bigquery

I am trying to do MINUS on 2 tables which have same schema in big-query.As I understand MINUS is not working in biquery

You can do something like:
SELECT
field
FROM `project_id.dataset.tableA` A
WHERE NOT EXISTS(SELECT 1 FROM `project_id.dataset.tableB` b WHERE a.field = b.field)

I see that there is EXCEPT set operator in Big Query for Standard SQL.
The EXCEPT operator returns rows from the left input query that are not present in the right input query. This is similar to what the MINUS does in ORACLE/MySQL
SELECT fieldId from dataset.table1 except DISTINCT SELECT fieldId from dataset.table2
Note: the datatype of both the columns should be same in both the tables

Related

Postgresql - "is not distinct from" is taking long time to complete the query

I am using Postgresql db. Previously I was using the = operator to check if a record already exists or not in the table.
Example: I am using only 3 columns in the following query but I have around 20 columns in my original query.
INSERT INTO book_details (user_name,day_of_use,attempts)
select card_number,day_of_use,attempts from temp_book_details temp1
WHERE NOT EXISTS (SELECT 1 FROM book_details raw1 WHERE
raw1.user_name = temp1.card_number and
raw1.day_of_use = temp1.day_of_use and
raw1.attempts = temp1.attempts );
But I was getting weird results while checking against null values in the column so I found another solution using "is not distinct from"
INSERT INTO book_details (user_name,day_of_use,attempts)
select card_number,day_of_use,attempts from temp_book_details temp1
WHERE NOT EXISTS (SELECT 1 FROM book_details raw1 WHERE
raw1.user_name is not distinct from temp1.card_number and
raw1.day_of_use is not distinct from temp1.day_of_use and
raw1.attempts is not distinct from temp1.attempts );
Now I have too much data in the table and I noticed insert/update query using "is not distinct from" is taking too long to complete the query if I compare the query execution time with a query using = operator.
How I can handle or change the query that will result in a valid comparison with a null column value comparison and it will not take more time.
Any join that does not have a join condition with = can only be processes as a nested loop join, and that is obviously too slow in your case.
Try converting it to an = with constructs like this:
WHERE coalesce(aw1.user_name, '') = coalesce(temp1.card_number, '')
This is of course only equivalent if no empty strings occur in the column, else try with some other string.
One quick thing to try would be converting the not exists to a left join:
INSERT INTO book_details (user_name,day_of_use,attempts)
select card_number,day_of_use,attempts
from temp_book_details temp1
left join book_details raw1 ON
raw1.user_name is not distinct from temp1.card_number and
raw1.day_of_use is not distinct from temp1.day_of_use and
raw1.attempts is not distinct from temp1.attempts
where raw1.id IS NULL; --use whatever field would have to be non-null in the book_details page

I want to join two tables with a common column in Big query?

To join the tables, I am using the following query.
SELECT *
FROM(select user as uservalue1 FROM [projectname.FullData_Edited]) as FullData_Edited
JOIN (select user as uservalue2 FROM [projectname.InstallDate]) as InstallDate
ON FullData_Edited.uservalue1=InstallDate.uservalue2;
The query works but the joined table only has two columns uservalue1 and uservalue2.
I want to keep all the columns present in both the table. Any idea how to achieve that?
#legacySQL
SELECT <list of fields to output>
FROM [projectname:datasetname.FullData_Edited] AS FullData_Edited
JOIN [projectname:datasetname.InstallDate] AS InstallDate
ON FullData_Edited.user = InstallDate.user
or (and preferable)
#standardSQL
SELECT <list of fields to output>
FROM `projectname.datasetname.FullData_Edited` AS FullData_Edited
JOIN `projectname.datasetname.InstallDate` AS InstallDate
ON FullData_Edited.user = InstallDate.user
Note, using SELECT * in such cases lead to Ambiguous column name error, so it is better to put explicit list of columns/fields you need to have in your output
The way around it is in using USING() syntax as in example below.
Assuming that user is the ONLY ambiguous field - it does the trick
#standardSQL
SELECT *
FROM `projectname.datasetname.FullData_Edited` AS FullData_Edited
JOIN `projectname.datasetname.InstallDate` AS InstallDate
USING (user)
For example:
#standardSQL
WITH `projectname.datasetname.FullData_Edited` AS (
SELECT 1 user, 'a' field1
),
`projectname.datasetname.InstallDate` AS (
SELECT 1 user, 'b' field2
)
SELECT *
FROM `projectname.datasetname.FullData_Edited` AS FullData_Edited
JOIN `projectname.datasetname.InstallDate` AS InstallDate
USING (user)
returns
user field1 field2
1 a b
whereas using ON FullData_Edited.user = InstallDate.user gives below error
Error: Duplicate column names in the result are not supported. Found duplicate(s): user
Don't use subqueries if you want all columns:
SELECT *
FROM [projectname.FullData_Edited] as FullData_Edited JOIN
[projectname.InstallDate] as InstallDate
ON FullData_Edited.uservalue1 = InstallDate.uservalue2;
You may have to list out the particular columns you want to avoid duplicate column names.
While you are at it, you should also switch to standard SQL.

SQL Server minus query gives issue?

I'm trying to compare two table's values for difference (I suspect for two TankSystemIds containing same data)
My query is
SELECT *
FROM [dbo].[vwRawSaleTransaction]
WHERE hdTankSystemId = 2782
MINUS
SELECT *
FROM [dbo].[vwRawSaleTransaction]
WHERE hdTankSystemId = 2380
But I get an error about syntax issues:
Incorrect syntax near 'minus'
But this is right[1]?
[1] https://www.techonthenet.com/sql/minus.php
Quoted in your link.
For databases such as SQL Server, PostgreSQL, and SQLite, use the EXCEPT operator to perform this type of query.
For your case, it seems like you are looking for duplicated data, intersect should be used instead.
Also, INTERSECT statement like
SELECT
EXPRESSION_1, EXPRESSION_2, ..., EXPRESSION_N
FROM
TABLE_A
INTERSECT
SELECT
EXPRESSION_1, EXPRESSION_2, ..., EXPRESSION_N
FROM
TABLE_B
can be written as
SELECT
TABLE_A.EXPRESSION_1, TABLE_A.EXPRESSION_2, ..., TABLE_A.EXPRESSION_N
FROM
TABLE_A
INNER JOIN
TABLE_B
ON
TABLE_A.EXPRESSION_1 = TABLE_B.EXPRESSION_1
AND TABLE_A.EXPRESSION_2 = TABLE_B.EXPRESSION_2
.
.
.
AMD TABLE_A.EXPRESSION_N = TABLE_B.EXPRESSION_N
If you use select * from the same table with a different where condition then intersect them, you are not going to get any rows as they have different value on the specific column used in where condition.

SQL Sybase Query Strange Behaviour

I've got 2 tables with exactly the same structure in the same Sybase database but they're separate tables.
This query works on one of the 2:
select * from table1 where
QUOTA_FIELD >
(SELECT
count(ACCOUNT) FROM
table1 As t1
where SECTOR = t1.SECTOR
AND
STATUS = 'QUOTA'
)
But for the second table I have to change it to this:
select * from table2 as tref where
QUOTA_FIELD >
(SELECT
count(ACCOUNT) FROM
table2 As t2
where tref.SECTOR = t2.SECTOR
AND
STATUS = 'QUOTA'
)
There's a restriction on where this will execute which means it needs to work like in the first query.
Does anyone have any ideas as to why the first might work as expected and the second wouldn't?
Since I am not yet allowed to comment, here as an answer to the question "does anyone...?":
No. I couldn't find anyone :)
This first query cannot work correctly, since it compares a column with itself (as long as the column names are all normal ASCII characters and not some similar looking UNICODE ones). Please give a proof that the result of this query is in every case the same as of query 2.
Also, the second query would normally be done like that: where SECTOR = tref.SECTOR...
You might be looking for something like this in query #1 :
select * from table1 t2 where
QUOTA_FIELD >
(SELECT
count(ACCOUNT) FROM
table1 As t1
where t2.SECTOR = t1.SECTOR
AND
t1.STATUS = 'QUOTA'
)
This explicitly specifies that the table in subquery is joining with the table in outer query ( co-related subquery ).
If this works, use the same idea in query #2

Minus operator in sql

I am trying to create a sql query with minus.
I have query1 which returns 28 rows with 2 columns
I have query2 which returns 22 row2 with same 2 columns in query 2.
when I create a query query1 minus query 2 it should have only show the 28-22=6 rows.
But it showing up all the 28 rows returned by query1.
Please advise.
Try using EXCEPT instead of MINUS. For Example:
Lets consider a case where you want to find out what tasks are in a table that haven't been assigned to you(So basically you are trying to find what tasks could be available to do).
SELECT TaskID, TaskType
FROM Tasks
EXCEPT
SELECT TaskID, TaskType
FROM Tasks
WHERE Username = 'Vidya'
That would return all the tasks that haven't been assigned to you. Hope that helps.
If MINUS won't work for you, the general form you want is the main query in the outer select and a variation of the other query in a not exists clause.
select <insert list of fields here>
from mytable a
join myothertable b
on b.aId = a.aid
where not exists (select * from tablec c where a.aid = c.aid)
The fields might not be exactly alike. may be one of the fields is char(10) and the other is char(20) and they both have the string "TEST" in them. They might "look" the same.
If the database you are working on supports "INTERSECT", try this query and see how many are perfectly matching results.
select field1, field2 from table1
intersect
select field1, field2 from table2
To get the results you are expecting, this query should give you 22 rows.
something like this:
select field1, field2, . field_n
from tables
MINUS
select field1, field2, . field_n
from tables;
MINUS works on the same principle as it does in the set operations. Suppose if you have set A and B,
A = {1,2,3,4}; B = {3,5,6}
then, A-B = {1,2,4}
If A = {1,3,5} and B = {2,4,6}
then, A-B = {1,3,5}. Here the count(A) before and after the MINUS operation will be the same, as it does not contain any overlapping terms with set B.
On similar lines, may be the result set obtained in query 2 may not have matching terms with the result of query1. Hence you are still getting 28 instead of 6 rows.
Hope this helps.
It returns the difference records in the upper query which are not contained by the second query.
In your case for example
A={1,2,3,4,5...28} AND B={29,30} then A-B={1,2,3....28}