BigQuery Full outer join producing "left join" results - google-bigquery

I have 2 tables, both of which contain distinct id values. Some of the id values might occur in both tables and some are unique to each table. Table1 has 10,910 rows and Table2 has 11,304 rows
When running a left join query:
SELECT COUNT(DISTINCT a.id)
FROM table1 a
JOIN table2 b on a.id = b.id
I get a total of 10,896 rows or 10,896 ids shared across both tables.
However, when I run a FULL OUTER JOIN on the 2 tables like this:
SELECT COUNT(DISTINCT a.id)
FROM table1 a
FULL OUTER JOIN EACH table2 b on a.id = b.id
I get total of 10,896 rows, but I was expecting all 10,910 rows from table1.
I am wondering if there is an issue with my query syntax.

As you are using EACH - it looks like you are running your queries in Legacy SQL mode.
In BigQuery Legacy SQL - COUNT(DISTINCT) function is probabilistic - gives statistical approximation and is not guaranteed to be exact.
You can use EXACT_COUNT_DISTINCT() function instead - this one gives you exact number but a little more expensive on back-end
Even better option - just use Standard SQL
For your specific query you will only need to remove EACH keyword and it should work as a charm
#standardSQL
SELECT COUNT(DISTINCT a.id)
FROM table1 a
JOIN table2 b on a.id = b.id
and
#standardSQL
SELECT COUNT(DISTINCT a.id)
FROM table1 a
FULL OUTER JOIN table2 b on a.id = b.id

I added the original query as a subquery and counted ids and produced the expected results. Still a little strange, but it works.
SELECT EXACT_COUNT_DISTINCT(a.id)
FROM
(SELECT a.id AS a.id,
b.id AS b.id
FROM table1 a FULL OUTER JOIN EACH table2 b on a.id = b.id))

It is because you count in both case the number of non-null lines for table a by using a count(distinct a.id).
Use a count(*) and it should works.

You will have to add coalesce... BigQuery, unlike traditional SQL does not recognize fields unless used explicitly
SELECT COUNT(DISTINCT coalesce(a.id,b.id))
FROM table1 a
FULL OUTER JOIN EACH table2 b on a.id = b.id
This query will now take full effect of full outer join :)

Related

join or merge two table based on id merge

I have two tables:
I am looking for the results like mentioned in the last.
I tried union (only similar col can be merged), left join, right join i am getting repeated fields in Null areas what can be other options where i can get null without column repeating
A full join would get all results from both tables.
select
A.ID,
A.ColA,
A.ColB,
B.ColC,
B.ColD
from TableA A
full join Table B on A.ID = B.ID
Here is a good post to understand joins
You can try distinct:
select distinct * from
tableA a,
tableB b
where a.id = b.id;
It will not give any duplicate tuples.

Can the order of Inner Joins Change the results o a query

I have the following scenario on a SQL Server 2008 R2:
The following queries returns :
select * from TableA where ID = '123'; -- 1 rows
select * from TableB where ID = '123'; -- 5 rows
select * from TableC where ID = '123'; -- 0 rows
When joining these tables the following way, it returns 1 row
SELECT A.ID
FROM TableA A
INNER JOIN ( SELECT DISTINCT ID
FROM TableB ) AS D
ON D.ID = A.ID
INNER JOIN TableC C
ON A.ID = C.ID
ORDER BY A.ID
But, when switching the inner joins order it does not returns any row
SELECT A.ID
FROM TableA A
INNER JOIN TableC C
ON A.ID = C.ID
INNER JOIN ( SELECT DISTINCT ID
FROM TableB ) AS D
ON D.ID = A.ID
ORDER BY A.ID
Can this be possible?
Print Screen:
For inner joins, the order of the join operations does not affect the query (it can affect the ordering of the rows and columns, but the same data is returned).
In this case, the result set is a subset of the Cartesian product of all the tables. The ordering doesn't matter.
The order can and does matter for outer joins.
In your case, one of the tables is empty. So, the Cartesian product is empty and the result set is empty. It is that simple.
As Gordon mentioned, for inner joins the order of joins doesn't matter, whereas it does matter when there's at least one outer join involved; however, in your case, none of this is pertinent as you are inner joining 3 tables, one of which will return zero rows - hence all combinations will result in zero rows.
You cannot reproduce the erratic behavior with the queries as they are shown in this question since they will always return zero records. You can try it again on your end to see what you come up with, and if you do find a difference, please share it with us then.
For the future, whenever you have something like this, creating some dummy data either in the form of insert statements or in rextester or the like, you make it that much easier for someone to help you.
Best of luck.

Slow performance query

I'm having a slow query performance and sometimes it came to an error of "Can't allocate space for object 'temp work table'"
I have 2 tables and 1 view. The first two tables have an left join and the last view will do a sub query. Below is the sample query.
SELECT a.*
FROM Table1 a LEFT JOIN Table2 b ON a.ID = b.ID
WHERE a.ID (SELECT ID
FROM View1).
The above query is very slow. BUT when I used a #temp table it becomes faster.
SELECT ID
INTO #Temp
FROM View1
SELECT a.*
FROM Table1 a LEFT JOIN Table2 b ON a.ID = b.ID
WHERE a.ID IN (SELECT ID
FROM #Temp)
Could someone explain why the first sql statement is very slow? and kindly give me an advise like adding new index?
Note: The first query statement cannot be altered or modified. I used only the second query statement to show to my team that if we put the 3rd table into temporary table and used it, makes faster.
Basically in the first query you are accessing the view for each and every row, and in turn the view is executing it's query.
In the second one you are executing the view's query just once and using the returned results through the temp table.
Try:
SELECT a.*
FROM Table1 a LEFT JOIN Table2 b ON a.ID = b.ID,
(SELECT ID
FROM View1) c
WHERE a.ID = c.ID;

Changing the ON condition order results in different query results?

Will there be any difference if I change the order from this to the next one in the last line ESPECIALLY when I use left join or left outer join? SOme people confuse me that it might have differnet value when we change order, I reckon they themselves aren't sure about this.
Or, if we change the order, under what situations such as right outer, right, left, left outer joins the query result differs?
It makes no difference which side you put criteria on when an = is being used.
Table order matters in the case of LEFT JOIN and RIGHT JOIN, but criteria order does not.
For example:
SELECT *
FROM Table1 a
LEFT JOIN Table2 b
ON a.ID = b.ID
Is equivalent to:
SELECT *
FROM Table2 a
RIGHT JOIN Table1 b
ON a.ID = b.ID
But not equivalent to:
SELECT *
FROM Table2 a
LEFT JOIN Table1 b
ON a.ID = b.ID
Demo: SQL Fiddle

How can I implement SQL INTERSECT and MINUS operations in MS Access

I have researched and haven't found a way to run INTERSECT and MINUS operations in MS Access. Does any way exist
INTERSECT is an inner join. MINUS is an outer join, where you choose only the records that don't exist in the other table.
INTERSECT
select distinct
a.*
from
a
inner join b on a.id = b.id
MINUS
select distinct
a.*
from
a
left outer join b on a.id = b.id
where
b.id is null
If you edit your original question and post some sample data then an example can be given.
EDIT: Forgot to add in the distinct to the queries.
INTERSECT is NOT an INNER JOIN. They're different. An INNER JOIN will give you duplicate rows in cases where INTERSECT WILL not. You can get equivalent results by:
SELECT DISTINCT a.*
FROM a
INNER JOIN b
on a.PK = b.PK
Note that PK must be the primary key column or columns. If there is no PK on the table (BAD!), you must write it like so:
SELECT DISTINCT a.*
FROM a
INNER JOIN b
ON a.Col1 = b.Col1
AND a.Col2 = b.Col2
AND a.Col3 = b.Col3 ...
With MINUS, you can do the same thing, but with a LEFT JOIN, and a WHERE condition checking for null on one of table b's non-nullable columns (preferably the primary key).
SELECT DISTINCT a.*
FROM a
LEFT JOIN b
on a.PK = b.PK
WHERE b.PK IS NULL
That should do it.
They're done through JOINs. The old fashioned way :)
For INTERSECT, you can use an INNER JOIN. Pretty straightforward. Just need to use a GROUP BY or DISTINCT if you have don't have a pure one-to-one relationship going on. Otherwise, as others had mentioned, you can get more results than you'd expect.
For MINUS, you can use a LEFT JOIN and use the WHERE to limit it so you're only getting back rows from your main table that don't have a match with the LEFT JOINed table.
Easy peasy.
Unfortunately MINUS is not supported in MS Access - one workaround would be to create three queries, one with the full dataset, one that pulls the rows you want to filter out, and a third that left joins the two tables and only pulls records that only exist in your full dataset.
Same thing goes for INTERSECT, except you would be doing it via an inner join and only returning records that exist in both.
No MINUS in Access, but you can use a subquery.
SELECT DISTINCT a.*
FROM a
WHERE a.PK NOT IN (SELECT DISTINCT b.pk FROM b)
I believe this one does the MINUS
SELECT DISTINCT
a.CustomerID,
b.CustomerID
FROM
tblCustomers a
LEFT JOIN
[Copy Of tblCustomers] b
ON
a.CustomerID = b.CustomerID
WHERE
b.CustomerID IS NULL