LEFT JOIN IN BIG QUERY WITH DUPLICATES - google-bigquery

My tables have duplicate IDs, but I want to know which are the IDs that match between the two.
and which ones from table1 are not in table2.
Also ID in Table1 is INTEGER and in table2 is STRING, that's why I'm using cast
SELECT cast(T1.ID as STRING) as ID
FROM `project.dataset.table1` as T1 WHERE ID is not null
LEFT JOIN
SELECT DISTINCT(T2.ID) as ID
FROM `project.dataset.table2` as T2 WHERE ID is not null
ON T1.ID = T2.ID
I run the two queries separate and they are ok, but it shows this error when I try to create the left join
Big query Error:
Syntax error: Expected end of input but got keyword LEFT at [3:1]
I've tried from this questions BigQuery Full outer join producing "left join" results
#standardSQL
SELECT COUNT(DISTINCT T1.NPI)
FROM `project.dataset.table1` as T1 WHERE NPI is not null
JOIN `project.dataset.table2` as T2 WHERE NPI is not null
ON T1.NPI= T2.NPI
and just more errors.
Could you guide me please?

Try EXCEPT instead:
SELECT cast(T1.ID as STRING) as ID
FROM `project.dataset.table1` as T1 WHERE ID is not null
EXCEPT
SELECT DISTINCT(T2.ID) as ID
FROM `project.dataset.table2` as T2 WHERE ID is not null

Related

How to join one table with two different tables with similar fields in BigQuery?

I have 3 tables in BigQuery.
I need to join first one (contains ids), to others (contains list of values for ids). I want to have sum of values by ids from two tables:
SELECT t0.id, sum(values) FROM t0
LEFT JOIN t1 ON t0.id = t1.id
LEFT JOIN t2 ON t0.id = t2.id
GROUP BY id
It does not work with an error Column name values is ambiguous
What is the best way to make it?
SELECT t0.id, sum(t1.values) + sum(t2.values) as sumOfValues FROM t0
LEFT JOIN t1 ON t0.id = t1.id
LEFT JOIN t2 ON t0.id = t2.id
GROUP BY id
I think below better reflect your original idea
#standardSQL
SELECT id, sum(u.values)
FROM t0
LEFT JOIN (
SELECT id, values FROM t1 UNION ALL
SELECT id, values FROM t2
) u
USING (id)
GROUP BY id

sql join with like operator

Hi I have 2 tables in my sqltable in which my first table has id's like 111,112,113 etc. and second table has same column also have column id which contains values like 111|aa,112|ab,114|ad and i need to select all the ids from second table which contain part of id of first column like 2nd table contains 111 and 112 which also there in table1 as 111 and 112 how can i select those recods itried below query but didbt get result.
select t1.id,t2.id from table1 t1 left outer join table2 t2 where t2.id like t2.id +'%'
also tried.
select t1.id,t2.id from table1 t1 left outer join table2 t2 on t2.id like t2.id +'%'
can someone please give me a hint how should i do this.
You have to use CONCAT to add the % character:
SELECT t1.id,t2.id
FROM table1 t1 LEFT JOIN table2 t2 ON t2.id LIKE CONCAT(t1.id, '|%')
demo on dbfiddle.uk
Note: Also make sure you are using the correct columns on the ON clause. In the above demo and query t1.id is the numeric column (contains 111, 112 or 113) and t2.id is the string column (contains 111|aa, 112|ab or 114|ad).
To avoid unexpected behaviour on numbers greater than 999 you can add the pipe character | to the condition too.
You could try usingb a proper ON clause
select t1.id,t2.id
from table1 t1
left outer join table2 t2 ON t1.id like concat(t2.id, '%')
or
select t1.id,t2.id
from table1 t1
left outer join table2 t2 ON t2.id like concat(t1.id, '%')
for the join you should use the column from two table not from the same table

create a sql join with more columns

i want create a sql statemnt (in PL SQL Developer) with a join with comma seperated?
SELECT * FROM TABLE1 t1 JOIN TABLE2 t2 ON t1.tab_id, second_id = t2.tab_id, second_id;
I always get a ORA-00920 Exception. If i change it to two Rows:
t1.tab_id = t2.tab_id AND t1.second_id = t2.second_id;
Then i get rows.
Can some say me if i can use the first step with coma seperated columns?
Greetz
You need a valid condition:
SELECT *
FROM TABLE1 t1 JOIN
TABLE2 t2
ON t1.tab_id = t2.tab_id AND t1.second_id = t2.second_id;
I think Oracle will also let you do:
SELECT *
FROM TABLE1 t1 JOIN
TABLE2 t2
ON (t1.tab_id, t1.second_id) in ( (t2.tab_id, t2.second_id) );
Or even:
SELECT *
FROM TABLE1 t1 JOIN
TABLE2 t2
USING (tab_id, second_id);
This works because the JOIN keys have the same names in the two tables.

Hive QL Except clause

How do I do an EXCEPT clause (like SQL) in Hive QL
I have 2 tables, and each table is a column of unique ids.
I want to find the list of ids that are only in table 1 but not in table 2
Table 1
apple
orange
pear
Table 2
apple
orange
In SQL you can do an EXCEPT clause (http://en.wikipedia.org/wiki/Set_operations_%28SQL%29) but you can't do that in Hive QL
I don't think there's any built-in way to do this but a LEFT OUTER JOIN should do the trick.
This selects all Ids from table1 that do not exist in table2:
SELECT t1.id FROM table1 t1 LEFT OUTER JOIN table2 t2 ON (t1.id=t2.id) WHERE t2.id IS NULL;
We can use NOT EXISTS clause in Hive as MINUS equivalent.
SELECT t1.id FROM t1 WHERE NOT EXISTS (SELECT 1 from t2 WHERE t2.id = t1.id);
1:
select distinct id from table1 where id not in (select distinct id from table2)
2:
select t1.id
from table1 as t1
left join table2 as t2
on t1.id = t2.id
where t2.id is null

SQL Joining three tables and using LEFT OUTER JOIN

I have three tables and two seperate SQL queries which are working correctly and I am having correct results.
If I try to join these three tables I am having null as result.
First query:
select T1.ID,T3.COMPANY
from T1,T3
where (T1.status!='CLOSED') and (T1.PRIORITY)>5 and T1.CLASSID=T3.CLASSID
Second query:
SELECT T1.ID, T2.DESCRIPTION
FROM T1
LEFT OUTER JOIN T2
ON T1.ID=T2.KEY
WHERE T1.status!='CLOSED'
AND (T2.CREATEDATE= (SELECT MAX(CREATEDATE)
FROM T2
WHERE T2.KEY=T1.ID))
I tried to join them but as result I am having null:
select T1.ID,T3.COMPANY,T2.DESCRIPTION
from T1
INNER JOIN T3 ON T1.CLASSID=T3.CLASSID
LEFT OUTER JOIN T2
ON T1.ID=T2.KEY
where (T1.status!='CLOSED') AND (T1.PRIORITY)>5
AND (T2.CREATEDATE= (SELECT MAX(CREATEDATE)
FROM T2
WHERE T2.KEY=T1.ID))
like it does not recognized last part for taking MAX value from T2 table.
What am I doing wrong? Thanks for help
Firstly, use an alias for the subquery on table T2.
T2.CREATEDATE =
(SELECT MAX(T2Alias.CREATEDATE)
FROM T2 AS T2Alias
WHERE T2Alias.KEY = T1.ID)
Secondly, consider moving this condition into the ON clause of the LEFT JOIN to table T2.
The first thing that jumps out at me is the new dependency on both T1.Priority > 5 and T2.CreateDate value being equal to the result of the inline query:
( AND (T1.PRIORITY) > 5
AND (T2.CREATEDATE =
(SELECT MAX(CREATEDATE) FROM T2 WHERE T2.KEY = T1.ID) )
Without the data it's difficult to check however this may be the issue