BigQuery, left join - google-bigquery

I am stuck on a very simple SQL script. I want to do a LEFT JOIN on Google BigQuery.
SELECT a.name, b.name FROM [bigquery-00000:table1] a LEFT JOIN
[bigquery-00000:table2] b ON b.name = a.name LIMIT 10
And I keep getting an error message:
ON clause must be AND of = comparisons of one field name from each
table, with all field names prefixed with table name. Consider using
Standard SQL .google.com/bigquery/docs/reference/standard-sql/), which
allows non-equality JOINs and comparisons involving expressions and
residual predicates
I don't understand what is wrong with my script. Please help.

I found what was wrong with my script. Wrong table name... sorry.

Related

How to use Except clause in Bigquery?

I am trying to use the existing Except clause in Bigquery. Please find my query below
select * EXCEPT (b.hosp_id, b.person_id,c.hosp_id) from
person a
inner join hospital b
on a.hosp_id= b.hosp_id
inner join reading c
on a.hosp_id= c.hosp_id
As you can see I am using 3 tables. All the 3 tables have the hosp_id column, so I would like to remove duplicate columns which are b.hosp_id and c.hosp_id. Simlarly, I would like to remove b.person_id column as well.
When I execute the above query, I get the syntax error as shown below
Syntax error: Expected ")" or "," but got "." at [9:19]
Please note that all the columns that I am using in Except clause is present in the tables used. Additional info is all the tables used are temp tables created using with clause. When I do the same manually by selecting column of interest, it works fine. But I have several columns and can't do this manually.
Can you help? I am trying to learn Bigquery. Your inputs would help
I use the EXCEPT on a per-table basis:
select p.* EXCEPT (hosp_id, person_id),
h.*,
r.* EXCEPT (hosp_id)
from person p inner join
hospital h
on p.hosp_id = h.hosp_id inner join
reading r
on p.hosp_id = r.hosp_id;
Note that this also uses meaningful abbreviations for table aliases, which makes the query much simpler to understand.
In your case, I don't think you need EXCEPT at all if you use the USING clause.
Try this instead:
select * EXCEPT (person_id) from
person a
inner join hospital b
using (hosp_id)
inner join reading c
using (hosp_id)
You can only put column names (not paths) in the EXCEPT list, and you can simply avoid projecting the duplicate columns with USING instead of ON.

Inner Join Ambiguous Syntax

I'm not super familiar with SQL but I know the basics. I was recently trying to replicate some logic form reports to SQL Server 2012. I started with the custom query from Webi (a reporting tool) and was trying to make a view from it in SQL.
Here is what the query look like:
SELECT
dimGlobalSalesAnalysisTbl.globalSalesAnalysisDesc,
dimGlobalShipDestinationCountryTbl.area,
dimGlobalShipDestinationCountryTbl.subarea,
dimGlobalCurrentProductTbl.sbuCodeDesc,
dimGlobalShipDateVw.shipDayOfWeekDesc,
sum(factSalesTblVw.globalSalesValue) AS 'Global Sales Value',
SUM(factSalesTblVw.salesUnitQuantity*GlobalFiles.dimCurrentGTINTbl.unitQty) AS 'Sales Unit Quantity'
FROM
dimGlobalCookCompaniesTbl INNER JOIN factSalesTblVw ON
(dimGlobalCookCompaniesTbl.globalCookCompanyID=factSalesTblVw.globalCookCompanyID)
INNER JOIN dimGlobalHistProductTbl ON (dimGlobalHistProductTbl.globalHistProductID=factSalesTblVw.globalHistProductID)
INNER JOIN dimGlobalCurrentProductTbl ON (dimGlobalHistProductTbl.globalCurrentProductID=dimGlobalCurrentProductTbl.globalCurrentProductID)
INNER JOIN dimGlobalHistShipCustomerTbl ON (factSalesTblVw.globalHistShipCustomerID=dimGlobalHistShipCustomerTbl.globalHistShipCustomerID)
INNER JOIN dimGlobalCurrentShipCustomerTbl ON (dimGlobalHistShipCustomerTbl.shipCustomerID=dimGlobalCurrentShipCustomerTbl.globalCurrentShipCustomerID)
***INNER JOIN dimGlobalCountryTbl dimGlobalShipDestinationCountryTbl ON (dimGlobalCurrentShipCustomerTbl.shipDestCountryDesc=dimGlobalShipDestinationCountryTbl.countryCode)***
INNER JOIN dimGlobalSalesAnalysisTbl ON (factSalesTblVw.globalSalesAnalysisID=dimGlobalSalesAnalysisTbl.globalSalesAnalysisID)
INNER JOIN dimGlobalShipDateVw ON (dimGlobalShipDateVw.shipJulianDate=factSalesTblVw.shipDateID)
INNER JOIN GlobalFiles.dimCurrentGTINTbl ON (GlobalFiles.dimCurrentGTINTbl.curGtinId=factSalesTblVw.GtinID)
WHERE
(
dimGlobalShipDateVw.shipYearNumber IN (DATEPART(yy,GETDATE())-1)
AND
dimGlobalCurrentShipCustomerTbl.shipCustomerNumberDesc
IN ( 'JPC000222-3','CNC000012-1' )
AND
dimGlobalSalesAnalysisTbl.globalSalesAnalysisDesc = 'Return Credits'
)
GROUP BY
dimGlobalShipDateVw.shipDate,
dimGlobalSalesAnalysisTbl.globalSalesAnalysisDesc,
dimGlobalShipDestinationCountryTbl.area,
dimGlobalShipDestinationCountryTbl.subarea,
dimGlobalCurrentProductTbl.sbuCodeDesc,
Upper(dimGlobalCurrentProductTbl.familyCodeDesc),
dimGlobalShipDateVw.shipYearNumber,
dimGlobalShipDateVw.shipDayOfWeekDesc,
dimGlobalCurrentProductTbl.madeByAbbr,
dimGlobalCookCompaniesTbl.companyDesc
This particular query runs on the production system if ran in the relevant database. When trying to make a view of this query in a different database, I precede the objects by [database_name].[schema/dbo] name.
On running the query, I get the error:
Invalid object name 'WWS.dbo.dimGlobalShipDestinationCountryTbl'
I try to find this particular table on the database, but it isn't there, though hovering over the table name in the query give a table definition but no script.
This table is present in an weird looking inner join (6th inner join) syntax like this:
INNER JOIN dimGlobalCountryTbl dimGlobalShipDestinationCountryTbl ON
(dimGlobalCurrentShipCustomerTbl.shipDestCountryDesc=dimGlobalShipDestinationCountryTbl.countryCode)
Two questions:
1. Can someone please explain this query syntax for inner join ?
2. This is pretty stupid but any ideas on how to look into possibly hidden table definitions ?
Two questions: 1. Can someone please explain this query syntax for inner join ?
The inner join in this case is nothing more than a table alias. The creator of the query thought aliasing the table would be easier to understand this name instead of the actual table name, or the same table is referenced twice and one would have to have an alias.
2. This is pretty stupid but any ideas on how to look into possibly hidden table definitions ?
Why? I think you just have a syntax error on your SQL when you added the database_name.schema syntax.
Think of the table alias like a column alias.... but and just like columns, you can omit the 'AS' keyword...
dimGlobalCountryTbl dimGlobalShipDestinationCountryTbl is the same as
dimGlobalCountryTbl AS dimGlobalShipDestinationCountryTbl

Converting SQL query to Hive query

I am having some trouble converting my SQL queries into Hive queries.
Relational schema:
Suppliers(sid, sname, address)
Parts(pid, pname, color)
Catalog(sid, pid, cost)
Query 1: Find the pnames of parts for which there is some supplier.
I have attempted one of the query conversions for query 1 and I think it is correct If someone can let me know if it is correct or incorrect I would really appreciate it. They seem to be the same to me based on the info I have looked up for Hive.
Query 1: SQL
SELECT pname
FROM Parts, Catalog
WHERE Parts.pid = Catalog.pid
Query 1: Converted to Hive
SELECT pname
FROM Parts, Catalog
WHERE Parts.pid = Catalog.pid;
Query 2: Find the sids of suppliers who supply only red parts.
For the second query I am having trouble. Mainly I am having trouble with the "not exists" part and the defining what color we want part. Can someone help me figure this out? I need to put the SQL into a Hive query.
Query 2: SQL
SELECT DISTINCT C.sid
FROM Catalog C
WHERE NOT EXISTS ( SELECT *
FROM Parts P
WHERE P.pid = C.pid AND P.color <> ‘Red’)
If someone can help me get these into the correct Hive format I would really appreciate it.
Thank you.
Although I have never used HiveQL in looking up some of its documentation it appears to support outer joins written in plain sql. In that case this should work: (an outer join where there is no match)
select distinct c.id
from catalog c
left outer join parts p
on (c.pid = p.pid
and p.color <> 'Red')
where p.pid is null
Edit -- enclosed on clause in () , this is not normally required of any major databases but seems to be needed in hiveql -- (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins).
Regarding your first query I don't think it wouldn't work in hive based on example queries in those docs however as another commenter mentioned it is best practice to use explicit joins via the join clause rather than implicit joins in the where clause. Think of the where clause as where you filter based on various conditions (except for join conditions), and the join clause as where you put all join conditions. It helps organize your query's logic. In addition, you can only imply inner joins (in the where clause). The join clause is needed any time you want to work with outer joins (as in the case of your second query, above).
This is the same as your first query but with explicit join syntax:
select pname
from parts p
join catalog c
on (p.pid = c.pid)

Oracle left join sql query issue

I have one SQL query in which they have used Left-Join feature and now there is a requirement to convert it into operator (+) syntax. Please guide me how to do it, The query is as written below :
select (some field names)
from
ldcs2.component comp
LEFT JOIN ldcs2.component_detail cd
ON cd.component_id = comp.component_id
LEFT JOIN ldcs2.component_item_breakdown cib
ON cib.component_item_id = cd.component_item_id
So please guide me what does Left-Join specify here and how can we write it into (+) expression.
Also guide me as they have mentioned second mapping table (ldcs2.component_detail) at first in ON condition, whether it would work differently if we write at first in that condition or not?
This is what you could do, but I have to note that personally I prefer the ANSI way.
There are two sides of a join condition. When you use ANSI syntax and code A left join B, you imply that for a record in A, there is no need to be a match on B.
When you put (+) on a specific side of the join condition you imply something like "The field on this side of the condition need not to be matched."
select (some field names)
from
ldcs2.component comp,
ldcs2.component_detail cd,
ldcs2.component_item_breakdown cib
where
cd.component_id (+) = comp.component_id
and cib.component_item_id (+) = cd.component_item_id
You could convert the ANSI/ISO Syntax to Oracle outer join syntax as follows -
SELECT (SOME field names)
FROM ldcs2.component comp,
ldcs2.component_detail cd,
ldcs2.component_item_breakdown cib
WHERE comp.component_id = cd.component_id(+)
AND cd.component_item_id = cib.component_item_id(+)
/
So please guide me what does Left-Join specify here and how can we write it into (+) expression.
For a detailed understanding, see my previous answer on a similar question here https://stackoverflow.com/a/28499208/3989608

Mixing "USING" and "ON" in Oracle ANSI join

I wrote an Oracle SQL expression like this:
SELECT
...
FROM mc_current_view a
JOIN account_master am USING (account_no)
JOIN account_master am_loan ON (am.account_no = am_loan.parent_account_no)
JOIN ml_client_account mca USING (account_no)
When I try to run it, Oracle throws an error in the line with "ON" self-join saying: "ORA-25154: column part of USING clause cannot have qualifier".
If I omit the "am" qualifier, it says: "ORA-00918: column ambiguously defined".
What's the best way to resolve this?
The error message is actually (surprise!) telling you exactly what the problem is. Once you use the USING clause for a particular column, you cannot use a column qualifier/table alias for that column name in any other part of your query. The only way to resolve this is to not use the USING clause anywhere in your query, since you have to have the qualifier on the second join condition:
SELECT
...
FROM mc_current_view a
JOIN account_master am ON (a.account_no = am.account_no)
JOIN account_master am_loan ON (am.account_no = am_loan.parent_account_no)
JOIN ml_client_account mca ON (a.account_no = mca.account_no);
My preference is never to use USING; always use ON. I like to my SQL to be very explicit and the USING clause feels one step removed in my opinion.
In this case, the error is coming about because you have account_no in mc_current_view, account_master, and ml_client_account so the actual join can't be resolved. Hope this helps.
The using is cleaner (imo) but it is still desirable to externally refererence the join fields as in the org example or an example like this:
select A.field,
B.field,
(select count(C.number)
from tableC C
where C.join_id = join_id -- wrong answer w/o prefix, exception with.
) avg_number
from tableA A
join tableB B using (join_id);
It gives the wrong answer because the join_id within the subquery implies C.join_id (matching all records) rather than A or B. Perhaps the best way to resolve might be just to allow explicit references with using, having the best of both worlds. Seems like there is a need because of cases like these.