BigQuery : WITH clause behavior in multiple JOIN conditions - sql

For readability, I have defined "org_location_ext" clause in the query as follows.
This "org_location_ext" is first used to join with the main fact-table "LOCATION_SALES".
It is used in other JOIN conditions as well.
According to the BigQuery documentation : https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#with_clause
The WITH clause contains one or more named subqueries which execute
every time a subsequent SELECT statement references them
I want to know the behavior for this case.
Does this query executes the "org_location_ext" WITH clause multiple times ?
Or when the SELECT query gets executed, a temporary table is created for "org_location_ext" and use this temporary table for all the JOINs.
Basically, after the first JOIN with the fact-table , later joins use that "filtered" result for their joins , or do they rerun the WITH clause ?
WITH org_location_ext AS (
SELECT *
FROM ORG_LOC_MASTER AS loc_master
JOIN LOC_REGN1 as regn1 ON loc_master.id = regn1.id
JOIN ...
JOIN ...
)
SELECT
..
org_location_ext.store_class,
org_location_ext.country,
org_location_ext.
..
..
FROM LOCATION_SALES AS sales
JOIN org_location_ext ON org_location_ext.area_id = sales.area_id AND org_location_ext.date = sales.date
JOIN ....
JOIN ....
JOIN COUNTRY_VAT AS vat ON vat.key1 =TBL_Y.key1 AND vat.country_code = org_location_ext.country_code

It depends on the query plan. Consider checking a query plan. You'll see how many times any specific table is accessed.

Related

where clause conditions in SQL

Below is a join based on where clause:
SELECT a.* FROM TEST_TABLE1 a,TEST_TABLE2 b,TEST_TABLE3 c
WHERE a.COL11 = b.COL11
AND b.COL12 = c.COL12
AND a.COL3 = c.COL13;
I have been learning SQL from online resources and trying to convert it with joins
Two issues:
The original query is confusing. The outer joins (with the (+) suffix) are made irrelevant by the last where condition. Because of that condition, the query should only return records where there is an actual matching c record. So the original query is the same as if there were no such (+) suffixes.
Your query joins TEST_TABLE3 twice, while the first query only joins it once, and there are two conditions that determine how it is joined there. You should not split those conditions over two separate joins.
BTW, it is surprising that the SQL Fiddle site does not show an error, as it makes no sense to use the same alias twice. See for example how MySQL returns the error with the same query on dbfiddle (on all available versions of MySQL):
Not unique table/alias: 'C'
So to get the same result using the standard join notation, all joins should be inner joins:
SELECT *
FROM TEST_TABLE1 A
INNER JOIN TEST_TABLE2 B
ON A.COL11 = B.COL11
INNER JOIN TEST_TABLE3 C
ON A.COL11 = B.COL11
AND B.COL12 = C.COL12;
#tricot correctly pointed out that it's strange to have 2 aliases with the same name and not getting an error. Also, to answer your question :
In the first query, we are firstly performing cross join between all the 3 tables by specifying all the table names. After that, we are filtering the rows using the condition specified in the WHERE clause on output that we got after performing cross join.
In second query, you need to join test_table3 only once. Since now you have all the required aliases A,B,C as in the first query so you can specify 2 conditions after the last join as below:
SELECT A.* FROM TEST_TABLE1 A
LEFT JOIN TEST_TABLE2 B
ON A.COL11 = B.COL11
left join TEST_TABLE3 C
on B.COL12 =C.COL12 AND A.COL3 = C.COL13;

Hive - Multiple sub-queries in where clause is failing

I am trying to create a table by checking two sub-query expressions within the where clause but my query fails with the below error :
Unsupported sub query expression. Only 1 sub query expression is
supported
Code snippet is as follows (Not the exact code. Just for better understanding) :
Create table winners row format delimited fields terminated by '|' as
select
games,
players
from olympics
where
exists (select 1 from dom_sports where dom_sports.players = olympics.players)
and not exists (select 1 from dom_sports where dom_sports.games = olympics.games)
If I execute same command with only one sub-query in where clause it is getting executed successfully. Having said that is there any alternative to achieve the same in a different way ?
Of course. You can use left join.
Inner join will act as exists. and left join + where clause will mimic the not exists.
There can be issue with granularity but that depends on your data.
select distinct
olympics.games,
olympics.players
from olympics
inner join dom_sports dom_sports on dom_sports.players = olympics.players
left join dom_sports dom_sports2 where dom_sports2.games = olympics.games
where dom_sports2.games is null

Why we should specify column name in where clause at inner join when have same column name?

I have the query
select gltree.*,tsacc.confirm,tsacc.acc_no,commacc.* from tsacc
inner join commacc on tsacc.acc_no = commacc.acc_no and tsacc.glcode = commacc.glcode
inner join gltree on tsacc.glcode = gltree.glcode
where gltree.glcode = 12738
in this query two specified tables have 'glcode' column name , so
why I should specify table name in where clause eg. gltree.glcode and can't use only glcode without table name ? As we have just one glcode in executed query ?
You actually have threes columns with that name (tsacc.glcode, commacc.glcode and gltree.glcode), so you need to tell the database which one you mean.
The list of columns in the select list is evaluated as the last step when processing the statement. So when the DB processes the where clause it does not "know" which of them you are actually using (you could use all of them).
Plus: with an inner join it does indeed not matter, but if you were using an outer join it would make a big difference which of those three columns is used in the where clause.

Difference visibility in subquery join and where

I had problems with a simple join:
SELECT *
FROM worker wo
WHERE EXISTS (
SELECT wp.id_working_place
FROM working_place wp
JOIN working_place_worker wpw ON ( wp.id_working_place = wpw.id_working_place
AND wpw.id_worker = wo.id_worker)
)
The error I had was ORA-00904: "WO"."ID_WORKER": not valid identifier.
Then I decided to move the union of tables from join clause to the where clause:
SELECT *
FROM worker wo
WHERE EXISTS (
SELECT wp.id_working_place
FROM working_place wp
JOIN working_place_worker wpw ON ( wp.id_working_place = wpw.id_working_place)
WHERE wpw.id_worker = wo.id_worker
)
And this last query works perfect.
Why is not possible to make it in the join? The table should be visible like it is in the where clause. Am I missing something?
In
FROM working_place wp
JOIN working_place_worker wpw ON ...
WHERE ...
the ON clause refers only to the two tables participating in the join, namely wp and wpw. Names from the outer query are not visible to it.
The WHERE clause (and its cousin HAVING is the means by which the outer query is correlated to the subquery. Names from the outer query are visible to it.
To make it easy to remember,
ON is about the JOIN, how two tables relate to form a row (or rows)
WHERE is about the selection criteria, the test the rows must pass
While the SQL parser will admit literals (which aren't column names) in the ON clause, it draws the line at references to columns outside the join. You could regard this as a favor that guards against errors.
In your case, the wo table is not part of the JOIN, and is rejected. It is part of the whole query, and is recognized by WHERE.

Group by in SQL Server giving wrong count

I have a query which works, goes like this:
Select
count(InsuranceOrderLine.AntallPotensiale) as potensiale,
COUNT(InsuranceOrderLine.AntallSolgt) as Solgt,
InsuranceProduct.Name,
InsuranceProductCategory.Name as Kategori
From
InsuranceOrderLine, InsuranceProduct, InsuranceProductCategory
where
InsuranceOrderLine.FKInsuranceProductId = InsuranceProduct.InsuranceProductID
and InsuranceProduct.FKInsuranceProductCategory = InsuranceProductCategory.InsuranceProductCategoryID
Group by
InsuranceProduct.name, InsuranceProductCategory.Name
This query over returns what I need, but when I try to add more table (InsuranceOrder) to be able to get the regardingUser column, then all the count values are way high.
Select
count(InsuranceOrderLine.AntallPotensiale) as Potensiale,
COUNT(InsuranceOrderLine.AntallSolgt) as Solgt,
InsuranceProduct.Name,
InsuranceProductCategory.Name as Kategori,
RegardingUser
From
InsuranceOrderLine, InsuranceProduct, InsuranceProductCategory, InsuranceSalesLead
where
InsuranceOrderLine.FKInsuranceProductId = InsuranceProduct.InsuranceProductID
and InsuranceProduct.FKInsuranceProductCategory = InsuranceProductCategory.InsuranceProductCategoryID
Group by
InsuranceProduct.name, InsuranceProductCategory.Name,RegardingUser
Thanks in advance
You're adding one more table to your FROM statement, but you don't specify any JOIN condition for that table - so your previous result set will do a FULL OUTER JOIN (cartesian product) with your new table! Of course you'll get duplication of data....
That's one of the reasons that I'm recommending never to use that old, legacy style JOIN - do not simply list a comma-separated bunch of tables in your FROM statement.
Always use the new ANSI standard JOIN syntax with INNER JOIN, LEFT OUTER JOIN and so on:
SELECT
count(iol.AntallPotensiale) as Potensiale,
COUNT(iol.AntallSolgt) as Solgt,
ip.Name,
ipc.Name as Kategori,
isl.RegardingUser
FROM
dbo.InsuranceOrderLine iol
INNER JOIN
dbo.InsuranceProduct ip ON iol.FKInsuranceProductId = ip.InsuranceProductID
INNER JOIN
dbo.InsuranceProductCategory ipc ON ip.FKInsuranceProductCategory = ipc.InsuranceProductCategoryID
INNER JOIN
dbo.InsuranceSalesLead isl ON ???????? -- JOIN condition missing here !!
When you do this, you first of all see right away that you're missing a JOIN condition here - how is this new table InsuranceSalesLead linked to any of the other tables already used in this SQL statement??
And secondly, your intent is much clearer, since the JOIN conditions linking the tables are where they belong - right with the JOIN - and don't clutter up your WHERE clauses ...
It looks like you added the table join which slightly multiplies count of rows - make sure, that you properly joining the table. And be careful with aggregate functions over several joined tables - joins very often lead to duplicates