Does BigQuery support SubQueries? - sql

In my SQL FROM clause, I want to use a dynamically created table via a subquery:
Select A.Field1,B.Field2
FROM TableA as A, (select Field1,Field2 from TableB) B
Where A.Field1 = B.Field1
Does BigQuery support this?

You don't need a subquery for this:
Select A.Field1,B.Field2
FROM TableA as A join
TableB as B
on A.Field1 = B.Field1;
But yes, according to the reference manual, BigQuery does support subqueries.

Yes it does, I remember doing something like
SELECT a ,b
FROM Tablea
WHERE a not IN (SELECT a FROM Tableb)

First let's formally fix your query (assuming you are using BigQuery Legacy SQL)
Please note that in Legacy SQL comma is used not as JOIN but rather as UNION ALL
So you query, to work, should look like below
SELECT A.Field1, B.Field2
FROM TableA AS A
JOIN (SELECT Field1, Field2 FROM TableB) AS B
ON A.Field1 = B.Field1
Of course in your particular example you don’t need subselect, but I think it is just simplified example so I am not going this direction and other answers already pointed to this anyway
Finally, about subqueries in BigQuery
BigQuery Legacy SQL supports very limited use of Subqueries - know as table subquery - in FROM and FLATTEN and semi- or anti-semi-JOIN (with only one field)
You can find details here https://cloud.google.com/bigquery/query-reference
From the other hand BigQuery Standard SQL provides reach support for Subqueries - for table subqueries as well as expression subqueries
You can see more here https://cloud.google.com/bigquery/sql-reference/query-syntax#subqueries
Note: BigQuery Standard SQL version is in Alpha yet

Related

Does there exist a SQL query that contains a subquery that can't be rewritten without a subquery?

I am looking for an example of a SQL query that contains a subquery which can't be rewritten without a subquery and do the same thing, if such a thing exists.
Any example will work, or an explanation of why there is no such thing will work too. MySQL, Oracle SQL, doesn't matter.
I don't think the following query in MySQL can be written in MySQL as a single select statement without a subquery
select a.*,
(select b.bid
from b
where b.aid = a.aid
order by rand()
limit 1
) b
from a;

SQL join conditions order performance

Does JOIN conditions order affect pefrormance? I have two tables A and B. I'm trying to join them like that:
SELECT * FROM A
INNER JOIN B on B.ID_A = A.ID
In this case firebird use NATURAL plan instead using foreign key.
SELECT * FROM A
INNER JOIN B on A.ID = B.ID_A
works good.
Is that normal?
I guess you use Firebird with version older than 2.5.4 (probably 2.5.3). And it had bug http://tracker.firebirdsql.org/browse/CORE-4530 fixed in 2.5.4. Please upgrade to Firebird 2.5.5 and check whether your problem disappear.
i'm not familiar with firebird but usually it doesn't matter for inner join.
And in the new version of Oracle or SQLServer or MySQL or Postgresql it won't impact the performance. we have explain plan to check the Database choose the right way to Join you can try it.
This query is OK, but make sure B.ID_A is index, (for Fast performance)
Don't use * in sql query, use only required columns...
Your question is about performance, so i am thinking, you have very large data..
so Must use LIMIT with your query...
Example
SELECT A.ID, A.column1, B.column2, FROM A
INNER JOIN B on B.ID_A = A.ID LIMIT 0,100

Hive star schema query with 'SELECT *' in sub-selects

I came across a previously implement query in Hive that I am trying to grok, and was wondering if someone could explain what the advantages (or lack there of) of the query pattern used. The query structure is a star schema that sub-selects the join tables in this manner:
SELECT
a.key
a.field1
b.field2
c.field3
d.field4
FROM first a
JOIN ( SELECT * FROM second ) b ON a.key = b.key
JOIN ( SELECT * from third ) c ON a.key = c.key
JOIN ( SELECT * from fourth ) d ON a.kay = d.key
SORT BY a.key DESC;
The thing that is perplexing me is why would you sub-select the join tables (note the SELECT * with no WHERE) rather than join directly to them. Before I go changing legacy code queries (for other reasons), I wanted to understand what might be the goals of this approach. The query was written in the time of Hive 0.10, but we are up to Hive 0.13 now. Could this be a legacy work around for something?
There is no any advantage in subqueries with select all and with no WHERE clause. But it maybe useful if applicable to limit columns and rows in subqueries so that their datasets will fit in memory and map-only join will work. Three subqueries in your example can be pre-calculated in parallel if hive.exec.parallel=true.
Also SORT BY a.key DESC seems useless in this query.
my best guess would be that it remained from the testing phase. to keep the where clauses in their own spaces. Performance wise there is no improvement.

SQL JOIN that uses OR in the ON statement

I’m running a SQL query on Google BigQuery and want to do this kind of SQL command:
SELECT ... FROM A JOIN B
ON A.col1=B.col1 AND (A.col2=B.col2 OR A.col3=B.col3)
This fails though with the error:
Error: ON clause must be AND of = comparisons of one field name from each table, with all field names prefixed with table name.
Is there a way to rewrite the SQL to get this kind of functionality?
Turns out this works, which is equivalent to a UNION ALL statement in Google BigQuery. Not sure how to do it if you just want a UNION, since DISTINCT is actually not supported in BigQuery. Luckily it's enough for me as is.
SELECT ... FROM
(SELECT ... FROM A JOIN B ON A.col1=B.col1 AND A.col2=B.col2),
(SELECT ... FROM A JOIN B ON A.col1=B.col1 AND A.col3=B.col3)
This should work:
SELECT ... FROM A CROSS JOIN B
WHERE A.col1=B.col1 AND (A.col2=B.col2 OR A.col3=B.col3)

SQL (any) Request for insight on a query optimization

I have a particularly slow query due to the vast amount of information being joined together. However I needed to add a where clause in the shape of id in (select id from table).
I want to know if there is any gain from the following, and more pressing, will it even give the desired results.
select a.* from a where a.id in (select id from b where b.id = a.id)
as an alternative to:
select a.* from a where a.id in (select id from b)
Update:
MySQL
Can't be more specific sorry
table a is effectively a join between 7 different tables.
use of * is for examples
Edit, b doesn't get selected
Your question was about the difference between these two:
select a.* from a where a.id in (select id from b where b.id = a.id)
select a.* from a where a.id in (select id from b)
The former is a correlated subquery. It may cause MySQL to execute the subquery for each row of a.
The latter is a non-correlated subquery. MySQL should be able to execute it once and cache the results for comparison against each row of a.
I would use the latter.
Both queries you list are the equivalent of:
select a.*
from a
inner join b on b.id = a.id
Almost all optimizers will execute them in the same way.
You could post a real execution plan, and someone here might give you a way to speed it up. It helps if you specify what database server you are using.
YMMV, but I've often found using EXISTS instead of IN makes queries run faster.
SELECT a.* FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id = a.id)
Of course, without seeing the rest of the query and the context, this may not make the query any faster.
JOINing may be a more preferable option, but if a.id appears more than once in the id column of b, you would have to throw a DISTINCT in there, and you more than likely go backwards in terms of optimization.
I would never use a subquery like this. A join would be much faster.
select a.*
from a
join b on a.id = b.id
Of course don't use select * either (especially never use it when doing a join as at least one field is repeated) and it wastes network resources to send unnneeded data.
Have you looked at the execution plan?
How about
select a.*
from a
inner join b
on a.id = b.id
presumably the id fields are primary keys?
Select a.* from a
inner join (Select distinct id from b) c
on a.ID = c.AssetID
I tried all 3 versions and they ran about the same. The execution plan was the same (inner join, IN (with and without where clause in subquery), Exists)
Since you are not selecting any other fields from B, I prefer to use the Where IN(Select...) Anyone would look at the query and know what you are trying to do (Only show in a if in b.).
your problem is most likely in the seven tables within "a"
make the FROM table contain the "a.id"
make the next join: inner join b on a.id = b.id
then join in the other six tables.
you really need to show the entire query, list all indexes, and approximate row counts of each table if you want real help