Support UNION function in BigQuery SQL - sql

BigQuery does not seem to have support for UNION yet:
https://developers.google.com/bigquery/docs/query-reference
(I don't mean unioning tables together for the source. It has that.)
Is it coming soon?

If you want UNION so that you can combine query results, you can use subselects
in BigQuery:
SELECT foo, bar
FROM
(SELECT integer(id) AS foo, string(title) AS bar
FROM publicdata:samples.wikipedia limit 10),
(SELECT integer(year) AS foo, string(state) AS bar
FROM publicdata:samples.natality limit 10);
This is almost exactly equivalent to the SQL
SELECT id AS foo, title AS bar
FROM publicdata:samples.wikipedia limit 10
UNION ALL
SELECT year AS foo, state AS bar
FROM publicdata:samples.natality limit 10;
(note that if want SQL UNION and not UNION ALL this won't work)
Alternately, you could run two queries and append the result.

BigQuery recently added support for Standard SQL, including the UNION operation.
When submitting a query through the web UI, just make sure to uncheck "Use Legacy SQL" under the SQL Version rubric:

You can always do:
SELECT * FROM (query 1), (query 2);
It does the same thing as :
SELECT * from query1 UNION select * from query 2;

Note that, if you're using standard SQL, the comma operator now means JOIN - you have to use the UNION syntax if you want a union:
In legacy SQL, the comma operator , has the non-standard meaning of UNION ALL when applied to tables. In standard SQL, the comma operator has the standard meaning of JOIN.
For example:
#standardSQL
SELECT
column_name,
count(*)
from
(SELECT * FROM me.table1 UNION ALL SELECT * FROM me.table2)
group by 1

This helped me out very much for doing a UNION INTERSECT with big query's StandardSQL.
#standardSQL
WITH
a AS (
SELECT
*
FROM
table_a),
b AS (
SELECT
*
FROM
table_b)
SELECT
*
FROM
a INTERSECT DISTINCT
SELECT
*
FROM
b
I STOLE/MODIFIED THIS EXAMPLE FROM: https://gist.github.com/yancya/bf38d1b60edf972140492e3efd0955d0

Unions are indeed supported. An excerpt from the link that you posted:
Note: Unlike many other SQL-based systems, BigQuery uses the comma syntax to indicate table unions, not joins. This means you can run a query over several tables with compatible schemas as follows:
// Find suspicious activity over several days
SELECT FORMAT_UTC_USEC(event.timestamp_in_usec) AS time, request_url
FROM [applogs.events_20120501], [applogs.events_20120502], [applogs.events_20120503]
WHERE event.username = 'root' AND NOT event.source_ip.is_internal;

Related

How to run two ORACLE SQL Queries together in TOAD?

I want to run two Oracle SQL queries together inside TOAD.
first query:
SELECT * FROM OT_SO_HEAD WHERE SOH_ANNOTATION = 'ECSO10012791'
and second:
SELECT * FROM OT_SO_ITEM WHERE SOI_SOH_SYS_ID = '30977853'
Please guide
You can use UNION or UNION ALL.
The UNION operator is used to combine the result-set of two or more SELECT statements.
Every SELECT statement within UNION must have the same number of columns
The columns must also have similar data types
The columns in every SELECT statement must also be in the same order
Syntax:
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
The UNION operator selects only distinct values by default. To allow duplicate values, use UNION ALL.
right click in script window and select EXECUTE -> Execute via Toad Script Runner and reuse Toad

How to query multiple tables using wildcard for a particular partition in standard SQL of Big Query

I am trying to query multiple tables in BigQuery using a wildcard (I have tables from _[0-9] suffix)
This query for a specific table works:
SELECT
count(*)
FROM `maw_qa.rt_content_secondly_0`
where _PARTITIONTIME = timestamp('2017-01-24');
But this doesn't :
SELECT
count(*)
FROM `maw_qa.rt_content_secondly_*`
where _PARTITIONTIME = timestamp('2017-01-24');
Error:
Query Failed
Error: Unrecognized name: _PARTITIONTIME at [5:7]
I am using standard SQL. Legacy SQL does not even take wildcard * in the query.
What is the way to do this correctly?
Looks like wildcard and partition do not work together in query
Try below. it is in BigQuery Legacy SQL as in this version it is less bushy
Assuming you have 4 tables, if more - you need to enlist all of them here
SELECT COUNT(*)
FROM
[maw_qa.rt_content_secondly_0],
[maw_qa.rt_content_secondly_1],
[maw_qa.rt_content_secondly_2],
[maw_qa.rt_content_secondly_3]
WHERE _PARTITIONTIME = TIMESTAMP('2017-01-24')
Of course similar can be written in BigQuery Standard SQL but it will require more typing with UNION ALL, etc.
For Standard SQL it can look like below
SELECT COUNT(*) FROM (
SELECT * FROM `maw_qa.rt_content_secondly_0` WHERE _PARTITIONTIME = TIMESTAMP('2017-01-24') UNION ALL
SELECT * FROM `maw_qa.rt_content_secondly_1` WHERE _PARTITIONTIME = TIMESTAMP('2017-01-24') UNION ALL
SELECT * FROM `maw_qa.rt_content_secondly_2` WHERE _PARTITIONTIME = TIMESTAMP('2017-01-24') UNION ALL
SELECT * FROM `maw_qa.rt_content_secondly_3` WHERE _PARTITIONTIME = TIMESTAMP('2017-01-24')
)
When you query a partitioned table, you don't need to use the _* syntax, which is reserved for table wildcards (where you filter on _TABLE_SUFFIX). In your case, you should just do:
SELECT
count(*)
FROM `maw_qa.rt_content_secondly`
where _PARTITIONTIME = '2017-01-24';

TABLE/CAST/MULTISET vs subquery in FROM clause

The following query doesn't work. It is expected to fail since temp.col references something that is unavailable in that context.
with temp as (
select 'A' col from dual
union all
select 'B' col from dual
)
select *
from temp,
(select level || temp.col from dual connect by level < 3);
The error message from Oracle is : ORA-00904: "TEMP"."COL": invalid identifier
But why is the next query working ? I see CAST/MULTISET as a way to go from a SQL table to a collection type and TABLE to go back to a SQL table. Why do we use such round-trip ? I guess to make the query work, but how ?
with temp as (
select 'A' col from dual
union all
select 'B' col from dual
)
select *
from temp,
table(
cast(
multiset(
select level || temp.col from dual connect by level < 3
) as sys.odcivarchar2list
)
) t;
The result is :
COL COLUMN_VALUE
--- ------------
A 1A
A 2A
B 1B
B 2B
Look how the second column is named COLUMN_VALUE. Looks like a generated name by one of the construct CAST/MULTISET or TABLE.
EDIT
With the accepted answer below, I checked the documentation and found that the TABLE mechanism is a table collection expression. The expression between rounded brackets is the collection expression. The documentations defines a mechanism called left correlation :
The collection_expression can reference columns of tables defined to
its left in the FROM clause. This is called left correlation. Left
correlation can occur only in table_collection_expression. Other
subqueries cannot contains references to columns defined outside the
subquery.
So this is like LATERAL in 12c.
Oracle allows lateral inline views to reference other tables inside the inline view.
In old versions this feature was mostly used for optimizations, as discussed in the Oracle optimizer blog here. Explicit lateral joins were added in 12c. Your first query only needs a small change to work in 12c:
with temp as (
select 'A' col from dual
union all
select 'B' col from dual
)
select *
from temp,
lateral(select level || temp.col from dual connect by level < 3);
Apparently Oracle also silently uses lateral joins for collection unnesting. There are a few cases where SQL uses a logical cross join, but the tables are obviously closely related; such as XMLTable, JSON_table, and queries like your second example. In those cases it makes sense to execute the two tables together. I assume the lateral mechanism is used there, although neither the execution plan nor the 10053 optimizer trace uses the word "lateral". The documentation even has an example very similar to yours in the Collection Unnesting: Examples. However, this "feature" is still not well documented.
On a side note, in general you should avoid SQL features that increase the context. Features like lateral joins, common table expressions, and correlated subqueries can be useful, but they can also make SQL statements more difficult to understand. A regular inline view can be run and understood all by itself and has a very simple interface - its projected columns. That simplicity makes it easier to assemble small components into a large statement.
I suggest you re-write your query like below. Treat each inline view like you would a function or procedure - give them good names and comments. It will help you later when you assemble them into large, realistic statements.
select col, the_level||col
from
(
--Good comment 1.
select 'A' col from dual union all
select 'B' col from dual
) good_name_1
cross join
(
--Good comment 2.
select level the_level
from dual
connect by level < 3
) good_name_2

How is WITH used in Oracle SQL (example code shown )?

I found this example code online when I searched for "how to do an Exclusive Between oracle sql"
Someone was proving that, in Oracle, BETWEEN is by default inclusive.
So they used such code :
with x as (
select 1 col1 from dual
union
select 2 col1 from dual
union
select 3 col1 from dual
UNION
select 4 col1 from dual
)
select *
from x
where col1 between 2 and 3
I've never seen such an example, what is going on with the WITH ?
In short, WITH clause is an inline view, or subquery. It is useful when you will refer to something multiple times, or when you want to abstract parts of a complex query to make it easier to read.
If you are from SQL Server world, you can also think of it like a temporary table.
So:
WITH foo as (select * from tab);
select * from foo;
is like
select * from (select * from tab);
Though it may be more efficient since x is resolved to a single dataset, even if queried multiple times.
It also reduces repetition. If you use a subquery more than once in a statement, you can consider factoring it out using WITH.
It has nothing to do with the BETWEEN example, it is just the author's choice of approach for demonstrating a concept.

Sybase - Subquery in FROM clause

I'm using Sybase ASE 12.5.0.3 and I'm unable to do subqueries like:
select * from (select '1' union select '2' ) X
I've been looking around and as far as I know it should be possible after Sybase ASE 12, am I doing something wrong, or is it not possible with this version???
Edit - Even after changing the query to:
select * from (select '1' as col1 union select '2' as col1 ) X
So even giving alias to the columns, it fails anyways...
Without seeing an error message, it appears that you need to give column aliases in your sub-query:
select *
from
(
select '1' as yournewCol
union
select '2' as yournewCol
) X
You need to give your columns name. Try this:
Sybase ASE does not support subqueries in the FROM clause:
Subqueries can be nested inside the where or having clause of an outer select, insert, update, or delete statement, inside another subquery, or in a select list. Alternatively, you can write many statements that contain subqueries as joins; Adaptive Server processes such statements as joins.