Trying to create multiple temporary tables in a single query - sql

I'd like to create 3-4 separate temporary tables within a single BigQuery query (all of the tables are based on different data sources) and then join them in various ways later on in the query.
I'm trying to do this by using multiple WITH statements, but it appears that you're only able to use one WITH statement in a query if you're not nesting them. Each time I've tried, I get an error saying that a 'SELECT' statement is expected.
Am I missing something? I'd prefer to do this all in one query if at all possible.

I don't know what you mean by "temporary tables", but I suspect you mean common table expressions (CTEs).
Of course you can have a query with multiple CTEs. You just need the right syntax:
with t1 as (
select . . .
),
t2 as (
select . . .
),
t3 as (
select . . .
)
select *
from t1 cross join t2 cross join t3;

bq mk --table --expiration [INTEGER] --description "[DESCRIPTION]"
--label [KEY:VALUE, KEY:VALUE] [PROJECT_ID]:[DATASET].[TABLE]
Expiration date will make your table temporary. You can create one table at the time with "bq mk" but you can use this in a script.
You can use the DDL, but here also you can only create one table at the time.
{CREATE TABLE | CREATE TABLE IF NOT EXISTS | CREATE OR REPLACE TABLE}
table_name [( column_name column_schema[, ...] )]
[PARTITION BY partition_expression] [OPTIONS(options_clause[, ...])]
[AS query_statement]
If by "temporary table" you meant "subqueries" this is the syntax you have to use:
WITH
subQ1 AS (
SELECT * FROM Roster
WHERE SchoolID = 52),
subQ2 AS (
SELECT SchoolID FROM subQ1)
SELECT DISTINCT * FROM subQ2;

You should be able to use common table expressions without issue. However, if you have large queries with a significant amount of common table expressions/subqueries you may run into resource issues in BQ specifically regarding the ability for BQ to create an execution plan. Temporary tables have helped me in these scenarios, but there are likely better practices.
CREATE TEMP TABLE T1 AS (
WITH
X as (query),
Y as (query),
Z as (query)
SELECT * FROM
(combination_of_above_queries)
);
CREATE TEMP TABLE T2 AS (
WITH
A as (query),
B as (query),
C as (query)
SELECT * FROM
(combination_of_above_queries)
);
CREATE TABLE new_table AS (
SELECT *
FROM (combination of T1 and T2)
)
Apologies, I just saw happened to come across this question and wanted to share what helped me... hope it is helpful for you.
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#temporary_tables

Related

Hive: read table partitions defined in subselect

I have a Hive table which is partitioned by partitionDate field.
I can read partition of my choice via simple
select * from myTable where partitionDate = '2000-01-01'
My task is to specify the partition of my choise dynamically. I.e. first I want to read it from some table, and only then run select to myTable. And of course, I want the power of partitions to be used.
I have written a query which looks like
select * from myTable mt join thatTable tt on tt.reportDate = mt.partitionDate
The query works but looks like partitions are not used. The query works too long.
I tried another approach:
select * from myTable where partitionDate in (select reportDate from thatTable)
.. and again I see that the query works too slowly.
Is there a way to implement this in Hive?
update: create table for myTable
CREATE TABLE `myTable`(
`theDate` string,
')
PARTITIONED BY (
`partitionDate` string)
TBLPROPERTIES (
'DO_NOT_UPDATE_STATS'='true',
'STATS_GENERATED_VIA_STATS_TASK'='true',
'spark.sql.create.version'='2.2 or prior',
'spark.sql.sources.schema.numPartCols'='1',
'spark.sql.sources.schema.numParts'='2',
'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"theDate","type":"string","nullable":true}...
'spark.sql.sources.schema.part.1'='{"name":"partitionDate","type":"string","nullable":true}...',
'spark.sql.sources.schema.partCol.0'='partitionDate')
If you are running Hive on Tez execution engine, try
set hive.tez.dynamic.partition.pruning=true;
Read more details and related configuration in the Jira HIVE-7826
and at the same time try to rewrite as a LEFT SEMI JOIN:
select *
from myTable t
left semi join (select distinct reportDate from thatTable) s on t.partitionDate = s.reportDate
If nothing helps, see this workaround: https://stackoverflow.com/a/56963448/2700344
Or this one: https://stackoverflow.com/a/53279839/2700344
Similar question: Hive Query is going for full table scan when filtering on the partitions from the results of subquery/joins

BigQuery - Create a table from results of a query that uses complex CTEs?

I have a multi CTE query with large underlying datasets that is run too frequently. I could just create a table of the results of that query for people to use instead, and refresh that daily. But I'm lost on the syntax to create such a table.
CREATE OR REPLACE TABLE dataset.target_table
AS
with cte_one as (
select
stuff
from big.table
),
...
cte_five as (
select
stuff
from other_big.table
),
final as (
select *
from cte_five left join cte_x on cte_five.id = cte_x.id
)
SELECT
*
FROM final
Is basically what I have. This actually creates the target table with the right schema even, but doesn't insert any rows...Any hints? Thanks
If you really want to do this in one step, you can just do SELECT INTO...
with cte_one as (
select
stuff
from big.table
),
...
cte_five as (
select
stuff
from other_big.table
),
final as (
select *
from cte_five left join cte_x on cte_five.id = cte_x.id
)
SELECT
*
INTO dataset.target_table
FROM final
That said, since this isn't just a once-off need I recommend creating the landing table once initially and then scheduling a daily flush and fill (TRUNCATE + INSERT) to update the data. It will give you more explicit control over the data types and also lets you work with a persistent object rather than something built from scratch daily.

Insert Statement - From Temp Table

I am currently trying to insert data into a Redshift table from multiple temporary tables that I have created.
Insert into schema.table1 (
with a as (
select t1.country_code, t1.country_name
from t1
)
select * from a
);
The error that I get on this statement says the following: Amazon Invalid operation: syntax error at or near "as";. What do I need to change in order to be able to insert data from a temp table?
Is it not possible to run the command like this if you have same table structures in both schema.table1 and t1
insert into schema.table1
select t1.country_code, t1.country_name
from t1;
One other thing you might want to check is, in your SQL table1 is in 'schema' but t1 is referred without a schema so it's in public, make sure you don't have two t1s with different structures.
I just tried this and it worked for me.
insert into tempt1 ( with a as (select a from tempt2) select * from a);

Copy results of joined query into identical table in other database

I am trying to take the results of a select query in SQL and place them in another table in a different database. The table structure is identical. The select query is as follows;
USE Warwick
Go
Select tblOperations.Link, Project.*
From tblOperations
Inner Join Warwick.dbo.Project
On tblOperations.Link= Warwick.dbo.Project.[Project ID]
Where tblOperations.Job# = Warwick.dbo.Project.[Job Number] and
tblOperations.[Status] = 'Active' or tblOperations.[Status] = 'Pending'
The join lets me select just the jobs that are considered active. I need to copy the results into the table WCI_DB.dbo.Project, which already exists. I would lke to append and not overwrite if the record exists.
Any help would be appreciated.
Thanks.
You should tag your question with the database, which seems to be SQL Server. The SQL syntax is insert:
insert into WCI_DB.dbo.Project
<your select here>;
Normally, you want to list columns after the table name:
insert into WCI_DB.dbo.Project(list of columns>
<your select here>;
However, if this is a one-time exercise and you know the columns are the same, then it is small sin to omit them once.
To create a new table, using select into, which is documented here.
select . . .
into WCI_DB.dbo.Project
. . .

Alternative SQL ways of looking up multiple items of known IDs?

Is there a better solution to the problem of looking up multiple known IDs in a table:
SELECT * FROM some_table WHERE id='1001' OR id='2002' OR id='3003' OR ...
I can have several hundreds of known items. Ideas?
SELECT * FROM some_table WHERE ID IN ('1001', '1002', '1003')
and if your known IDs are coming from another table
SELECT * FROM some_table WHERE ID IN (
SELECT KnownID FROM some_other_table WHERE someCondition
)
The first (naive) option:
SELECT * FROM some_table WHERE id IN ('1001', '2002', '3003' ... )
However, we should be able to do better. IN is very bad when you have a lot of items, and you mentioned hundreds of these ids. What creates them? Where do they come from? Can you write a query that returns this list? If so:
SELECT *
FROM some_table
INNER JOIN ( your query here) filter ON some_table.id=filter.id
See Arrays and Lists in SQL Server 2005
ORs are notoriously slow in SQL.
Your question is short on specifics, but depending on your requirements and constraints I would build a look-up table with your IDs and use the EXISTS predicate:
select t.id from some_table t
where EXISTS (select * from lookup_table l where t.id = l.id)
For a fixed set of IDs you can do:
SELECT * FROM some_table WHERE id IN (1001, 2002, 3003);
For a set that changes each time, you might want to create a table to hold them and then query:
SELECT * FROM some_table WHERE id IN
(SELECT id FROM selected_ids WHERE key=123);
Another approach is to use collections - the syntax for this will depend on your DBMS.
Finally, there is always this "kludgy" approach:
SELECT * FROM some_table WHERE '|1001|2002|3003|' LIKE '%|' || id || '|%';
In Oracle, I always put the id's into a TEMPORARY TABLE to perform massive SELECT's and DML operations:
CREATE GLOBAL TEMPORARY TABLE t_temp (id INT)
SELECT *
FROM mytable
WHERE mytable.id IN
(
SELECT id
FROM t_temp
)
You can fill the temporary table in a single client-server roundtrip using Oracle collection types.
We have a similar issue in an application written for MS SQL Server 7. Although I dislike the solution used, we're not aware of anything better...
'Better' solutions exist in 2008 as far as I know, but we have Zero clients using that :)
We created a table valued user defined function that takes a comma delimited string of IDs, and returns a table of IDs. The SQL then reads reasonably well, and none of it is dynamic, but there is still the annoying double overhead:
1. Client concatenates the IDs into the string
2. SQL Server parses the string to create a table of IDs
There are lots of ways of turning '1,2,3,4,5' into a table of IDs, but the Stored Procedure which uses the function ends up looking like...
CREATE PROCEDURE my_road_to_hell #IDs AS VARCHAR(8000)
AS
BEGIN
SELECT
*
FROM
myTable
INNER JOIN
dbo.fn_split_list(#IDs) AS [IDs]
ON [IDs].id = myTable.id
END
The fastest is to put the ids in another table and JOIN
SELECT some_table.*
FROM some_table INNER JOIN some_other_table ON some_table.id = some_other_table.id
where some_other_table would have just one field (ids) and all values would be unique