How can I create clone of tables in google bigquery

How can I create clone of tables in google bigquery - google-bigquery

Currently I am doing some performance testing. For that reason I need to create 100k tables in a dataset with x number of rows. I have the sample table but how can I write a script to execute select statement with some concatenation happening inside loop.
A sample to create 10 copies
DECLARE i INT64 DEFAULT 1;
DECLARE n int64;
SET n = 10;
-- we will do this until we execute below query n times
WHILE i < n DO
CREATE TABLE `myproject.target_dataset.table_` + STRING(i)
AS SELECT * FROM `myproject.source_databaset.sample_table`
SET i = i + 1;
END WHILE;
End Result : table_1, table_2 ... table_10 would be there in dataset.
How can I achieve "CREATE TABLE myproject.target_dataset.table_ + STRING(i)" in bigquery scripting.
Tried "bq cp myproject.source_dataset.sample_table myproject.target_dataset.table_1...n" but its very slow.

Try EXECUTE IMMEDIATE:
DECLARE i INT64 DEFAULT 1;
SET n = 10;
-- we will do this until we execute below query n times
WHILE i < n DO
EXECUTE IMMEDIATE "CREATE TABLE `myproject.target_dataset.table_" || i || "` AS SELECT * FROM `myproject.source_databaset.sample_table`";
SET i = i + 1;
END WHILE;

Related

How can I pass a list, array or string to be separated as a parameter to redshift

I'm trying to write a simple query with an in clause like so:
SELECT *
FROM storeupcsalesbyday
WHERE date >= '9/1/2020' AND date <= '9/10/2020' AND upc in ('0000000004011', '0000000094011')
I need to be able to pass the values in the in clause as a parameter, the number of values in the in clause are variable and could be one or thousands depending on the user input. In other sql databases I have solved this problem by creating a user defined function that takes a string, splits it on a delimiter and inserts the values in a temp table, then I would select all from the temp table to use in my in clause. However user defined functions in redshift do not allow tables as a return type. How are others solving this problem in redshift.
Thanks

I was able to create a stored procedure that takes a varchar and creates a temp table of all "slices" of the varchar broken up by a delimiter (in this case a ','). I just wanted to share it here in case someone else has this issue.
Here is the procedure:
CREATE OR REPLACE Procedure sp_UPCStringToTempTable(upcList IN varchar(max))
AS 'DECLARE
idx int;
slice varchar(8000);
upcListVar varchar(max);
BEGIN
idx = 1;
upcListVar = upcList;
DROP TABLE if exists tmp_upc;
CREATE TEMP TABLE tmp_upc(upc varchar(14));
WHILE idx != 0 LOOP
idx = charindex('','', upcListVar);
IF idx != 0 THEN
slice = left(upcListVar, idx - 1);
END IF;
IF idx = 0 THEN
slice = upcListVar;
END IF;
IF len(slice) > 0 THEN
INSERT INTO tmp_upc values (slice);
END IF;
upcListVar = right(upcListVar, len(upcListVar) - idx);
END LOOP;
END;
' LANGUAGE plpgsql;

create table num(id int) ;
insert into num values(1), (2),(3);
with t as
(
select split_part('0000000004011, 0000000094011',',',id ) col1 from num
)
select * from a join t on a.col1 = t.col1
This should solve your problem.

Create Temp Table in Each Loop and Union After Loop Completion

Using BigQuery's standard SQL scripting functionality, I want to 1) create a temp table for each iteration of a loop, and 2) union those temp tables after the loop is complete. I've tried something like the following:
DECLARE i INT64 DEFAULT 1;
DECLARE ttable_name STRING;
WHILE i < 10 DO
SET ttable_name = CONCAT('temp_table_', CAST(i AS STRING));
CREATE OR REPLACE TEMP TABLE ttable_name AS
SELECT * FROM my_table AS mt WHERE mt.my_col = 1;
SET i = i + 1;
END LOOP;
SELECT * FROM temp_table_*; -- wildcard table to union all results
But I get the following error:
Exceeded rate limits: too many table update operations for this table.
How can I accomplish this task?

Your script does not work the way you think it does!
Instead of writing in each iteration into separate table named like temp_table_N - you actually writing to the very same temp table named ttable_name - thus the Exceeded rate limits error
BigQuery does not allow using variables for objects names

Don't create new tables. Add to an existing one with an INSERT INTO, or hold data in a variable (if it's not too much data), as in:
DECLARE steps INT64 DEFAULT 1;
DECLARE table_holder ARRAY<STRUCT<steps INT64, x INT64, y ARRAY<INT64>>>;
LOOP
SET table_holder = (
SELECT ARRAY_AGG(
STRUCT(steps, 1 AS x, [1,2,3] AS y))
FROM (SELECT '')
);
SET steps = steps+1;
IF steps=30 THEN LEAVE; END IF;
END LOOP;
CREATE TABLE temp.results
AS
SELECT *
FROM UNNEST(table_holder)
Related: https://stackoverflow.com/a/59314390/132438

Question asker/OP here. While I have selected #felipe-hoffa's answer as I believe it will be best for future readers of this question, I have actually gone a different route in solving my problem:
BEGIN
DECLARE i INT64 DEFAULT 1;
CREATE OR REPLACE TEMP TABLE ttable AS
SELECT
CAST(NULL AS INT64) AS col1 -- cast NULL as the type of target col
,CAST(NULL AS FLOAT64) AS col2
,CAST(NULL AS DATE) AS col3;
WHILE i < 10 DO
-- overwrite `ttable` with its previous contents union'ed
-- with new data results from current loop iteration
CREATE OR REPLACE TEMP TABLE ttable AS
SELECT mt.col1, mt.col2, mt.col3 FROM my_table AS mt WHERE mt.other_col = i
UNION ALL
SELECT * FROM ttable;
SET i = i + 1;
END LOOP;
SELECT * FROM ttable; -- UNION'ed results
DROP TABLE IF EXISTS ttable;
END;
Why? I find it easier to stay in "table land" than to venture into "STRUCT/ARRAY land".

CREATE OR REPLACE TEMP TABLE in a script error: "Exceeded rate limits: too many table update operations for this table."

This script gives me an error after ~11 steps:
DECLARE steps INT64 DEFAULT 1;
LOOP
CREATE OR REPLACE TEMP TABLE countme AS (SELECT steps, 1 x, [1,2,3] y);
SET steps = steps+1;
IF steps=30 THEN LEAVE; END IF;
END LOOP;
Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
Even if this is a temp table - what can I do instead?

Instead of using a TEMP TABLE, hold the results on a temp variable with an array. You can even materialize it as the last step:
DECLARE steps INT64 DEFAULT 1;
DECLARE table_holder ARRAY<STRUCT<steps INT64, x INT64, y ARRAY<INT64>>>;
LOOP
SET table_holder = (
SELECT ARRAY_AGG(
STRUCT(steps, 1 AS x, [1,2,3] AS y))
FROM (SELECT '')
);
SET steps = steps+1;
IF steps=30 THEN LEAVE; END IF;
END LOOP;
CREATE TABLE temp.results
AS
SELECT *
FROM UNNEST(table_holder)

How to use variables in "EXECUTE format()" in plpgsql

I want to update a column in table stats with the specific column being a parameter, then return the updated value of that column [only has 1 row]:
CREATE FUNCTION grow(col varchar) RETURNS integer AS $$
DECLARE
tmp int;
BEGIN
tmp := (EXECUTE format(
'UPDATE stats SET %I = %I + 1
RETURNING %I',
col, col, col
)
);
RETURN tmp;
END;
As a whole, I'm not even sure if this is best way to do what I want, any suggestion would be appreciated!

You can do that. Use the INTO keyword of the EXECUTE statement.
CREATE OR REPLACE FUNCTION grow(_col text, OUT tmp integer)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE format(
'UPDATE stats
SET %1$I = %1$I + 1
RETURNING %1$I'
, _col)
INTO tmp;
END
$func$;
Call:
SELECT grow('counter');
Using an OUT parameter to simplify overall.
format() syntax explained in the manual.
You could just run the UPDATE instead of a function call:
UPDATE stats SET counter = counter + 1 RETURNING counter;
There are not many scenarios where the function with dynamic SQL isn't just needless complication.
Alternative design
If at all possible consider a different table layout: rows instead of columns (as suggested by #Ruslan). Allows any number of counters:
CREATE TABLE stats (
tag text PRIMARY KEY
, counter int NOT NULL DEFAULT 0
);
Call:
UPDATE stats
SET counter = counter + 1
WHERE tag = 'counter1'
RETURNING counter;
Or maybe consider a dedicated SEQUENCE for counting ...

Oracle SQL: Compare all values from 2 columns and exchange them

I have a Oracle DB with a table called myC. In this table I have a few row, two of them called myCheight, myCwidth.
I need to read these values and compare them like in IF myCheight > myCwidth DO switch the values.
I tried to read values from one row but didnt get it to work. I use Oracles Oracle SQL Developer.
This is what i came up with so far:
set serveroutput on;
DECLARE
cursor h is select * from MyC;
type htype is table of h%rowtype index by number;
stage_tab htype;
master_tab htype;
BEGIN
open h;
loop
fetch h bulk collect into stage_tab limit 500;
for i in 1 .. stage_tab.count loop
master_tab(stage_tab(i).id) := stage_tabe(i);
end loop;
exit when h%notfound;
end loop;
close h;
end;

Can't you just do this?
UPDATE myC
SET myCheight = myCwidth,
myCwidth = myCheight
WHERE myCheight > myCwidth

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can I create clone of tables in google bigquery - google-bigquery

Try EXECUTE IMMEDIATE: DECLARE i INT64 DEFAULT 1; SET n = 10; -- we will do this until we execute below query n times WHILE i < n DO EXECUTE IMMEDIATE "CREATE TABLE `myproject.target_dataset.table_" || i || "` AS SELECT * FROM `myproject.source_databaset.sample_table`"; SET i = i + 1; END WHILE;

Related

How can I pass a list, array or string to be separated as a parameter to redshift

Create Temp Table in Each Loop and Union After Loop Completion

CREATE OR REPLACE TEMP TABLE in a script error: "Exceeded rate limits: too many table update operations for this table."

How to use variables in "EXECUTE format()" in plpgsql

Oracle SQL: Compare all values from 2 columns and exchange them

Categories

Resources