Create Temp Table in Each Loop and Union After Loop Completion - google-bigquery

Using BigQuery's standard SQL scripting functionality, I want to 1) create a temp table for each iteration of a loop, and 2) union those temp tables after the loop is complete. I've tried something like the following:
DECLARE i INT64 DEFAULT 1;
DECLARE ttable_name STRING;
WHILE i < 10 DO
SET ttable_name = CONCAT('temp_table_', CAST(i AS STRING));
CREATE OR REPLACE TEMP TABLE ttable_name AS
SELECT * FROM my_table AS mt WHERE mt.my_col = 1;
SET i = i + 1;
END LOOP;
SELECT * FROM temp_table_*; -- wildcard table to union all results
But I get the following error:
Exceeded rate limits: too many table update operations for this table.
How can I accomplish this task?

Your script does not work the way you think it does!
Instead of writing in each iteration into separate table named like temp_table_N - you actually writing to the very same temp table named ttable_name - thus the Exceeded rate limits error
BigQuery does not allow using variables for objects names

Don't create new tables. Add to an existing one with an INSERT INTO, or hold data in a variable (if it's not too much data), as in:
DECLARE steps INT64 DEFAULT 1;
DECLARE table_holder ARRAY<STRUCT<steps INT64, x INT64, y ARRAY<INT64>>>;
LOOP
SET table_holder = (
SELECT ARRAY_AGG(
STRUCT(steps, 1 AS x, [1,2,3] AS y))
FROM (SELECT '')
);
SET steps = steps+1;
IF steps=30 THEN LEAVE; END IF;
END LOOP;
CREATE TABLE temp.results
AS
SELECT *
FROM UNNEST(table_holder)
Related: https://stackoverflow.com/a/59314390/132438

Question asker/OP here. While I have selected #felipe-hoffa's answer as I believe it will be best for future readers of this question, I have actually gone a different route in solving my problem:
BEGIN
DECLARE i INT64 DEFAULT 1;
CREATE OR REPLACE TEMP TABLE ttable AS
SELECT
CAST(NULL AS INT64) AS col1 -- cast NULL as the type of target col
,CAST(NULL AS FLOAT64) AS col2
,CAST(NULL AS DATE) AS col3;
WHILE i < 10 DO
-- overwrite `ttable` with its previous contents union'ed
-- with new data results from current loop iteration
CREATE OR REPLACE TEMP TABLE ttable AS
SELECT mt.col1, mt.col2, mt.col3 FROM my_table AS mt WHERE mt.other_col = i
UNION ALL
SELECT * FROM ttable;
SET i = i + 1;
END LOOP;
SELECT * FROM ttable; -- UNION'ed results
DROP TABLE IF EXISTS ttable;
END;
Why? I find it easier to stay in "table land" than to venture into "STRUCT/ARRAY land".

Related

How can I pass a list, array or string to be separated as a parameter to redshift

I'm trying to write a simple query with an in clause like so:
SELECT *
FROM storeupcsalesbyday
WHERE date >= '9/1/2020' AND date <= '9/10/2020' AND upc in ('0000000004011', '0000000094011')
I need to be able to pass the values in the in clause as a parameter, the number of values in the in clause are variable and could be one or thousands depending on the user input. In other sql databases I have solved this problem by creating a user defined function that takes a string, splits it on a delimiter and inserts the values in a temp table, then I would select all from the temp table to use in my in clause. However user defined functions in redshift do not allow tables as a return type. How are others solving this problem in redshift.
Thanks
I was able to create a stored procedure that takes a varchar and creates a temp table of all "slices" of the varchar broken up by a delimiter (in this case a ','). I just wanted to share it here in case someone else has this issue.
Here is the procedure:
CREATE OR REPLACE Procedure sp_UPCStringToTempTable(upcList IN varchar(max))
AS 'DECLARE
idx int;
slice varchar(8000);
upcListVar varchar(max);
BEGIN
idx = 1;
upcListVar = upcList;
DROP TABLE if exists tmp_upc;
CREATE TEMP TABLE tmp_upc(upc varchar(14));
WHILE idx != 0 LOOP
idx = charindex('','', upcListVar);
IF idx != 0 THEN
slice = left(upcListVar, idx - 1);
END IF;
IF idx = 0 THEN
slice = upcListVar;
END IF;
IF len(slice) > 0 THEN
INSERT INTO tmp_upc values (slice);
END IF;
upcListVar = right(upcListVar, len(upcListVar) - idx);
END LOOP;
END;
' LANGUAGE plpgsql;
create table num(id int) ;
insert into num values(1), (2),(3);
with t as
(
select split_part('0000000004011, 0000000094011',',',id ) col1 from num
)
select * from a join t on a.col1 = t.col1
This should solve your problem.

CREATE OR REPLACE TEMP TABLE in a script error: "Exceeded rate limits: too many table update operations for this table."

This script gives me an error after ~11 steps:
DECLARE steps INT64 DEFAULT 1;
LOOP
CREATE OR REPLACE TEMP TABLE countme AS (SELECT steps, 1 x, [1,2,3] y);
SET steps = steps+1;
IF steps=30 THEN LEAVE; END IF;
END LOOP;
Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
Even if this is a temp table - what can I do instead?
Instead of using a TEMP TABLE, hold the results on a temp variable with an array. You can even materialize it as the last step:
DECLARE steps INT64 DEFAULT 1;
DECLARE table_holder ARRAY<STRUCT<steps INT64, x INT64, y ARRAY<INT64>>>;
LOOP
SET table_holder = (
SELECT ARRAY_AGG(
STRUCT(steps, 1 AS x, [1,2,3] AS y))
FROM (SELECT '')
);
SET steps = steps+1;
IF steps=30 THEN LEAVE; END IF;
END LOOP;
CREATE TABLE temp.results
AS
SELECT *
FROM UNNEST(table_holder)

How to store multiple rows in a variable in pl/sql function?

I'm writing a pl/sql function. I need to select multiple rows from select statement:
SELECT pel.ceid
FROM pa_exception_list pel
WHERE trunc(pel.creation_date) >= trunc(SYSDATE-7)
if i use:
SELECT pel.ceid
INTO v_ceid
it only stores one value, but i need to store all values that this select returns. Given that this is a function i can't just use simple select because i get error, "INTO - is expected."
You can use a record type to do that. The below example should work for you
DECLARE
TYPE v_array_type IS VARRAY (10) OF NUMBER;
var v_array_type;
BEGIN
SELECT x
BULK COLLECT INTO
var
FROM (
SELECT 1 x
FROM dual
UNION
SELECT 2 x
FROM dual
UNION
SELECT 3 x
FROM dual
);
FOR I IN 1..3 LOOP
dbms_output.put_line(var(I));
END LOOP;
END;
So in your case, it would be something like
select pel.ceid
BULK COLLECT INTO <variable which you create>
from pa_exception_list
where trunc(pel.creation_Date) >= trunc(sysdate-7);
If you really need to store multiple rows, check BULK COLLECT INTO statement and examples. But maybe FOR cursor LOOP and row-by-row processing would be better decision.
You may store all in a rowtype parameter and show whichever column you want to show( assuming ceid is your primary key column, col1 & 2 are some other columns of your table ) :
SQL> set serveroutput on;
SQL> declare
l_exp pa_exception_list%rowtype;
begin
for c in ( select *
from pa_exception_list pel
where trunc(pel.creation_date) >= trunc(SYSDATE-7)
) -- to select multiple rows
loop
select *
into l_exp
from pa_exception_list
where ceid = c.ceid; -- to render only one row( ceid is primary key )
dbms_output.put_line(l_exp.ceid||' - '||l_exp.col1||' - '||l_exp.col2); -- to show the results
end loop;
end;
/
SET SERVEROUTPUT ON
BEGIN
FOR rec IN (
--an implicit cursor is created here
SELECT pel.ceid AS ceid
FROM pa_exception_list pel
WHERE trunc(pel.creation_date) >= trunc(SYSDATE-7)
)
LOOP
dbms_output.put_line(rec.ceid);
END LOOP;
END;
/
Notes from here:
In this case, the cursor FOR LOOP declares, opens, fetches from, and
closes an implicit cursor. However, the implicit cursor is internal;
therefore, you cannot reference it.
Note that Oracle Database automatically optimizes a cursor FOR LOOP to
work similarly to a BULK COLLECT query. Although your code looks as if
it fetched one row at a time, Oracle Database fetches multiple rows at
a time and allows you to process each row individually.

Store result of a query inside a function

I've the following function:
DO
$do$
DECLARE
maxgid integer;
tableloop integer;
obstacle geometry;
simplifyedobstacle geometry;
BEGIN
select max(gid) from public.terrain_obstacle_temp into maxgid;
FOR tableloop IN 1 .. maxgid
LOOP
insert into public.terrain_obstacle (tse_coll,tse_height,geom) select tse_coll,tse_height,geom
from public.terrain_obstacle_temp where gid = tableloop;
END LOOP;
END
$do$;
I need to modify this function in order to execute different queries according to the type of a column of public.terrain_obstacle_temp.
This is a temporary table created by reading a shapefile, and I need to know the kind of the geom column of that table. I have a query that give the data to me:
SELECT type
FROM geometry_columns
WHERE f_table_schema = 'public'
AND f_table_name = 'terrain_obstacle'
and f_geometry_column = 'geom';
It returns me a character_varying value (in this case MULTIPOLYGON).
Ho can I modify the function in order to get the result of the query, and create an if statement that allows me to execute some code according to the result of that query?
Is the intention to copy all the records from the temp table to the actual table? If so, you may be able to skip the loop:
insert into public.terrain_obstacle (tse_coll, tse_height, geom)
select tse_coll, tse_height, geom
from public.terrain_obstacle_temp
;
Do terrain_obstacle and terrain_obstacle_temp have the same structure? If so, then the "insert into ... select ..." should work fine provided the column types are the same.
If conditional typing is required, use the CASE WHEN syntax:
v_type geometry_columns.type%TYPE;
...
SELECT type
INTO v_type
FROM geometry_columns
WHERE f_table_schema = 'public'
AND f_table_name = 'terrain_obstacle'
AND f_geometry_column = 'geom'
;
insert into public.terrain_obstacle (tse_coll, tse_height, geom)
select tse_coll
,tse_height
,CASE WHEN v_type = 'MULTIPOLYGON' THEN my_func1(geom)
WHEN v_type = 'POINT' THEN my_func2(geom)
ELSE my_default(geom)
END
from public.terrain_obstacle_temp
;

value limitation in an IN clause Oracle

I work for a company that has a DW - ETL setup. I need to write a query that looks for over 2500+ values in an WHEN - IN clause and also over 1000+ values in a WHERE - IN clause. Basically it would look like the following:
SELECT
,user_id
,CASE WHEN user_id IN ('user_n', +2500 user_[n+1] ) THEN 1
ELSE 0
,item_id
FROM user_table
WHERE item_id IN ('item_n', +1000 item_[n+1] );
As you probably already know PL/SQL allows a maximum of 1000 values in an IN clause, so I tried adding OR - IN clauses (as suggested in other stackoverflow threads):
SELECT
,user_id
,CASE WHEN user_id IN ('user_n', +999 user_[n+1] )
OR user_id IN ('user_n', +999 user_[n+1] )
OR user_id IN ('user_n', +999 user_[n+1] ) THEN 1
ELSE 0 END AS user_group
,item_id
FROM user_table
WHERE item_id IN ('item_n', +999 item_[n+1] )
OR item_id IN ('item_n', +999 item_[n+1] );
NOTE: i know the math is erroneous in the examples above, but you get the point
The problem is that queries have a maximum executing time of 120 minutes and the job is being automatically killed. So I googled what solutions I could find and it seems Temporary Tables could be the solution I'm looking for, but with all honesty none of the examples I found is clear enough on how to include the values I want in the table and also how to use this table in my original query. Not even the ORACLE documentation was of much help.
Another potential problem is that I have limited rights and I've seen other people mention that in their companies they don't have the rights to create temporary tables.
Some of the info I found in my research:
ORACLE documentation
StackOverflow thread
[StackOverflow thread 2]
Another solution I found was using tuples instead, as mentioned in THIS thread (which I haven't tried) because as another user mentions performance seems greatly affected.
Any guidance on how to use a Temporary Table or if anyone has another way of dealing with this limitation would be greatly appreciated.
Create a global temporary table so no undo logs are created
CREATE GLOBAL TEMPORARY TABLE <table_name> (
<column_name> <column_data_type>,
<column_name> <column_data_type>,
<column_name> <column_data_type>)
ON COMMIT DELETE ROWS;
then depending on how the user list arrives import the data into a holding table and then run
select 'INSERT INTO global_temporary_table <column> values '
|| holding_table.column
||';'
FROM holding_table.column;
This gives you insert statements as output which you run to insert the data.
then
SELECT <some_column>
FROM <some_table>
WHERE <some_value> IN
(SELECT <some_column> from <global_temporary_table>
Use a collection:
CREATE TYPE Ints_Table AS TABLE OF INT;
CREATE TYPE IDs_Table AS TABLE OF CHAR(5);
Something like this:
SELECT user_id,
CASE WHEN user_id MEMBER OF Ints_Table( 1, 2, 3, /* ... */ 2500 )
THEN 1
ELSE 0
END
,item_id
FROM user_table
WHERE item_id MEMBER OF IDs_table( 'ABSC2', 'DITO9', 'KMKM9', /* ... */ 'QD3R5' );
Or you can use PL/SQL to populate a collection:
VARIABLE cur REFCURSOR;
DECLARE
t_users Ints_Table;
t_items IDs_Table;
f UTL_FILE.FILE_TYPE;
line VARCHAR2(4000);
BEGIN
t_users.EXTEND( 2500 );
FOR i = 1 .. 2500 LOOP
t_users( t_users.COUNT ) := i;
END LOOP;
// load data from a file
f := UTL_FILE.FOPEN('DIRECTORY_HANDLE','datafile.txt','R');
IF UTL_FILE.IS_OPEN(f) THEN
LOOP
UTL_FILE.GET_LINE(f,line);
IF line IS NULL THEN EXIT; END IF;
t_items.EXTEND;
t_items( t_items.COUNT ) := line;
END LOOP;
OPEN :cur FOR
SELECT user_id,
CASE WHEN user_id MEMBER OF t_users
THEN 1
ELSE 0
END
,item_id
FROM user_table
WHERE item_id MEMBER OF t_items;
END;
/
PRINT cur;
Or if you are using another language to call the query then you could pass the collections as a bind value (as shown here).
In PL/SQL you could use a collection type. You could create your own like this:
create type string_table is table of varchar2(100);
Or use an existing type such as SYS.DBMS_DEBUG_VC2COLL which is a table of VARCHAR2(1000).
Now you can declare a collection of this type for each of your lists, populate it, and use it in the query - something like this:
declare
strings1 SYS.DBMS_DEBUG_VC2COLL := SYS.DBMS_DEBUG_VC2COLL();
strings2 SYS.DBMS_DEBUG_VC2COLL := SYS.DBMS_DEBUG_VC2COLL();
procedure add_string1 (p_string varchar2) is
begin
strings1.extend();
strings1(strings.count) := p_string;
end;
procedure add_string2 (p_string varchar2) is
begin
strings2.extend();
strings2(strings2.count) := p_string;
end;
begin
add_string1('1');
add_string1('2');
add_string1('3');
-- and so on...
add_string1('2500');
add_string2('1');
add_string2('2');
add_string2('3');
-- and so on...
add_string2('1400');
for r in (
select user_id
, case when user_id in table(strings2) then 1 else 0 end as indicator
, item_id
from user_table
where item_id in table(strings1)
)
loop
dbms_output.put_Line(r.user_id||' '||r.indicator);
end loop;
end;
/
You can use below example to understand Global temporary tables and the type of GTT.
CREATE GLOBAL TEMPORARY TABLE GTT_PRESERVE_ROWS (ID NUMBER) ON COMMIT PRESERVE ROWS;
INSERT INTO GTT_PRESERVE_ROWS VALUES (1);
COMMIT;
SELECT * FROM GTT_PRESERVE_ROWS;
DELETE FROM GTT_PRESERVE_ROWS;
COMMIT;
TRUNCATE TABLE GTT_PRESERVE_ROWS;
DROP TABLE GTT_PRESERVE_ROWS;--WONT WORK IF YOU DIDNOT TRUNCATE THE TABLE OR THE TABLE IS BEING USED IN SOME OTHER SESSION
CREATE GLOBAL TEMPORARY TABLE GTT_DELETE_ROWS (ID NUMBER) ON COMMIT DELETE ROWS;
INSERT INTO GTT_DELETE_ROWS VALUES (1);
SELECT * FROM GTT_DELETE_ROWS;
COMMIT;
SELECT * FROM GTT_DELETE_ROWS;
DROP TABLE GTT_DELETE_ROWS;
However as you mentioned you receive the input in an excel file so you can simply create a table and load data in that table. Once the data is loaded you can use the data in IN clause of your query.
select * from employee where empid in (select empid from temptable);
create temporary table userids (userid int);
insert into userids(...)
then a join or in subquery
select ...
where user_id in (select userid from userids);
drop temporary table userids;