How can I pass a list, array or string to be separated as a parameter to redshift - sql

I'm trying to write a simple query with an in clause like so:
SELECT *
FROM storeupcsalesbyday
WHERE date >= '9/1/2020' AND date <= '9/10/2020' AND upc in ('0000000004011', '0000000094011')
I need to be able to pass the values in the in clause as a parameter, the number of values in the in clause are variable and could be one or thousands depending on the user input. In other sql databases I have solved this problem by creating a user defined function that takes a string, splits it on a delimiter and inserts the values in a temp table, then I would select all from the temp table to use in my in clause. However user defined functions in redshift do not allow tables as a return type. How are others solving this problem in redshift.
Thanks

I was able to create a stored procedure that takes a varchar and creates a temp table of all "slices" of the varchar broken up by a delimiter (in this case a ','). I just wanted to share it here in case someone else has this issue.
Here is the procedure:
CREATE OR REPLACE Procedure sp_UPCStringToTempTable(upcList IN varchar(max))
AS 'DECLARE
idx int;
slice varchar(8000);
upcListVar varchar(max);
BEGIN
idx = 1;
upcListVar = upcList;
DROP TABLE if exists tmp_upc;
CREATE TEMP TABLE tmp_upc(upc varchar(14));
WHILE idx != 0 LOOP
idx = charindex('','', upcListVar);
IF idx != 0 THEN
slice = left(upcListVar, idx - 1);
END IF;
IF idx = 0 THEN
slice = upcListVar;
END IF;
IF len(slice) > 0 THEN
INSERT INTO tmp_upc values (slice);
END IF;
upcListVar = right(upcListVar, len(upcListVar) - idx);
END LOOP;
END;
' LANGUAGE plpgsql;

create table num(id int) ;
insert into num values(1), (2),(3);
with t as
(
select split_part('0000000004011, 0000000094011',',',id ) col1 from num
)
select * from a join t on a.col1 = t.col1
This should solve your problem.

Related

Create Temp Table in Each Loop and Union After Loop Completion

Using BigQuery's standard SQL scripting functionality, I want to 1) create a temp table for each iteration of a loop, and 2) union those temp tables after the loop is complete. I've tried something like the following:
DECLARE i INT64 DEFAULT 1;
DECLARE ttable_name STRING;
WHILE i < 10 DO
SET ttable_name = CONCAT('temp_table_', CAST(i AS STRING));
CREATE OR REPLACE TEMP TABLE ttable_name AS
SELECT * FROM my_table AS mt WHERE mt.my_col = 1;
SET i = i + 1;
END LOOP;
SELECT * FROM temp_table_*; -- wildcard table to union all results
But I get the following error:
Exceeded rate limits: too many table update operations for this table.
How can I accomplish this task?
Your script does not work the way you think it does!
Instead of writing in each iteration into separate table named like temp_table_N - you actually writing to the very same temp table named ttable_name - thus the Exceeded rate limits error
BigQuery does not allow using variables for objects names
Don't create new tables. Add to an existing one with an INSERT INTO, or hold data in a variable (if it's not too much data), as in:
DECLARE steps INT64 DEFAULT 1;
DECLARE table_holder ARRAY<STRUCT<steps INT64, x INT64, y ARRAY<INT64>>>;
LOOP
SET table_holder = (
SELECT ARRAY_AGG(
STRUCT(steps, 1 AS x, [1,2,3] AS y))
FROM (SELECT '')
);
SET steps = steps+1;
IF steps=30 THEN LEAVE; END IF;
END LOOP;
CREATE TABLE temp.results
AS
SELECT *
FROM UNNEST(table_holder)
Related: https://stackoverflow.com/a/59314390/132438
Question asker/OP here. While I have selected #felipe-hoffa's answer as I believe it will be best for future readers of this question, I have actually gone a different route in solving my problem:
BEGIN
DECLARE i INT64 DEFAULT 1;
CREATE OR REPLACE TEMP TABLE ttable AS
SELECT
CAST(NULL AS INT64) AS col1 -- cast NULL as the type of target col
,CAST(NULL AS FLOAT64) AS col2
,CAST(NULL AS DATE) AS col3;
WHILE i < 10 DO
-- overwrite `ttable` with its previous contents union'ed
-- with new data results from current loop iteration
CREATE OR REPLACE TEMP TABLE ttable AS
SELECT mt.col1, mt.col2, mt.col3 FROM my_table AS mt WHERE mt.other_col = i
UNION ALL
SELECT * FROM ttable;
SET i = i + 1;
END LOOP;
SELECT * FROM ttable; -- UNION'ed results
DROP TABLE IF EXISTS ttable;
END;
Why? I find it easier to stay in "table land" than to venture into "STRUCT/ARRAY land".

Store result of a query inside a function

I've the following function:
DO
$do$
DECLARE
maxgid integer;
tableloop integer;
obstacle geometry;
simplifyedobstacle geometry;
BEGIN
select max(gid) from public.terrain_obstacle_temp into maxgid;
FOR tableloop IN 1 .. maxgid
LOOP
insert into public.terrain_obstacle (tse_coll,tse_height,geom) select tse_coll,tse_height,geom
from public.terrain_obstacle_temp where gid = tableloop;
END LOOP;
END
$do$;
I need to modify this function in order to execute different queries according to the type of a column of public.terrain_obstacle_temp.
This is a temporary table created by reading a shapefile, and I need to know the kind of the geom column of that table. I have a query that give the data to me:
SELECT type
FROM geometry_columns
WHERE f_table_schema = 'public'
AND f_table_name = 'terrain_obstacle'
and f_geometry_column = 'geom';
It returns me a character_varying value (in this case MULTIPOLYGON).
Ho can I modify the function in order to get the result of the query, and create an if statement that allows me to execute some code according to the result of that query?
Is the intention to copy all the records from the temp table to the actual table? If so, you may be able to skip the loop:
insert into public.terrain_obstacle (tse_coll, tse_height, geom)
select tse_coll, tse_height, geom
from public.terrain_obstacle_temp
;
Do terrain_obstacle and terrain_obstacle_temp have the same structure? If so, then the "insert into ... select ..." should work fine provided the column types are the same.
If conditional typing is required, use the CASE WHEN syntax:
v_type geometry_columns.type%TYPE;
...
SELECT type
INTO v_type
FROM geometry_columns
WHERE f_table_schema = 'public'
AND f_table_name = 'terrain_obstacle'
AND f_geometry_column = 'geom'
;
insert into public.terrain_obstacle (tse_coll, tse_height, geom)
select tse_coll
,tse_height
,CASE WHEN v_type = 'MULTIPOLYGON' THEN my_func1(geom)
WHEN v_type = 'POINT' THEN my_func2(geom)
ELSE my_default(geom)
END
from public.terrain_obstacle_temp
;

How to transpose a table from a wide format to narrow, using the values as a filter?

I get a table X (with 1 row):
COL_XA COL_VG COL_LF COL_EQ COL_PP COL_QM ...
1 0 0 0 1 1
Each column COL_x can have only values 0 or 1.
I want to transform this table into this form Y:
NAME
"COL_XA"
"COL_PP"
"COL_QM"
...
This table should print only those columns from table X that the first (and only) row has value 1.
This question is related to any other question about transposition, with the difference that I don't want the actual values, but the column names, which are not known in advance.
I could use Excel or PL/SQL to create a list of strings of the form
MIN(CASE WHEN t.COL_XA = 1 THEN 'COL_XA' ELSE null END) as NAME, but this solution is inefficient (EXECUTE IMMEDIATE) and difficult to maintain. And the string passed to EXECUTE IMMEDIATE is limited to 32700 characters, which can be easily exceeded in production, where the table X can have well over 500 fields.
To completly automate the query you must be able to read the column names of the actual cursor. In PL/SQL this is possible using DBMS_SQL (other way would be in JDBC). Based on this OTN thread here a basic table function.
The importent parts are
1) dbms_sql.parse the query given as a text string and dbms_sql.execute it
2) dbms_sql.describe_columns to get the list of the column names returned from the query on table x
3) dbms_sql.fetch_rows to fetch the first row
4) loop the columns and checking the dbms_sql.column_value if equals to 1 output column_name (with PIPE)
create or replace type str_tblType as table of varchar2(30);
/
create or replace function get_col_name_on_one return str_tblType
PIPELINED
as
l_theCursor integer default dbms_sql.open_cursor;
l_columnValue varchar2(2000);
l_columnOutput varchar2(4000);
l_status integer;
l_colCnt number default 0;
l_colDesc dbms_sql.DESC_TAB;
begin
dbms_sql.parse( l_theCursor, 'SELECT * FROM X', dbms_sql.native );
for i in 1 .. 1000 loop
begin
dbms_sql.define_column( l_theCursor, i,
l_columnValue, 2000 );
l_colCnt := i;
exception
when others then
if ( sqlcode = -1007 ) then exit;
else
raise;
end if;
end;
end loop;
dbms_sql.define_column( l_theCursor, 1, l_columnValue, 2000 );
l_status := dbms_sql.execute(l_theCursor);
dbms_sql.describe_columns(l_theCursor,l_colCnt, l_colDesc);
if dbms_sql.fetch_rows(l_theCursor) > 0 then
for lColCnt in 1..l_colCnt
loop
dbms_sql.column_value( l_theCursor, lColCnt, l_columnValue );
--DBMS_OUTPUT.PUT_LINE( l_columnValue);
IF (l_columnValue = '1') THEN
DBMS_OUTPUT.PUT_LINE(Upper(l_colDesc(lColCnt).col_name));
pipe row(Upper(l_colDesc(lColCnt).col_name));
END IF;
end loop;
end if;
return;
end;
/
select * from table(get_col_name_on_one);
COLUMN_LOOOOOOOOOOOOOONG_100
COLUMN_LOOOOOOOOOOOOOONG_200
COLUMN_LOOOOOOOOOOOOOONG_300
COLUMN_LOOOOOOOOOOOOOONG_400
COLUMN_LOOOOOOOOOOOOOONG_500
COLUMN_LOOOOOOOOOOOOOONG_600
COLUMN_LOOOOOOOOOOOOOONG_700
COLUMN_LOOOOOOOOOOOOOONG_800
COLUMN_LOOOOOOOOOOOOOONG_900
COLUMN_LOOOOOOOOOOOOOONG_1000
You should not get in troubles with wide tables using this solution, I tested with a 1000 column tables with long column names.
Here is solution but I have to break it in two parts
First you extract all the column names of table. I have used LISTAGG to collect column names separated by ,
I will use the output of first query in second query.
select listagg(column_name,',') WITHIN GROUP (ORDER BY column_name )
from user_tab_cols where upper(table_name)='X'
The output of above query will be like COL_XA,COL_VG,COL_LF,COL_EQ,COL_PP,COL_QM ... and so on.
Copy above output and use in below query replacing
select NAME from X
unpivot ( bit for NAME in (<outputvaluesfromfirstquery>))
where bit=1
I am trying to merge above two, but I have option for pivot xml but not for unpivot xml.
You can do this with a bunch of union alls:
select 'COL_XA' as name from table t where col_xa = 1 union all
select 'COL_VG' as name from table t where col_vg = 1 union all
. . .
EDIT:
If you have only one row, then you do not need:
MIN(CASE WHEN t.COL_XA = 1 THEN 'COL_XA' ELSE null END) as NAME
You can simply use:
(CASE WHEN t.COL_XA = 1 THEN 'COL_XA' END)
The MIN() isn't needed for one row and the ELSE null is redundant.

How insert rows with max(order_field) + 1 transactionally in PostgreSQL

I need to insert in a PostgreSQL table a row with a column containing the max value + 1 for this same column on a subset of the rows of the table. That column is used to ordering the rows in that subset.
I´m trying to update the column value in an after insert trigger but I´m obtaining duplicate values for this column in different rows.
What´s the best way to do that avoiding duplicate values for the ordering column in the subset in a concurrent environment with a lot of inserts in a short time?
Thanks in advance
EDIT:
The subset is defined by another column of the same table: this column has the same value for all the related rows.
If that column is used only for ordering then use a sequence:
create table t (
column1 integer,
ordering_column serial
);
http://www.postgresql.org/docs/current/static/datatype-numeric.html#DATATYPE-NUMERIC-TABLE
New transactional-safe answer:
To make it in a transactional-safe way you could use this trigger, which creates sequences for each different "set_id" value:
create or replace function calculate_index() returns trigger
as $$
declare my_indexer_name text;
begin
my_indexer_name = 'my_indexer_name_' || NEW.my_set_id;
if NOT EXISTS (SELECT * FROM pg_class WHERE relname = my_indexer_name)
then
execute 'create sequence ' || my_indexer_name;
end if;
select nextval(my_indexer_name) into NEW.my_index;
return new;
end
$$
language plpgsql;
CREATE TRIGGER my_indexer_trigger
BEFORE INSERT ON my_table FOR EACH ROW
EXECUTE PROCEDURE calculate_index();
Also you could create manually sequences named 'my_indexer_name_1', 'my_indexer_name_2', etc. if your set_id possible values are known beforehand, then you could eliminate the if-then from the trigger function above.
This was my initial and not transactional-safe answer:
I would create a new helper table let's call it set_indexes:
create table set_indexes( set_id integer, max_index integer );
each record has the set_id and the max index value of that set. e.g.:
set_id, max_index
1 53
2 12
3 43
in the trigger code you would:
select max_index + 1 from set_indexes where set_indexes.set_id = NEW.my_set_id
into NEW.my_index;
// Chek if the set_id is new:
if NEW.my_index is null then
insert into set_indexes( set_id, max_index) values (NEW.my_set_id, 1);
NEW.my_index = 0;
else
update set_indexes set max_index = NEW.my_index where set_indexes.set_id = NEW.my_set_id;
end if;

Use query for SQL alias OR join column names to row values?

I'm working with data in PostgreSQL that uses a data dictionary table to provide descriptions for the column (variable) names of other tables in the dataset. For example:
Table 1:
a00600 | a00900
-------+-------
row 1 | row 1
row 2 | row 2
Data Dictionary (Key) columns:
Variable | Description
---------+------------
a00600 | Total population
a00900 | Zipcode
For reporting purposes, how do I write SQL to perform the following dynamically (without specifying each column name)?
SELECT 'a00600' AS (SELECT Key.Description
WHERE Key.Variable = 'a00600')
FROM Table 1;
I realize there's likely a better way to parse this question/problem and am open to any ideas for what I need to accomplish.
You need to use dynamic SQL with a procedural language function. Usually plpgsql and use EXECUTE with it.
The tricky part is to define the return type at creation time.
I have compiled a number of solutions in this related answer.
There are lots of related answer on SO already. Search for combinations of terms like [plpgsql] EXECUTE RETURN QUERY [dynamic-sql] quote_ident.
Your approach is commonly frowned upon among database designers.
My personal opinion: I wouldn't go that route. I always use basic, descriptive names. You can always add more décor in your application if needed.
Another way to get the descriptions instead of the actual column names would be to create views (one for every table). This can be automated by generating the views automatically. This looks rather clumsy, but it has the huge advantage that for "complex* queries the resulting queryplans will be axactly the same as for the original columns names. (functions joined into complex queries will perform badly: the optimiser cannot take them apart, so the resulting behavior will be equivalent to "row at a time")
Example:
-- tmp schema is only for testing
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE thedata
( a00600 varchar
, a00900 varchar
);
INSERT INTO thedata(a00600 , a00900) VALUES
('key1', 'data1')
,('key2', 'data2');
CREATE TABLE thedict
( variable varchar
, description varchar
);
INSERT INTO thedict(variable , description) VALUES
('a00600' , 'Total population')
,('a00900' , 'Zipcode' );
CREATE OR REPLACE FUNCTION create_view_definition(zname varchar)
RETURNS varchar AS
$BODY$
DECLARE
thestring varchar;
therecord RECORD;
iter INTEGER ;
thecurs cursor for
SELECT co.attname AS zname, d.description AS zdesc
FROM pg_class ct
JOIN pg_namespace cs ON cs.oid=ct.relnamespace
JOIN pg_attribute co ON co.attrelid = ct.oid AND co.attnum > 0
LEFT JOIN thedict d ON d.variable = co.attname
WHERE ct.relname = 'thedata'
AND cs.nspname = 'tmp'
;
BEGIN
thestring = '' ;
iter = 0;
FOR therecord IN thecurs LOOP
IF (iter = 0) THEN
thestring = 'CREATE VIEW ' || quote_ident('v'||zname) || ' AS ( SELECT ' ;
ELSE
thestring = thestring || ', ';
END IF;
iter=iter+1;
thestring = thestring || quote_ident(therecord.zname);
IF (therecord.zdesc IS NOT NULL) THEN
thestring = thestring || ' AS ' || quote_ident(therecord.zdesc);
END IF;
END LOOP;
IF (iter > 0) THEN
thestring = thestring || ' FROM ' || quote_ident(zname) || ' )' ;
END IF;
RETURN thestring;
END;
$BODY$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION execute_view_definition(zname varchar)
RETURNS INTEGER AS
$BODY$
DECLARE
meat varchar;
BEGIN
meat = create_view_definition(zname);
EXECUTE meat;
RETURN 0;
END;
$BODY$ LANGUAGE plpgsql;
SELECT create_view_definition('thedata');
SELECT execute_view_definition('thedata');
SELECT * FROM vthedata;
RESULT:
CREATE FUNCTION
CREATE FUNCTION
create_view_definition
---------------------------------------------------------------------------------------------------
CREATE VIEW vthedata AS ( SELECT a00600 AS "Total population", a00900 AS "Zipcode" FROM thedata )
(1 row)
execute_view_definition
-------------------------
0
(1 row)
Total population | Zipcode
------------------+---------
key1 | data1
key2 | data2
(2 rows)
Please note this is only an example. If it were for real, I would at least put the generated views into a separate schema, to avoid name collisions and pollution of the original schema.