Redshift: Executing a dynamic query from a string - sql

I would like to execute a dynamic SQL query stored in a string field on Amazon Redshift.
My background is mostly T-SQL relational databases. I used to build SQL statements dynamically, store them into variables and them execute them. I know Redshift can prepare and execute statements, but I wonder if it is possible to execute a query stored in a string field.
I have a piece of code that dynamically builds the code below with stats on several tables using pg_* system tables. Every column/table name is dynamically calculated. Here's an example of the query output:
SELECT h_article_id AS key, 'transport_parameters_weight_in_grams' AS col_name, COUNT(DISTINCT transport_parameters_weight_in_grams) AS count_value FROM dv.s_products GROUP BY h_article_id UNION ALL
SELECT h_article_id AS key, 'transport_parameters_width_in_mm' AS col_name, COUNT(DISTINCT transport_parameters_width_in_mm) AS count_value FROM dv.s_products GROUP BY h_article_id UNION ALL
SELECT h_article_id AS key, 'label_owner_info_communication_address' AS col_name, COUNT(DISTINCT label_owner_info_communication_address) AS count_value FROM dv.s_products GROUP BY h_article_id
I would like to input this dynamic piece of code within another query, so I can make some statistics, like so:
SELECT col_name, AVG(count_value*1.00) AS avg_count
FROM (
'QUERY ABOVE'
) A
GROUP BY col_name;
This would ouput something like:
col_name avg_count
transport_parameters_weight_in_grams 1.00
transport_parameters_width_in_mm 1.00
label_owner_info_communication_address 0.60
The natural way for me to do this would be to store everything as a string in a variable and execute it. But I'm afraid Redshift does not support this.
Is there an alternative way to really build dynamic SQL code?

This is possible now that we have added support for Stored Procedures. "Overview of Stored Procedures in Amazon Redshift"
For example, this stored procedure counts the rows in a table and inserts the table name and row count into another table. Both table names are provided as input.
CREATE PROCEDURE get_tbl_count(IN source_tbl VARCHAR, IN count_tbl VARCHAR) AS $$
BEGIN
EXECUTE 'INSERT INTO ' || quote_ident(count_tbl)
|| ' SELECT ''' || source_tbl ||''', COUNT(*) FROM '
|| quote_ident(source_tbl) || ';'
RETURN;
END;
$$ LANGUAGE plpgsql;
In your example the query to executed could be passed in as a string.

No. There is not a straightforward way to run dynamic built SQL code in Redshift.
You can't define SQL variables, or create stored procedures, as you would have in MS SQL Server.
You can create Python Functions in Redshift, but you would be coding in Python vs. SQL.
You can use the "PREPARE" and "EXECUTE" statements to run "pre-defined" SQL queries, but you would have to create the statements outside of the database, before passing them to the execute command. By creating the statement outside of the database, in a way defeats the purpose.... You can create any statement in your "favorite" programming language.
As I said, this SQL based, in-database dynamic SQL does not exist.
Basically, you need to run this logic in your application or using something such as AWS Data Pipeline.

I am using Postgre on Redshift, and I ran into this issue and found a solution.
I was trying to create a dynamic query, putting in my own date.
date = dt.date(2018, 10, 30)
query = ''' select * from table where date >= ''' + str(my_date) + ''' order by date '''
But, the query entirely ignores the condition when typing it this way.
However, if you use the percent sign (%), you can insert the date correctly.
The correct way to write the above statement is:
query = ''' select * from table where date >= ''' + ''' '%s' ''' % my_date + ''' order by date '''
So, maybe this is helpful, or maybe it is not. I hope it helps at least one person in my situation!
Best wishes.

Related

How to use a table's content for querying other tables in BIgQuery

My team and I are using a query on a daily basis to receive specific results from a large dataset. This query is constantly updated with different terms that I would like to receive from the dataset.
To make this job more scaleable, I built a table of arrays, each containing the terms and conditions for the query. That way the query can lean on the table, and changes that I make in the table will affect the query without the need to change it.
The thing is - I can't seem to find a way to reference the table in the actual query without selecting it. I want to use the content of the table as a WHERE condition. for example:
table1:
terms
[term1, term2, term3]
query:
select * from dataset
where dataset.collumn like '%term1'
or dataset.collumn like '%term2'
or dataset.collumn like '%term3'
etc.
If you have any ideas please let me know (if the solution involves Python or JS this is also great)
thanks!
You can "build" the syntax you want using Procedural Language in BigQuery and then execute it. Here is a way of doing it without "leaving" BQ (meaning, without using external code):
BEGIN
DECLARE statement STRING DEFAULT 'SELECT col FROM dataset.table WHERE';
FOR record IN (SELECT * FROM UNNEST(['term1','term2','term3']) as term)
DO
SET statement = CONCAT(statement, ' col LIKE "', '%', record.term, '" OR');
END FOR;
SET statement = CONCAT(statement, ' 1=2');
EXECUTE IMMEDIATE statement;
END;

Dynamic Oracle Table Function for use in Tableau

We have a large amount of data in an Oracle 11g server. Most of the engineers use Tableau for visualizing data, but there is currently not a great solution for visualizing straight from the Oracle server because of the structure of the database. Unfortunately, this cannot be changed, as it's very deeply integrated with the rest of our systems. There is a "dictionary" table, let's call it tab_keys:
name | key
---------------
AB-7 | 19756
BG-0 | 76519
FY-10 | 79513
JB-2 | 18765
...
...
And there are also the tables actually containing the data. Each entry in tab_keys has a corresponding data table named by prefixing the key with an identifier, in this case, we'll use "dat_". So AB-7 will store all its data in a table called dat_19756. These keys are not known to the user, and are only used for tracking "behind the scenes". The user only knows the AB-7 moniker.
Tableau allows communication with Oracle servers using standard SQL select statements, but because the user doesn't know the key value, they cannot write a SQL statement to query the data.
Tableau recently added the ability for users to query Oracle Table Functions, so I started going down the road of writing a table function to query for the key, and return a table of the results for Tableau to use. The problem is that each dat_ table is basically unique with a different numbers of columns, labels, number of records, and datatypes from the next dat_ table.
What is the right way to handle this problem? Can I:
1) Write a function (which tableau can call inline in regular SQL) to return a bonified table name which is dynamically generated? I tried this:
create or replace FUNCTION TEST_FUNC
(
V_NAME IN VARCHAR2
) RETURN user_tables.table_name%type AS
V_KEY VARCHAR(100);
V_TABLE user_tables.table_name%type;
BEGIN
select KEY into V_KEY from my_schema.tab_keys where NAME = V_NAME;
V_TABLE := dbms_assert.sql_object_name('my_schema.dat_' || V_KEY);
RETURN V_TABLE;
END TEST_FUNC;
and then SELECT * from TABLE(TEST_FUNC('AB-7')); but I get:
ORA-22905: cannot access rows from a non-nested table item
22905. 00000 - "cannot access rows from a non-nested table item"
*Cause: attempt to access rows of an item whose type is not known at
parse time or that is not of a nested table type
*Action: use CAST to cast the item to a nested table type
I couldn't figure out a good way to CAST the table as the table type I needed. Could this be done in the function before returning?
2) Write a table function? Tableau can supposedly query these like tables, but then I run into the problem of dynamically generating types (which I understand isn't easy) but with the added complication of this needing to be used by multiple users simultaneously, so each user would need a data type generated for them each time they connect to a table (if I'm understanding this correctly).
I have to think I'm missing something simple. How do I cast the return of this query as this other table's datatype?
There is no simple way to have a single generic function return a dynamically configurable nested table. With other products you could use a Ref Cursor (which maps to ODBC or JDBC ResultSet object) but my understanding is Tableau does not support that option.
One thing you can do is generate views from your data dictionary. You can use this query to produce a one-off script.
select 'create or replace view "' || name || '" as select * from dat_' || key || ';'
from tab_keys;
The double-quotes are necessary because AB-7 is not a valid object name in Oracle, due to the dash.
This would allow your users to query their data like this:
select * from "AB-7";
Note they would have to use the double-quotes too.
Obviously, any time you inserted a row in tab_keys you'd need to create the required view. That could be done through a trigger.
You can build dynamic SQL in SQL using the open source program Method4:
select * from table(method4.dynamic_query(
q'[
select 'select * from dat_'||key
from tab_keys
where name = 'AB-7'
]'
));
A
-
1
The program combines Oracle Data Cartridge Interface with ANYDATASET to create a function that can return dynamic types.
There might be a way to further simplify the interface but I haven't figured it out yet. These Oracle Data Cartridge Interface functions are very picky and are not easy to repackage.
Here's the sample schema I used:
create table tab_keys(name varchar2(100), key varchar2(100));
insert into tab_keys
select 'AB-7' , '19756' from dual union all
select 'BG-0' , '76519' from dual union all
select 'FY-10', '79513' from dual union all
select 'JB-2' , '18765' from dual;
create table dat_19756 as select 1 a from dual;
create table dat_76519 as select 2 b from dual;
create table dat_79513 as select 3 c from dual;
create table dat_18765 as select 4 d from dual;

Can I use a query parameter in a table name?

I want to do something along the lines of:
SELECT some_things
FROM `myproject.mydataset.mytable_#suffix`
But this doesn't work because the parameter isn't expanded inside the table name.
This does work, using wildcard tables:
SELECT some_things
FROM `myproject.mydataset.mytable_*`
WHERE _TABLE_SUFFIX = #suffix
However, it has some problems:
If I mistype the parameter, this query silently returns zero rows, rather than yelling at me loudly.
Query caching stops working when querying with a wildcard.
If other tables exist with the mytable_ prefix, they must have the same schema, even if they don't match the suffix. Otherwise, weird stuff happens. It seems like BigQuery either computes the union of all columns, or takes the schema of an arbitrary table; it's not documented and I didn't look at it in detail.
Is there a better way to query a single table whose name depends on a query parameter?
Yes, you can, here's a working example:
DECLARE tablename STRING;
DECLARE tableQuery STRING;
##get list of tables
CREATE TEMP TABLE tableNames as select table_name from nomo_nausea.INFORMATION_SCHEMA.TABLES where table_name not in ('_sdc_primary_keys', '_sdc_rejected', 'fba_all_order_report_data');
WHILE (select count(*) from tableNames) >= 1 DO
SET tablename = (select table_name from tableNames LIMIT 1);
##build dataset + table name
SET tableQuery = CONCAT('nomo_nausea.' , tablename);
##use concat to build string and execute
EXECUTE IMMEDIATE CONCAT('SELECT * from `', tableQuery, '` where _sdc_deleted_at is not null');
DELETE FROM tableNames where table_name = tablename;
END WHILE;
In order to answer your stated problems:
Table scanning happens in FROM clause, in WHERE clause happens filtering [1] thus if WHERE condition is not match an empty result would be returned.
"Currently, Cached results are not supported when querying with wildcard" [2].
"BigQuery uses the schema for the most recently created table that matches the wildcard as the schema" [3]. What kind of weird stuff you have faced in your use case? "A wildcard table represents a union of all the tables that match the wildcard expression" [4].
In BigQuery parameterized queries can be run, But table names can not be parameterized [5]. Your wildcard solution seems to be the only way.
You can actually use tables as parameters if you use the Python API, but it's not documented yet. If you pass the tables as parameters through a formatted text string vs. a docstring, your query should work.
SQL example:
sql = "SELECT max(_last_updt) FROM `{0}.{1}.{2}` WHERE _last_updt >= TIMESTAMP(" +
"CURRENT_DATE('-06:00'))".format(project_id, dataset_name, table_name)
SQL in context of Python API:
bigquery_client = bigquery.Client() #setup the client
query_job = bigquery_client.query(sql) #run the query
results = query_job.result() # waits for job to complete
for row in results:
print row

Getting creation-SQL of a table in HP Vertica using a query

I am using HP Vertica db engine. There are some tables created in the database. I have a requirement wherein I need to get the create-table script of a table given the table name by querying over a system-table or a stored-proc or otherwise. Any help in reaching this need is highly appreciated. Thanks.
The easiest way to get the table definition for a table is by using EXPORT_TABLES(). This function allows multiple objects for the scope.
You can script the export statement and execute it inside a script, such as:
SELECT 'SELECT EXPORT_TABLES('''', ''' || table_schema || '.' || table_name || ''');' FROM v_catalog.tables;
Alternatively, you can roll up to the schema level using:
SELECT EXPORT_TABLES('', 'schema');
The difference being that EXPORT_TABLES will not produce definition for any projections associated with the table. If you need the projection with the table definition, use EXPORT_OBJECTS.

Copy many tables in MySQL

I want to copy many tables with similar names but different prefixes. I want the tables with the wp_ prefix to go into their corresponding tables with the shop_ prefix.
In other words, I want to do something like this:
insert into shop_wpsc_*
select * from wp_wpsc_*
How would you do this?
SQL doesn't allow wildcarding table names - the only way to do this is to loop through a list of tables (via the ANSI INFORMATION_SCHEMA/INFORMATION_SCHEMAS) while using dynamic SQL.
Dynamic SQL is different for every database vendor...
Update
MySQL? Why didn't you say so in the first place...
MySQL's dynamic SQL is called "Prepared Statements" - this is my fav link for it besides the documentation. There're numerous questions on SO about operations on all the tables in a MySQL database - just need to tweak the WHERE clause to get the table names you want.
You'll want to do this from within a MySQL stored procedure...
You can do this by combining multiple statements into a single prepared statement -- try doing this:
SELECT #sql_text := GROUP_CONCAT(
CONCAT('insert into shop_wpsc_',
SUBSTRING(table_name, 9),
' select * from ', table_name, ';'), ' ')
FROM INFORMATION_SCHEMA.TABLES
WHERE table_schema = 'example'
AND table_name LIKE 'wp_wpsc_%';
PREPARE stmt FROM #sql_text;
EXECUTE stmt;
Expanding on OMG Ponies' answer a bit, you can use the data dictionary and write a SQL to write the SQL's. For example, in Oracle, you could do something like this:
SELECT 'insert into shop_wpsc_' || SUBSTR(table_name,9) || ' select * from ' || table_name || ';'
FROM all_tables
WHERE table_name LIKE 'WP_SPSC%'
This will generate a series of SQL statements you can run as a single script. Like OMG Ponies' pointed out though, the syntax will vary depending on what DB vendor you are using (e.g. all_tables is Oracle specific).
First I would select all tables from the catalog views (the name of those may depend on your dmbs, though if they are ansi compatible they should support INFORMATION_SCHEMA) that start with wp_wpsc_.
(For instance for DB2:
SELECT NAME FROM TABLES WHERE NAME LIKE 'wp_wpsc_%'
)
Then iterate through that result set, and create a dynamic statement in the form you have given to read from the current table and insert into the corresponding new one.