Loop in BigQuery (SQL - GOOGLE CLOUD) - sql

I would like to know how to perform a loop in bigquery to create a table changing only its name and the where clause.
Basically as an example:
I would like, for example, to create the table three times according to vector_a, that is, we would have a table with the name 01,02,03 and filtering from vector_b that would also change to create the table with std1 at the beginning and then std2 and std3. Being these variables inside the array in string format.

See https://cloud.google.com/bigquery/docs/reference/standard-sql/procedural-language#for-in
You can use something like
DECLARE vector_a ARRAY<STRING>;
SET vector_a = ['_01', '_02', '_03'];
FOR loop_variable_name IN (SELECT * FROM UNNEST(vector_a))
DO
-- use loop_variable_name here;
END FOR;

Related

BigQuery CREATE TABLE with dynamic table name [duplicate]

I am trying to write a BigQuery script that I can store as a procedure, I would like one of the arguments I pass to be used in the table name that is written out by the script, for example:
DECLARE id STRING;
SET id = '123';
CREATE OR REPLACE TABLE test.id AS(
SELECT * FROM dataset.table
)
However, in this example the table is created with the name id rather than the value of the "id" variable, 123. Is there any way I can dynamically create a table using the value of a declared variable in the BigQuery UI?
Why not just use Execute Immediate with concat if you know the table schema?
EXECUTE IMMEDIATE CONCAT('CREATE TABLE `', id, '` (column_name STRING)');
So far we have officially announced BigQuery scripting document, which is still in Beta phase, leveraging usage of dynamic parameters (variables) as a placeholders for values in SQL queries . However, according to Parameterized queries in BigQuery documentation, query parameters can't be used for SQL object identifiers:
Parameters cannot be used as substitutes for identifiers, column
names, table names, or other parts of the query.
Maybe you can use a wildcard table. You would create a wildcard table with all subtables you want to query and use the WHERE clause to select any subtable you want. Just be careful, the tables must have the same schema.

Create table using a variable name in the BigQuery UI

I am trying to write a BigQuery script that I can store as a procedure, I would like one of the arguments I pass to be used in the table name that is written out by the script, for example:
DECLARE id STRING;
SET id = '123';
CREATE OR REPLACE TABLE test.id AS(
SELECT * FROM dataset.table
)
However, in this example the table is created with the name id rather than the value of the "id" variable, 123. Is there any way I can dynamically create a table using the value of a declared variable in the BigQuery UI?
Why not just use Execute Immediate with concat if you know the table schema?
EXECUTE IMMEDIATE CONCAT('CREATE TABLE `', id, '` (column_name STRING)');
So far we have officially announced BigQuery scripting document, which is still in Beta phase, leveraging usage of dynamic parameters (variables) as a placeholders for values in SQL queries . However, according to Parameterized queries in BigQuery documentation, query parameters can't be used for SQL object identifiers:
Parameters cannot be used as substitutes for identifiers, column
names, table names, or other parts of the query.
Maybe you can use a wildcard table. You would create a wildcard table with all subtables you want to query and use the WHERE clause to select any subtable you want. Just be careful, the tables must have the same schema.

Can you concatenate two strings to create dynamic column names in PostgreSQL?

I have a form where people can type in a start and end date, as well as a column name prefix.
In the backend, I want to do something along the lines of
SELECT *, CAST('{{startDate}}' AS TIMESTAMP) AS ({{prefix}} + '_startDate')
Is this possible? Basically, I want to dynamically create the name of the new column. The table is immediately returned to the user, so I don't want to mutate the underlying table itself. Thanks!
You can execute dynamic query that you have prepared by using EXECUTE keyword, otherwise it is not possible to have dynamic structure of SQL.
Since you are preparing your SQL outside database, you can use something like:
SELECT *, CAST('{{startDate}}' AS TIMESTAMP) AS {{prefix}}_startDate
Assuming that {{prefix}} is replaced with some string by your template before it is sent to database.

Checking for a string within a string, and not in another string using SQL

I am trying to build a short SQL script that will check if #NewProductAdded is somewhere in #NewTotalProducts. And also if #NewProductAdded is NOT in #OldTotalProducts. Please have a look at the setup below (The real data is in tables, not variables, but a simple example is all I need):
declare #NewProductAdded as varchar(max)
declare #NewTotalProducts as varchar(max)
declare #OldTotalProducts as varchar(max)
set #NewProductAdded ='ProductB'
set #NewTotalProducts = 'ProductAProductBProductC'
set #OldTotalProducts = 'ProductAProductC'
SELECT CustomerID FROM Products WHERE NewProductAdded ...
I want to make sure that 'ProductB' is contained somewhere within #NewTotalProducts, and is NOT contained anywhere within #OldTotalProducts. Product names vary vastly with thousands of combinations, and there is no way to really separate them from each other in a string. I am sure there is a simple solution or function for this, I just don't know it yet.
The specific answer to your question is like (or charindex() if you are using SQL Server or Sybase):
where #NewTotalProducts like '%'+#NewProductAdded+'%' and
#OldTotalProducts not like '%'+#NewProductAdded+'%'
First comment. If you have to use lists stored in strings, at least use delimiters:
where ','+#NewTotalProducts+',' like '%,'+#NewProductAdded+',%' and
','+#OldTotalProducts+',' not like '%,'+#NewProductAdded+',%'
Second comment. Don't store lists in strings. Instead, use a temporary tables or table variable:
declare #NewTotalProducts table (name varchar(255));
insert into #NewTotalProducts(name)
select 'ProductA' union all
select 'ProductB' . . .
Note: throughout this answer I have used SQL Server syntax. The code appears to be SQL Server.

How to reuse a large query without repeating it?

If I have two queries, which I will call horrible_query_1 and ugly_query_2, and I want to perform the following two minus operations on them:
(horrible_query_1) minus (ugly_query_2)
(ugly_query_2) minus (horrible_query_1)
Or maybe I have a terribly_large_and_useful_query, and the result set it produces I want to use as part of several future queries.
How can I avoid copying and pasting the same queries in multiple places? How can I "not repeat myself," and follow DRY principles. Is this possible in SQL?
I'm using Oracle SQL. Portable SQL solutions are preferable, but if I have to use an Oracle specific feature (including PL/SQL) that's OK.
create view horrible_query_1_VIEW as
select .. ...
from .. .. ..
create view ugly_query_2_VIEW as
select .. ...
from .. .. ..
Then
(horrible_query_1_VIEW) minus (ugly_query_2_VIEW)
(ugly_query_2_VIEW) minus (horrible_query_1_VIEW)
Or, maybe, with a with clause:
with horrible_query_1 as (
select .. .. ..
from .. .. ..
) ,
ugly_query_2 as (
select .. .. ..
.. .. ..
)
(select * from horrible_query_1 minus select * from ugly_query_2 ) union all
(select * from ugly_query_2 minus select * from horrible_query_1)
If you want to reuse the SQL text of the queries, then defining views is the best way, as described earlier.
If you want to reuse the result of the queries, then you should consider global temporary tables. These temporary tables store data for the duration of session or transaction (whichever you choose). These are really useful in case you need to reuse calculated data many times over, especially if your queries are indeed "ugly" and "horrible" (meaning long running). See Temporary tables for more information.
If you need to keep the data longer than a session, you can consider materialized views.
Since you're using Oracle, I'd create Pipelined TABLE functions.
The function takes parameters and returns an object (which you have to create)
and then you SELECT * or even specific columns from it using the TABLE() function and can use it with a WHERE clause or with JOINs. If you want a unit of reuse (a function) you're not restricted to just returning values (i.e a scalar function) you can write a function that returns rows or recordsets.
something like this:
FUNCTION RETURN_MY_ROWS(Param1 IN type...ParamX IN Type)
RETURN PARENT_OBJECT PIPELINED
IS
local_curs cursor_alias; --you need a cursor alias if this function is in a Package
out_rec ROW_RECORD_OF_CUSTOM_OBJECT:=ROW_RECORD_OF_CUSTOM_OBJECT(NULL, NULL,NULL) --one NULL for each field in the record sub-object
BEGIN
OPEN local_curs FOR
--the SELECT query that you're trying to encapsulate goes here
-- and it can be very detailed/complex and even have WITH () etc..
SELECT * FROM baseTable WHERE col1 = x;
-- now that you have captured the SELECT into a Cursor
-- here you put a LOOP to take what's in the cursor and put it in the
-- child object (that holds the individual records)
LOOP
FETCH local_curs --opening the ref-cursor
INTO out_rec.COL1,
out_rec.COL2,
out_rec.COL3;
EXIT WHEN local_curs%NOTFOUND;
PIPE ROW(out_rec); --piping out the Object
END LOOP;
CLOSE local_curs; -- always do this
RETURN; -- we're now done
END RETURN_MY_ROWS;
after you've done that, you can use it like so
SELECT * FROM TABLE(RETURN_MY_ROWS(val1, val2));
you can INSERT SELECT or even CREATE TABLE out of it , you can have it in joins.
two more things to mention:
--ROW_RECORD_OF_CUSTOM_OBJECT is something along these lines
CREATE or REPLACE TYPE ROW_RECORD_OF_CUSTOM_OBJECT AS OBJECT
(
col1 type;
col2 type;
...
colx type;
);
and PARENT_OBJECT is a table of the other object (with the field definitions) we just made
create or replace TYPE PARENT_OBJECT IS TABLE OF ROW_RECORD_OF_CUSTOM_OBJECT;
so this function needs two OBJECTs to support it, but one is a record, the other is a table of that record (you have to create the record first).
In a nutshell, the function is easy to write, you need a child object (with fields), and a parent object that will house that child object that is of type TABLE of the child object, and you open the original base-table fetching SQL into a SYS_REFCURSOR (which you may need to alias) if you're in a package and you read from that cursor from a loop into the individual records.
The function returns a type of PARENT_OBJECT but inside it packs the records sub-object with values from the cursor.
I hope this works for you (there may be permissioning issues with your DBA if you want to create OBJECTs and Table functions)*/
If you operate with values, you could write functions.
Here you find infos on how to do it. It basically works like writing a function in any language. You can define parameters and return values.
Which gives you the cool possibility to write code just once. Here is how you do it:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_5009.htm
Have you tried using RESULT_CACHE hint in your queries? Also, you could
ALTER SESSION SET RESULT_CACHE_MODE=FORCE
and see if it helps.