Encapsulating complex code in BigQuery - google-bigquery

I recently had to generate a BQ table out of other BQ tables. The logic was rather involved and I ended up writing a complex SQL statement.
In Oracle SQL I would have written a PL/SQL procedure with the logic broken down into separate pieces (most often merge statements). In some cases I would encapsulate some code into functions. The resulting procedure would be a sequence of DML statements, easy to read and maintain.
However nothing similar exists for BQ. The UDF's are only temporary and cannot be stored within -say- a view.
Question: I am looking for ways to make my complex BQ SQL code more modular and readable. Is there any way I could accomplish this?

currently available option is to use WITH Clause
The WITH clause contains one or more named subqueries whose output acts as a temporary table which subsequent SELECT statements can reference in any clause or subquer
I would still consider User-Defined Functions as a really good option.
JS and SQL UDF are available in BigQuery and from what is known BigQuery team is working on introducing permanent UDF to be available soon
Meantime you can just store body of JS UDF as a js library and reference it in your UDF using OPTIONS section. see Including external libraries in above reference
October 2019 Update
The ability to use scripting and stored procedures is now in Beta.
So, you can send multiple statements to BigQuery in one request, to use variables, and to use control flow statements such as IF and WHILE, etc.
And, you can use procedure, which is a block of statements that can be called from other queries.
Note: it is Beta yet

BigQuery supports persistent user-defined functions. To get started, see the documentation.
For example, here's a CREATE FUNCTION statement that creates a function to compute the median of an array:
CREATE FUNCTION dataset.median(arr ANY TYPE) AS (
(
SELECT
IF(
MOD(ARRAY_LENGTH(arr), 2) = 0,
(arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2) - 1)] + arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))]) / 2,
arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))]
)
FROM (SELECT ARRAY_AGG(x ORDER BY x) AS arr FROM UNNEST(arr) AS x)
)
);
After executing this statement, you can reference it in a follow-up query:
SELECT dataset.median([7, 1, 2, 10]) AS median;
You can also reference the function inside logical views. Note that you currently need to qualify the reference to the function inside the view using a project, however:
CREATE VIEW dataset.sampleview AS
SELECT x, `project-name`.dataset.median(array_column) AS median
FROM `project-name`.dataset.table

Related

Adding today date in Table name when using Create Table function in standard sql GBQ

I am quite new to GBQ and any help is appreciated it.
I have a query below:
#Standard SQL
create or replace table `xxx.xxx.applications`
as select * from `yyy.yyy.applications`
What I need to do is to add today's date at the end of the table name so it is something like xxx.xxx.applications_<todays date>
basically create a filename with Application but add date at the end of the name applications.
I am writing a procedure to create a table every time it runs but need to add the date for audit purposes every time I create the table (as a backup).
I searched everywhere and can't get the exact answer, is this possible in Query Editor as I need to store this as a Proc.
Thanks in advance
BigQuery doesn't support dynamic SQL at the moment which means that this kind of construction is not possible.
Currently BigQuery supports Parameterized Queries but its not possible to use parameters to dynamically change the source table's name as you can see in the provided link.
BigQuery supports query parameters to help prevent SQL injection when
queries are constructed using user input. This feature is only
available with standard SQL syntax. Query parameters can be used as
substitutes for arbitrary expressions. Parameters cannot be used as
substitutes for identifiers, column names, table names, or other parts
of the query.
If you need to build a query based on some variable's value, I suggest that you use some script in SHELL, Python or any other programming language to create the SQL statement and then execute it using the bq command.
Another approach could be using the BigQuery client library in some of the supported languages instead of the bq command.

How do I generate a table name that contains today's date?

It may seem a little strange, but there are already tables with names for each date.
In my project, I have tables for each date to make statistics easier to handle.
Of course, I don't think this is always the best way, but this is the table structure for my project.
(It's a common technique in Google BigQuery and Amazon Athena. This question is about Google BigQuery)
So to get the data, I want to generate today's date. If I use TODAY, I can get the data of the latest day without rewriting the code even if it is the next day.
I tried, but the code didn't work.
Not work 1:
CONCAT in FROM
SELECT
*
FROM
CONCAT('foo_', FORMAT_TIMESTAMP('%Y%m%d', CURRENT_TIMESTAMP(), 'Asia/Tokyo'))
Error:
Table-valued function not found: CONCAT at [4:3]
Not work 2:
create temporary function:
create temporary function getTableName() as (CONCAT('foo_', FORMAT_TIMESTAMP('%Y%m%d', CURRENT_TIMESTAMP(), 'Asia/Tokyo')));
Error:
CREATE TEMPORARY FUNCTION statements must be followed by an actual query.
Question
How do I generate a table name that contains TODAY's date?
In this case, I would recommend you to use Wild tables in BigQuery, which allows you to use some features in Standard SQL.
With Wild Tables you can use _TABLE_SUFFIX, it grants you the ability to filter/scan tables containing this parameter. The syntax would be as follows:
SELECT *
FROM `test-proj-261014.sample.test_*`
where _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', CURRENT_DATE)
I hope it helps.
Your first query should go like this:
select CONCAT('foo_', FORMAT_TIMESTAMP('%Y%m%d', CURRENT_TIMESTAMP(), 'Asia/Tokyo'))
For creating temporary function, use the below code:
create temp function getTableName() as
((select CONCAT('foo_', FORMAT_TIMESTAMP('%Y%m%d', CURRENT_TIMESTAMP(), 'Asia/Tokyo'))
));
select getTableName()
The error "CREATE TEMPORARY FUNCTION statements must be followed by an actual query." is because once the temporary functions are defined then you have to use the actual query to use that function and then the validity of function dies out. To define persistent UDFs and use them in multiple queries please go through the link to define permanent functions.You can reuse persistent UDFs across multiple queries, whereas you can only use temporary UDFs in a single query.

Using a Teradata UDF in SAS Implicit Sql Pass Thru

I am trying to use a Teradata UDF (User Defined Function) in a SAS Implicit SQL which establishes the connection to Teradata using LIBNAME Statement.Assume that the function is called PTY_DECRYPT and is defined in a Database called TEST in Teradata. The Purpose of this function is to decrypt values in a Column of a View in Teradata.
What works is using the UDF in an Explicit Sql .Below I am using the function on a column called SSN_NBR in a view called V_TEST_PERS present in the Database called SAMPLE.
Explcit Sql:
Options debug=DBMS_TIMERS sastrace=',,,d'
sastraceloc=saslog no$stsuffix fullstimer;
Proc Sql;
Connect to TERADATA(User=XXXXX pwd=XXXXX server=XXXXX);
Create Table Final as
select * from connection to teradata
(
Select
sub_id,
SSN_NBR,
TEST.PTY_DECRYPT(SSN_NBR,'T_ssn_test',400,0,0 ) as SSN_NBR_Decrypt
from SAMPLE.V_TEST_PERS
);
disconnect from teradata;
Quit;
But I would like to use the same function in an Implicit SQL but it does not work. Any ideas as to how to make it work in an Implicit Sql with minimum changes to the Implicit SQL?
Implicit Sql
Options debug=DBMS_TIMERS sastrace=',,,d'
sastraceloc=saslog no$stsuffix fullstimer;
Libname Td Teradata User=XXXXX pwd=XXXXX server=XXXXX database=SAMPLE ;
Proc sql;
Create table Final as
select
sub_id,
SSN_NBR,
TEST.PTY_DECRYPT(SSN_NBR,'T_ssn_test',400,0,0 ) as SSN_NBR_Decrypt
from Td.V_TEST_PERS;
Quit;
In your implicit SQL you reference the view with the LIBNAME alias TD, however when you reference the UDF you are not aliasing the TEST database containing the UDF with the LIBNAME alias. Syntactically, you may not be able to do that in SAS. (e.g. TD.TEST.PTY_DECRYPT() - in fact I wouldn't expect this to work)
The UDF may need to be placed in SYSLIB or TD_SYSFNLIB so that it is in a default search path for the database optimizer to find the UDF without being fully qualified. (e.g. TD_WEEK_BEGIN()) Alternatively, the UDF could be placed in database SAMPLE but that likely violates how UDFs are maintained in your environment, as it would in my environment.
Otherwise, the UDF call could be embedded in a view on the database, but then you have other issues to consider with the security of that column if your environment is not granting security on a column level basis to views containing encrypted data elements. (e.g. PHI, PII, etc.) Without a row-column level security mechanism in place to dynamically filter a users ability to see the column you are decrypting in the view putting the UDF into the view isn't going to work.
I asked the same question the SAS Communities Forum and I am glad to say that i did find a Solution to this Problem.
Please see the link below :
https://communities.sas.com/t5/Base-SAS-Programming/Using-a-Teradata-UDF-in-SAS-Implicit-Sql-Pass-Thru/m-p/266850/highlight/false#M52685

Calling a stored function (that returns an array of a user-defined type) in oracle across a database link

Normally, I call my function like so:
SELECT *
FROM TABLE(
package_name.function(parameters)
)
I'm trying to call this function across a database link. My intuition is that the following is the correct syntax, but I haven't gotten it to work:
SELECT *
FROM TABLE(
package_name.function#DBLINK(parameters)
)
> ORA-00904: "PACKAGE_NAME"."FUNCTION": invalid identifier
I've tried moving around the database link to no effect. I've tried putting it after the parameter list, after the last parenthesis, after the package name...I've also tried all of the above permutations including the schema name before the package name. I'm running out of ideas.
This is oracle 10g. I'm suspicious that the issue may be that the return type of the function is not defined in the schema in which I'm calling it, but I feel like I should be getting a different error if that were the case.
Thanks for your help!
What you're trying is the correct syntax as far as I know, but in any case it would not work due to the return type being user-defined, as you suspect.
Here's an example with a built-in pipelined function. Calling it locally works, of course:
SELECT * FROM TABLE(dbms_xplan.display_cursor('a',1,'ALL'));
Returns:
SQL_ID: a, child number: 1 cannot be found
Calling it over a database link:
SELECT * FROM TABLE(dbms_xplan.display_cursor#core('a',1,'ALL'));
fails with this error:
ORA-30626: function/procedure parameters of remote object types are not supported
Possibly you are getting the ORA-904 because the link goes to a specific schema that does not have access to the package. But in any case, this won't work, even if you define an identical type with the same name in your local schema, because they're still not the same type from Oracle's point of view.
You can of course query a view remotely, so if there is a well-defined set of possible parameters, you could create one view for each parameter combination and then query that, e.g.:
CREATE VIEW display_cursor_a_1_all AS
SELECT * FROM TABLE(dbms_xplan.display_cursor('a',1,'ALL'))
;
If the range of possible parameter values is too large, you could create a procedure that creates the needed view dynamically given any set of parameters. Then you have a two-step process every time you want to execute the query:
EXECUTE package.create_view#remote(parameters)
SELECT * FROM created_view#remote;
You have to then think about whether multiple sessions might call this in parallel and if so how to prevent them from stepping on each other.

How do you port a SqlServer database to MySQL?

I have a SqlServer db that I would like to port to MySQL. What's the best way to to this. Things that need to be ported are:
Tables (and data)
FileStream → MySQL equivalent?
Stored Procedures
Functions
Data types are relatively similar.
There is no equivalent to FileStream in MySQL - the files must either be stored as BLOBs, or on the file system while the path is stored in the database.
Migrating away from TSQL means:
There's no WITH clause in MySQL - it will have to converted into a derived table/inline view
There's no TOP syntax - these have to be converted to use LIMIT
There's no ranking/analytic functionality in MySQL - can't use ROW_NUMBER, RANK, DENSE_RANK or NTILE. See this article for alternatives.
MySQL views have notoriously limited functionality:
The SELECT statement cannot contain a subquery in the FROM clause.
The SELECT statement cannot refer to system or user variables.
Within a stored program, the definition cannot refer to program parameters or local variables.
The SELECT statement cannot refer to prepared statement parameters.
Any table or view referred to in the definition must exist. However, after a view has been created, it is possible to drop a table or view that the definition refers to. In this case, use of the view results in an error. To check a view definition for problems of this kind, use the CHECK TABLE statement.
The definition cannot refer to a TEMPORARY table, and you cannot create a TEMPORARY view.
Any tables named in the view definition must exist at definition time.
You cannot associate a trigger with a view.
As of MySQL 5.0.52, aliases for column names in the SELECT statement are checked against the maximum column length of 64 characters (not the maximum alias length of 256 characters).
Dynamic SQL will have to be converted to use MySQL's Prepared Statement syntax
A guide/article with some useful tips is available on the official MySQL dev site.
This is not for the faint of heart. Here is an article that explains what you are in for:
http://searchenterpriselinux.techtarget.com/news/column/0,294698,sid39_gci1187176,00.html