Is it possible to convert a Table-Valued functions in a stored procedure to Spark SQL to run in Azure Databricks? - apache-spark-sql

I am not able to convert a below Stored Procedure code with Table-Valued functions in OUTER APPLY into Spark SQL code.
SELECT T1.col1, T2.col2, fn.col1
FROM table1 T1
JOIN table2 T2 ON col1= T2.col1
OUTER APPLY FN_Formula(T1.col,T2.col1) fn

SparkSQL does not have table-valued user defined functions in the same way as SQL Server does, but it does support system table-valued functions (like range, explode, stack etc), user-defined functions which could be exploded, or just plain rewrites of the same set-logic, eg with a Common Table Expression (CTE) or subquery. Higher order functions are also a powerful way to deal with arrays and complex data types. The best approach kind of depends what you are doing. Perhaps you can share some about what the function is doing?

Related

starts_with in presto?

I am new to writing sql queries in presto and was looking for a function similar to 'starts_with'.
If a string starts with a given substring then the query needs to return that record.
In Postgresql, I am currently doing select * from tableA where name~'^Joh'. Whats the equivalent of this in Presto?
PostgreSQL and presto are RDBMS based on SQL. It is weird to see that you've learned a PostgreSQL proprietary add on (regular expressions) to the language before learning the standard SQL functions. In SQL you use LIKE for pattern matches:
select * from tableA where name like 'Joh%';
You can use Like in SQL. You can go through this link https://www.w3schools.com/sql/sql_like.asp. Using like you can search for a specified pattern.
In presto you can use regexp_like() which runs little faster than other like operators.For your case try below query which should provide you with expected functionality.
select regexp_like('John', '^John')

Encapsulating complex code in BigQuery

I recently had to generate a BQ table out of other BQ tables. The logic was rather involved and I ended up writing a complex SQL statement.
In Oracle SQL I would have written a PL/SQL procedure with the logic broken down into separate pieces (most often merge statements). In some cases I would encapsulate some code into functions. The resulting procedure would be a sequence of DML statements, easy to read and maintain.
However nothing similar exists for BQ. The UDF's are only temporary and cannot be stored within -say- a view.
Question: I am looking for ways to make my complex BQ SQL code more modular and readable. Is there any way I could accomplish this?
currently available option is to use WITH Clause
The WITH clause contains one or more named subqueries whose output acts as a temporary table which subsequent SELECT statements can reference in any clause or subquer
I would still consider User-Defined Functions as a really good option.
JS and SQL UDF are available in BigQuery and from what is known BigQuery team is working on introducing permanent UDF to be available soon
Meantime you can just store body of JS UDF as a js library and reference it in your UDF using OPTIONS section. see Including external libraries in above reference
October 2019 Update
The ability to use scripting and stored procedures is now in Beta.
So, you can send multiple statements to BigQuery in one request, to use variables, and to use control flow statements such as IF and WHILE, etc.
And, you can use procedure, which is a block of statements that can be called from other queries.
Note: it is Beta yet
BigQuery supports persistent user-defined functions. To get started, see the documentation.
For example, here's a CREATE FUNCTION statement that creates a function to compute the median of an array:
CREATE FUNCTION dataset.median(arr ANY TYPE) AS (
(
SELECT
IF(
MOD(ARRAY_LENGTH(arr), 2) = 0,
(arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2) - 1)] + arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))]) / 2,
arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))]
)
FROM (SELECT ARRAY_AGG(x ORDER BY x) AS arr FROM UNNEST(arr) AS x)
)
);
After executing this statement, you can reference it in a follow-up query:
SELECT dataset.median([7, 1, 2, 10]) AS median;
You can also reference the function inside logical views. Note that you currently need to qualify the reference to the function inside the view using a project, however:
CREATE VIEW dataset.sampleview AS
SELECT x, `project-name`.dataset.median(array_column) AS median
FROM `project-name`.dataset.table

Do SQL User-Defined Functions support SELECT clauses?

The documentation for SQL-based UDFs is sparse. I'm wondering if it's possible to write a full-fledged SELECT clause, using the UDF parameters in the query. So in effect, each invocation of the UDF would result in a subquery.
Contrived example:
CREATE TEMP FUNCTION foo(bar STRING) AS (
SELECT * FROM `example.latest` WHERE thing = bar
);
SELECT foo('abc')
BigQuery gives the error "Syntax error: Unexpected keyword SELECT; failed to parse CREATE [TEMP] FUNCTION statement" so I assume it's not possible, but would love to get confirmation.
SELECT is supported in general but unfortunatelly you cannot reference table(s) in UDF!
See UDF Limitations for more

Using a Teradata UDF in SAS Implicit Sql Pass Thru

I am trying to use a Teradata UDF (User Defined Function) in a SAS Implicit SQL which establishes the connection to Teradata using LIBNAME Statement.Assume that the function is called PTY_DECRYPT and is defined in a Database called TEST in Teradata. The Purpose of this function is to decrypt values in a Column of a View in Teradata.
What works is using the UDF in an Explicit Sql .Below I am using the function on a column called SSN_NBR in a view called V_TEST_PERS present in the Database called SAMPLE.
Explcit Sql:
Options debug=DBMS_TIMERS sastrace=',,,d'
sastraceloc=saslog no$stsuffix fullstimer;
Proc Sql;
Connect to TERADATA(User=XXXXX pwd=XXXXX server=XXXXX);
Create Table Final as
select * from connection to teradata
(
Select
sub_id,
SSN_NBR,
TEST.PTY_DECRYPT(SSN_NBR,'T_ssn_test',400,0,0 ) as SSN_NBR_Decrypt
from SAMPLE.V_TEST_PERS
);
disconnect from teradata;
Quit;
But I would like to use the same function in an Implicit SQL but it does not work. Any ideas as to how to make it work in an Implicit Sql with minimum changes to the Implicit SQL?
Implicit Sql
Options debug=DBMS_TIMERS sastrace=',,,d'
sastraceloc=saslog no$stsuffix fullstimer;
Libname Td Teradata User=XXXXX pwd=XXXXX server=XXXXX database=SAMPLE ;
Proc sql;
Create table Final as
select
sub_id,
SSN_NBR,
TEST.PTY_DECRYPT(SSN_NBR,'T_ssn_test',400,0,0 ) as SSN_NBR_Decrypt
from Td.V_TEST_PERS;
Quit;
In your implicit SQL you reference the view with the LIBNAME alias TD, however when you reference the UDF you are not aliasing the TEST database containing the UDF with the LIBNAME alias. Syntactically, you may not be able to do that in SAS. (e.g. TD.TEST.PTY_DECRYPT() - in fact I wouldn't expect this to work)
The UDF may need to be placed in SYSLIB or TD_SYSFNLIB so that it is in a default search path for the database optimizer to find the UDF without being fully qualified. (e.g. TD_WEEK_BEGIN()) Alternatively, the UDF could be placed in database SAMPLE but that likely violates how UDFs are maintained in your environment, as it would in my environment.
Otherwise, the UDF call could be embedded in a view on the database, but then you have other issues to consider with the security of that column if your environment is not granting security on a column level basis to views containing encrypted data elements. (e.g. PHI, PII, etc.) Without a row-column level security mechanism in place to dynamically filter a users ability to see the column you are decrypting in the view putting the UDF into the view isn't going to work.
I asked the same question the SAS Communities Forum and I am glad to say that i did find a Solution to this Problem.
Please see the link below :
https://communities.sas.com/t5/Base-SAS-Programming/Using-a-Teradata-UDF-in-SAS-Implicit-Sql-Pass-Thru/m-p/266850/highlight/false#M52685

SQL join using USING: <column name> is not a recognized table hints option

I have the following JOIN:
SELECT * FROM tableA INNER JOIN tableB USING (commonColumn)
I get an error:
"commonColumn" is not a recognized table hints option. If it is
intended as a parameter to a table-valued function or to the
CHANGETABLE function, ensure that your database compatibility mode is
set to 90.
The following instead works:
SELECT * FROM tableA INNER JOIN tableB ON tableA.commonColumn = tableB.commonColumn
The compatibility level in my case is set to 100 (SQL Server 2008), while, by the way, I am working with SQL Server 2012.
What am I doing wrong? I find it very difficult to find example of the use of the keyword USING, as it is almost impossible to do a relevant web search. Yet, it seems the right thing to use when the "joining columns" have the same name...
USING is not supported SQL Server syntax. It's not a reserved keyword, either, so the query engine is using that as a table alias.
It is an ODBC keyword, but those are handled somewhat differently. The engine won't always complain if you use them, but you're not supposed to use them anyways.
It is also listed as a possible future reserved keyword. It's common for new editions of SQL Server to add words to the core reserved list.
Personally, I don't see them adding NATURAL JOIN syntax support, even with USING. A lot of DBAs consider NATURAL JOINs problematic.
The USING keyword is used to specify the source data for MERGE statements (called <table source>) in the documentation.