how to use SQL user defined function in snowflake? - sql

I am just studying how to use SQL in snowflake. Here is a snapshot:
And this is the code used in here:
use schema SNOWFLAKE_SAMPLE_DATA.TPCH_SF1;
--use schema SNOWFLAKE_SAMPLE_DATA.TPCH_SF10;
select *
from LINEITEM
limit 200
You can see the table includes two feilds: L_LINENUMBER, L_QUANTITY. Now I want to try a user defined function, which can do:
use L_LINENUMBER, L_QUANTITY as two parameters transferred into the function,
calculate L_LINENUMBER1=L_LINENUMBER+1, and L_QUANTITY1=mean(L_QUANTITY).
join the two new fields (L_LINENUMBER1, L_QUANTITY1) to the original table (LINEITEM)
how to use create function to do this. I have read a lot of examples regarding create function. But I just cannot get the point. Maybe because I am not good at SQL. So, could anyone give me a comprehensive example with all the details?

I understand that you question is about UDFs, but using UDFs for your purpose here is overkill.
You can increment an attribute in a table using the following statement.
SELECT
L_LINENUMBER+1 as L_LINENUMBER1
FROM LINEITEM;
To calculate the mean of an attribute in a table, you should understand that this is an aggregate function which only makes sense when used in conjunction with a group by statement. An example with your data is shown below.
SELECT
AVG(L_QUANTITY) AS L_QUANTITY1
FROM LINEITEM
GROUP BY L_ORDERKEY;
Since your question was originally on UDFs and you seem to be following with Snowflake's sample data, the example that they provide is the following UDF which accepts a temperature in Kelvin and converts it to Fahrenheit (from the definition you can see that it can be applied to any attribute of the number type).
CREATE OR REPLACE FUNCTION
UTIL_DB.PUBLIC.convert_fahrenheit( t NUMBER)
RETURNS NUMBER
COMMENT='Convert from Kelvin from Fahrenheit'
AS '(t - 273.15) * 1.8000 + 32.00';

Related

Creating a UDF in BigQuery

I would like to create a UDF named maxDate in BigQuery that does the following:
maxDate('table_name') returns the result from running the query below:
select max(table_id) from fact.___TABLES____ where table_id < 'table_name';
I'm quite new to JS and not too sure how to start. This looks like a simple thing to write. Could anyone point me in the right way? I've read the documentation, and unsure of how to write this.
Scalar UDF are not existent yet in BigQuery
See more about BigQuery User-Defined Functions to understand what are they today.
To simplify - think of today's UDF as virtual table that you can query and this table in turn powered by real table where each row is processed row-by-row and javascript code is applied for each row and generates (instead of this input row) zero, one or many (depends of inplemented in js logic) rows)

How to define functions for any column (scalar UDF) on Google BigQuery

Let's say I need to define a function with a behavior like UPPER(string), we can call it FIRSTCHAR(string) that gets the first character of a string.
So I would like to make SQL like:
SELECT FIRSTCHAR(middle_name) AS middle_name_first_char,
FIRSTCHAR(last_name) AS last_name_first_char FROM clients
Reading BigQuery UDF documentation is not clear how to make such functions that works over string, across any table or column. It looks like to define a function with bigquery.defineFunction() it needs an Input column names argument.
Per what I know, scalar type UDF are not available yet in BigQuery. Current UDF are only table wise. So you supply table to UDF and UDF is processing it row-by-row outputting 0, 1 or many rows (depends on your implemented function) for each input row.
I remember one of Google Team member mentioned - they work on making scalar UDF available at some point
I assume your simplified example in question is just example to demonstrate point of your question, so I am not providing actual solution for this example (which is super simple use of string function(s))
2016-08-11 UPDATE
Scalar UDF are supported now for BigQuery Standard SQL
See examples below
JS UDF
CREATE TEMPORARY FUNCTION FIRSTCHAR(word STRING)
RETURNS STRING
LANGUAGE js
AS "return word.substring(0, 1);";
SELECT
FIRSTCHAR(middle_name) AS middle_name_first_char,
FIRSTCHAR(last_name) AS last_name_first_char
FROM clients
SQL UDF
CREATE TEMPORARY FUNCTION FIRSTCHAR(word STRING)
RETURNS STRING
AS (SUBSTR(word, 0, 1));
SELECT
FIRSTCHAR(middle_name) AS middle_name_first_char,
FIRSTCHAR(last_name) AS last_name_first_char
FROM clients

How can i use the new UDF functionality to create "Dynamic SQL statement"?

How can i use the new UDF functionality to create "Dynamic SQL statement"?
Is there a way to use UDF in order to construct SQL statement based on template and input variables, and later run this query?
The documentation https://cloud.google.com/bigquery/user-defined-functions?hl=en says:
A UDF is similar to the "Map" function in a MapReduce: it takes a
single row as input and produces zero or more rows as output. The
output can potentially have a different schema than the input.
So your UDF receives just a single row.
Therefore - no, UDF is not for the purpose you described in your question.
You might take a look at views - maybe that will suit you better:
https://cloud.google.com/bigquery/querying-data#views

How do I write an SQL function that takes an unknown amount of numbers as parameters?

I am trying to write an Oracle SQL function that takes a list of numbers as arguments and return a pipelined list of table rows. My main problem is the quantity of numbers that can be passed is never certain with no real upper limit. I'll try and demonstrate what I mean:
Say I have a table defined as so:
create table items (
id number primary key,
class number,
data string
);
I want to return all rows that match one of a list of class numbers that I submit. The function I'm shooting at looks a little like this:
function get_list_items_from_class([unknown number of parameters]
in items.class%type)
return tbl_list_item pipelined; -- I have types defined to handle the return values
I've been looking at ways to handle defining a function that can take an undefined amount of integers and so far the most promising search has taken me to this page which explains about using collections and records. I don't think a VARRAY is what I'm looking for as the size has to be predefined. As Associative Array may be what I'm looking for, but before I spend a lot of time trying things out, I want to make sure the tool is fit for the job. I'm pretty inexperienced with Oracle SQL right now and I'm working on a time sensitive project.
Any help that you could offer would be appreciated. I realise that there are simpler ways to achieve what I'm trying to do in this example (simply multiple calls to a function that takes one parameter is one) but this example is simplified. Other parts of the project I'm working on require me to seek a solution using this multiple parameter method.
EDIT: That being said, I would welcome other design suggestions if I'm way off base with what I'm trying to attempt. It would be a learning experience if nothing else.
Many thanks in advance for your time.
EDIT: I will be accessing the database from proprietary client software written in Java.
You could use a table parameter as I linked in the comments or you could pass in a comma separated list of values parse it to a table and join to that.
something like this (with input_lst as a string):
select *
from tbl_list_item
where tbl_list_item.class in
(
select regexp_substr(input_lst,'[^,]+', 1, level) from dual
connect by regexp_substr(input_lst, '[^,]+', 1, level) is not null
);
adapted from https://blogs.oracle.com/aramamoo/entry/how_to_split_comma_separated_string_and_pass_to_in_clause_of_select_statement
Which choice is better depends on your expected number of entries and what is easier for your client side. I think with a small number (2-20) the comma separated is a fine choice. With a large number you probably want to pass a table.
A colleague actually suggested another way to achieve this and I think it is worth sharing. Basically, define a table type that can contain the arguments, then pass an array from the Java program that can be read from this table.
In my example, firstly define a table type of number:
create or replace type tbl_number as table of number;
Then, in the SQL package, define the function as:
function get_list_items_from_class(i_numbers in tbl_number)
return tbl_list_item pipelined;
The function in the package body has one major change (apart from the definition obviously). Use the following select statement:
select *
from tbl_list_item
where tbl_list_item.class in
(
select * from table(i_numbers)
);
This will select all the relevant items that match one of the integers that were passed to the "i_numbers" table. I like this way as it means less string parsing, both in the Java application and the SQL pacakage.
Here's how I passed the number arguments from the Java application using an ARRAY object.
ArrayDescriptor arrayDesc = ArrayDescriptor.createDescriptor("NUMBERS", con); //con is the database connection
ARRAY array = new ARRAY(arrayDesc, con, numberList.toArray()); // numberList is an ArrayList of integers which holds the arguments
array is then passed to the SQL function.

How to pass an entire row (in SQL, not PL/SQL) to a stored function?

I am having the following (pretty simple) problem. I would like to write an (Oracle) SQL query, roughly like the following:
SELECT count(*), MyFunc(MyTable.*)
FROM MyTable
GROUP BY MyFunc(MyTable.*)
Within PL/SQL, one can use a RECORD type (and/or %ROWTYPE), but to my knowledge, these tools are not available within SQL. The function expects the complete row, however. What can I do to pass the entire row to the stored function?
Thanks!
Don't think you can.
Either create the function with all the arguments you need, or pass the id of the row and do a SELECT within the function.