How do I write an SQL function that takes an unknown amount of numbers as parameters? - sql

I am trying to write an Oracle SQL function that takes a list of numbers as arguments and return a pipelined list of table rows. My main problem is the quantity of numbers that can be passed is never certain with no real upper limit. I'll try and demonstrate what I mean:
Say I have a table defined as so:
create table items (
id number primary key,
class number,
data string
);
I want to return all rows that match one of a list of class numbers that I submit. The function I'm shooting at looks a little like this:
function get_list_items_from_class([unknown number of parameters]
in items.class%type)
return tbl_list_item pipelined; -- I have types defined to handle the return values
I've been looking at ways to handle defining a function that can take an undefined amount of integers and so far the most promising search has taken me to this page which explains about using collections and records. I don't think a VARRAY is what I'm looking for as the size has to be predefined. As Associative Array may be what I'm looking for, but before I spend a lot of time trying things out, I want to make sure the tool is fit for the job. I'm pretty inexperienced with Oracle SQL right now and I'm working on a time sensitive project.
Any help that you could offer would be appreciated. I realise that there are simpler ways to achieve what I'm trying to do in this example (simply multiple calls to a function that takes one parameter is one) but this example is simplified. Other parts of the project I'm working on require me to seek a solution using this multiple parameter method.
EDIT: That being said, I would welcome other design suggestions if I'm way off base with what I'm trying to attempt. It would be a learning experience if nothing else.
Many thanks in advance for your time.
EDIT: I will be accessing the database from proprietary client software written in Java.

You could use a table parameter as I linked in the comments or you could pass in a comma separated list of values parse it to a table and join to that.
something like this (with input_lst as a string):
select *
from tbl_list_item
where tbl_list_item.class in
(
select regexp_substr(input_lst,'[^,]+', 1, level) from dual
connect by regexp_substr(input_lst, '[^,]+', 1, level) is not null
);
adapted from https://blogs.oracle.com/aramamoo/entry/how_to_split_comma_separated_string_and_pass_to_in_clause_of_select_statement
Which choice is better depends on your expected number of entries and what is easier for your client side. I think with a small number (2-20) the comma separated is a fine choice. With a large number you probably want to pass a table.

A colleague actually suggested another way to achieve this and I think it is worth sharing. Basically, define a table type that can contain the arguments, then pass an array from the Java program that can be read from this table.
In my example, firstly define a table type of number:
create or replace type tbl_number as table of number;
Then, in the SQL package, define the function as:
function get_list_items_from_class(i_numbers in tbl_number)
return tbl_list_item pipelined;
The function in the package body has one major change (apart from the definition obviously). Use the following select statement:
select *
from tbl_list_item
where tbl_list_item.class in
(
select * from table(i_numbers)
);
This will select all the relevant items that match one of the integers that were passed to the "i_numbers" table. I like this way as it means less string parsing, both in the Java application and the SQL pacakage.
Here's how I passed the number arguments from the Java application using an ARRAY object.
ArrayDescriptor arrayDesc = ArrayDescriptor.createDescriptor("NUMBERS", con); //con is the database connection
ARRAY array = new ARRAY(arrayDesc, con, numberList.toArray()); // numberList is an ArrayList of integers which holds the arguments
array is then passed to the SQL function.

Related

how to use SQL user defined function in snowflake?

I am just studying how to use SQL in snowflake. Here is a snapshot:
And this is the code used in here:
use schema SNOWFLAKE_SAMPLE_DATA.TPCH_SF1;
--use schema SNOWFLAKE_SAMPLE_DATA.TPCH_SF10;
select *
from LINEITEM
limit 200
You can see the table includes two feilds: L_LINENUMBER, L_QUANTITY. Now I want to try a user defined function, which can do:
use L_LINENUMBER, L_QUANTITY as two parameters transferred into the function,
calculate L_LINENUMBER1=L_LINENUMBER+1, and L_QUANTITY1=mean(L_QUANTITY).
join the two new fields (L_LINENUMBER1, L_QUANTITY1) to the original table (LINEITEM)
how to use create function to do this. I have read a lot of examples regarding create function. But I just cannot get the point. Maybe because I am not good at SQL. So, could anyone give me a comprehensive example with all the details?
I understand that you question is about UDFs, but using UDFs for your purpose here is overkill.
You can increment an attribute in a table using the following statement.
SELECT
L_LINENUMBER+1 as L_LINENUMBER1
FROM LINEITEM;
To calculate the mean of an attribute in a table, you should understand that this is an aggregate function which only makes sense when used in conjunction with a group by statement. An example with your data is shown below.
SELECT
AVG(L_QUANTITY) AS L_QUANTITY1
FROM LINEITEM
GROUP BY L_ORDERKEY;
Since your question was originally on UDFs and you seem to be following with Snowflake's sample data, the example that they provide is the following UDF which accepts a temperature in Kelvin and converts it to Fahrenheit (from the definition you can see that it can be applied to any attribute of the number type).
CREATE OR REPLACE FUNCTION
UTIL_DB.PUBLIC.convert_fahrenheit( t NUMBER)
RETURNS NUMBER
COMMENT='Convert from Kelvin from Fahrenheit'
AS '(t - 273.15) * 1.8000 + 32.00';

How can I pass from Java a collection/array of data to procedure call? HANA

Let's say I have a list of ids: 111, 112, 113 that I fetched executing the following query using Java:
SELECT "id" FROM User WHERE (email, name) IN (("", ""), ("", ""));
The list length will vary.
From Java I need to pass this list/collection/array of IDs to a stored procedure. How can I do that?
CREATE PROCEDURE "PROCEDUREEXAMPLE" (IN userIds ??COLLECTION??) LANGUAGE SQLSCRIPT SQL SECURITY DEFINER
AS
BEGIN
//Do the rest
END
I wanted to use another procedure to pass the result of first query to the other procedure but as you can see, the first sql is dynamic and the values will vary.
A way to do is to store those ids in a temporary table and the procedure call will access them, but I wanted to know if it possible to pass a collection of data to procedure call.
Feel free to suggest other ways of doing this.
Thanks
This has been asked & answered here several times before.
No, it’s not possible to pass a java collection or java array of values into a SAP HANA SQL statement and get the corresponding IN LIST.
There is also no mapping of java arrays to SAP HANA SQL arrays.
To deal with that, two main approaches are available:
Create the IN-LIST based on the collection elements yourself. This of course can lead to issues with prepared statement reuse, due to the changing number of elements. One way to handle this could be to prepare a statement with a larger number of elements and only bind those for which you got elements in the collection/array.
Create a temporary table, fill it with the elements of the collection, one element = one row and use an INNER JOIN to filter based on this set of elements.

Finding strings that differ with at most one letter from a given string in SAS with PROC SQL

First some context. I am using proc sql in SAS, and need to fetch all the entries in a data set (with a couple of million entries) that have variable "Name" equal to (let's say) "Massachusetts". Of course, since the data was once manually entered by humans, close to all conceivable spelling errors occur ("Amssachusetts", "Kassachusetts" etc.).
I have found that few entries get more than two characters wrong, so the code
Name like "__ssachusetts" OR Name like "_a_sachusetts" OR ... OR Name like "Massachuset__"
would select the entries I am looking for. However, I am hoping that there must be a more convenient way to write
Name that differs by at most 2 characters from "Massachusetts";
Is there? Or is there some other strategy for fetching these entries? I tried searching both stackoverflow and the web but was unsuccesful. I am also a relative beginner with both SQL and SAS.
Some additional information: The database is not in English (and the actual string is not "Massachusetts") so using SOUNDEX is not really feasible (if it ever were).
Thanks in advance.
(Edit: Improved the title)
SAS has built-in functions COMPGED and COMPLEV to compute distances between strings. Here is an example that shows how to select just those with a Levenshtein edit distance of less than or equal to 2.
data typo;
input name $20.;
datalines;
massachusetts
masachusets
mssachusetts
nassachusets
nassachussets
massachusett
;
proc sql;
select name from typo
where complev(name, "massachusetts") <= 2;
quit;
There are other phonetic algorithms like Hamming distance that should work better.
You can search on google for implementation of this algorithm for your specific DB engine.
What you are looking for is "Approximate string matching". For that one can use "Levenshtein distance computing algorithm". I am not sure, but hope that this answer will help
You could implement a stored function of this type (Oracle syntax, transform to your RDBMS):
CREATE FUNCTION distance(one VARCHAR2, two VARCHAR2) RETURN NUMBER IS
DETERMINISTIC
BEGIN
-- do some comparison here
END distance;
And then use it in SQL:
SELECT * FROM table WHERE distance(name, 'Massachusetts') <= 2
Of course, these things tend to be quite slow...
I know this is four years too late but since it might also give ideas to others who are searching this thread:
What you're considering is a semantic layered design you would need to implement some conditional logic for these different text comparisons, using Lenvenschtien distances like the Jaro-Winkler for comparing text of differing lengths and Hamming for those of the same length for which you suppose simple text trans-positioning. This is nothing new these days with all of the various text mining programs out there.
Here is a post which is very good in my view;
Jaro-Winkler string comparison function in SAS

SQL to filter by multiple criteria including containment in string list

so i have a table lets say call it "tbl.items" and there is a column "title" in "tbl.items" i want to loop through each row and for each "title" in "tbl.items" i want to do following:
the column has the datatype nvarchar(max) and contains a string...
filter the string to remove words like in,out, where etc (stopwords)
compare the rest of the string to a predefined list and if there is a match perform some action which involves inserting data in other tables as well..
the problem is im ignotent when it comes to writing T-sql scripts, plz help and guide me how can i achieve this?
whether it can be achieved by writing a sql script??
or i have to develope a console application in c# or anyother language??
im using mssql server 2008
thanks in advance
You want a few things. First, look up SQL Server's syntax for functions, and write something like this:
-- Warning! Code written off the top of my head,
-- don't expect this to work w/copy-n-paste
create function removeStrings(#input nvarchar(4000))
as begin
-- We're being kind of simple-minded and using strings
-- instead of regular expressions, so we are assuming a
-- a space before and after each word. This makes this work better:
#input = ' ' + #input
-- Big list of replaces
#input = replace(' in ','',#input)
#input = replace(' out ','',#input)
--- more replaces...
end
Then you need your list of matches in a table, call this "predefined" with a column "matchString".
Then you can retrieve the matching rows with:
select p.matchString
from items i
join predefined p
on removeStrings(i.title) = p.matchString
Once you have those individual pieces working, I suggest a new question on what particular process you may be doing with them.
Warning: Not knowing how many rows you have or how often you have to do this (every time a user saves something? Once/day?), this will not exactly be zippy, if you know what I mean. So once you have these building blocks in hand, there may also be a follow-up question for how and when to do it.

Sql Optimization: Xml or Delimited String

This is hopefully just a simple question involving performance optimizations when it comes to queries in Sql 2008.
I've worked for companies that use Stored Procs a lot for their ETL processes as well as some of their websites. I've seen the scenario where they need to retrieve specific records based on a finite set of key values. I've seen it handled in 3 different ways, illustrated via pseudo-code below.
Dynamic Sql that concatinates a string and executes it.
EXEC('SELECT * FROM TableX WHERE xId IN (' + #Parameter + ')'
Using a user defined function to split a delimited string into a table
SELECT * FROM TableY INNER JOIN SPLIT(#Parameter) ON yID = splitId
USING XML as the Parameter instead of a delimited varchar value
SELECT * FROM TableZ JOIN #Parameter.Nodes(xpath) AS x (y) ON ...
While I know creating the dynamic sql in the first snippet is a bad idea for a large number of reasons, my curiosity comes from the last 2 examples. Is it more proficient to do the due diligence in my code to pass such lists via XML as in snippet 3 or is it better to just delimit the values and use an udf to take care of it?
There is now a 4th option - table valued parameters, whereby you can actually pass a table of values in to a sproc as a parameter and then use that as you would normally a table variable. I'd be preferring this approach over the XML (or CSV parsing approach)
I can't quote performance figures between all the different approaches, but that's one I'd be trying - I'd recommend doing some real performance tests on them.
Edit:
A little more on TVPs. In order to pass the values in to your sproc, you just define a SqlParameter (SqlDbType.Structured) - the value of this can be set to any IEnumerable, DataTable or DbDataReader source. So presumably, you already have the list of values in a list/array of some sort - you don't need to do anything to transform it into XML or CSV.
I think this also makes the sproc clearer, simpler and more maintainable, providing a more natural way to achieve the end result. One of the main points is that SQL performs best at set based/not looping/non string manipulation activities.
That's not to say it will perform great with a large set of values passed in. But with smaller sets (up to ~1000) it should be fine.
UDF invocation is a little bit more costly than splitting the XML using the built-in function.
However, this only needs to be done once per query, so the performance difference will be negligible.