Azure Synapse Analytics SQL Database function to get match between two delimited lists - azure-synapse

I'm using Azure Synapse Analytics SQL Database. I'm aware I can't use selects in a scalar function (hence the error The SELECT statement is not allowed in user-defined functions). I'm looking for a work-around since this function does not rely on any tables. The goal is a scalar function that takes two delimited lists parameters, a delimiter parameter and returns 1 if the lists have one or more matching items, and returns 0 if no matches are found.
--The SELECT statement is not allowed in user-defined functions
CREATE FUNCTION util.get_lsts_have_mtch
(
#p_lst_1 VARCHAR(8000),
#p_lst_2 VARCHAR(8000),
#p_dlmtr CHAR(1)
)
RETURNS BIT
/***********************************************************************************************************
Description: This function returns 1 if two delimited lists have an item that exists in both lists.
--Example run:
SELECT util.get_lsts_have_mtch('AB|CD|EF|GH|IJ','UV|WX|CD|IJ|YZ','|') -- returns 1, there's a match
SELECT util.get_lsts_have_mtch('AB|CD|EF|GH|IJ','ST|UV|WX|YZ','|') -- returns 0, there's no match
**********************************************************************************************************/
AS
BEGIN
DECLARE #v_result BIT;
-- *** CAN THIS BE ACCOMPLISHED EFFICIENTLY WITHOUT ANY SELECTS? ***
SET #v_result = (SELECT CAST(CASE WHEN EXISTS (SELECT 1
FROM STRING_SPLIT(#p_lst_1, #p_dlmtr) AS tokens_1
INNER JOIN STRING_SPLIT(#p_lst_2, #p_dlmtr) AS tokens_2
ON tokens_1.value = tokens_2.value)
THEN 1
ELSE 0
END) AS BIT);
RETURN #v_result;
END;

I ditched the function and used this CASE statement. I wanted a function to join on that would be reusable. If anyone can find a function to do this, I will make that the accepted answer.
SELECT ...
FROM tbl_1
JOIN tbl_2
ON
-- wanted: util.get_lsts_have_mtch(tbl_1.my_lst, tbl_2.my_lst, '|') = 1
-- but settled for:
CASE WHEN EXISTS
(SELECT [value]
FROM STRING_SPLIT(tbl_1.my_lst, '|')
INTERSECT
SELECT [value]
FROM STRING_SPLIT(tbl_2.my_lst, '|'))
THEN 1
ELSE 0
END = 1

Related

Checking if field contains multiple string in sql server

I am working on a sql database which will provide with data some grid. The grid will enable filtering, sorting and paging but also there is a strict requirement that users can enter free text to a text input above the grid for example
'Engine 1001 Requi' and that the result will contain only rows which in some columns contain all the pieces of the text. So one column may contain Engine, other column may contain 1001 and some other will contain Requi.
I created a technical column (let's call it myTechnicalColumn) in the table (let's call it myTable) which will be updated each time someone inserts or updates a row and it will contain all the values of all the columns combined and separated with space.
Now to use it with entity framework I decided to use a table valued function which accepts one parameter #searchQuery and it will handle it like this:
CREATE FUNCTION myFunctionName(#searchText NVARCHAR(MAX))
RETURNS #Result TABLE
( ... here come columns )
AS
BEGIN
DECLARE #searchToken TokenType
INSERT INTO #searchToken(token) SELECT value FROM STRING_SPLIT(#searchText,' ')
DECLARE #searchTextLength INT
SET #searchTextLength = (SELECT COUNT(*) FROM #searchToken)
INSERT INTO #Result
SELECT
... here come columns
FROM myTable
WHERE (SELECT COUNT(*) FROM #searchToken WHERE CHARINDEX(token, myTechnicalColumn) > 0) = #searchTextLength
RETURN;
END
Of course the solution works fine but it's kinda slow. Any hints how to improve its efficiency?
You can use an inline Table Valued Function, which should be quite a lot faster.
This would be a direct translation of your current code
CREATE FUNCTION myFunctionName(#searchText NVARCHAR(MAX))
RETURNS TABLE
AS RETURN
(
WITH searchText AS (
SELECT value token
FROM STRING_SPLIT(#searchText,' ') s(token)
)
SELECT
... here come columns
FROM myTable t
WHERE (
SELECT COUNT(*)
FROM searchText
WHERE CHARINDEX(s.token, t.myTechnicalColumn) > 0
) = (SELECT COUNT(*) FROM searchText)
);
GO
You are using a form of query called Relational Division Without Remainder and there are other ways to cut this cake:
CREATE FUNCTION myFunctionName(#searchText NVARCHAR(MAX))
RETURNS TABLE
AS RETURN
(
WITH searchText AS (
SELECT value token
FROM STRING_SPLIT(#searchText,' ') s(token)
)
SELECT
... here come columns
FROM myTable t
WHERE NOT EXISTS (
SELECT 1
FROM searchText
WHERE CHARINDEX(s.token, t.myTechnicalColumn) = 0
)
);
GO
This may be faster or slower depending on a number of factors, you need to test.
Since there is no data to test, i am not sure if the following will solve your issue:
-- Replace the last INSERT portion
INSERT INTO #Result
SELECT
... here come columns
FROM myTable T
JOIN #searchToken S ON CHARINDEX(S.token, T.myTechnicalColumn) > 0

How does one automatically insert the results of several function calls into a table?

Wasn't sure how to title the question but hopefully this makes sense :)
I have a table (OldTable) with an index and a column of comma separated lists. I'm trying to split the strings in the list column and create a new table with the indexes coupled with each of the sub strings of the string it was connected to in the old table.
Example:
OldTable
index | list
1 | 'a,b,c'
2 | 'd,e,f'
NewTable
index | letter
1 | 'a'
1 | 'b'
1 | 'c'
2 | 'd'
2 | 'e'
2 | 'f'
I have created a function that will split the string and return each sub string as a record in a 1 column table as so:
SELECT * FROM Split('a,b,c', ',', 1)
Which will result in:
Result
index | string
1 | 'a'
1 | 'b'
1 | 'c'
I was hoping that I could use this function as so:
SELECT * FROM Split((SELECT * FROM OldTable), ',')
And then use the id and string columns from OldTable in my function (by re-writing it slightly) to create NewTable. But I as far as I understand sending tables into the function doesn't work as I get: "Subquery returned more than 1 value. ... not premitted ... when the subquery is used as an expression."
One solution I was thinking of would be to run the function, as is, on all the rows of OldTable and insert the result of each call into NewTable. But I'm not sure how to iterate each row without a function. And I can't send tables into the a function to iterate so I'm back at square one.
I could do it manually but OldTable contains a few records (1000 or so) so it seems like automation would be preferable.
Is there a way to either:
Iterate over OldTable row by row, run the row through Split(), add the result to NewTable for all rows in OldTable. Either by a function or through regular sql-transactions
Re-write Split() to take a table variable after all
Get rid of the function altogether and just do it in sql transactions?
I'd prefer to not use procedures (don't know if there is a solutions with them either) mostly because I don't want the functionality inside of the DB to be exposed to the outside. If, however that is the "best"/only way to go I'll have to consider it. I'm quite (read very) new to SQL so it might be a needless worry.
Here is my Split() function if it is needed:
CREATE FUNCTION Split (
#string nvarchar(4000),
#delimitor nvarchar(10),
#indexint = 0
)
RETURNS #splitTable TABLE (id int, string nvarchar(4000) NOT NULL) AS
BEGIN
DECLARE #startOfSubString smallint;
DECLARE #endOfSubString smallint;
SET #startOfSubString = 1;
SET #endOfSubString = CHARINDEX(#delimitor, #string, #startOfSubString);
IF (#endOfSubString <> 0)
WHILE #endOfSubString > 0
BEGIN
INSERT INTO #splitTable
SELECT #index, SUBSTRING(#string, #startOfSubString, #endOfSubString - #startOfSubString);
SET #startOfSubString = #endOfSubString+1;
SET #endOfSubString = CHARINDEX(#delimitor, #string, #startOfSubString);
END;
INSERT INTO #splitTable
SELECT #index, SUBSTRING(#string, #startOfSubString, LEN(#string)-#startOfSubString+1);
RETURN;
END
Hope my problem and attempt was explained and possible to understand.
You are looking for cross apply:
SELECT t.index, s.item
FROM OldTable t CROSS APPLY
(dbo.split(t.list, ',')) s(item);
Inserting in the new table just requires an insert or select into clause.

Using local function parameter inside a while loop - SQL Server

My inline table function basically prints out numbers, from 1 to the number that is input in the function. Since I'm new to SQL Server I am not sure how to implement a while loop inside a function. My implementation is below. As expected i get a lot compiler errors at WHILE, BEGIN, THEN. Can anyone suggest how can i improve the code. Thanks!
CREATE FUNCTION num_F(#nnn INT)
RETURNS TABLE
AS
RETURN (SELECT CASE
WHEN #nnn = 0
THEN NULL
ELSE WHILE dbo.num_F(#nnn) > 0
BEGIN
THEN dbo.num_F(#nnn)
SET dbo.num_F(#nnn) = dbo.num_F(#nnn)-1
END
GO
END as First_Col
ORDER BY First_Col ASC)
simplest ways is by using a common table expression (CTE)
CREATE FUNCTION Numfunctn1 (#num INT)
RETURNS TABLE
AS
RETURN(
WITH cte
AS (SELECT 1 id
UNION ALL
SELECT id + 1
FROM cte
WHERE id < #num)
SELECT *
FROM cte)
SELECT *
FROM Numfunctn1(10)

SQL Server 2008 inconsistent results

I just released some code into production that is randomly causing errors. I already fixed the problem by totally changing the way I was doing the query. However, it still bothers me that I don't know what was causing the problem in the first place so was wondering if someone might know the answer. I have the following query inside of a stored procedure. I'm not looking for comments about that's not a good practice to make queries with nested function calls and things like that :-). Just really want to find out why it doesn't work consistently. Randomly the function in the query will return a non-numeric value and cause an error on the join. However, if I immediately rerun the query it works fine.
SELECT cscsf.cloud_server_current_software_firewall_id,
dbo.fn_GetCustomerFriendlyFromRuleName(cscsf.rule_name, np.policy_name) as rule_name,
cscsf.rule_action,
cscsf.rule_direction,
cscsf.source_address,
cscsf.source_mask,
cscsf.destination_address,
cscsf.destination_mask,
cscsf.protocol,
cscsf.port_or_port_range,
cscsf.created_date_utc,
cscsf.created_by
FROM CLOUD_SERVER_CURRENT_SOFTWARE_FIREWALL cscsf
LEFT JOIN CLOUD_SERVER cs
ON cscsf.cloud_server_id = cs.cloud_server_id
LEFT JOIN CLOUD_ACCOUNT cla
ON cs.cloud_account_id = cla.cloud_account_id
LEFT JOIN CONFIGURATION co
ON cla.configuration_id = co.configuration_id
LEFT JOIN DEDICATED_ACCOUNT da
ON co.dedicated_account_id = da.dedicated_account_id
LEFT JOIN CORE_ACCOUNT ca
ON da.core_account_number = ca.core_account_id
LEFT JOIN NETWORK_POLICY np
ON np.network_policy_id = (select dbo.fn_GetIDFromRuleName(cscsf.rule_name))
WHERE cs.cloud_server_id = #cloud_server_id
AND cs.current_software_firewall_confg_guid = cscsf.config_guid
AND ca.core_account_id IS NOT NULL
ORDER BY cscsf.rule_direction, cscsf.cloud_server_current_software_firewall_id
if you notice the join
ON np.network_policy_id = (select dbo.fn_GetIDFromRuleName(cscsf.rule_name))
calls a function.
Here is that function:
ALTER FUNCTION [dbo].[fn_GetIDFromRuleName]
(
#rule_name varchar(100)
)
RETURNS varchar(12)
AS
BEGIN
DECLARE #value varchar(12)
SET #value = dbo.fn_SplitGetNthRow(#rule_name, '-', 2)
SET #value = dbo.fn_SplitGetNthRow(#value, '_', 2)
SET #value = dbo.fn_SplitGetNthRow(#value, '-', 1)
RETURN #value
END
Which then calls this function:
ALTER FUNCTION [dbo].[fn_SplitGetNthRow]
(
#sInputList varchar(MAX),
#sDelimiter varchar(10) = ',',
#sRowNumber int = 1
)
RETURNS varchar(MAX)
AS
BEGIN
DECLARE #value varchar(MAX)
SELECT #value = data_split.item
FROM
(
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as row_num FROM dbo.fn_Split(#sInputList, #sDelimiter)
) AS data_split
WHERE
data_split.row_num = #sRowNumber
IF #value IS NULL
SET #value = ''
RETURN #value
END
which finally calls this function:
ALTER FUNCTION [dbo].[fn_Split] (
#sInputList VARCHAR(MAX),
#sDelimiter VARCHAR(10) = ','
) RETURNS #List TABLE (item VARCHAR(MAX))
BEGIN
DECLARE #sItem VARCHAR(MAX)
WHILE CHARINDEX(#sDelimiter,#sInputList,0) <> 0
BEGIN
SELECT #sItem=RTRIM(LTRIM(SUBSTRING(#sInputList,1,CHARINDEX(#sDelimiter,#sInputList,0)-1))), #sInputList=RTRIM(LTRIM(SUBSTRING(#sInputList,CHARINDEX(#sDelimiter,#sInputList,0)+LEN(#sDelimiter),LEN(#sInputList))))
IF LEN(#sItem) > 0
INSERT INTO #List SELECT #sItem
END
IF LEN(#sInputList) > 0
INSERT INTO #List SELECT #sInputList -- Put the last item in
RETURN
END
The reason it is "randomly" returning different things has to do with how SQL Server optimizes queries, and where they get short-circuited.
One way to fix the problem is the change the return value of fn_GetIDFromRuleName:
return (case when isnumeric(#value) then #value end)
Or, change the join condition:
on np.network_policy_id = (select case when isnumeric(dbo.fn_GetIDFromRuleName(cscsf.rule_name)) = 1)
then dbo.fn_GetIDFromRuleName(cscsf.rule_name) end)
The underlying problem is order of evaluation. The reason the "case" statement fixes the problem is because it checks for a numeric value before it converts and SQL Server guarantees the order of evaluation in a case statement. As a note, you could still have problems with converting numbers like "6e07" or "1.23" which are numeric, but not integers.
Why does it work sometimes? Well, clearly the query execution plan is changing, either statically or dynamically. The failing case is probably on a row that is excluded by the WHERE condition. Why does it try to do the conversion? The question is where the conversion happens.
WHere the conversion happens depends on the query plan. This may, in turn, depend on when the table cscf in question is read. If it is already in member, then it might be read and attempted to be converted as a first step in the query. Then you would get the error. In another scenario, the another table might be filtererd, and the rows removed before they are converted.
In any case, my advice is:
NEVER have implicit conversion in queries.
Use the case statement for explicit conversions.
Do not rely on WHERE clauses to filter data to make conversions work. Use the case statement.

UDF Table UDV or Scalar UDF?

I will do my best to make this question better than my last fiasco. I am getting the dreaded >"cannot find either column "dbo" or the user-defined function or aggregate "dbo.PriMonthAvgPrice", or the name is ambiguous.<
I am attempting to find the avg sales price from the previous month. Here is my UDF:
USE [WoodProduction]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [dbo].[PriMonthAvgPrice]
(
-- Add the parameters for the function here
#endofmonth datetime,
#begofmonth datetime,
#PlantCode varchar
)
RETURNS decimal (10,2)
AS
BEGIN
-- Declare the return variable here
DECLARE #MonthEndAvgPrice decimal (10,2)
-- Add the T-SQL statements to compute the return value here
SELECT #MonthEndAvgPrice =
(
select
sum(Actual_Sales_Dollars/Actual_Volume)
FROM
woodproduction.dbo.plywood_layup_sales pls
WHERE
Production_Date between #begofmonth and #endofmonth
and actual_volume <> 0
and #PlantCode = pls.Plant_Code
)
-- Return the result of the function
RETURN #MonthEndAvgPrice
END
This is my SELECT statement from my query:
SELECT
DISTINCT
P.[Plant_Number]
,p.plant_name
,pls.plant_code
,(pls.[Budget_Realization]) AS 'BR'
,(pls.[Actual_Volume] ) AS 'AV'
,(pls.[Budget_Volume]) AS 'BV'
--,sum (dpb.[Gross_Production_Per_Hr]) AS 'GPB'
,(p.Production_Volume) AS 'PV'
,CASE
WHEN coalesce (pls.[Actual_Volume],0) = 0 and
coalesce (pls.[Actual_Sales_Dollars],0) = 0
THEN 0
ELSE (pls.[Actual_Sales_Dollars]/pls.[Actual_Volume])
END
AS 'AP'
,pls.production_date
,[dbo].[PriMonthAvgPrice](#endofmonth,#begofmonth, pls.plant_code) AS 'PriMoAvgPrice'
My BASIC understanding is that I HAVE created a Scalar Function. From what I've been reading about my error however, This error returns on TVF's. Is this true? I created a SVF prior to this dealing with just determining a prior month end date so it wasn't as involved as this one where I create the query in the UDF.
Do I need to change this to a TVF? And if so, how do I incorporate SELECT * when I have to join multiple tables along with this?
Thanks in advance.
Aaron
You don't show the from clause, but is the database you created the function in part of it?
Does it work if you fully qualify the name (include the database)?
Have you independently tested the function with:
select [dbo].[PriMonthAvgPrice] ('01/01/2011', '02/01/2011', 'test')
Note: of course you would use some actual values that should return a result.
Please run this and tell us the values returned:
SELECT Actual_Sales_Dollars,Actual_Volume, pls.PLant_code
FROM woodproduction.dbo.plywood_layup_sales pls
WHERE Production_Date between '09-01-2011' and '09-30-2011'
and actual_volume <> 0