Scaler Function in Where Clause Really Slow? How to use Cross Apply Instead? - where-clause

I have some data, some of it was imported with different separators such as * - . or a space...some of it was removed on import, some was not. Some of the external values being compared to it has the same issue. So we remove all separators and compare that way, I don't want to just update the columns yet as the data isn't "mine".
So since I see this over and over in the code I am moving to stored procedures, I wrote a stored function to do it for me.
ALTER FUNCTION [dbo].[fn_AccountNumber_Format2]
(#parAcctNum NVARCHAR(50))
RETURNS NVARCHAR(50)
AS
BEGIN
SET #parAcctNum = REPLACE(REPLACE(REPLACE(REPLACE(#parAcctNum, '.', ''), '*', ''), '-', ''), ' ', '');
RETURN #parAcctNum
END
Normally the queries looked something like this and it takes less than a second to run on a few millions rows :
SELECT name1, accountID FROM tblAccounts WHERE (Replace(Replace(Replace(accountnumber, '.', ''), '*', ''), '-', '') = Replace(Replace(Replace('123-456-789', '.', ''), '*', ''), '-', ''));
So my first attempt with it like this takes 24 seconds to excecute:
SELECT name1, accountID FROM tblAccounts WHERE (dbo.fn_AccountNumber_Format2 ([accountnumber])) = Replace(Replace(Replace('123-456-789', '.', ''), '*', ''), '-', '');
This one 43 seconds:
SELECT name1, accountID FROM tblAccounts WHERE (dbo.fn_AccountNumber_Format2(accountnumber)) = (dbo.fn_AccountNumber_Format2 ('123-456-789'));
So the drastic slow down came as a complete shock to me as I expected the user defined function to run just the same as the system function REPLACE... After some research on stackexchange and google it seems that using Cross Apply and creating a table with the function may be a better solution but I have no idea how that works, can anyone help me with that?

Inline Function
CREATE FUNCTION [dbo].[uspAccountNumber_Format3]
(
#parAcctNum NVARCHAR(50))
RETURNS TABLE
AS
RETURN
(
SELECT REPLACE(REPLACE(REPLACE(REPLACE(#parAcctNum, '.', ''), '*', ''),'-', ''), ' ', '') AS AccountNumber
)
Usage
SELECT name1 ,
accountID
FROM tblAccounts
CROSS APPLY dbo.uspAccountNumber_Format3(accountnumber) AS a
CROSS APPLY dbo.uspAccountNumber_Format3('123-456-789') AS b
WHERE a.AccountNumber = b.AccountNumber

Related

Formatting Phone Number to US Format (###) ###-####

I am trying to reformat around 1000 phone numbers in a SQL Server database to US format (###) ###-####
Currently the phone numbers are formatted in all sorts of ways ranging from ##########, ###-###-####, one is ###)-###-####. There is also one with only six digits.
As a first step I've been attempting to isolate the numbers in all of these rows but its just returning the same as they were already.
select SUBSTRING(phone, PATINDEX('%[0-9]%', phone), LEN(phone)) from people
How could I best go about writing a query which would format them all as (###) ###-####?
expected output:
(555) 222-3333
(555) 444-3030
(555) 092-0920
(555) 444-4444
Since one suggestion was made already and the suggestion there to isolate numbers in a string uses a while loop I need to post an alternative to that which doesn't use any looping. Instead it utilizes a tally or numbers table. There are lots of solutions for those. I like to use a view which is lightning fast and has zero reads.
Here is my version of a tally table.
create View [dbo].[cteTally] as
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select N from cteTally
Next we need a table valued function to remove the characters that are not numbers using our tally table. This is also super fast because we are using our tally table instead of looping.
create function GetOnlyNumbers
(
#SearchVal varchar(8000)
) returns table as return
with MyValues as
(
select substring(#SearchVal, N, 1) as number
, t.N
from cteTally t
where N <= len(#SearchVal)
and substring(#SearchVal, N, 1) like '[0-9]'
)
select distinct NumValue = STUFF((select number + ''
from MyValues mv2
order by mv2.N
for xml path('')), 1, 0, '')
from MyValues mv
Now that we have all the legwork done we can focus on the task at hand. Since you didn't provide any sample data I just made up some stuff. I am not really sure if this is representative of your data or not but this works on the sample data I created.
if OBJECT_ID('tempdb..#Something') is not null
drop table #Something
create table #Something(SomeVal varchar(100))
insert #Something values
('Maybe you have other stuff in here. 5552223333 additional characters can cause grief')
, ('321-654-9878')
, ('123)-333-4444')
, ('1234567')
select replace(format(try_convert(bigint, n.NumValue), '(###) ###-####'), '() ', '')
, n.NumValue
from #Something s
cross apply dbo.GetOnlyNumbers(s.SomeVal) n
The output for the formatted data looks like this:
(555) 222-3333
(321) 654-9878
(123) 333-4444
123-4567
If this reformatting something that is going to be used repeatedly then a creating a UDF as suggested by #GSerg would be the way to go.
If this is just a one time clean-up you could give this a try.
First replace all of the numbers with empty strings with a series of nested REPLACE() functions.
DECLARE #PhoneNumbers TABLE (
Number varchar (20))
INSERT INTO #PhoneNumbers VALUES ('(888-239/1239')
INSERT INTO #PhoneNumbers VALUES ('222.1234')
SELECT
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(Number, '0', '')
, '1', '')
, '2', '')
, '3', '')
, '4', '')
, '5', '')
, '6', '')
, '7', '')
, '8', '')
, '9', '')
FROM #PhoneNumbers
Then take those result non-numeric characters and put them each in their own nested REPLACE() function and format the result. You will have to deal with each length individually. If you have only 7 digits and you want to format it to have 10 digits what do you want those extra 3 digits to be. This will handle the 10 digit phone numbers.
SELECT FORMAT(x.NumbersOnly, '(###) ###-####')
FROM
(
SELECT
CONVERT(BIGINT,
REPLACE(
REPLACE(
REPLACE(
REPLACE(Number, '(', '')
, '-', '')
, '/', '')
, '.', '')
) AS NumbersOnly
FROM #PhoneNumbers
) x
WHERE LEN(x.NumbersOnly) = 10
Here is the dbfiddle.

SQL : Retrieve first three characters from sql column value

For ex : if sql column value is sa,123k and the output should first three characters i.e. sak
Letters and any special characters needs to be eliminated and gets only three characters. How do we do this ?
You can use recursive CTEs for this purpose:
with t as (
select 'sa,123k' as str
),
cte as (
select str, left(str, 1) as c, stuff(str, 1, 1, '') as rest, 1 as lev,
convert(varchar(max), (case when left(str, 1) like '[a-zA-Z]' then left(str, 1) else '' end)) as chars
from t
union all
select str, left(rest, 1) as c, stuff(rest, 1, 1, '') as rest, lev + 1,
convert(varchar(max), (case when left(rest, 1) like '[a-zA-Z]' then chars + left(rest, 1) else chars end))
from cte
where rest > '' and len(chars) < 3
)
select str, max(chars)
from cte
where len(chars) <= 3
group by str;
Here is a db<>fiddle.
This might help
DECLARE #VAR VARCHAR(100)= 'sa,1235JSKL', #RESULT VARCHAR(MAX)=''
SELECT #RESULT = #RESULT+
CASE WHEN RESULT LIKE '[a-zA-Z]' THEN RESULT ELSE '' END
FROM (
SELECT NUMBER, SUBSTRING(#VAR,NUMBER,1) AS RESULT
FROM MASTER..spt_values
WHERE TYPE = 'P' AND NUMBER BETWEEN 1 AND LEN(#VAR)
)A
ORDER BY NUMBER
SELECT SUBSTRING(#RESULT,1,3)
If you want to apply this on a Tables column, you need to create Scalar function with same logic. You can find more number of articles how to create the scalar function by Googling..
You can use this function which is written by G Mastros to do this.
Create Function [dbo].[RemoveNonAlphaCharacters](#Temp nvarchar(MAX))
Returns nvarchar(MAX)
AS
Begin
Declare #KeepValues as nvarchar(MAX)
Set #KeepValues = '%[^a-z]%'
While PatIndex(#KeepValues, #Temp) > 0
Set #Temp = Stuff(#Temp, PatIndex(#KeepValues, #Temp), 1, '')
Return #Temp
End
Then simply call the fuction like this
SELECT LEFT(dbo.RemoveNonAlphaCharacters(colName), 3)
FROM TableName
Reference: G Mastros answer on "How to strip all non-alphabetic characters from string in SQL Server" question.
Well, this is ugly, but you could replace all the characters you don't like.
In your example, this would be:
SELECT REPLACE (REPLACE (REPLACE (REPLACE ('sa,123k', '1', ''), '2', ''), '3', ''), ',', '')
Obviously, this needs a lot of replaces if you need all numbers and other sorts of characters replaced.
Edited, based on your comment:
SELECT REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE ('123456gh,.!879k', '1', ''), '2', ''), '3', ''), ',', ''), '4', ''), '5', ''), '6', ''), '.', ''), '!', ''), '7', ''), '8', ''), '9', '')

How can I CONCAT portions of three columns to one new column

I am trying to create a new column in my results that is made up on the first 3 characters of "PrimaryName", all of "VendorCity", and the first 5 characters of "VendorZip"
SELECT,VendorName
,replace(PrimaryVendorLocationName,' ','') as PrimaryName
,replace(PrimaryVendorLocationCity,' ','') as VendorCity
,replace(PrimaryVendorLocationZipCode,' ','') as VendorZip
FROM [table]
As you can see I also need to remove spaces to ensure a cleaner return. I would like to call the new column "NewVendorCode". So a record that originates like this:
R A Slack
Chicago Heights
60654-1234
Will return this:
RASChicagoHeights60654
You can use the following, using LEFT (MySQL / TSQL):
SELECT CONCAT(
LEFT(REPLACE(PrimaryVendorLocationName, ' ', ''), 3),
REPLACE(PrimaryVendorLocationCity, ' ', ''),
LEFT(REPLACE(PrimaryVendorLocationZipCode, ' ', ''), 5)
) FROM table_name
... or you can use SUBSTRING (MySQL / TSQL) (instead of LEFT):
SELECT CONCAT(
SUBSTRING(REPLACE(PrimaryVendorLocationName, ' ', ''), 1, 3),
REPLACE(PrimaryVendorLocationCity, ' ', ''),
SUBSTRING(REPLACE(PrimaryVendorLocationZipCode, ' ', ''), 1, 5)
) FROM table_name
Note: As you can see the SELECT querys work on MySQL and TSQL without change.
demo (MySQL): https://www.db-fiddle.com/f/wTuKzosFgkEuKXtruCTCxg/0
demo (TSQL): http://sqlfiddle.com/#!18/dbc98/1/1
You can use the following code:
SELECT VendorName+
replace(PrimaryVendorLocationName,' ','') +
replace(PrimaryVendorLocationCity,' ','') +
replace(PrimaryVendorLocationZipCode,' ','') as NewVendorCode
SELECT VendorName
,PrimaryName
,VendorCity
,VendorZip
,CONCAT(LEFT(PrimaryName,3),VendorCity,LEFT(VendorZip,5)) As NewVendorCode
FROM (
SELECT VendorName
,replace(PrimaryVendorLocationName,' ','') as PrimaryName
,replace(PrimaryVendorLocationCity,' ','') as VendorCity
,replace(PrimaryVendorLocationZipCode,' ','') as VendorZip
FROM [table]
)

Need Help in creating Dynamic SQL Query

I am using a SQL Server database. I have a SQL query which I have to write inside a stored procedure using SQL string and I am unable to write it.
The SQL query is
SELECT TOP (1000)
[OfficeNo], [CustNo], [SAPNo],
[Name1], [Name2],
[HomePhone], [OtherPhone], [FaxPhone], [cellPhone], [workPhone]
FROM
[dbo].[tblCustomers]
WHERE
OfficeNo = '1043'
AND (REPLACE(REPLACE(REPLACE(REPLACE(HomePhone,'(',''),' ',''),'-',''),')','') = '6147163987' )
OR (REPLACE(REPLACE(REPLACE(REPLACE(OtherPhone,'(',''),' ',''),'-',''),')','') = '6147163987'
OR (REPLACE(REPLACE(REPLACE(REPLACE(FaxPhone,'(',''),' ',''),'-',''),')','') = '6147163987'
OR (REPLACE(REPLACE(REPLACE(REPLACE(cellPhone,'(',''),' ',''),'-',''),')','') = '6147163987'
OR (REPLACE(REPLACE(REPLACE(REPLACE(workPhone,'(',''),' ',''),'-',''),')','') = '6147163987'))))
The above SQL query works, but I am unable to convert the above REPLACE statements inside a dynamic SQL string due to lot of single quotes and colons. And it is throwing errors.
Here is another option. This is using an inline table valued function which is a whole lot better for performance than a scalar function. There are several ways this could work but I chose to pass in both the stored (or formatted) value in addition to the desired clean value. This lets us use cross apply to filter out those rows that don't match.
create function PhoneNumberCheck
(
#StoredValue varchar(20)
, #CleanValue varchar(20)
) returns table as return
select CleanValue = #CleanValue
where #CleanValue = REPLACE(REPLACE(REPLACE(REPLACE(#StoredValue, '(', ''),' ', ''), '-', ''), ')', '')
Then to use this function we simply need to call it for each column of phone number values. One thing I should mention is in your original query you have top 1000 but you do not have an order by. This means you have no way of ensuring which rows you get back. If you use top you almost always need to include an order by.
SELECT TOP (1000) [OfficeNo]
,[CustNo]
,[SAPNo]
,[Name1]
,[Name2]
,[HomePhone]
,[OtherPhone]
,[FaxPhone]
,[cellPhone]
,[workPhone]
FROM [dbo].[tblCustomers] c
cross apply dbo.PhoneNumberCheck(HomePhone, '6147163987') hp
cross apply dbo.PhoneNumberCheck(OtherPhone, '6147163987') op
cross apply dbo.PhoneNumberCheck(FaxPhone, '6147163987') fp
cross apply dbo.PhoneNumberCheck(cellPhone, '6147163987') cp
cross apply dbo.PhoneNumberCheck(workPhone, '6147163987') wp
where OfficeNo = '1043'
--order by ???
Depending on what version of SQL server you are using there are better ways to do this now, but here is a function I have to clean phones for 2012 and earlier.
Create FUNCTION [dbo].[fn_CleanPhone] (
#phone VARCHAR(20))
RETURNS VARCHAR(10)
AS
BEGIN
RETURN CASE WHEN ISNUMERIC(LEFT(NULLIF(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(#phone,
'`', 1), '{', ''), '}', ''),'_', ''), ' ', ''), '-', ''), '.', ''), '(', ''), ')', ''), '/', ''), ''), 10)) = 1
THEN LEFT(NULLIF(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(#phone,
'`', 1), '{', ''), '}', ''), '_', ''), ' ', ''), '-', ''), '.', ''), '(', ''), ')', ''), '/', ''), ''), 10)
ELSE NULL
END
END
Then call the function like this in place of all your NULL and IF statements and LEFT statements as all of them are done in the function above
SELECT dbo.fn_CleanPhone('1234-22(23)')
so this in your where statement:
where OfficeNo = '1043'
AND (
dbo.fn_CleanPhone(HomePhone) = '6147163987' )
OR dbo.fn_CleanPhone(OtherPhone) = '6147163987' )
OR dbo.fn_CleanPhone(FaxPhone) = '6147163987' )
OR dbo.fn_CleanPhone(cellPhone) = '6147163987' )
OR dbo.fn_CleanPhone(workPhone) = '6147163987' )
) -- end for and
create a function to return the numbers of the input(sorry for bad naming):
CREATE FUNCTION GETCleand(#INPUT VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #INDEX INT = 0
DECLARE #CLEANED VARCHAR(MAX)=''
WHILE(#INDEX<LEN(#INPUT))
BEGIN
IF(ISNUMERIC(SUBSTRING(#INPUT,#INDEX,1))=1)
SET #CLEANED = #CLEANED + SUBSTRING(#INPUT,#INDEX,1)
SET #INDEX = #INDEX + 1
END
RETURN #CLEANED
END
SELECT TOP (1000) [OfficeNo]
,[CustNo]
,[SAPNo]
,[Name1]
,[Name2]
,[HomePhone]
,[OtherPhone]
,[FaxPhone]
,[cellPhone]
,[workPhone]
FROM [dbo].[tblCustomers]
where OfficeNo = '1043' and
(GetCleaned(HomePhone) = '6147163987'
or GetCleaned(OtherPhone) = '6147163987'
or GetCleand(FaxPhone) = '6147163987'
or GetCleand(cellPhone) = '6147163987'
or GetCleand(workPhone) = '6147163987')
when you have some OR and AND conditions on your where you should use parentheses on OR ones
Perhaps the easiest way is to let TSQL do the loop without the performance hit when using a Row-By-Row query.
I have created a little test query hoping it will be easier for you to implement it in your case.
declare #table table
(
id int
,PhoneNr nvarchar(18)
)
insert into #table
values(1,'(123) 4567')
,(2,'123, 4567')
,(3,'123 4567')
,(4,'123 - 4567');
;with t1 as
(
select PhoneNr, id from #table
union all
select cast(replace(PhoneNr, substring(PhoneNr, PatIndex('%[^a-z0-9]%', PhoneNr), 1), '') as nvarchar(18)), id
from t1
where PatIndex('%[^a-z0-9]%', PhoneNr) > 0
)
select t1.PhoneNr from t1
where PatIndex('%[^a-z0-9]%', t1.PhoneNr) = 0
option (maxrecursion 0)
I would:
Replace the values entered in my test table with your test cases
Run the query and alter the regex if needed
integrate it into your table structure and cast the table phone column to in, if no error you will have achieved your transformation.
When you have your 'set up working' then compare the execution plans and you can pick your winner ;-)

how to covert this string in 10 different nodes in SQL Server 2012?

My String is =
[10, 1],[7, 3],[15, 4],[10, 1],[14, 1]
How to convert it into 10 different nodes/values? My current attempt is like this
select CAST('<A>'+REPLACE(REPLACE( REPLACE(REPLACE('[10, 1],[7, 3],[15, 4],[10, 1],[14, 1]', '[', ''), ']', ''),',',''),' ','</A><A>')+'</A>' AS XML) AS Data
Answer=
<A>10</A><A>17</A><A>315</A><A>410</A><A>114</A><A>1</A>
I want it in 10 nodes/values instead of above. How i should do it in sql server 2012?
This is too long for comments
select REPLACE(REPLACE(REPLACE(#data, '],[', ''), '[', ''), ']', '')
Result :
10, 17, 315, 410, 114, 1
EDIT :
Is seems to you are looking for values only
select LTRIM(REPLACE(REPLACE(a.value('.', 'VARCHAR(30)'), '[', ''), ']', '')) [Data] from
(
select CAST('<A>'+REPLACE('[10, 1],[7, 3],[15, 4],[10, 1],[14, 1]', ',', '</A><A>')+'</A>' AS xml) AS Data
)a cross apply Data.nodes ('/A') as split(a)
Result :
Data
10
1
7
3
15
4
10
1
14
1
Already provided answers seem to work well, but I thought about a more versatile one (may work in more complex scenarios) using regular expressions:
Install sql-server-regex (e.g. for Sql Server 2014)
Use a "split" method
select Match from dbo.RegexSplit(#data, '\D') where Match <> ''
Performance testing
I noticed that using CLR functions is much faster than REPLACE as indicated below:
Using RegexSplit (about 20s for 1M elements)
declare #baseMsg varchar(max) = '[10, 1],[7, 3],[15, 4],[10, 1],[14, 1],'
declare #data varchar(max) = replicate(#baseMsg, 1000000)
select Match from dbo.RegexSplit(#data, '\D') where Match <> ''
Using REPLACE (about 15s for 2K elements)
declare #baseMsg varchar(max) = '[10, 1],[7, 3],[15, 4],[10, 1],[14, 1],'
declare #data varchar(max) = replicate(#baseMsg, 200)
select LTRIM(REPLACE(REPLACE(a.value('.', 'VARCHAR(30)'), '[', ''), ']', '')) [Data] from
(
select CAST('<A>'+REPLACE(#data, ',', '</A><A>')+'</A>' AS xml) AS Data
)a cross apply Data.nodes ('/A') as split(a)
So, we are talking about a difference of three orders of magnitude.
Of course, the solution should be chosen based on string length ,security permissions (maybe the SQLCLR is not allowed or the external library must be analyzed before it is allowed to run within SQL Server).
I found the answer, SQL should be as below :
select CAST('<A>'+REPLACE(REPLACE( REPLACE(
REPLACE('[10, 1],[7, 3],[15, 4],[10, 1],[14, 1]', '[', ''),
']', ' '),
',',''),
' ','</A><A>') +'</A>' AS XML) AS Data