I have a column with datatype nvarchar and I want to sort it in ascending order. How do I achieve it in SSRS? - sql

This is what I'm getting
abc 1
abc 12
abc 15
abc 2
abc 3
And this is how I want
abc 1
abc 2
abc 3
abc 12
abc 15
Query that I use:
select *
from view_abc
order by col1

Use a function to strip out the non numeric characters and leave just the value. Use another function to strip out all the numeric data. You can then sort on the two returned values.
It seems like a bit of work at first but once the functions are in you can re-use them in the future. Here's two functions I use regularly when we get data in from external sources and it's not very normalised.
They may not be the most efficient functions in the world but they work for my purposes
1st a function to just leave the numeric portion.
CREATE FUNCTION [fn].[StripToAlpha]
(
#inputString nvarchar(4000)
)
RETURNS varchar(4000)
AS
BEGIN
DECLARE #Counter as int
DECLARE #strReturnVal varchar(4000)
DECLARE #Len as int
DECLARE #ASCII as int
SET #Counter=0
SET #Len=LEN(#inputString)
SET #strReturnVal = ''
WHILE #Counter<=#Len
BEGIN
SET #Counter = #Counter +1
SET #ascii= ASCII(SUBSTRING(#inputString,#counter,1))
IF(#ascii BETWEEN 65 AND 90) OR (#ascii BETWEEN 97 AND 122)
BEGIN
SET #strReturnVal = #strReturnVal + (SUBSTRING(#inputString,#counter,1))
END
END
RETURN #strReturnVal
END
2nd a function to extract the value from a text field, this also handle percentages (e.g. abc 23% comes out as 0.23) but this is not required in your case.
You'll need to CREATE an 'fn' schema of change the schema name first...
CREATE FUNCTION [fn].[ConvertToValue]
(
#inputString nvarchar(4000)
)
RETURNS Float
AS
BEGIN
DECLARE #Counter as int
DECLARE #strReturnVal varchar(4000)
DECLARE #ReturnVal Float
DECLARE #Len as int
DECLARE #ASCII as int
SET #Counter=0
SET #Len=LEN(#inputString)
SET #strReturnVal = ''
IF #inputString IS NULL
BEGIN
Return NULL
END
IF #Len = 0 OR LEN(LTRIM(RTRIM(#inputString))) = 0
BEGIN
SET #ReturnVal=0
END
ELSE
BEGIN
WHILE #Counter<=#Len
BEGIN
SET #Counter = #Counter +1
SET #ascii= ASCII(SUBSTRING(#inputString,#counter,1))
IF(#ascii BETWEEN 48 AND 57) OR (#ascii IN (46,37))
BEGIN
SET #strReturnVal = #strReturnVal + (SUBSTRING(#inputString,#counter,1))
END
END
if RIGHT(#strReturnVal,1)='%'
BEGIN
SET #strReturnVal = LEFT(#strReturnVal,len(#strReturnVal)-1)
SET #strReturnVal = CAST((CAST(#strReturnVal AS FLOAT)/100) AS nvarchar(4000))
END
SET #ReturnVal = ISNULL(#strReturnVal,0)
END
RETURN #ReturnVal
END
Now we have the two functions created you can simply do
SELECT *
FROM view_abc
ORDER BY fn.StripToAlpha(Col1), fn.ConvertToValue(Col1)

Try this
Edited :
SELECT CAST(SUBSTRING(ColumnNameToOrder, CHARINDEX(' ', ColumnNameToOrder, 0), LEN (ColumnNameToOrder)) AS INT) AS IntColumn, SUBSTRING(ColumnNameToOrder,0, CHARINDEX(' ', ColumnNameToOrder, 0)) AS CharColumn, * FROM view_abc ORDER BY Charcolumn, Intcolumn
Instead of ColumnNameToOrder, you can put your column name which contains the data like 'abc 123'...
Tell me if it works please.

This is what I have come up with. Maybe it can help you, or at least point you in the right direction.
I tested the values. When the values have a zero, the order is like you would like the order to be. Like this:
abc 01
abc 02
abc 03
abc 12
abc 15
So you can run this query to update the existing values, to add the zero.
UPDATE abc
SET col1 = 'abc 0' + SUBSTRING(col1, 5, 1)
WHERE LEN(col1) = 5
Or you can do the above query like this if the first three characters can vary:
UPDATE abc
SET col1 = (SUBSTRING(col1, 1, 3) + ' 0' + SUBSTRING(col1, 5, 1))
WHERE col1 LIKE 'abc__'
This will override the existing value in the col1 column, only when the length of the current String is of length 5.
Then you can run the following query to get the results:
SELECT col1
FROM abc
ORDER BY col1 ASC

=cint(right(Fields!Col1.Value, instrrev(Fields!Col1.Value, " ")-1))
This will work in SSRS and sort correctly, but will only work if Col1 always contains a space, and that the characters after the space can be converted to an integer.

Related

in SQL, how can I find duplicate string values within the same record?

Sample table
Record Number | Filter | Filters_Applied
----------------------------------------------
1 | yes | red, blue
2 | yes | green
3 | no |
4 | yes | red, red, blue
Is it possible to query all records where there are duplicate string values? For example, how could I query to pull record 4 where the string "red" appeared twice? Except in the table that I am dealing with, there are far more string values that can populate in the "filters_applied" column.
CLARIFICATION I am working out of Periscope and pulling data using SQL.
I assume that you have to check that in the logical page.
You can query the table with like '%red%'.
select Filters_Applied from table where Filters_Applied like '%red%';
You will get the data which has red at least one. Then, doing some string analysis in logic page.
In php, You can use the substr_count function to determine the number of occurrences of the string.
//the loop to load db query
while(){
$number= substr_count("Filters_Applied",red);
if($number>1){
echo "this".$Filters_Applied.">1"
}
}
for SQL-SERVER or other versions which can run these functions
Apply this logic
declare #val varchar(100) = 'yellow,blue,white,green'
DECLARE #find varchar(100) = 'green'
select #val = replace(#val,' ','') -- remove spaces
select #val;
select (len(#val)-len(replace(#val,#find,'')))/len(#find) [recurrence]
Create this Function which will parse string into rows and write query as given below. This will works for SQL Server.
CREATE FUNCTION [dbo].[StrParse]
(#delimiter CHAR(1),
#csv NTEXT)
RETURNS #tbl TABLE(Keys NVARCHAR(255))
AS
BEGIN
DECLARE #len INT
SET #len = Datalength(#csv)
IF NOT #len > 0
RETURN
DECLARE #l INT
DECLARE #m INT
SET #l = 0
SET #m = 0
DECLARE #s VARCHAR(255)
DECLARE #slen INT
WHILE #l <= #len
BEGIN
SET #l = #m + 1--current position
SET #m = Charindex(#delimiter,Substring(#csv,#l + 1,255))--next delimiter or 0
IF #m <> 0
SET #m = #m + #l
--insert #tbl(keys) values(#m)
SELECT #slen = CASE
WHEN #m = 0 THEN 255 --returns the remainder of the string
ELSE #m - #l
END --returns number of characters up to next delimiter
IF #slen > 0
BEGIN
SET #s = Substring(#csv,#l,#slen)
INSERT INTO #tbl
(Keys)
SELECT #s
END
SELECT #l = CASE
WHEN #m = 0 THEN #len + 1 --breaks the loop
ELSE #m + 1
END --sets current position to 1 after next delimiter
END
RETURN
END
GO
CREATE TABLE Table1# (RecordNumber int, [Filter] varchar(5), Filters_Applied varchar(100))
GO
INSERT INTO Table1# VALUES
(1,'yes','red, blue')
,(2,'yes','green')
,(3,'no ','')
,(4,'yes','red, red, blue')
GO
--This query will return what you are expecting
SELECT t.RecordNumber,[Filter],Filters_Applied,ltrim(rtrim(keys)), count(*)NumberOfRows
FROM Table1# t
CROSS APPLY dbo.StrParse (',', t.Filters_Applied)
GROUP BY t.RecordNumber,[Filter],Filters_Applied,ltrim(rtrim(keys)) HAVING count(*) >1
You didn't state your DBMS, but in Postgres this isn't that complicated:
select st.*
from sample_table st
join lateral (
select count(*) <> count(distinct trim(item)) as has_duplicates
from unnest(string_to_array(filters_applied,',')) as t(item)
) x on true
where x.has_duplicates;
Online example: http://rextester.com/TJUGJ44586
With the exception of string_to_array() the above is actually standard SQL

Return all words starting with a character in a column

I have a VARCHAR column with data like this:
abc = :abc and this = :that
I need a query to find all of the special "words" that start with a colon in this column of data. I don't really need any other data (IDs or otherwise) and duplicates would be OK. I can remove duplicates in Excel later if need be. So if this was the only row, I'd like something like this as the output:
SpecialWords
:abc
:that
I'm thinking it'll require a CHARINDEX or something like that. But since there could be more than one special word in the column, I can't just find the first : and strip out the rest.
Any help is greatly appreciated! Thanks in advance!
You have to split this value based on spaces and return only fields that starts with a colon :, i provided 2 solutions to achieve this based on the result type you need (Table or Single Value)
Table-Valued Function
You can create a TV function to split this column into a table:
CREATE FUNCTION [dbo].[GETVALUES]
(
#DelimitedString varchar(8000)
)
RETURNS #tblArray TABLE
(
ElementID int IDENTITY(1,1), -- Array index
Element varchar(1000) -- Array element contents
)
AS
BEGIN
-- Local Variable Declarations
-- ---------------------------
DECLARE #Index smallint,
#Start smallint,
#DelSize smallint
SET #DelSize = 1
-- Loop through source string and add elements to destination table array
-- ----------------------------------------------------------------------
WHILE LEN(#DelimitedString) > 0
BEGIN
SET #Index = CHARINDEX(' ', #DelimitedString)
IF #Index = 0
BEGIN
IF ((LTRIM(RTRIM(#DelimitedString))) LIKE ':%')
INSERT INTO
#tblArray
(Element)
VALUES
(LTRIM(RTRIM(#DelimitedString)))
BREAK
END
ELSE
BEGIN
IF (LTRIM(RTRIM(SUBSTRING(#DelimitedString, 1,#Index - 1)))) LIKE ':%'
INSERT INTO
#tblArray
(Element)
VALUES
(LTRIM(RTRIM(SUBSTRING(#DelimitedString, 1,#Index - 1))))
SET #Start = #Index + #DelSize
SET #DelimitedString = SUBSTRING(#DelimitedString, #Start , LEN(#DelimitedString) - #Start + 1)
END
END
RETURN
END
And you can use it like the following:
DECLARE #SQLStr varchar(100)
SELECT #SQLStr = 'abc = :abc and this = :that and xyz = :asd'
SELECT
*
FROM
dbo.GETVALUES(#SQLStr)
Result:
Scalar-Valued Function
If you need to return a value (not table) so you can use this function which will return on all values separated by (line feed + carridge return CHAR(13) + CHAR(10))
CREATE FUNCTION dbo.GetValues2
(
#DelimitedString varchar(8000)
)
RETURNS varchar(8000)
AS
BEGIN
DECLARE #Index smallint,
#Start smallint,
#DelSize smallint,
#Result varchar(8000)
SET #DelSize = 1
SET #Result = ''
WHILE LEN(#DelimitedString) > 0
BEGIN
SET #Index = CHARINDEX(' ', #DelimitedString)
IF #Index = 0
BEGIN
if (LTRIM(RTRIM(#DelimitedString))) LIKE ':%'
SET #Result = #Result + char(13) + char(10) + (LTRIM(RTRIM(#DelimitedString)))
BREAK
END
ELSE
BEGIN
IF (LTRIM(RTRIM(SUBSTRING(#DelimitedString, 1,#Index - 1)))) LIKE ':%'
SET #Result = #Result + char(13) + char(10) + (LTRIM(RTRIM(SUBSTRING(#DelimitedString, 1,#Index - 1))))
SET #Start = #Index + #DelSize
SET #DelimitedString = SUBSTRING(#DelimitedString, #Start , LEN(#DelimitedString) - #Start + 1)
END
END
return #Result
END
GO
you can use it as the following
DECLARE #SQLStr varchar(100)
SELECT #SQLStr = 'abc = :abc and this = :that and xyz = :asd'
SELECT dbo.GetValues2(#SQLStr)
Result
in the table result line feed are not visible, just copy the data to an editor and it will appears as shown in the image
References
Splitting the string in sql server
One way is to write a specialized SPLIT function. I would suggest getting a TSQL Split function off the internet and see if you can adapt the code to your needs.
Working from scratch, you could write a function that loops over the column value using CHARINDEX until it doesn't find any more : characters.
How about using a charindex?
rextester sample:
create table mytable (testcolumn varchar(20))
insert into mytable values ('this = :that'),('yes'), (':no'), ('abc = :abc')
select right(testcolumn, charindex(':', reverse(testcolumn)) - 1) from mytable
where testcolumn like '%:%'
reference:
SQL Select everything after character
Update
Addressing Sami's:
Didn't see that two words could be in one colon, how about this?
select replace(substring(testcolumn, charindex(':', testcolumn), len(testcolumn)), ':', '')
Update again
I see, the actual statement is this = :that and that = :this
If performance is important then you want to use an inline table valued function to split the string and extract what you need. You could use delimitedSplit8K or delimitedSplit8K_lead for this.
declare #string varchar(8000) = 'abc = :abc and this = :that';
select item
from dbo.DelimitedSplit8K(#string, ' ')
where item like ':%';
returns:
item
------
:abc
:that
And for even better performance than what I posted above you could use ngrams8k like so:
declare #string varchar(8000) = 'abc = :abc and this = :that';
select position, item =
substring(#string, position,
isnull(nullif(charindex(' ',#string,position+1),0),8000)-position)
from dbo.ngrams8k(#string, 1)
where token = ':';
This even gives you the location of the item you are searching for:
position item
---------- -------
7 :abc
23 :that

How to update values using case statement

I have created update statement like below
UPDATE dbo.S_Item
SET SalePrice3 = CASE WHEN Price <0 THEN '-1'
when Price=1 then 11
when Price=2 then 22
when Price=3 then 33
when Price=4 then 44
when Price=5 then 55
when Price=6 then 66
when Price=7 then 77
when Price=8 then 88
when Price=9 then 99
when Price=0 then 00
end
but i want update more values using above statement for example if want update price=123 it has to update 112233,if price=456 it has to update 445566,if price=725 it has to update 772255 how can achieve this help me
Create Function ReplicateDigits (
#Number Int)
Returns BigInt
Begin
Declare #Step SmallInt = 1,
#Result nVaRchar(100) = N''
While (#Step <= Len(#Number))
Begin
Select #Result = #Result + Replicate(SubString(Cast(#Number As Varchar), #Step, 1), 2)
Select #Step = #Step + 1
End
Return Cast(#Result As BigInt)
End
Go
Then:
UPDATE dbo.S_Item
SET SalePrice3 = CASE
WHEN Price <0 THEN '-1'
Else dbo.ReplicateDigits(Price)
End
Let me know if it was useful.
If the point is just in duplication of every digit, here's another implementation of the duplication method:
CREATE FUNCTION dbo.DuplicateDigits(#Input int)
RETURNS varchar(20)
AS
BEGIN
DECLARE #Result varchar(20) = CAST(#Input AS varchar(20));
DECLARE #Pos int = LEN(#Result);
WHILE #Pos > 0
BEGIN
SET #Result = STUFF(#Result, #Pos, 0, SUBSTRING(#Result, #Pos, 1));
SET #Pos -= 1;
END;
RETURN #Result;
END;
The method consists in iterating through the digits backwards, extracting each using SUBSTRING and duplicating it using STUFF.
And you would be using this function same as in Meysam Tolouee's answer:
UPDATE dbo.S_Item
SET SalePrice3 = CASE
WHEN Price < 0 THEN '-1'
ELSE dbo.DuplicateDigits(SalePrice3)
END;
To explain a little why the function's returned type is varchar, it is because that guarantees that the function returns the result no matter what the input's [reasonable] length is. The maximum length of 20 has been chosen merely because the input is [assumed to be] int and positive int values consist of up to 10 digits.
However, whether varchar(20) converts to the type of SalePrice3 is another matter, which should be considered separately.
Youy Must Create a Procedure for Achiving the Desired Result Rather Than to Use a Single Query.

Is it possible to compare rows for similar data in SQL server

Is it possible to compare rows for similar data in SQL Server? I have a company name column in a table where company names could be somewhat similar. Here is an example of the different 8 values that represent the same 4 companies:
ANDORRA WOODS
ANDORRA WOODS HEALTHCARE CENTER
ABC HEALTHCARE, JOB #31181
ABC HEALTHCARE, JOB #31251
ACTION SERVICE SALES, A SUBSIDIARY OF SINGER EQUIPMENT
ACTION SERVICE SALES, A SUBSIDIARY OF SINGER EQUIPMENT COMPANY
APEX SYSTEMS
APEX SYSTEMS, INC
The way I clean it right now is using Google refine where I can identify clusters of similar data values and make them all as one.
Using this example I only need 4 names not 8 so I need to replace similar ones with only one since I will be assigning indexes to those names later on. Any help is greatly appreciated.
I have a couple UDF's I converted from some VB code some time ago that takes in 2 varchar() and returns an int between 0 and 100 (0= not similar, 100= same) if your interested.
-- Description: Removes any special characters from a string
CREATE FUNCTION [dbo].[SimReplaceSpecial]
(
-- Add the parameters for the function here
#String varchar(max)
)
RETURNS varchar(max)
AS
BEGIN
-- Declare the return variable here
DECLARE #Result varchar(max) = ''
-- Add the T-SQL statements to compute the return value here
DECLARE #Pos int = 1
DECLARE #Asc int
DECLARE #WorkingString varchar(max)
SET #WorkingString = upper(#String)
WHILE #Pos <= LEN(#WorkingString)
BEGIN
SET #Asc = ascii(substring(#WorkingString,#Pos,1))
If (#Asc >= 48 And #Asc <= 57) Or (#Asc >= 65 And #Asc <= 90)
SET #Result = #Result + Char(#Asc)
SET #Pos = #Pos + 1
--IF #Pos + 1 > len(#String)
-- BREAK
--ELSE
-- CONTINUE
END
-- Return the result of the function
RETURN #Result
END
-- Description: DO NOT CALL DIRECTLY - Used by the Similar function
-- Finds longest common substring (other than single
-- characters) in String1 and String2, then recursively
-- finds longest common substring in left-hand
-- portion and right-hand portion. Updates the
-- cumulative score.
CREATE FUNCTION [dbo].[SimFindCommon]
(
-- Add the parameters for the function here
#String1 varchar(max),
#String2 varchar(max),
#Score int
)
RETURNS int
AS
BEGIN
-- Declare the return variable here
--DECLARE #Result int
DECLARE #Longest Int = 0
DECLARE #StartPos1 Int = 0
DECLARE #StartPos2 Int = 0
DECLARE #J Int = 0
DECLARE #HoldStr varchar(max)
DECLARE #TestStr varchar(max)
DECLARE #LeftStr1 varchar(max) = ''
DECLARE #LeftStr2 varchar(max) = ''
DECLARE #RightStr1 varchar(max) = ''
DECLARE #RightStr2 varchar(max) = ''
-- Add the T-SQL statements to compute the return value here
SET #HoldStr = #String2
WHILE LEN(#HoldStr) > #Longest
BEGIN
SET #TestStr = #HoldStr
WHILE LEN(#TestStr) > 1
BEGIN
SET #J = CHARINDEX(#TestStr,#String1)
If #J > 0
BEGIN
--Test string is sub-set of the other string
If Len(#TestStr) > #Longest
BEGIN
--Test string is longer than previous
--longest. Store its length and position.
SET #Longest = Len(#TestStr)
SET #StartPos1 = #J
SET #StartPos2 = CHARINDEX(#TestStr,#String2)
END
--No point in going further with this string
BREAK
END
ELSE
--Test string is not a sub-set of the other
--string. Discard final character of test
--string and try again.
SET #TestStr = Left(#TestStr, LEN(#TestStr) - 1)
END
--Now discard first char of test string and
--repeat the process.
SET #HoldStr = Right(#HoldStr, LEN(#HoldStr) - 1)
END
--Update the cumulative score with the length of
--the common sub-string.
SET #Score = #Score + #Longest
--We now have the longest common sub-string, so we
--can isolate the sub-strings to the left and right
--of it.
If #StartPos1 > 3 And #StartPos2 > 3
BEGIN
SET #LeftStr1 = Left(#String1, #StartPos1 - 1)
SET #LeftStr2 = Left(#String2, #StartPos2 - 1)
If RTRIM(LTRIM(#LeftStr1)) <> '' And RTRIM(LTRIM(#LeftStr2)) <> ''
BEGIN
--Get longest common substring from left strings
SET #Score = dbo.SimFindCommon(#LeftStr1, #LeftStr2,#Score)
END
END
ELSE
BEGIN
SET #LeftStr1 = ''
SET #LeftStr2 = ''
END
If #Longest > 0
BEGIN
SET #RightStr1 = substring(#String1, #StartPos1 + #Longest, LEN(#String1))
SET #RightStr2 = substring(#String2, #StartPos2 + #Longest, LEN(#String2))
If RTRIM(LTRIM(#RightStr1)) <> '' And RTRIM(LTRIM(#RightStr2)) <> ''
BEGIN
--Get longest common substring from right strings
SET #Score = dbo.SimFindCommon(#RightStr1, #RightStr2,#Score)
END
END
ELSE
BEGIN
SET #RightStr1 = ''
SET #RightStr2 = ''
END
-- Return the result of the function
RETURN #Score
END
-- Description: Compares two not-empty strings regardless of case.
-- Returns a numeric indication of their similarity
-- (0 = not at all similar, 100 = identical)
CREATE FUNCTION [dbo].[Similar]
(
-- Add the parameters for the function here
#String1 varchar(max),
#String2 varchar(max)
)
RETURNS int
AS
BEGIN
-- Declare the return variable here
DECLARE #Result int
DECLARE #WorkingString1 varchar(max)
DECLARE #WorkingString2 varchar(max)
-- Add the T-SQL statements to compute the return value here
if isnull(#String1,'') = '' or isnull(#String2,'') = ''
SET #Result = 0
ELSE
BEGIN
--Convert each string to simplest form (letters
--and digits only, all upper case)
SET #WorkingString1 = dbo.SimReplaceSpecial(#String1)
SET #WorkingString2 = dbo.SimReplaceSpecial(#String2)
If RTRIM(LTRIM(#WorkingString1)) = '' Or RTRIM(LTRIM(#WorkingString2)) = ''
BEGIN
--One or both of the strings is now empty
SET #Result = 0
END
ELSE
BEGIN
If #WorkingString1 = #WorkingString2
BEGIN
--Strings are identical
SET #Result = 100
END
ELSE
BEGIN
--Find all common sub-strings
SET #Result = dbo.SimFindCommon(#WorkingString1, #WorkingString2,0)
--We now have the cumulative score. Return this
--as a percent of the maximum score. The maximum
--score is the average length of the two strings.
SET #Result = #Result * 200 / (Len(#WorkingString1) + Len(#WorkingString2))
END
END
END
-- Return the result of the function
RETURN #Result
END
--Usage--------------------------------------------------------------------
--Call the "Similar" Function only
SELECT dbo.Similar('ANDORRA WOODS','ANDORRA WOODS HEALTHCARE CENTER')
--Result = 60
SELECT dbo.Similar('ABC HEALTHCARE, JOB #31181','ABC HEALTHCARE, JOB #31251')
--Result = 85
SELECT dbo.Similar('ACTION SERVICE SALES, A SUBSIDIARY OF SINGER EQUIPMENT','ACTION SERVICE SALES, A SUBSIDIARY OF SINGER EQUIPMENT COMPANY')
--Result = 92
SELECT dbo.Similar('APEX SYSTEMS','APEX SYSTEMS, INC')
--Result = 88
SSIS/Data Tools has a Fuzzy Grouping transformation that is very helpful in situations like this. It doesn't actually group your data, rather it gives you similarity scores that you can use to determine when items should be grouped together.
Plenty of tutorials out there, here's one: The Fuzzy Grouping Transformation

T-SQL trim &nbsp (and other non-alphanumeric characters)

We have some input data that sometimes appears with &nbsp characters on the end.
The data comes in from the source system as varchar() and our attempts to cast as decimal fail b/c of these characters.
Ltrim and Rtrim don't remove the characters, so we're forced to do something like:
UPDATE myTable
SET myColumn = replace(myColumn,char(160),'')
WHERE charindex(char(160),myColumn) > 0
This works for the &nbsp, but is there a good way to do this for any non-alphanumeric (or in this case numeric) characters?
This will remove all non alphanumeric chracters
CREATE FUNCTION [dbo].[fnRemoveBadCharacter]
(
#BadString nvarchar(20)
)
RETURNS nvarchar(20)
AS
BEGIN
DECLARE #nPos INTEGER
SELECT #nPos = PATINDEX('%[^a-zA-Z0-9_]%', #BadString)
WHILE #nPos > 0
BEGIN
SELECT #BadString = STUFF(#BadString, #nPos, 1, '')
SELECT #nPos = PATINDEX('%[^a-zA-Z0-9_]%', #BadString)
END
RETURN #BadString
END
Use the function like:
UPDATE TableToUpdate
SET ColumnToUpdate = dbo.fnRemoveBadCharacter(ColumnToUpdate)
WHERE whatever
This page has a sample of how you can remove non-alphanumeric chars:
-- Put something like this into a user function:
DECLARE #cString VARCHAR(32)
DECLARE #nPos INTEGER
SELECT #cString = '90$%45623 *6%}~:#'
SELECT #nPos = PATINDEX('%[^0-9]%', #cString)
WHILE #nPos > 0
BEGIN
SELECT #cString = STUFF(#cString, #nPos, 1, '')
SELECT #nPos = PATINDEX('%[^0-9]%', #cString)
END
SELECT #cString
How is the table being populated? While it is possible to scrub this in sql a better approach would be to change the column type to int and scrub the data before it's loaded into the database (SSIS). Is this an option?
For large datasets I have had better luck with this function that checks the ASCII value. I have added options to keep only alpha, numeric or alphanumeric based on the parameters.
--CleanType 1 - Remove all non alpanumeric
-- 2 - Remove only alpha
-- 3 - Remove only numeric
CREATE FUNCTION [dbo].[fnCleanString] (
#InputString varchar(8000)
, #CleanType int
, #LeaveSpaces bit
) RETURNS varchar(8000)
AS
BEGIN
-- // Declare variables
-- ===========================================================
DECLARE #Length int
, #CurLength int = 1
, #ReturnString varchar(8000)=''
SELECT #Length = len(#InputString)
-- // Begin looping through each char checking ASCII value
-- ===========================================================
WHILE (#CurLength <= (#Length+1))
BEGIN
IF (ASCII(SUBSTRING(#InputString,#CurLength,1)) between 48 and 57 AND #CleanType in (1,3) )
or (ASCII(SUBSTRING(#InputString,#CurLength,1)) between 65 and 90 AND #CleanType in (1,2) )
or (ASCII(SUBSTRING(#InputString,#CurLength,1)) between 97 and 122 AND #CleanType in (1,2) )
or (ASCII(SUBSTRING(#InputString,#CurLength,1)) = 32 AND #LeaveSpaces = 1 )
BEGIN
SET #ReturnString = #ReturnString + SUBSTRING(#InputString,#CurLength,1)
END
SET #CurLength = #CurLength + 1
END
RETURN #ReturnString
END
If the mobile could start with a Plus(+) I will use the function like this
CREATE FUNCTION [dbo].[Mobile_NoAlpha](#Mobile VARCHAR(1000))
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE #StartsWithPlus BIT = 0
--check if the mobile starts with a plus(+)
IF LEFT(#Mobile, 1) = '+'
BEGIN
SET #StartsWithPlus = 1
--Take out the plus before using the regex to eliminate invalid characters
SET #Mobile = RIGHT(#Mobile, LEN(#Mobile)-1)
END
WHILE PatIndex('%[^0-9]%', #Mobile) > 0
SET #Mobile = Stuff(#Mobile, PatIndex('%[^0-9]%', #Mobile), 1, '')
IF #StartsWithPlus = 1
SET #Mobile = '+' + #Mobile
RETURN #Mobile
END