I have a column that contain these value from a table.
Colum1: dhd-29229 Table: Test
dhd-29199
dhd-00011
My goal is to write a select sql statement that will return the following value: All numeric value and not take into account dhd-
Use like:
where column1 like '%29229'
If you want exactly XXX-29199 in this format, you've to use :
where column1 like '___-29229'
If you want just the 29229, you can use : where column1 like '%29229' but in this case the query will return all values that ends with "29229". you can think to controle the length.
https://www.w3schools.com/sql/sql_like.asp
in T-SQL (sql server):
select substring(colum1, charindex('-',colum1)+1,5) as sub_string from test
If you want a strict 'numbers only' that does not reply upon format or inconsistencies... try writing a UDF :
Create FUNCTION [test].[numOnly]
(#text nvarchar (max))
RETURNS nvarchar (max)
AS
BEGIN
DECLARE
#char nvarchar(1),
#i_idx int,
#temp int,
#out nvarchar(max)
While len(#text) > 0
Begin
Set #char = left(#text,1)
if ascii(#char) between 49 and 57
set #out = isnull(#out,'') + #char
Set #text = RIGHT(#text,len(#text)-1)
End
Return #out
END
Use it thus ...
select [test].[numOnly]('44--(*)"*()£hllk564kkj3') as 'lala'
Result : 445643
Related
I have problem in optimizing an SQL query to do some data cleansing.
In fact, I have a table which is a sort of referential of a multiple special characters and word. Let's call it ABNORMAL(ID,PATTERN)
I have also another table INDIVIDUALS containing a column (NAME) which I want to clean by removing from it all characters that exist in the table ABNORMAL.
Currently, I have tried to use update statements, but I'm not sure if there is a better way to do this.
Approach one
Use a while loop to build a replace containing all characters from ABNORMALS by a blank '' and do one update using the built-in REPLACE
DECLARE #REPLACE_EXPRESSION nvarchar(max) ='REPLACE(NAME,'''','''')'
DECLARE #i int = 1
DECLARE #nbr int = (SELECT COUNT(*) FROM ABNORMAL)
-- CURRENT_CHARAC
DECLARE #CURRENT_CHARAC nvarchar(max)
-- NEW REPLACE EXPRESSION TO IMBRICATE INTO THE REPLACE EXPRESSION VARIABLE
DECLARE #CURR_REP NVARCHAR(max)
-- STRING TO BUILD AN SQL QUERY CONTAINING THE REPLACE EXPRESSION
DECLARE #UPDATE_QUERY nvarchar(max)
WHILE #i < #nbr
BEGIN
SELECT #CURRENT_CHARAC=PATTERN FROM CLEANSING_STG_PRISM_FRA_REF_UNSIGNIFICANT_VALUES WHERE ID_PATTERN=#i ;
SET #REPLACE_EXPRESSION = REPLACE(#REPLACE_EXPRESSION ,'NAME','REPLACE(NAME,'+''''+#CURRENT_CHARAC+''''+','''')')
set #i=#i+1
END
SET #UPDATE_QUERY = 'UPDATE INDIVIDUAL SET NAME ='+ #REPLACE_EXPRESSION
EXEC sp_executesql #UPDATE_QUERY
Approach two
Use a while loop to select every character in abnormal and do an update using replace containing the characters to remove:
DECLARE #i int = 1
DECLARE #nbr int = (SELECT COUNT(*) FROM ABNORMAL)
-- CURRENT_CHARAC
DECLARE #CURRENT_CHARAC nvarchar(max)
-- STRING TO BUILD AN SQL QUERY CONTAINING THE REPLACE EXPRESSION
DECLARE #UPDATE_QUERY nvarchar(max)
WHILE #i < #nbr
BEGIN
SELECT #CURRENT_CHARAC=PATTERN FROM CLEANSING_STG_PRISM_FRA_REF_UNSIGNIFICANT_VALUES WHERE ID_PATTERN=#i ;
UPDATE INDIVIDUAL
SET NAME = REPLACE(NAME,#CURRENT_CHARAC,'')
SET #i=#i+1
END
I already tested both approaches for 2 millions records, and I found that the first approach is faster than the second. I would know if you have already done something similar and new (better) ideas to try.
If you are using SQL Server 2017 you could use TRANSLATE and avoid dynamic SQL:
SELECT i.*
, REPLACE(TRANSLATE(i.NAME, f, REPLICATE('!', s.l)), '!', '') AS cleansed
FROM INDIVIDUALS i
OUTER APPLY (SELECT STRING_AGG(PATTERN, '') AS f
,LEN(STRING_AGG(PATTERN,'')) AS l
FROM ABNORMAL) AS s
DBFiddle Demo
Anyway 1st approach is better becasue you do one UPDATE, with second approach you remove characters one character at time (so you will have multiple UPDATE).
I would also track transaction log growth with both approaches.
If there's not too many characters that to be cleaned, then this trick might work.
Basically, you build 1 big update statement with a replace for each value in the table with the characters to be removed.
Example code:
Test data (using temp tables)
create table #ABNORMAL_CHARACTERS (id int identity(1,1), chr varchar(30));
insert into #ABNORMAL_CHARACTERS (chr) values ('!'),('&'),('#');
create table #INDIVIDUAL (id int identity(1,1), name varchar(30));
insert into #INDIVIDUAL (name) values ('test 1 &'),('test !'),('test 3');
Code:
declare #FieldName varchar(30) = 'name';
declare #Replaces varchar(max) = #FieldName;
declare #UpdateSQL varchar(max);
select #Replaces = concat('replace('+#Replaces+', ', ''''+chr+''','''')') from #ABNORMAL_CHARACTERS order by id;
set #UpdateSQL = 'update #INDIVIDUAL
set name = '+#Replaces + '
where exists (select 1 from #ABNORMAL_CHARACTERS where charindex(chr,name)>0)';
exec (#UpdateSQL);
select * from #INDIVIDUAL;
A test here on rextester
And if you would have a UDF that can do a regex replace.
For example here
Then the #Replaces variable could be simplified with only 1 RegexReplace function and a pattern.
I'm asking this question for SQL Server 2008 R2
I'd like to know if there is a way to create multiple functions in a single batch statement.
I've made the following code as an example; suppose I want to take a character string and rearrange its letters in alphabetical order. So, 'Hello' would become 'eHllo'
CREATE FUNCTION char_split (#string varchar(max))
RETURNS #characters TABLE
(
chars varchar(2)
)
AS
BEGIN
DECLARE #length int,
#K int
SET #length = len(#string)
SET #K = 1
WHILE #K < #length+1
BEGIN
INSERT INTO #characters
SELECT SUBSTRING(#string,#K,1)
SET #K = #K+1
END
RETURN
END
CREATE FUNCTION rearrange (#string varchar(max))
RETURNS varchar(max)
AS
BEGIN
DECLARE #SplitData TABLE (
chars varchar(2)
)
INSERT INTO #SplitData SELECT * FROM char_split(#string)
DECLARE #Output varchar(max)
SELECT #Output = coalesce(#Output,' ') + cast(chars as varchar(10))
from #SplitData
order by chars asc
RETURN #Output
END
declare #string varchar(max)
set #string = 'Hello'
select dbo.rearrange(#string)
When I try running this code, I get this error:
'CREATE FUNCTION' must be the first statement in a query batch.
I tried enclosing each function in a BEGIN END block, but no luck. Any advice?
Just use a GO statement between the definition of the UDFs
Not doable. SImple like that.
YOu can make it is one statement using a GO between them.
But as the GO is a batch delimiter.... this means you send multiple batches, which is explicitly NOT Wanted in your question.
So, no - it is not possible to do that in one batch as the error clearly indicates.
I am having a small problem with the IN SQL statement. I was just wondering if anyone could help me?
#Ids = "1,2,3,4,5"
SELECT * FROM Nav WHERE CONVERT(VARCHAR,NavigationID) IN (CONVERT(VARCHAR,#Ids))
This is coming back with the error below, I am sure this is pretty simple!
Conversion failed when converting the varchar value '1,' to data type int.
The SQL IN clause does not accept a single variable to represent a list of values -- no database does, without using dynamic SQL. Otherwise, you could use a Table Valued Function (SQL Server 2000+) to pull the values out of the list & return them as a table that you can join against.
Dynamic SQL example:
EXEC('SELECT *
FROM Nav
WHERE NavigationID IN ('+ #Ids +')')
I recommend reading The curse and blessings of dynamic SQL before using dynamic SQL on SQL Server.
Jason:
First create a function like this
Create FUNCTION [dbo].[ftDelimitedAsTable](#dlm char, #string varchar(8000))
RETURNS
--------------------------------------------------------------------------*/
/*------------------------------------------------------------------------
declare #dlm char, #string varchar(1000)
set #dlm=','; set #string='t1,t2,t3';
-- tHIS FUNCION RETUNRS IN THE ASCENDING ORDER
-- 19TH Apr 06
------------------------------------------------------------------------*/
--declare
#table_var TABLE
(id int identity(1,1),
r varchar(1000)
)
AS
BEGIN
declare #n int,#i int
set #n=dbo.fnCountChars(#dlm,#string)+1
SET #I =1
while #I <= #N
begin
insert #table_var
select dbo.fsDelimitedString(#dlm,#string,#i)
set #I= #I+1
end
if #n =1 insert #TABLE_VAR VALUES(#STRING)
delete from #table_var where r=''
return
END
And then
set quoted_identifier off
declare #ids varchar(max)
select #Ids = "1,2,3,4,5"
declare #nav table ( navigationid int identity(1,1),theother bigint)
insert #nav(theother) select 10 union select 11 union select 15
SELECT * FROM #Nav WHERE CONVERT(VARCHAR,NavigationID) IN (select id from dbo.ftDelimitedAsTable(',',#Ids))
select * from dbo.ftDelimitedAsTable(',',#Ids)
What you're doing is not possible with the SQL IN statement. You cannot pass a string to it and expect that string to be parsed. IN is for specific, hard-coded values.
There are two ways to do what you want to do here.
One is to create a 'dynamic sql' query and execute it, after substituting in your IN list.
DECLARE #query varchar(max);
SET #query = 'SELECT * FROM Nav WHERE CONVERT(VARCHAR,NavigationID) IN (' + #Ids + ')'
exec (#query)
This can have performance impacts and other complications. Generally I'd try to avoid it.
The other method is to use a User Defined Function (UDF) to split the string into its component parts and then query against that.
There's a post detailing how to create that function here
Once the function exists, it's trivial to join onto it
SELECT * FROM Nav
CROSS APPLY dbo.StringSplit(#Ids) a
WHERE a.s = CONVERT(varchar, Nav.NavigationId)
NB- the 'a.s' field reference is based on the linked function, which stores the split value in a column named 's'. This may differ based on the implementation of your string split function
This is nice because it uses a set based approach to the query rather than an IN subquery, but a CROSS JOIN may be a little complex for the moment, so if you want to maintain the IN syntax then the following should work:
SELECT * FROM Nav
WHERE Nav.NavigationId IN
(SELECT CONVERT(int, a.s) AS Value
FROM dbo.StringSplit(#Ids) a
Suppose I have a table a with one column b and three rows(1,2,3), I would like to create a function that will return '1,2,3' that would be called like this : SELECT FUNC(f), ... FROM ...
In other words, I have a linked table that have more than one rows linked to each rows of the first table and would like to concatenate the content of one column from the second table. In this case, it's a list of names associated with a specific observation.
I was thinking of using a SQL function for that, but I can't remember how... :(
Thanks
Here is an example for SQL Server:
CREATE FUNCTION ConcatenateMyTableValues
(#ID int)
RETURNS varchar(max)
AS
BEGIN
declare #s as varchar(max);
select #s = isnull(#s + ',', '') + MyColumn from MyTable where ID = #ID;
return #s
end
And then you could use it like this:
select t.ID, t.Name, dbo.ConcatenateMyTableValues(t.ID)
from SomeTable t
you can use COALESCE to convert values in one column to csv
http://www.sqlteam.com/article/using-coalesce-to-build-comma-delimited-string
I am not sure how you will be able to create a function
unless you do not mind creating a dynamic sql which can accept a column name and then build sql accordingly at run time.
edit.
this works for SQL Server only.
didn't realize that you have not mentioned any db.
DECLARE #list AS varchar(MAX)
SELECT #list = ISNULL(#list + ',', '') + b
FROM MyTable
SELECT #list AS Result
Here a way to do this recursive (ms sql)
Depends on your technology.
If it is Oracle, check out the stragg function.
http://www.sqlsnippets.com/en/topic-11591.html
If it is MSSQL use the XML PATH trick
http://sqlblogcasts.com/blogs/tonyrogerson/archive/2009/03/29/creating-an-output-csv-using-for-xml-and-multiple-rows.aspx
I guess I'm going to answer my own question since I actually got a way to do it using CURSOR (that keyword I was missing in my thoughs..)
ALTER FUNCTION [dbo].[GET_NOM_MEDECIN_REVISEURS] (#NoAs810 int)
RETURNS NVARCHAR(1000)
AS
BEGIN
IF #NoAs810 = 0
RETURN ''
ELSE
BEGIN
DECLARE #NomsReviseurs NVARCHAR(1000)
DECLARE #IdReviseurs AS TABLE(IdReviseur int)
DECLARE #TempNomReviseur NVARCHAR(50)
SET #NomsReviseurs = ''
DECLARE CurReviseur CURSOR FOR
SELECT DISTINCT Nom FROM T_Ref_Reviseur R INNER JOIN T_Signature S ON R.IdReviseur = S.idReviseur WHERE NoAs810 = #NoAs810
OPEN CurReviseur
FETCH FROM CurReviseur INTO #TempNomReviseur
WHILE ##FETCH_STATUS = 0
BEGIN
SET #NomsReviseurs = #NomsReviseurs + #TempNomReviseur
FETCH NEXT FROM CurReviseur INTO #TempNomReviseur
IF ##FETCH_STATUS = 0
SET #NomsReviseurs = #NomsReviseurs + ' - '
END
CLOSE CurReviseur
RETURN #NomsReviseurs
END
RETURN ''
END
Any one know a good way to remove punctuation from a field in SQL Server?
I'm thinking
UPDATE tblMyTable SET FieldName = REPLACE(REPLACE(REPLACE(FieldName,',',''),'.',''),'''' ,'')
but it seems a bit tedious when I intend on removing a large number of different characters for example: !##$%^&*()<>:"
Thanks in advance
Ideally, you would do this in an application language such as C# + LINQ as mentioned above.
If you wanted to do it purely in T-SQL though, one way make things neater would be to firstly create a table that held all the punctuation you wanted to removed.
CREATE TABLE Punctuation
(
Symbol VARCHAR(1) NOT NULL
)
INSERT INTO Punctuation (Symbol) VALUES('''')
INSERT INTO Punctuation (Symbol) VALUES('-')
INSERT INTO Punctuation (Symbol) VALUES('.')
Next, you could create a function in SQL to remove all the punctuation symbols from an input string.
CREATE FUNCTION dbo.fn_RemovePunctuation
(
#InputString VARCHAR(500)
)
RETURNS VARCHAR(500)
AS
BEGIN
SELECT
#InputString = REPLACE(#InputString, P.Symbol, '')
FROM
Punctuation P
RETURN #InputString
END
GO
Then you can just call the function in your UPDATE statement
UPDATE tblMyTable SET FieldName = dbo.fn_RemovePunctuation(FieldName)
I wanted to avoid creating a table and wanted to remove everything except letters and digits.
DECLARE #p int
DECLARE #Result Varchar(250)
DECLARE #BadChars Varchar(12)
SELECT #BadChars = '%[^a-z0-9]%'
-- to leave spaces - SELECT #BadChars = '%[^a-z0-9] %'
SET #Result = #InStr
SET #P =PatIndex(#BadChars,#Result)
WHILE #p > 0 BEGIN
SELECT #Result = Left(#Result,#p-1) + Substring(#Result,#p+1,250)
SET #P =PatIndex(#BadChars,#Result)
END
I am proposing 2 solutions
Solution 1: Make a noise table and replace the noises with blank spaces
e.g.
DECLARE #String VARCHAR(MAX)
DECLARE #Noise TABLE(Noise VARCHAR(100),ReplaceChars VARCHAR(10))
SET #String = 'hello! how * > are % u (: . I am ok :). Oh nice!'
INSERT INTO #Noise(Noise,ReplaceChars)
SELECT '!',SPACE(1) UNION ALL SELECT '#',SPACE(1) UNION ALL
SELECT '#',SPACE(1) UNION ALL SELECT '$',SPACE(1) UNION ALL
SELECT '%',SPACE(1) UNION ALL SELECT '^',SPACE(1) UNION ALL
SELECT '&',SPACE(1) UNION ALL SELECT '*',SPACE(1) UNION ALL
SELECT '(',SPACE(1) UNION ALL SELECT ')',SPACE(1) UNION ALL
SELECT '{',SPACE(1) UNION ALL SELECT '}',SPACE(1) UNION ALL
SELECT '<',SPACE(1) UNION ALL SELECT '>',SPACE(1) UNION ALL
SELECT ':',SPACE(1)
SELECT #String = REPLACE(#String, Noise, ReplaceChars) FROM #Noise
SELECT #String Data
Solution 2: With a number table
DECLARE #String VARCHAR(MAX)
SET #String = 'hello! & how * > are % u (: . I am ok :). Oh nice!'
;with numbercte as
(
select 1 as rn
union all
select rn+1 from numbercte where rn<LEN(#String)
)
select REPLACE(FilteredData,' ',SPACE(1)) Data from
(select SUBSTRING(#String,rn,1)
from numbercte
where SUBSTRING(#String,rn,1) not in('!','*','>','<','%','(',')',':','!','&','#','#','$')
for xml path(''))X(FilteredData)
Output(Both the cases)
Data
hello how are u . I am ok . Oh nice
Note- I have just put some of the noises. You may need to put the noises that u need.
Hope this helps
You can use regular expressions in SQL Server - here is an article based on SQL 2005:
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
I'd wrap it in a simple scalar UDF so all string cleaning is in one place if it's needed again.
Then you can use it on INSERT too...
I took Ken MC's solution and made it into an function which can replace all punctuation with a given string:
----------------------------------------------------------------------------------------------------------------
-- This function replaces all punctuation in the given string with the "replaceWith" string
----------------------------------------------------------------------------------------------------------------
IF object_id('[dbo].[fnReplacePunctuation]') IS NOT NULL
BEGIN
DROP FUNCTION [dbo].[fnReplacePunctuation];
END;
GO
CREATE FUNCTION [dbo].[fnReplacePunctuation] (#string NVARCHAR(MAX), #replaceWith NVARCHAR(max))
RETURNS NVARCHAR(MAX)
BEGIN
DECLARE #Result Varchar(max) = #string;
DECLARE #BadChars Varchar(12) = '%[^a-z0-9]%'; -- to leave spaces - SELECT #BadChars = '%[^a-z0-9] %'
DECLARE #p int = PatIndex(#BadChars,#Result);
DECLARE #searchFrom INT;
DECLARE #indexOfPunct INT = #p;
WHILE #indexOfPunct > 0 BEGIN
SET #searchFrom = LEN(#Result) - #p;
SET #Result = Left(#Result, #p-1) + #replaceWith + Substring(#Result, #p+1,LEN(#Result));
SET #IndexOfPunct = PatIndex(#BadChars, substring(#Result, (LEN(#Result) - #SearchFrom)+1, LEN(#Result)));
SET #p = (LEN(#Result) - #searchFrom) + #indexOfPunct;
END
RETURN #Result;
END;
GO
-- example:
SELECT dbo.fnReplacePunctuation('This is, only, a tést-really..', '');
Output:
Thisisonlyatéstreally
If it's a one-off thing, I would use a C# + LINQ snippet in LINQPad to do the job with regular expressions.
It is quick and easy and you don't have to go through the process of setting up a CLR stored procedure and then cleaning up after yourself.
Can't you use PATINDEX to only include NUMBERS and LETTERS instead of trying to guess what punctuation might be in the field? (Not trying to be snarky, if I had the code ready, I'd share it...but this is what I'm looking for).
Seems like you need to create a custom function in order to avoid a giant list of replace functions in your queries - here's a good example:
http://www.codeproject.com/KB/database/SQLPhoneNumbersPart_2.aspx?display=Print