I want to remove from my value / string this character " but I can not remove it using my UPDATE statement here:
UPDATE dbo].[Tablename]
SET [columnname] = REPLACE([columnname], '"', '')
Do you have any idea how to remove this character?
Thank you for opinions
You could simply use the ASCII character codes to replace the wildcard characters. There might be other good solution as well. It worked for me!
UPDATE dbo].[Tablename] SET [columnname] = REPLACE([columnname], char(47), '')
Char(47) is the ASCII code for the forward slash. You can find the full list here.
So first make 100% sure of the character you are trying to eliminate.
Just because it 'looks' like a double quote, doesn't mean that it is.
Alter the commented line in the code below, to select a single record from your dataset.
It will spit out the charcode for each character in the source string.
(obv change table and field names also !)
declare
#sample nvarchar(max),
#char nvarchar(1),
#i_idx int,
#temp int
create table #RtnValue(
Id int identity(1,1),
A nvarchar(max),
[uni] int,
[ascii] int
)
-- set #sample to a single value from your dataset here
select #sample = (select top (1) [sample] from test.test)
While len(#sample) > 0
Begin
Set #char = left(#sample,1)
Insert Into #RtnValue (A, uni, [ascii])
Select A = #char, UNI = UNICODE(#char), [ASCII] = ASCII(#char)
Set #sample = RIGHT(#sample,len(#sample)-1)
End
Select * from #RtnValue
Related
I have problem in optimizing an SQL query to do some data cleansing.
In fact, I have a table which is a sort of referential of a multiple special characters and word. Let's call it ABNORMAL(ID,PATTERN)
I have also another table INDIVIDUALS containing a column (NAME) which I want to clean by removing from it all characters that exist in the table ABNORMAL.
Currently, I have tried to use update statements, but I'm not sure if there is a better way to do this.
Approach one
Use a while loop to build a replace containing all characters from ABNORMALS by a blank '' and do one update using the built-in REPLACE
DECLARE #REPLACE_EXPRESSION nvarchar(max) ='REPLACE(NAME,'''','''')'
DECLARE #i int = 1
DECLARE #nbr int = (SELECT COUNT(*) FROM ABNORMAL)
-- CURRENT_CHARAC
DECLARE #CURRENT_CHARAC nvarchar(max)
-- NEW REPLACE EXPRESSION TO IMBRICATE INTO THE REPLACE EXPRESSION VARIABLE
DECLARE #CURR_REP NVARCHAR(max)
-- STRING TO BUILD AN SQL QUERY CONTAINING THE REPLACE EXPRESSION
DECLARE #UPDATE_QUERY nvarchar(max)
WHILE #i < #nbr
BEGIN
SELECT #CURRENT_CHARAC=PATTERN FROM CLEANSING_STG_PRISM_FRA_REF_UNSIGNIFICANT_VALUES WHERE ID_PATTERN=#i ;
SET #REPLACE_EXPRESSION = REPLACE(#REPLACE_EXPRESSION ,'NAME','REPLACE(NAME,'+''''+#CURRENT_CHARAC+''''+','''')')
set #i=#i+1
END
SET #UPDATE_QUERY = 'UPDATE INDIVIDUAL SET NAME ='+ #REPLACE_EXPRESSION
EXEC sp_executesql #UPDATE_QUERY
Approach two
Use a while loop to select every character in abnormal and do an update using replace containing the characters to remove:
DECLARE #i int = 1
DECLARE #nbr int = (SELECT COUNT(*) FROM ABNORMAL)
-- CURRENT_CHARAC
DECLARE #CURRENT_CHARAC nvarchar(max)
-- STRING TO BUILD AN SQL QUERY CONTAINING THE REPLACE EXPRESSION
DECLARE #UPDATE_QUERY nvarchar(max)
WHILE #i < #nbr
BEGIN
SELECT #CURRENT_CHARAC=PATTERN FROM CLEANSING_STG_PRISM_FRA_REF_UNSIGNIFICANT_VALUES WHERE ID_PATTERN=#i ;
UPDATE INDIVIDUAL
SET NAME = REPLACE(NAME,#CURRENT_CHARAC,'')
SET #i=#i+1
END
I already tested both approaches for 2 millions records, and I found that the first approach is faster than the second. I would know if you have already done something similar and new (better) ideas to try.
If you are using SQL Server 2017 you could use TRANSLATE and avoid dynamic SQL:
SELECT i.*
, REPLACE(TRANSLATE(i.NAME, f, REPLICATE('!', s.l)), '!', '') AS cleansed
FROM INDIVIDUALS i
OUTER APPLY (SELECT STRING_AGG(PATTERN, '') AS f
,LEN(STRING_AGG(PATTERN,'')) AS l
FROM ABNORMAL) AS s
DBFiddle Demo
Anyway 1st approach is better becasue you do one UPDATE, with second approach you remove characters one character at time (so you will have multiple UPDATE).
I would also track transaction log growth with both approaches.
If there's not too many characters that to be cleaned, then this trick might work.
Basically, you build 1 big update statement with a replace for each value in the table with the characters to be removed.
Example code:
Test data (using temp tables)
create table #ABNORMAL_CHARACTERS (id int identity(1,1), chr varchar(30));
insert into #ABNORMAL_CHARACTERS (chr) values ('!'),('&'),('#');
create table #INDIVIDUAL (id int identity(1,1), name varchar(30));
insert into #INDIVIDUAL (name) values ('test 1 &'),('test !'),('test 3');
Code:
declare #FieldName varchar(30) = 'name';
declare #Replaces varchar(max) = #FieldName;
declare #UpdateSQL varchar(max);
select #Replaces = concat('replace('+#Replaces+', ', ''''+chr+''','''')') from #ABNORMAL_CHARACTERS order by id;
set #UpdateSQL = 'update #INDIVIDUAL
set name = '+#Replaces + '
where exists (select 1 from #ABNORMAL_CHARACTERS where charindex(chr,name)>0)';
exec (#UpdateSQL);
select * from #INDIVIDUAL;
A test here on rextester
And if you would have a UDF that can do a regex replace.
For example here
Then the #Replaces variable could be simplified with only 1 RegexReplace function and a pattern.
I'm asking this question for SQL Server 2008 R2
I'd like to know if there is a way to create multiple functions in a single batch statement.
I've made the following code as an example; suppose I want to take a character string and rearrange its letters in alphabetical order. So, 'Hello' would become 'eHllo'
CREATE FUNCTION char_split (#string varchar(max))
RETURNS #characters TABLE
(
chars varchar(2)
)
AS
BEGIN
DECLARE #length int,
#K int
SET #length = len(#string)
SET #K = 1
WHILE #K < #length+1
BEGIN
INSERT INTO #characters
SELECT SUBSTRING(#string,#K,1)
SET #K = #K+1
END
RETURN
END
CREATE FUNCTION rearrange (#string varchar(max))
RETURNS varchar(max)
AS
BEGIN
DECLARE #SplitData TABLE (
chars varchar(2)
)
INSERT INTO #SplitData SELECT * FROM char_split(#string)
DECLARE #Output varchar(max)
SELECT #Output = coalesce(#Output,' ') + cast(chars as varchar(10))
from #SplitData
order by chars asc
RETURN #Output
END
declare #string varchar(max)
set #string = 'Hello'
select dbo.rearrange(#string)
When I try running this code, I get this error:
'CREATE FUNCTION' must be the first statement in a query batch.
I tried enclosing each function in a BEGIN END block, but no luck. Any advice?
Just use a GO statement between the definition of the UDFs
Not doable. SImple like that.
YOu can make it is one statement using a GO between them.
But as the GO is a batch delimiter.... this means you send multiple batches, which is explicitly NOT Wanted in your question.
So, no - it is not possible to do that in one batch as the error clearly indicates.
I have a stored procedure that takes as an input a string of GUIDs and selects from table where table GUID IN (#Param).
#Param = 'b2e16cdc-1f1b-40e2-a979-f87a6a2457af,
c275dd13-bb54-4b8c-aa12-220b5980cabd,
af3552ec-37b1-4a76-81ad-1bd6b8c4cd6c,
3a7fda02-558b-49a9-a870-30350254d8c0,'
SELECT * FROM dbo.Table1 WHERE
TableGUID IN (#Param)
However, I noticed that the query return values, only if the first GUID matches, otherwise it will not return anything. which means that it only compares with the first GUID in the string.
anyone knows how solve the problem?
declare #sql varchar(max)
set #sql='SELECT * FROM dbo.Table1 WHERE
TableGUID IN ('+#Param+') '
exec (#sql)
We can't do it, because SQL has no concept of Lists, or array or other useful data structures - it only knows about tables (and table based information) so it converts the string list into a table structure when it compiles the command - and it can't compile a variable string, so it complains and you get annoyed. Or at least, I do.
What we have to do is convert the comma separated values into a table first. My initial version was inline, and rather messy, so I re-worked it to a user function and made it a bit more general purpose.
USE [Testing] GO
CREATE FUNCTION [dbo].[VarcharToTable] (#InStr NVARCHAR(MAX))
RETURNS #TempTab TABLE
(id UNIQUEIDENTIFIER NOT NULL)
AS
BEGIN
;-- Ensure input ends with comma
SET #InStr = REPLACE(#InStr + ',', ',,', ',')
DECLARE #SP INT
DECLARE #VALUE NVARCHAR(MAX)
WHILE PATINDEX('%,%', #INSTR ) <> 0
BEGIN
SELECT #SP = PATINDEX('%,%',#INSTR)
SELECT #VALUE = LEFT(#INSTR , #SP - 1)
SELECT #INSTR = STUFF(#INSTR, 1, #SP, '')
INSERT INTO #TempTab(id) VALUES (#VALUE)
END
RETURN
END
GO
This creates a user function that takes a comma separated value string and converts it into a table that SQL does understand - just pass it the sting, and it works it all out. It's pretty obvious how it works, the only complexity is the REPLACE part which ensures the string is terminated with a single comma by appending one, and removing all double commas from the string. Without this, while loop becomes harder to process, as the final number might or might not have a terminating comma and that would have to be dealt with separately.
DECLARE #LIST NVARCHAR(MAX)
SET #LIST = '973150D4-0D5E-4AD0-87E1-037B9D4FC03B,973150d4-0d5e-4ad0-87e1-037b9d4fc03c'
SELECT Id, Descr FROM TableA WHERE Id IN (SELECT * FROM dbo.VarcharToTable(#LIST))
In addition to MikkaRin's answer: a GUID has to be unclosed in apostrophes, so the value in the parameter should look like
'b2e16cdc-1f1b-40e2-a979-f87a6a2457af',
'c275dd13-bb54-4b8c-aa12-220b5980cabd',
'af3552ec-37b1-4a76-81ad-1bd6b8c4cd6c',
'3a7fda02-558b-49a9-a870-30350254d8c0'
In the end, you have to pass something like:
#Param = '''b2e16cdc-1f1b-40e2-a979-f87a6a2457af'',
''c275dd13-bb54-4b8c-aa12-220b5980cabd'',
''af3552ec-37b1-4a76-81ad-1bd6b8c4cd6c'',
''3a7fda02-558b-49a9-a870-30350254d8c0'''
Pay attention to the last comma of the list. It should be removed.
UPDATE:
Someone marked this question as duplicate of
How do I split a string so I can access item x.
But it's different, my question is about Sybase SQL Anywhere, the other is about MS SQL Server. These are two different SQL engines, even if they have the same origin, they have different syntax. So it's not duplicate. I wrote in the first place in description and tags that it's all about Sybase SQL Anywhere.
I have field id_list='1234,23,56,576,1231,567,122,87876,57553,1216'
and I want to use it to search IN this field:
SELECT *
FROM table1
WHERE id IN (id_list)
id is integer
id_list is varchar/text
But in this way this doesn't work, so I need in some way to split id_list into select query.
What solution should I use here? I'm using the T-SQL Sybase ASA 9 database (SQL Anywhere).
Way I see this, is to create own function with while loop through,
and each element extract based on split by delimiter position search,
then insert elements into temp table which function will return as result.
This can be done without using dynamic SQL but you will need to create a couple of supporting objects. The fist object is a table valued function that will parse your string and return a table of integers. The second object is a stored procedure that will have a parameter where you can pass the string (id_list), parse it to a table, and then finally join it to your query.
First, create the function to parse the string:
CREATE FUNCTION [dbo].[String_To_Int_Table]
(
#list NVARCHAR(1024)
, #delimiter NCHAR(1) = ',' --Defaults to CSV
)
RETURNS
#tableList TABLE(
value INT
)
AS
BEGIN
DECLARE #value NVARCHAR(11)
DECLARE #position INT
SET #list = LTRIM(RTRIM(#list))+ ','
SET #position = CHARINDEX(#delimiter, #list, 1)
IF REPLACE(#list, #delimiter, '') <> ''
BEGIN
WHILE #position > 0
BEGIN
SET #value = LTRIM(RTRIM(LEFT(#list, #position - 1)));
INSERT INTO #tableList (value)
VALUES (cast(#value as int));
SET #list = RIGHT(#list, LEN(#list) - #position);
SET #position = CHARINDEX(#delimiter, #list, 1);
END
END
RETURN
END
Now create your stored procedure:
CREATE PROCEDURE ParseListExample
#id_list as nvarchar(1024)
AS
BEGIN
SET NOCOUNT ON;
--create a temp table to hold the list of ids
CREATE TABLE #idTable (ID INT);
-- use the table valued function to parse the ids into a table.
INSERT INTO #idTable(ID)
SELECT Value FROM dbo.String_to_int_table(#id_list, ',');
-- join the temp table of ids to the table you want to query...
SELECT T1.*
FROM table1 T1
JOIN #idTable T2
on T1.ID = T2.ID
Execution Example:
exec ParseListExample #id_list='1234,23,56,576,1231,567,122,87876,57553,1216'
I hope this helps...
Like Mikael Eriksson said, there is answer at dba.stackexchange.com with two very good solutions, first with use of sa_split_list system procedure, and second slower with CAST statement.
For the Sybase SQL Anywhere 9 sa_split_list system procedure not exist, so I have made sa_split_list system procedure replacement (I used parts of the code from bsivel answer):
CREATE PROCEDURE str_split_list
(in str long varchar, in delim char(10) default ',')
RESULT(
line_num integer,
row_value long varchar)
BEGIN
DECLARE str2 long varchar;
DECLARE position integer;
CREATE TABLE #str_split_list (
line_num integer DEFAULT AUTOINCREMENT,
row_value long varchar null,
primary key(line_num));
SET str = TRIM(str) || delim;
SET position = CHARINDEX(delim, str);
separaterows:
WHILE position > 0 loop
SET str2 = TRIM(LEFT(str, position - 1));
INSERT INTO #str_split_list (row_value)
VALUES (str2);
SET str = RIGHT(str, LENGTH(str) - position);
SET position = CHARINDEX(delim, str);
end loop separaterows;
select * from #str_split_list order by line_num asc;
END
Execute the same way as sa_split_list with default delimiter ,:
select * from str_split_list('1234,23,56,576,1231,567,122,87876,57553,1216')
or with specified delimiter which can be changed:
select * from str_split_list('1234,23,56,576,1231,567,122,87876,57553,1216', ',')
You use text in your query and this is not going to work.
Use dynamic query.
Good contribution from bsivel answer, but to generalise it (for other separators than a comma), then the line
SET #list = LTRIM(RTRIM(#list))+ ','
must become
SET #list = LTRIM(RTRIM(#list))+ #delimiter
The first version will only work for comma-separated lists.
The dynamic query approach would look like this:
create procedure ShowData #IdList VarChar(255)
as
exec ('use yourDatabase; select * from MyTable where Id in ('+#IdList+')')
Any one know a good way to remove punctuation from a field in SQL Server?
I'm thinking
UPDATE tblMyTable SET FieldName = REPLACE(REPLACE(REPLACE(FieldName,',',''),'.',''),'''' ,'')
but it seems a bit tedious when I intend on removing a large number of different characters for example: !##$%^&*()<>:"
Thanks in advance
Ideally, you would do this in an application language such as C# + LINQ as mentioned above.
If you wanted to do it purely in T-SQL though, one way make things neater would be to firstly create a table that held all the punctuation you wanted to removed.
CREATE TABLE Punctuation
(
Symbol VARCHAR(1) NOT NULL
)
INSERT INTO Punctuation (Symbol) VALUES('''')
INSERT INTO Punctuation (Symbol) VALUES('-')
INSERT INTO Punctuation (Symbol) VALUES('.')
Next, you could create a function in SQL to remove all the punctuation symbols from an input string.
CREATE FUNCTION dbo.fn_RemovePunctuation
(
#InputString VARCHAR(500)
)
RETURNS VARCHAR(500)
AS
BEGIN
SELECT
#InputString = REPLACE(#InputString, P.Symbol, '')
FROM
Punctuation P
RETURN #InputString
END
GO
Then you can just call the function in your UPDATE statement
UPDATE tblMyTable SET FieldName = dbo.fn_RemovePunctuation(FieldName)
I wanted to avoid creating a table and wanted to remove everything except letters and digits.
DECLARE #p int
DECLARE #Result Varchar(250)
DECLARE #BadChars Varchar(12)
SELECT #BadChars = '%[^a-z0-9]%'
-- to leave spaces - SELECT #BadChars = '%[^a-z0-9] %'
SET #Result = #InStr
SET #P =PatIndex(#BadChars,#Result)
WHILE #p > 0 BEGIN
SELECT #Result = Left(#Result,#p-1) + Substring(#Result,#p+1,250)
SET #P =PatIndex(#BadChars,#Result)
END
I am proposing 2 solutions
Solution 1: Make a noise table and replace the noises with blank spaces
e.g.
DECLARE #String VARCHAR(MAX)
DECLARE #Noise TABLE(Noise VARCHAR(100),ReplaceChars VARCHAR(10))
SET #String = 'hello! how * > are % u (: . I am ok :). Oh nice!'
INSERT INTO #Noise(Noise,ReplaceChars)
SELECT '!',SPACE(1) UNION ALL SELECT '#',SPACE(1) UNION ALL
SELECT '#',SPACE(1) UNION ALL SELECT '$',SPACE(1) UNION ALL
SELECT '%',SPACE(1) UNION ALL SELECT '^',SPACE(1) UNION ALL
SELECT '&',SPACE(1) UNION ALL SELECT '*',SPACE(1) UNION ALL
SELECT '(',SPACE(1) UNION ALL SELECT ')',SPACE(1) UNION ALL
SELECT '{',SPACE(1) UNION ALL SELECT '}',SPACE(1) UNION ALL
SELECT '<',SPACE(1) UNION ALL SELECT '>',SPACE(1) UNION ALL
SELECT ':',SPACE(1)
SELECT #String = REPLACE(#String, Noise, ReplaceChars) FROM #Noise
SELECT #String Data
Solution 2: With a number table
DECLARE #String VARCHAR(MAX)
SET #String = 'hello! & how * > are % u (: . I am ok :). Oh nice!'
;with numbercte as
(
select 1 as rn
union all
select rn+1 from numbercte where rn<LEN(#String)
)
select REPLACE(FilteredData,' ',SPACE(1)) Data from
(select SUBSTRING(#String,rn,1)
from numbercte
where SUBSTRING(#String,rn,1) not in('!','*','>','<','%','(',')',':','!','&','#','#','$')
for xml path(''))X(FilteredData)
Output(Both the cases)
Data
hello how are u . I am ok . Oh nice
Note- I have just put some of the noises. You may need to put the noises that u need.
Hope this helps
You can use regular expressions in SQL Server - here is an article based on SQL 2005:
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
I'd wrap it in a simple scalar UDF so all string cleaning is in one place if it's needed again.
Then you can use it on INSERT too...
I took Ken MC's solution and made it into an function which can replace all punctuation with a given string:
----------------------------------------------------------------------------------------------------------------
-- This function replaces all punctuation in the given string with the "replaceWith" string
----------------------------------------------------------------------------------------------------------------
IF object_id('[dbo].[fnReplacePunctuation]') IS NOT NULL
BEGIN
DROP FUNCTION [dbo].[fnReplacePunctuation];
END;
GO
CREATE FUNCTION [dbo].[fnReplacePunctuation] (#string NVARCHAR(MAX), #replaceWith NVARCHAR(max))
RETURNS NVARCHAR(MAX)
BEGIN
DECLARE #Result Varchar(max) = #string;
DECLARE #BadChars Varchar(12) = '%[^a-z0-9]%'; -- to leave spaces - SELECT #BadChars = '%[^a-z0-9] %'
DECLARE #p int = PatIndex(#BadChars,#Result);
DECLARE #searchFrom INT;
DECLARE #indexOfPunct INT = #p;
WHILE #indexOfPunct > 0 BEGIN
SET #searchFrom = LEN(#Result) - #p;
SET #Result = Left(#Result, #p-1) + #replaceWith + Substring(#Result, #p+1,LEN(#Result));
SET #IndexOfPunct = PatIndex(#BadChars, substring(#Result, (LEN(#Result) - #SearchFrom)+1, LEN(#Result)));
SET #p = (LEN(#Result) - #searchFrom) + #indexOfPunct;
END
RETURN #Result;
END;
GO
-- example:
SELECT dbo.fnReplacePunctuation('This is, only, a tést-really..', '');
Output:
Thisisonlyatéstreally
If it's a one-off thing, I would use a C# + LINQ snippet in LINQPad to do the job with regular expressions.
It is quick and easy and you don't have to go through the process of setting up a CLR stored procedure and then cleaning up after yourself.
Can't you use PATINDEX to only include NUMBERS and LETTERS instead of trying to guess what punctuation might be in the field? (Not trying to be snarky, if I had the code ready, I'd share it...but this is what I'm looking for).
Seems like you need to create a custom function in order to avoid a giant list of replace functions in your queries - here's a good example:
http://www.codeproject.com/KB/database/SQLPhoneNumbersPart_2.aspx?display=Print