Removing white spaces and special characters from SQL - sql

I have a table where I have a ColumnA which has data with white spaces and special characters. I want to generate ColumnB with the data from ColumnA with the removal of white spaces and special characters.
For example, ColumnA has values like:
N/A
#email
Hot-topic
#sql#%
White paper.
I want a new column with values:
NA
email
HotTopic
sql
Whitepaper
I tried below SQL in SSMS, but it is not working completely. Could someone help me out?
SELECT code,
REPLACE(REPLACE(code, TRIM(TRANSLATE(code,'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz',' '))
,'') ,' ','')
FROM SAMP
It is not working for the record with value: #sql#%

Added as a wiki answer in order to retain the comment made by #lptr. Query by #lptr explanation mine (#DaleK).
Your attempt was close, but only worked for single characters... the one that failed was because you had multiple characters that needed replacing and once you remove the white space they are all next to each other and don't match the original string anymore.
This answer cleverly replaces all the letter characters with a "*" using translate as step 1, then using translate again on the original column value, replaces all the non-letter characters with a "*" as step 2, then finally replaces all "*" characters with an empty string.
Note also the use of replication to avoid typing the same character in multiple times.
create table samp(code varchar(50));
insert into samp(code)
values
('N/A'),
('#email'),
('Hot-topic'),
('#sql#%'),
('White paper. ');
select s.code, n.nonletters, l.letters
from samp as s
cross apply (values(translate(s.code, 'abcdefghijklmnopqrstuvwxyz', replicate('*', 26)))) as n (nonletters)
cross apply (values(replace(translate(s.code, n.nonletters, replicate('*', len(n.nonletters+'.')-1)), '*', ''))) as l (letters);

Related

Getting unwanted data in select statement of NChar column

On running the below query:
SELECT DISTINCT [RULE], LEN([RULE]) FROM MYTBL WHERE [RULE]
LIKE 'Trademarks, logos, slogans, company names, copyrighted material or brands of any third party%'
I am getting the output as:
The column datatype is NCHAR(120) and the collation used is SQL_Latin1_General_CP1_CI_AS
The data is inserted with an extra leading space in the end. But using RTRIM function also I am not able to trim the extra space. I am not sure which type of leading space(encoded) is inserted here.
Can you please suggest some other alternative except RTRIM to get rid of extra white space at the end as the Column is NCHAR.
Below are the things which I have already tried:
RTRIM(LTRIM(CAST([RULE] as VARCHAR(200))))
RTRIM([RULE])
Update to Question
Please download the Database from here TestDB
Please use below query for your reference:
SELECT DISTINCT [RULE], LEN([RULE]) FROM [TestDB].[BI].[DimRejectionReasonData]
WHERE [RULE]
LIKE 'Trademarks, logos, slogans, company names, copyrighted material or brands of any third party%'
You may have a non-breaking space nchar(160) inside the string.
You can convert it to a simple space and then use the usual trim function
LTRIM(RTRIM(REPLACE([RULE], NCHAR(160), ' ')))
In case of unicode space
LTRIM(RTRIM(REPLACE(RULE, NCHAR(0x00A0), ' ')))
I guess this is what you are looking for ( Not sure ) . Make a try with this approach
SELECT REPLACE(REPLACE([RULE], CHAR(13), ''), CHAR(10), '')
Reference links : Link 1 & Link 2
Note: FYI refer those links for better understanding .
change the type nchar into varchar it will return the result without extra space

how not to replace "]" when using regex_replace for removing special characters

I'm trying to remove few special characters from a comment column in my table. I used the below statement but it seems to remove the ']' even though it is in the ^[not] list.
UPDATE TEST
set comments=REGEXP_REPLACE(
comments,
'[^[a-z,A-Z,0-9,[:space:],''&'','':'',''/'',''.'',''?'',''!'','']'']]*',
' '
);
The table data contains the following:
[SYSTEM]:Do you have it in stock? 😊
My requirement is to have:
[SYSTEM]:Do you have it in stock?
You have two mistakes in you regex:
Do not put characters in quotes and don't split them with comma.
Remove inner square brackets.
And place closing square brackets first in the list, just after initial circumflex. Fixed regex:
UPDATE TEST set comments=REGEXP_REPLACE(comments,'[^]a-zA-Z0-9[:space:]&:/.?!]*',' ');
My try, I just removed the commas, put the "accepted" characters after the initial "not"(no brackets).
A special case are the brackets: https://dba.stackexchange.com/a/109294/6228
select REGEXP_REPLACE(
'[ION] are varză murată.',
'[^][a-zA-Z0-9[:space:]&:/,.?!]+',
' ')
from dual;
Result:
[ION] are varz murat .

Query for blank white space before AND after a number string

How would i go about constructing a query, that would return all material numbers that have a "blank white space" either BEFORE or AFTER the number string? We are exporting straight from SSMS to excel and we see the problem in the spreadsheet. If i could return all of the material numbers with spaces.. i could go in and edit them or do a replace to fix this issue prior to exporting! (the mtrl numbers are imported in via a windows application that users upload an excel template to. This template has all of this data and sometimes they place in spaces in or after the material number). The query we have used to work but now it does not return anything, but upon export we identify these problems you see highlighted in the screenshot (left screenshot) and then query to find that mtrl # in the table (right screenshot). And indeed, it has a space before the 1.
Currently the query we use looks like:
SELECT Mtrl
FROM dbo.Source
WHERE Mtrl LIKE '% %'
Since you are getting the data from a query, you should just have that query remove any potential spaces using LTRIM and RTRIM:
LTRIM(RTRIM([MTRL]))
Keep in mind that these two commands remove only spaces, not tabs or returns or other white-space characters.
Doing the above will make sure that the data for the entire set of data is fine, whether or not you find it and/or fix it.
Or, since you are copying-and-pasting from the Results Grid into Excel, you can just CONVERT the value to a number which will naturally remove any spaces:
SELECT CONVERT(INT, ' 12 ');
Returns:
12
So you would just use:
CONVERT(INT, [MRTL])
Now, if you want to find the data that has anything that is not a digit in it, you would use this:
SELECT Mtrl
FROM dbo.Source
WHERE [Mtrl] LIKE '%[^0-9]%'; -- any single non-digit character
If the issue is with non-space white-space characters, you can find out which ones they are via the following (to find them at the beginning instead of at the end, change the RIGHT to be LEFT):
;WITH cte AS
(
SELECT UNICODE(RIGHT([MTRL], 1)) AS [CharVal]
FROM dbo.Source
)
SELECT *
FROM cte
WHERE cte.[CharVal] NOT BETWEEN 48 AND 57 -- digits 0 - 9
AND cte.[CharVal] <> 32; -- space
And you can fix in one shot using the following, which removes regular spaces (char 32 via LTRIM/RTRIM), tabs (char 9), and non-breaking spaces (char 160):
UPDATE src
SET src.[Mtrl] = REPLACE(
REPLACE(
LTRIM(RTRIM(src.[Mtrl])),
CHAR(160),
''),
CHAR(9),
'')
FROM dbo.Source src
WHERE src.[Mtrl] LIKE '%[' -- find rows with any of the following characters
+ CHAR(9) -- tab
+ CHAR(32) -- space
+ CHAR(160) -- non-breaking space
+ ']%';
Here I used the same WHERE condition that you have since if there can't be any spaces then it doesn't matter if you check both ends or for any at all (and maybe it is faster to have a single LIKE instead of two).

SQL Server 2000 - How to remove the hidden characters in the column?

I'm trying to remove a hidden characters from a varchar column, these hidden characters (i.e. period, space) was taken from a scanned bar code and it is not visible in the result set once query was executed. I have tried to use below script but it failed to remove the hidden characters(see attached screenshot for reference.)
Any help is highly appreciated.
SELECT Replace(Replace(LTrim(RTrim(mycolumn)), '.', ''), ' ', '')
FROM MyTable
WHERE serialno = '123456789'
One thing that has worked for me is to select the column with the special characters, then paste the data into notepad++ then turn on View>Show Symbol>Show All Characters. Then I could copy the special characters from Notepad++ into the second argument of the REPLACE() function in SQL.

Regular expressions in SQL

Im curious if and how you can use regular expressions to find white space in SQL statments.
I have a string that can have an unlimited amount of white space after the actual string.
For example:
"STRING "
"STRING "
would match, but
"STRING A"
"STRINGB"
would not.
Right now I have:
like 'STRING%'
which doesnt quite return the results I would like.
I am using Sql Server 2008.
A simple like can find any string with spaces at the end:
where col1 like '% '
To also allow tabs, carriage returns or line feeds:
where col1 like '%[ ' + char(9) + char(10) + char(13) + ']'
Per your comment, to find "string" followed by any number of whitespace:
where rtrim(col1) = 'string'
You could try
where len(col1) <> len(rtrim(col1))
Andomar's answer will find the strings for you, but my spidey sense tells me maybe the scope of the problem is bigger than simply finding the whitespace.
If, as I suspect, you are finding the whitespace so that you can then clean it up, a simple
UPDATE Table1
SET col1 = RTRIM(col1)
will remove any trailing whitespace from the column.
Or RTRIM(LTRIM(col1)) to remove both leading and trailing whitespace.
Or REPLACE(col1,' '.'') to remove all whitespace including spaces within the string.
Note that RTRIM and LTRIM only work on spaces, so to remove tabs/CRs/LFs you would have to use REPLACE. To remove those only from the leading/trailing portion of the string is feasible but not entirely simple. Bug your database vendor to implement the ANSI SQL 99 standard TRIM function that would make this much easier.
where len(col1 + 'x') <> len(rtrim(col1)) + 1
BOL provides workarounds for LEN() with trailing spaces : http://msdn.microsoft.com/en-us/library/ms190329.aspx
LEN(Column + '_') - 1
or using DATALENGTH