SQL - How to remove a space character from a string - sql

I have a table where the max length of a column (varchar) is 12, someone has loaded some value with a space, so rather than 'SPACE' it's 'SPACE '
I want to remove the space using a script, I was positive RTRIM or REPLACE(myValue, ' ', '') would work but LEN(myValue) shows there is still and extra character?

As mentioned by a couple folks, it may not be a space. Grab a copy of ngrams8k and you use it to identify the issue. For example, here we have the text, " SPACE" with a preceding space and trailing CHAR(160) (HTML BR tag). CHAR(160) looks like a space in SSMS but isn't "trimable". For example consider this query:
DECLARE #string VARCHAR(100) = ' SPACE'+CHAR(160);
SELECT '"'+#string+'"'
Using ngrams8k you could do this:
DECLARE #string VARCHAR(100) = ' SPACE'+CHAR(160);
SELECT
ng.position,
ng.token,
asciival = ASCII(ng.token)
FROM dbo.ngrams8k(#string,1) AS ng;
Returns:
position token asciival
---------- ------- -----------
1 32
2 S 83
3 P 80
4 A 65
5 C 67
6 E 69
7   160
As you can see, the first character (position 1) is CHAR(32), that's a space. The last character (postion 7) is not a space.
Knowing that CHAR(160) is the issue you could fix it like so:
SET #string = REPLACE(LTRIM(#string),CHAR(160),'')
If you are using SQL Server 2017+ you can also use TRIM which does a whole lot more than just LTRIM-and-RTRIM-ing. For example, this will remove
leading and trailing tabs, spaces, carriage returns, line returns and HTML BR tags.
SET #string = SELECT TRIM(CHAR(32)+CHAR(9)+CHAR(10)+CHAR(13)+CHAR(160) FROM #string)

Odds are it is some other non-printing character, carriage return is a big one when going from between *nix and other OS. One way to tell is to use the DUMP function. So you could start with a query like:
SELECT dump(column_name)
FROM your_table
WHERE column_name LIKE 'SPACE%'
That should help you find the offending character, however, that doesn't fix your problem. Instead, I would use something like REGEXP_REPLACE:
SELECT REGEXP_REPLACE(column_name, '[^A-z]')
FROM your_table
That should take care of any non-printing characters. You may need to play with the regular expression if you expect numbers or symbols in your string. You could switch to a character class like:
SELECT REGEXP_REPLACE(column_name, '[:cntrl:]')
FROM your_table

Related

Update or replace a special character text in my database field

I am facing to an issue that I cannot update or replace some characters in my database.
Here is how this text look like in my column when I retrieve it:
As you can see, there is an unknown characters between 'master' and 'degree' which I cannot even paste it here.
I also tried to update and replace it with below code (I cannot paste that two vertical lines here since they are not supported in any browser and I am not sure what they are, Please see the picture above to see what is in my SQL statement).
begin transaction
update gm_desc set projdesc=replace(projdesc,'%â s%','') where projdesc like '%âs%' and proposalno = '15-149-01'
You can see the real SQL Statement here:
I tried to update, or replace it but I cannot do it. The update statement successfully works but I still see that weird special charters. I would be appreciate to help me.
Here's a scalar-valued function which removes all non-alphanumeric characters (preserves spaces) from a string.
Hopefully it helps!
dbfiddle
create function dbo.get_alphanumeric_str
(
#string varchar(max)
)
returns varchar(max)
as
begin
declare #ret varchar(max);
with nums as (
select 1 as n
union all select n+1 from nums
where n < 256
)
select #ret = replace(stuff(
(
select '' + substring(#string, nums.n, 1)
from nums
where patindex('%[^0-9A-Za-z ]%', substring(#string, nums.n,1)) = 0
for xml path('')
), 1, 0, ''
), ' ', ' ')
option (MAXRECURSION 256)
return #ret;
end
Usage
select dbo.get_alphanumeric_str('Helloᶄ âWorld 1234⅊⅐')
Returns: Hello World 1234
How it works
The nums CTE is just to get a list of numbers (you can set the maximum of 256 to a higher value if your strings are longer; n.b. option (MAXRECURSION n) is for this CTE but has to be placed at the query)
The stuff essentially iterates through the string, using the list of numbers above and extracts a substring of length 1; each of these chars are checked if they match the [^0-9A-Za-z ] regex group (0-9 all digits, A-Za-z all letters both lower and upper case, and a single space character)
If they match, patindex() should return 0; i.e. index zero.
Use replace(string, ' ', ' ') for the space character as the xml path returns a special encoding, see this question.
Use a binary collation for accented characters; see this answer

Identify Hidden Characters

In my SQL tables I have text which has hidden characters which is only visible when I copy and paste it in notepad++.
How to find those rows which has hidden characters using SQL Server queries?
I have tried comparing the lengths using datalength and len
it did not work.
DATALENGTH(name) AS BinaryLength != LEN(name)
I want the row which has hidden characters.
On the assumption that this is being caused by control characters. Some of which are invisible. But also include tabs, newlines and spaces. An example to illustrate and how to get them to appear.
--DROP TABLE #SillyTemp
DECLARE #InvisibleChar1 NCHAR(1) = NCHAR(28), #InvisibleChar2 NCHAR(1) = NCHAR(30), #NonControlChar NCHAR(1) = NCHAR(33);
DECLARE #InputString NVARCHAR(500) = N'Some |' + #InvisibleChar1 +'| random string |' + #InvisibleChar2 + '|' + '; Thank god Finally a normal character |' + #NonControlChar + '|';
SELECT #InputString AS OhNoInvisibleCharacters
DECLARE #ControlCharRange NVARCHAR(50) = N'%[' + NCHAR(1) + '-' + NCHAR(31) + ']%';
CREATE TABLE #SillyTemp
(
input nvarchar(500)
)
INSERT INTO #SillyTemp(input)
VALUES (#InputString),(N'A normal string')
SELECT #ControlCharRange;
SELECT input FROM #SillyTemp AS #SI WHERE input LIKE #ControlCharRange;
This produces 3 results. A string with invisiblechars within them like such:
Some || random string ||; Thank god Finally a normal character |!|
Note, the are actually invisible inside SQL. But stackoverflow shows them as such. The output in SQL Server is simply.
Some || random string ||; Thank god Finally a normal character |!|
But these characters still have a corresponding (N)CHAR(X) value. (N)CHAR(0) is a NULL character and is highly unlikely to be in a string, in my setup to detect them it also provides some problems in building a range. (N)CHAR(32) is the ' ' space character.
The way the [X-Y] string operator works is also based on the (N)CHAR numbers. Therefore we can make a range of [NCHAR(1)-NCHAR(31)]
The last select goes through the temporary table, one which has invisible characters. Since we're looking for any NCHARS between 1 and 31, only those with invisible characters (and often invalid characters or tabs/newlines) satisfy the where condition. Thus only they get returned. In this case only the 'faulty' string gets returned in my select statement.

SQL Command to replace embedded spaces with another character

I have a relational database with several fixed length character fields. I need to permanently replace all the embedded spaces with another character like - so JOHN DOE would become JOHN-DOE and ITSY BISTSY SPIDER would become ITSY-BISTSY-SPIDER. I can search before hand to make sure there are no strings that would conflict. I just need to be able to print the requested files with no embedded spaces. I would do the replacement in the C code but I want to make sure that there is never a future case where there is a JANE DOE and JANE-DOE in the DB.
By the way I have already made sure that there are no strings with more than one consecutive embedded space or leading spaces only trailing spaces to fill the fixed length fields.
Edit: thanks for all the help!
It looks like when I cut & pasted my question from Word to StackOverflow the trailing spaces got lost so the meaning my question was lost a bit.
I need to replace only the embedded spaces not the trailing spaces!
Note: I am using middle dot to stand in for spaces that don't show well.
Using:
SELECT REPLACE(operator_name, ' ', '-') FROM operator_info ;
the string JOHN·DOE············ became JOHN-DOE------------.
I need JOHN-DOE············.
I am thinking I need to use aliasing and the TRIM command but not sure how.
With whatever REPLACE function is built into your particular database.
MySQL:
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_replace
Oracle:
http://psoug.org/reference/translate_replace.html
SQLServer:
http://msdn.microsoft.com/en-us/library/ms186862.aspx
Edits below based on your comment.
I've done this in SQLServer syntax so please modify the example as needed. The first example really breaks down what's going on and the second one bunches it all into a single ugly query :D
#output in this case contains your final value.
DECLARE #input VARCHAR (100) = ' some test ';
DECLARE #trimmed VARCHAR (100);
DECLARE #replaced VARCHAR (100);
DECLARE #output VARCHAR (100);
-- Get just the inner text without the preceding / trailing spaces.
SET #trimmed = LTRIM (RTRIM (#input));
-- Replace the spaces *inside* the trimmed text with a dash.
SET #replaced = REPLACE (#trimmed, ' ', '-');
-- Take the original text and replace the trimmed version (with the inner spaces) with the dash version.
SET #output = REPLACE (#input, #trimmed, #replaced);
-- Show each step of the process!
SELECT #input AS INPUT,
#trimmed AS TRIMMED,
#replaced AS REPLACED,
#output AS OUTPUT;
And as a SELECT statement.
DECLARE #inputTable TABLE (Value VARCHAR (100) NOT NULL);
INSERT INTO #inputTable (Value)
VALUES (' some test '),
(' another test ');
SELECT REPLACE (Value,
LTRIM (RTRIM (Value)),
REPLACE (LTRIM (RTRIM (Value)), ' ', '-'))
FROM #inputTable;
If you are using MSSQL:
SELECT REPLACE(field_name,' ','-');
Edit: After the requirement about skipping the trailing spaces.
You can try this one-liner:
SELECT REPLACE(RTRIM(#name), ' ', '-') + SUBSTRING(#name, LEN(RTRIM(#name)) + 1, LEN(#NAME))
However I would recommend that you put it into a user defined function instead.
assuming SQL Server:
update TABLE set column = replace (column, ' ','-')
SELECT REPLACE(field_name,' ','-');
Edit: After the requirement about skipping the trailing spaces. You can try this one-liner:
SELECT REPLACE(RTRIM(#name), ' ', '-') + SUBSTRING(#name, LEN(RTRIM(#name)) + 1;

Remove leading zeros

Given data in a column which look like this:
00001 00
00026 00
I need to use SQL to remove anything after the space and all leading zeros from the values so that the final output will be:
1
26
How can I best do this?
Btw I'm using DB2
This was tested on DB2 for Linux/Unix/Windows and z/OS.
You can use the LOCATE() function in DB2 to find the character position of the first space in a string, and then send that to SUBSTR() as the end location (minus one) to get only the first number of the string. Casting to INT will get rid of the leading zeros, but if you need it in string form, you can CAST again to CHAR.
SELECT CAST(SUBSTR(col, 1, LOCATE(' ', col) - 1) AS INT)
FROM tab
In DB2 (Express-C 9.7.5) you can use the SQL standard TRIM() function:
db2 => CREATE TABLE tbl (vc VARCHAR(64))
DB20000I The SQL command completed successfully.
db2 => INSERT INTO tbl (vc) VALUES ('00001 00'), ('00026 00')
DB20000I The SQL command completed successfully.
db2 => SELECT TRIM(TRIM('0' FROM vc)) AS trimmed FROM tbl
TRIMMED
----------------------------------------------------------------
1
26
2 record(s) selected.
The inner TRIM() removes leading and trailing zero characters, while the outer trim removes spaces.
This worked for me on the AS400 DB2.
The "L" stands for Leading.
You can also use "T" for Trailing.
I am assuming the field type is currently VARCHAR, do you need to store things other than INTs?
If the field type was INT, they would be removed automatically.
Alternatively, to select the values:
SELECT (CAST(CAST Col1 AS int) AS varchar) AS Col1
I found this thread for some reason and find it odd that no one actually answered the question. It seems that the goal is to return a left adjusted field:
SELECT
TRIM(L '0' FROM SUBSTR(trim(col) || ' ',1,LOCATE(' ',trim(col) || ' ') - 1))
FROM tab
One option is implicit casting: SELECT SUBSTR(column, 1, 5) + 0 AS column_as_number ...
That assumes that the structure is nnnnn nn, ie exactly 5 characters, a space and two more characters.
Explicit casting, ie SUBSTR(column,1,5)::INT is also a possibility, but exact syntax depends on the RDBMS in question.
Use the following to achieve this when the space location is variable, or even when it's fixed and you want to make a more robust query (in case it moves later):
SELECT CAST(SUBSTR(LTRIM('00123 45'), 1, CASE WHEN LOCATE(' ', LTRIM('00123 45')) <= 1 THEN LEN('00123 45') ELSE LOCATE(' ', LTRIM('00123 45')) - 1 END) AS BIGINT)
If you know the column will always contain a blank space after the start:
SELECT CAST(LOCATE(LTRIM('00123 45'), 1, LOCATE(' ', LTRIM('00123 45')) - 1) AS BIGINT)
both of these result in:
123
so your query would
SELECT CAST(SUBSTR(LTRIM(myCol1), 1, CASE WHEN LOCATE(' ', LTRIM(myCol1)) <= 1 THEN LEN(myCol1) ELSE LOCATE(' ', LTRIM(myCol1)) - 1 END) AS BIGINT)
FROM myTable1
This removes any content after the first space character (ignoring leading spaces), and then converts the remainder to a 64bit integer which will then remove all leading zeroes.
If you want to keep all the numbers and just remove the leading zeroes and any spaces you can use:
SELECT CAST(REPLACE('00123 45', ' ', '') AS BIGINT)
While my answer might seem quite verbose compared to simply SELECT CAST(SUBSTR(myCol1, 1, 5) AS BIGINT) FROM myTable1 but it allows for the space character to not always be there, situations where the myCol1 value is not of the form nnnnn nn if the string is nn nn then the convert to int will fail.
Remember to be careful if you use the TRIM function to remove the leading zeroes, and actually in all situations you will need to test your code with data like 00120 00 and see if it returns 12 instead of the correct value of 120.

SQL Server TRIM character

I have the following string: 'BOB*', how do I trim the * so it shows up as 'BOB'
I tried the RTRIM('BOB*','*') but does not work as says needs only 1 parameter.
Another pretty good way to implement Oracle's TRIM char FROM string in MS SQL Server is the following:
First, you need to identify a char that will never be used in your string, for example ~
You replace all spaces with that character
You replace the character * you want to trim with a space
You LTrim + RTrim the obtained string
You replace back all spaces with the trimmed character *
You replace back all never-used characters with a space
For example:
REPLACE(REPLACE(LTrim(RTrim(REPLACE(REPLACE(string,' ','~'),'*',' '))),' ','*'),'~',' ')
CREATE FUNCTION dbo.TrimCharacter
(
#Value NVARCHAR(4000),
#CharacterToTrim NVARCHAR(1)
)
RETURNS NVARCHAR(4000)
AS
BEGIN
SET #Value = LTRIM(RTRIM(#Value))
SET #Value = REVERSE(SUBSTRING(#Value, PATINDEX('%[^'+#CharacterToTrim+']%', #Value), LEN(#Value)))
SET #Value = REVERSE(SUBSTRING(#Value, PATINDEX('%[^'+#CharacterToTrim+']%', #Value), LEN(#Value)))
RETURN #Value
END
GO
--- Example
----- SELECT dbo.TrimCharacter('***BOB*********', '*')
----- returns 'BOB'
If you want to remove all asterisks then it's obvious:
SELECT REPLACE('Hello*', '*', '')
However, If you have more than one asterisk at the end and multiple throughout, but are only interested in trimming the trailing ones, then I'd use this:
DECLARE #String VarChar(50) = '**H*i****'
SELECT LEFT(#String, LEN(REPLACE(#String, '*', ' '))) --Returns: **H*i
I updated this answer to include show how to remove leading characters:
SELECT RIGHT(#String, LEN(REPLACE(REVERSE(#String), '*', ' '))) --Returns: H*i****
LEN() has a "feature" (that looks a lot like a bug) where it does not count trailing spaces.
LEFT('BOB*', LEN('BOB*')-1)
should do it.
If you wanted behavior similar to how RTRIM handles spaces i.e. that "B*O*B**" would turn into "B*O*B" without losing the embedded ones then something like -
REVERSE(SUBSTRING(REVERSE('B*O*B**'), PATINDEX('%[^*]%',REVERSE('B*O*B**')), LEN('B*O*B**') - PATINDEX('%[^*]%', REVERSE('B*O*B**')) + 1))
Should do it.
If you only want to remove a single '*' character from the value when the value ends with a '*', a simple CASE expression will do that for you:
SELECT CASE WHEN RIGHT(foo,1) = '*' THEN LEFT(foo,LEN(foo)-1) ELSE foo END AS foo
FROM (SELECT 'BOB*' AS foo)
To remove all trailing '*' characters, then you'd need a more complex expression, making use of the REVERSE, PATINDEX, LEN and LEFT functions.
NOTE: Be careful with the REPLACE function, as that will replace all occurrences of the specified character within the string, not just the trailing ones.
How about.. (in this case to trim off trailing comma or period)
For a variable:
-- Trim commas and full stops from end of City
WHILE RIGHT(#CITY, 1) IN (',', '.'))
SET #CITY = LEFT(#CITY, LEN(#CITY)-1)
For table values:
-- Trim commas and full stops from end of City
WHILE EXISTS (SELECT 1 FROM [sap_out_address] WHERE RIGHT([CITY], 1) IN (',', '.'))
UPDATE [sap_out_address]
SET [CITY] = LEFT([CITY], LEN([CITY])-1)
WHERE RIGHT([CITY], 1) IN (',', '.')
An other approach ONLY if you want to remove leading and trailing characters is the use of TRIM function.
By default removes white spaces but have te avility of remove other characters if you specify its.
SELECT TRIM('=' FROM '=SPECIALS=') AS Result;
Result
--------
SPECIALS
Unfortunately LTRIM and RTRIM does not work in the same way and only removes white spaces instead of specified characters like TRIM does if you specify its.
Reference and more examples:
https://database.guide/how-to-remove-leading-and-trailing-characters-in-sql-server/
RRIM() LTRIM() only remove spaces try http://msdn.microsoft.com/en-us/library/ms186862.aspx
Basically just replace the * with empty space
REPLACE('TextWithCharacterToReplace','CharacterToReplace','CharacterToReplaceWith')
So you want
REPLACE ('BOB*','*','')
I really like Teejay's answer, and almost stopped there. It's clever, but I got the "almost too clever" feeling, as, somehow, your string at some point will actually have a ~ (or whatever) in it on purpose. So that's not defensive enough for me to put into production.
I like Chris' too, but the PATINDEX call seems like overkill.
Though it's probably a micro-optimization, here's one without PATINDEX:
CREATE FUNCTION dbo.TRIMMIT(#stringToTrim NVARCHAR(MAX), #charToTrim NCHAR(1))
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE #retVal NVARCHAR(MAX)
SET #retVal = #stringToTrim
WHILE 1 = charindex(#charToTrim, reverse(#retVal))
SET #retVal = SUBSTRING(#retVal,0,LEN(#retVal))
WHILE 1 = charindex(#charToTrim, #retVal)
SET #retVal = SUBSTRING(#retVal,2,LEN(#retVal))
RETURN #retVal
END
--select dbo.TRIMMIT('\\trim\asdfds\\\', '\')
--trim\asdfds
Returning a MAX nvarchar bugs me a little, but that's the most flexible way to do this..
I've used a similar approach to some of the above answers of using pattern matching and reversing the string to find the first non-trimmable character, then cutting that off. The difference is this version does less work than those above, so should be a little more efficient.
This creates RTRIM functionality for any specified character.
It includes an additional step set #charToFind = case... to escape the chosen character.
There is currently an issue if #charToReplace is a right crotchet (]) as there appears to be no way to escape this.
.
declare #stringToSearch nvarchar(max) = '****this is****a ** demo*****'
, #charToFind nvarchar(5) = '*'
--escape #charToFind so it doesn't break our pattern matching
set #charToFind = case #charToFind
when ']' then '[]]' --*this does not work / can't find any info on escaping right crotchet*
when '^' then '\^'
--when '%' then '%' --doesn't require escaping in this context
--when '[' then '[' --doesn't require escaping in this context
--when '_' then '_' --doesn't require escaping in this context
else #charToFind
end
select #stringToSearch
, left
(
#stringToSearch
,1
+ len(#stringToSearch)
- patindex('%[^' + #charToFind + ']%',reverse(#stringToSearch))
)
SqlServer2017 has a new way to do it: https://learn.microsoft.com/en-us/sql/t-sql/functions/trim-transact-sql?view=sql-server-2017
SELECT TRIM('0' FROM '00001900'); -> 19
SELECT TRIM( '.,! ' FROM '# test .'); -> # test
SELECT TRIM('*' FROM 'BOB*'); --> BOB
Unfortunately, RTRIM does not support trimming a specific character.
SELECT REPLACE('BOB*', '*', '')
SELECT REPLACE('B*OB*', '*', '')
-------------------------------------
Result : BOB
-------------------------------------
this will replace all asterisk* from the text
Trim with many cases
--id = 100 101 102 103 104 105 106 107 108 109 110 111
select right(id,2)+1 from ordertbl -- 1 2 3 4 5 6 7 8 9 10 11 -- last two positions are taken
select LEFT('BOB', LEN('BOB')-1) -- BO
select LEFT('BOB*',1) --B
select LEFT('BOB*',2) --BO
Try this:
Original
select replace('BOB*','*','')
Fixed to be an exact replacement
select replace('BOB*','BOB*','BOB')
Solution for one char parameter:
rtrim('0000100','0') ->
select left('0000100',len(rtrim(replace('0000100','0',' '))))
ltrim('0000100','0') ->
select right('0000100',len(replace(ltrim(replace('0000100','0',' ')),' ','.')))
#teejay solution is great. But the code below can be more understandable:
declare #X nvarchar(max)='BOB *'
set #X=replace(#X,' ','^')
set #X=replace(#X,'*',' ')
set #X= ltrim(rtrim(#X))
set #X=replace(#X,'^',' ')
Here's a function I used in the past. Note that while you can make it more general purpose by having extra parameters like the character(s) you wish to remove and what you will be replacing the space character(s) with, this greatly increases execution time. Here, I used a pipe to replace spaces AFTER pre-trimming the input. Change varchar to nvarchar if required.
CREATE FUNCTION [dbo].[TrimColons]
(
#strToTrim varchar(500)
)
RETURNS varchar(500)
AS
BEGIN
RETURN REPLACE(REPLACE(LTRIM(RTRIM(REPLACE(REPLACE(LTRIM(RTRIM(#strToTrim)),' ','|'),':',' '))),' ',':'),'|',' ')
/*
Here's a breakdown of this fancy, schmancy, trimmer
LTRIM(RTRIM(#strToTrim)) trims leading & trailing spaces first
REPLACE(LTRIM(RTRIM(#strToTrim)),' ','|') replaces inside spaces with pipe char
REPLACE(REPLACE(LTRIM(RTRIM(#strToTrim)),' ','|'),':',' ') replaces demarc character, the colon, with spaces
LTRIM(RTRIM(REPLACE(REPLACE(LTRIM(RTRIM(#strToTrim)),' ','|'),':',' '))) trims the leading & trailing converted-to-space demarc char (colon)
REPLACE(LTRIM(RTRIM(REPLACE(REPLACE(LTRIM(RTRIM(#strToTrim)),' ','|'),':',' '))),' ',':') replaces the inner space characters back to demar char (colon)
REPLACE(REPLACE(LTRIM(RTRIM(REPLACE(REPLACE(LTRIM(RTRIM(#strToTrim)),' ','|'),':',' '))),' ',':'),'|',' ') replaces the pipe characters back to original space characters
*/
END
DECLARE #String VarChar(50) = '**H*i****', #String2 VarChar(50)
--Assign to new variable #String2
;WITH X AS (
SELECT LEFT(#String, LEN(REPLACE(#String, '*', ' '))) [V1]
)
SELECT TOP 1 #String2 = RIGHT(V1, LEN(REPLACE(REVERSE(V1), '*', ' '))) FROM X
SELECT #String [#String], #String2 [#String2]
--See the intermediate values, v0 original, v1 triming end, and v2 trim the v1 leading
;WITH X AS (
SELECT #String V0, LEFT(#String, LEN(REPLACE(#String, '*', ' '))) [V1]
)
SELECT [V0], [V1], RIGHT([V1], LEN(REPLACE(REVERSE([V1]), '*', ' '))) [v2] FROM X