How to trim string (with Ideographic space U+3000) in sql server? - sql

I have to trim Japanese characters string which has double byte space at start of string and end of string.
I have to do this by procedure of SQL server 2016.
For Example,
SELECT LTRIM(RTRIM(' A A '))
above one is working perfect
But Problem is in bellows line
SELECT LTRIM(RTRIM(' A A '))
i want output of above one is 'A A'
Have any idea, how to do this ?

Adapted SQL from OP's post:
SELECT LTRIM(RTRIM(REPLACE(' A A ', ' ', ' ')))
Screenshot with result:

The space in that string is the Ideographic space (U+3000) Unicode character, which LTRIM and RTRIM don't recognize as whitespace. Even TRIM in SQL Server 2017 won't recognize it unless it's specified explicitly.
Another problem is that this character is outside the normal range of characters and can't appear in a varchar field or value. This leads to inconsistent results between SQL Server versions. In SQL Server 2014 it will even appear as a ?. In later versions LTRIM/RTRIM may or may not work without emitting the error character. I don't have access to all versions to test this.
In SQL Server 2017 it's possible to explicitly specify the trimmed character, eg :
select trim(N' ' from N' A A ')
This produces A A.
In previous versions, PATINDEX can be used to find the locations of the first and last non-space positions :
declare #str nvarchar(10)=N' A A ';
declare #start int=PATINDEX(N'%[^ ]%',#str)
declare #end int=PATINDEX(N'% ',#str)
SELECT SUBSTRING(#str,#start,#end-#start)
The pattern N'%[^ ]%' finds the first non-U+3000 character in the string. N'% ' finds the position of the last one. SUBSTRING(#str,#start,#end-#start) extracts the content between the two positions.
The result is:
A A

I got solution
Thank you so much for your efforts.
Please use this function for double byte space remove.
CREATE FUNCTION [RTRIMBYTE](#AV_VALUE NVARCHAR(MAX))
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE #AV_RETURN NVARCHAR(MAX) = #AV_VALUE;
WHILE DATALENGTH(#AV_RETURN) > 0 AND RIGHT(#AV_RETURN, 1) in (' ', ' ')
SET #AV_RETURN = LEFT(#AV_RETURN, LEN('X' + #AV_RETURN + 'X') -3 ) ;
RETURN #AV_RETURN;
END;

Related

LTRIM RTRIM not working for Chinese string SQL

I have a column named Text which receives from the end user the following string:
'复合模头滤网 φ245 120目*300目 24×120目 '
Which includes a weird space, kind of larger than the regular space, at the end. However it has the same ASCII code as the normal space 32.
I used this SQL code to trim my string but always returning same string without trimming !!!!
LTRIM(RTRIM([Text]))
The solution is to try trim the the character with the ASCII code 32. The following code works perfectly:
TRIM(CHAR(32) from [ShortText])
To check it out if works , I tried it this way :
DECLARE #t TABLE(txt nvarchar(255));
INSERT INTO #t VALUES (TRIM(CHAR(32) from '复合模头滤网 φ245 120目*300目 24×120目 '));
SELECT txt, LEN((txt)), ASCII(RIGHT(txt,1)) AS ASCII_Char
--32=SPACE,--13 CR,--11 LF, 9-tab
FROM #t
This character is U+3000 IDEOGRAPHIC SPACE, and as documented, SQL Server by default only removes U+0020 SPACE.
You can use TRIM(... FROM in modern versions of SQL Server
DECLARE #t nvarchar(1000) = N'复合模头滤网 φ245 120目*300目 24×120目 ';
SELECT
DATALENGTH(#t) / 2 totalCharacters,
LEN(#t) totalCharactersTrimmed,
TRIM(#t) trimmedNormal,
DATALENGTH(TRIM(#t)) / 2 totaTrimmedNormal,
TRIM(NCHAR(0x3000) FROM #t) trimmedIdeographic,
TRIM(N' ' FROM #t) trimmedIdeographic,
DATALENGTH(TRIM(NCHAR(0x3000) FROM #t)) / 2 totalTrimmedIdeographic;
SELECT
UNICODE(NCHAR(0x3000)) unicodeNum,
ASCII(NCHAR(0x3000)) asciiNum;
db<>fiddle
You claim it has the same ASCII code, however that is just because ASCII does not have an exact character for it. If you use the UNICODE function, you will see the difference, as the fiddle shows.
For such characters as these, you must make sure to use the nvarchar data type, and the NCHAR and UNICODE functions.

What is the difference when parsing between Tab and Spaces in sql server 2008 R2

I have encountered a scenario below
Declare #var int = ' 123'
select #var
Declare #var1 int = ' 123'
select #var1
for the first case I have used spaces in front of the value and while execute it returns value as 123
In Second case I have used tab instead of space in front of value and while execute it throws conversion error
Can anyone let know what is the difference between these 2 scenario..
Even though you have put same number of spaces (using spaces and then Tab) the character codes for both of them is different and that is the reason that space and TAB are treated as separately in SQL Server.
More information about character codes and character encoding can be found at below 2 links:-
https://www.computerhope.com/jargon/c/charcode.htm
https://www.pcmag.com/encyclopedia/term/51983/standards-character-codes
Also if you think mathematically and logically:- having spaces before integer numbers does not make sense. It's like having zeros before numbers.
For Example:-' 123' (5 spaces and then 123) is like 00000123.
Yet one more reason that spaces are trimmed before the integer numbers

Remove last x characters until a specific character

I got this string /uk-en/contact-us/frequently-asked-questions/your-trip/there-wi-fi-access-in-the-eurostar-terminals-and-board-your-trains and I need to get just the last part of the URL (until the last /).
Then I want to replace '-' with a space. The strings are not with the same number of characters.
How can I do?
Thank you!
Solution using BigQuery functions:
select regexp_replace(last(split(x, "/")), "-", " ") from
(select
"/uk-en/contact-us/frequently-asked-questions/your-trip/there-wi-fi-access-in-the-eurostar-terminals-and-board-your-trains"
as x)
Here is what I tried in SQL Server
DECLARE #s VARCHAR(max)= '/uk-en/contact-us/frequently-asked-questions/your-trip/there-wi-fi-access-in-the-eurostar-terminals-and-board-your-trains'
SELECT REVERSE(SUBSTRING(REVERSE(#s),CHARINDEX('/',REVERSE(#s)),LEN(REVERSE(#s))))+REVERSE(REPLACE(SUBSTRING(REVERSE(#s),1,CHARINDEX('/',REVERSE(#s))-1),'-',' '))
Sorry this was for SQL Server
did you try using split in big query
SPLIT('str' [, 'delimiter']) Returns a set of substrings as a repeated string. If delimiter is specified, the SPLIT function breaks str into substrings, using delimiter as the delimiter.

Transact SQL replace part of string

Is it possible to delete part of string using regexp (or something else, may be something like CHARINDEX could help) in SQL query?
I use MS SQL Server (2008 most likely).
Example: I have strings like "[some useless info] Useful part of string" I want to delete parts with text in brackets if they are in line.
Use REPLACE
for example :
UPDATE authors SET city = replace(city, 'To Remove', 'With BLACK or Whatever')
WHERE city LIKE 'Salt%'; // with where condition
You can use the PATINDEX function. Its not a complete regular expression implementation but you can use it for simple things.
PATINDEX (Transact-SQL)> Returns the starting position of the first occurrence of a pattern in a specified expression, or zeros if the pattern is not found, on all valid text and character data types.
OR You can use CLR to extend the SQL Server with a complete regular expression implementation.
SQL Server 2005: CLR Integration
SELECT * FROM temp where replace(replace(replace(url,'http://',''),'www.',''),'https://','')='"+url+"';
You can use STUFF to insert a string into another string. It deletes a specified length of characters in the first string at the start position and then inserts the second string into the first string at the start position.
For example, the code below, replaces the 5 with 666666:
DECLARE #Variable NVARCHAR(MAX) = '12345678910'
SELECT STUFF(#Variable, 5, 1, '666666')
Note, that the second argument is not a string, it is a position and you are able to calculate it position using CHARINDEX for example.
Here is your case:
DECLARE #Variable NVARCHAR(MAX) = '[some useless info] Useful part of string'
SELECT STUFF(
#Variable
,CHARINDEX('[', #Variable)
,LEN(SUBSTRING(#Variable, CHARINDEX('[', #Variable), CHARINDEX(']', #Variable) - LEN(SUBSTRING(#Variable, 0, CHARINDEX('[', #Variable)))))
,''
)
Finally helps REPLACE, SUBSTRING and PATINDEX.
REPLACE(t.badString, Substring(t.badString , Patindex('%[%' , t.badString)+1 , Patindex('%]%' , t.badString)), '').
Thanks to all.

Why does CHARINDEX function return an index for 'Œ' in string 'manoeuvre'?

I have this SQL code
declare #s varchar(8000) = 'manoeuvre'
select CHARINDEX(char(140), #s, 0)
char(140) = Œ, which dose not exist in the string 'manoeuvre'.
yet SQL server returns the following
4 (indicating it had located the char(140) on this line)
if I replace 'Œ' with a '*' I get
man*uvre
it seem like SQL has replaced the 'o' and 'e' with the one character, but why?
why is is replacing 'oe' with 'Œ'?
the same effect can be see with the string 'mass' and 'ß' (which I believe is German for double s). replacing on this character returns the sting 'ma*'.
Is SQL trying to do something "smart" under the covers?
EDIT
Extra information:
SQL server 2008 R2.
collation of database is Latin1_General_CI_AS.
If you look up that sign (ASCII 140) it is described as
capital OE ligature
See www.table-ascii.com for instance
try
select CHARINDEX(char(140), #s COLLATE Latin1_General_BIN, 0)
which will do a binary search.