What is the difference when parsing between Tab and Spaces in sql server 2008 R2 - sql

I have encountered a scenario below
Declare #var int = ' 123'
select #var
Declare #var1 int = ' 123'
select #var1
for the first case I have used spaces in front of the value and while execute it returns value as 123
In Second case I have used tab instead of space in front of value and while execute it throws conversion error
Can anyone let know what is the difference between these 2 scenario..

Even though you have put same number of spaces (using spaces and then Tab) the character codes for both of them is different and that is the reason that space and TAB are treated as separately in SQL Server.
More information about character codes and character encoding can be found at below 2 links:-
https://www.computerhope.com/jargon/c/charcode.htm
https://www.pcmag.com/encyclopedia/term/51983/standards-character-codes
Also if you think mathematically and logically:- having spaces before integer numbers does not make sense. It's like having zeros before numbers.
For Example:-' 123' (5 spaces and then 123) is like 00000123.
Yet one more reason that spaces are trimmed before the integer numbers

Related

LTRIM RTRIM not working for Chinese string SQL

I have a column named Text which receives from the end user the following string:
'复合模头滤网 φ245 120目*300目 24×120目 '
Which includes a weird space, kind of larger than the regular space, at the end. However it has the same ASCII code as the normal space 32.
I used this SQL code to trim my string but always returning same string without trimming !!!!
LTRIM(RTRIM([Text]))
The solution is to try trim the the character with the ASCII code 32. The following code works perfectly:
TRIM(CHAR(32) from [ShortText])
To check it out if works , I tried it this way :
DECLARE #t TABLE(txt nvarchar(255));
INSERT INTO #t VALUES (TRIM(CHAR(32) from '复合模头滤网 φ245 120目*300目 24×120目 '));
SELECT txt, LEN((txt)), ASCII(RIGHT(txt,1)) AS ASCII_Char
--32=SPACE,--13 CR,--11 LF, 9-tab
FROM #t
This character is U+3000 IDEOGRAPHIC SPACE, and as documented, SQL Server by default only removes U+0020 SPACE.
You can use TRIM(... FROM in modern versions of SQL Server
DECLARE #t nvarchar(1000) = N'复合模头滤网 φ245 120目*300目 24×120目 ';
SELECT
DATALENGTH(#t) / 2 totalCharacters,
LEN(#t) totalCharactersTrimmed,
TRIM(#t) trimmedNormal,
DATALENGTH(TRIM(#t)) / 2 totaTrimmedNormal,
TRIM(NCHAR(0x3000) FROM #t) trimmedIdeographic,
TRIM(N' ' FROM #t) trimmedIdeographic,
DATALENGTH(TRIM(NCHAR(0x3000) FROM #t)) / 2 totalTrimmedIdeographic;
SELECT
UNICODE(NCHAR(0x3000)) unicodeNum,
ASCII(NCHAR(0x3000)) asciiNum;
db<>fiddle
You claim it has the same ASCII code, however that is just because ASCII does not have an exact character for it. If you use the UNICODE function, you will see the difference, as the fiddle shows.
For such characters as these, you must make sure to use the nvarchar data type, and the NCHAR and UNICODE functions.

How to trim string (with Ideographic space U+3000) in sql server?

I have to trim Japanese characters string which has double byte space at start of string and end of string.
I have to do this by procedure of SQL server 2016.
For Example,
SELECT LTRIM(RTRIM(' A A '))
above one is working perfect
But Problem is in bellows line
SELECT LTRIM(RTRIM(' A A '))
i want output of above one is 'A A'
Have any idea, how to do this ?
Adapted SQL from OP's post:
SELECT LTRIM(RTRIM(REPLACE(' A A ', ' ', ' ')))
Screenshot with result:
The space in that string is the Ideographic space (U+3000) Unicode character, which LTRIM and RTRIM don't recognize as whitespace. Even TRIM in SQL Server 2017 won't recognize it unless it's specified explicitly.
Another problem is that this character is outside the normal range of characters and can't appear in a varchar field or value. This leads to inconsistent results between SQL Server versions. In SQL Server 2014 it will even appear as a ?. In later versions LTRIM/RTRIM may or may not work without emitting the error character. I don't have access to all versions to test this.
In SQL Server 2017 it's possible to explicitly specify the trimmed character, eg :
select trim(N' ' from N' A A ')
This produces A A.
In previous versions, PATINDEX can be used to find the locations of the first and last non-space positions :
declare #str nvarchar(10)=N' A A ';
declare #start int=PATINDEX(N'%[^ ]%',#str)
declare #end int=PATINDEX(N'% ',#str)
SELECT SUBSTRING(#str,#start,#end-#start)
The pattern N'%[^ ]%' finds the first non-U+3000 character in the string. N'% ' finds the position of the last one. SUBSTRING(#str,#start,#end-#start) extracts the content between the two positions.
The result is:
A A
I got solution
Thank you so much for your efforts.
Please use this function for double byte space remove.
CREATE FUNCTION [RTRIMBYTE](#AV_VALUE NVARCHAR(MAX))
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE #AV_RETURN NVARCHAR(MAX) = #AV_VALUE;
WHILE DATALENGTH(#AV_RETURN) > 0 AND RIGHT(#AV_RETURN, 1) in (' ', ' ')
SET #AV_RETURN = LEFT(#AV_RETURN, LEN('X' + #AV_RETURN + 'X') -3 ) ;
RETURN #AV_RETURN;
END;

Min length constraint preventing from inserting spaces into column [duplicate]

I have the following test table in SQL Server 2005:
CREATE TABLE [dbo].[TestTable]
(
[ID] [int] NOT NULL,
[TestField] [varchar](100) NOT NULL
)
Populated with:
INSERT INTO TestTable (ID, TestField) VALUES (1, 'A value'); -- Len = 7
INSERT INTO TestTable (ID, TestField) VALUES (2, 'Another value '); -- Len = 13 + 6 spaces
When I try to find the length of TestField with the SQL Server LEN() function it does not count the trailing spaces - e.g.:
-- Note: Also results the grid view of TestField do not show trailing spaces (SQL Server 2005).
SELECT
ID,
TestField,
LEN(TestField) As LenOfTestField, -- Does not include trailing spaces
FROM
TestTable
How do I include the trailing spaces in the length result?
This is clearly documented by Microsoft in MSDN at http://msdn.microsoft.com/en-us/library/ms190329(SQL.90).aspx, which states LEN "returns the number of characters of the specified string expression, excluding trailing blanks". It is, however, an easy detail on to miss if you're not wary.
You need to instead use the DATALENGTH function - see http://msdn.microsoft.com/en-us/library/ms173486(SQL.90).aspx - which "returns the number of bytes used to represent any expression".
Example:
SELECT
ID,
TestField,
LEN(TestField) As LenOfTestField, -- Does not include trailing spaces
DATALENGTH(TestField) As DataLengthOfTestField -- Shows the true length of data, including trailing spaces.
FROM
TestTable
You can use this trick:
LEN(Str + 'x') - 1
I use this method:
LEN(REPLACE(TestField, ' ', '.'))
I prefer this over DATALENGTH because this works with different data types, and I prefer it over adding a character to the end because you don't have to worry about the edge case where your string is already at the max length.
Note: I would test the performance before using it against a very large data set; though I just tested it against 2M rows and it was no slower than LEN without the REPLACE...
"How do I include the trailing spaces in the length result?"
You get someone to file a SQL Server enhancement request/bug report because nearly all the listed workarounds to this amazingly simple issue here have some deficiency or are inefficient. This still appears to be true in SQL Server 2012. The auto trimming feature may stem from ANSI/ISO SQL-92 but there seems to be some holes (or lack of counting them).
Please vote up "Add setting so LEN counts trailing whitespace" here:
https://feedback.azure.com/forums/908035-sql-server/suggestions/34673914-add-setting-so-len-counts-trailing-whitespace
Retired Connect link:
https://connect.microsoft.com/SQLServer/feedback/details/801381
There are problems with the two top voted answers. The answer recommending DATALENGTH is prone to programmer errors. The result of DATALENGTH must be divided by the 2 for NVARCHAR types, but not for VARCHAR types. This requires knowledge of the type you're getting the length of, and if that type changes, you have to diligently change the places you used DATALENGTH.
There is also a problem with the most upvoted answer (which I admit was my preferred way to do it until this problem bit me). If the thing you are getting the length of is of type NVARCHAR(4000), and it actually contains a string of 4000 characters, SQL will ignore the appended character rather than implicitly cast the result to NVARCHAR(MAX). The end result is an incorrect length. The same thing will happen with VARCHAR(8000).
What I've found works, is nearly as fast as plain old LEN, is faster than LEN(#s + 'x') - 1 for large strings, and does not assume the underlying character width is the following:
DATALENGTH(#s) / DATALENGTH(LEFT(LEFT(#s, 1) + 'x', 1))
This gets the datalength, and then divides by the datalength of a single character from the string. The append of 'x' covers the case where the string is empty (which would give a divide by zero in that case). This works whether #s is VARCHAR or NVARCHAR. Doing the LEFT of 1 character before the append shaves some time when the string is large. The problem with this though, is that it does not work correctly with strings containing surrogate pairs.
There is another way mentioned in a comment to the accepted answer, using REPLACE(#s,' ','x'). That technique gives the correct answer, but is a couple orders of magnitude slower than the other techniques when the string is large.
Given the problems introduced by surrogate pairs on any technique that uses DATALENGTH, I think the safest method that gives correct answers that I know of is the following:
LEN(CONVERT(NVARCHAR(MAX), #s) + 'x') - 1
This is faster than the REPLACE technique, and much faster with longer strings. Basically this technique is the LEN(#s + 'x') - 1 technique, but with protection for the edge case where the string has a length of 4000 (for nvarchar) or 8000 (for varchar), so that the correct answer is given even for that. It also should handle strings with surrogate pairs correctly.
LEN cuts trailing spaces by default, so I found this worked as you move them to the front
(LEN(REVERSE(TestField))
So if you wanted to, you could say
SELECT
t.TestField,
LEN(REVERSE(t.TestField)) AS [Reverse],
LEN(t.TestField) AS [Count]
FROM TestTable t
WHERE LEN(REVERSE(t.TestField)) <> LEN(t.TestField)
Don't use this for leading spaces of course.
You need also to ensure that your data is actually saved with the trailing blanks. When ANSI PADDING is OFF (non-default):
Trailing blanks in character values
inserted into a varchar column are
trimmed.
You should define a CLR function that returns the String's Length field, if you dislike string concatination.
I use LEN('x' + #string + 'x') - 2 in my production use-cases.
If you dislike the DATALENGTH because of of n/varchar concerns, how about:
select DATALENGTH(#var)/isnull(nullif(DATALENGTH(left(#var,1)),0),1)
which is just
select DATALENGTH(#var)/DATALENGTH(left(#var,1))
wrapped with divide-by-zero protection.
By dividing by the DATALENGTH of a single char, we get the length normalised.
(Of course, still issues with surrogate-pairs if that's a concern.)
This is the best algorithm I've come up with which copes with the maximum length and variable byte count per character issues:
ISNULL(LEN(STUFF(#Input, 1, 1, '') + '.'), 0)
This is a variant of the LEN(#Input + '.') - 1 algorithm but by using STUFF to remove the first character we ensure that the modified string doesn't exceed maximum length and remove the need to subtract 1.
ISNULL(..., 0) is added to deal with the case where #Input = '' which causes STUFF to return NULL.
This does have the side effect that the result is also 0 when #Input is NULL which is inconsistent with LEN(NULL) which returns NULL, but this could be dealt with by logic outside this function if need be
Here are the results using LEN(#Input), LEN(#Input + '.') - 1, LEN(REPLACE(#Input, ' ', '.')) and the above STUFF variant, using a sample of #Input = CAST(' S' + SPACE(3998) AS NVARCHAR(4000)) over 1000 iterations
Algorithm
DataLength
ExpectedResult
Result
ms
LEN
8000
4000
2
14
+DOT-1
8000
4000
1
13
REPLACE
8000
4000
4000
514
STUFF+DOT
8000
4000
4000
0
In this case the STUFF algorithm is actually faster than LEN()!
I can only assume that internally SQL looks at the last character and if it is not a space then optimizes the calculation
But that's a good result eh?
Don't use the REPLACE option unless you know your strings are small - it's hugely inefficient
use
SELECT DATALENGTH('string ')

Cut a certain string from a biggest one which could have a variable length. (SQL SERVER)

I have a small problem with a string that I need to take a specific part of.
I know exactly where it begins but it could have variable lengths, so I can not use SUBSTRING where I have a fixed length of Chars that I could use.
Example:
EREF+322345 KARR CUSTOMER1 ....
EREF+3211234 KARR CUSTROMER2....
I need to take the number after the + sign till the space before the word KARR begins.
In the first line it is 322345 ( 6 chars) and in the second it is 3211234 which is 7 chars so the length could be variable.
SUBSTRING doesnt help me in this case because i do not have a fix number of chars what i want to cut of the string.
Any suggestions?
This will work :
declare #a varchar(255)='EREF+322345 KARR CUSTOMER1'
select substring(#a,charindex('+',#a,1)+1,charindex(' ',#a,1)-charindex('+',#a,1))
-------------------------------
declare #a varchar(255)='EREF+3211234 KARR CUSTROMER2.'
select substring(#a,charindex('+',#a,1)+1,charindex(' ',#a,1)-charindex('+',#a,1))

Reverse characters in string with mixed Left-to-right and Right-to-left languages using SQL?

I have string values in my table which includes Hebrew characters (or any R-T-L language for this case) and English ones (or numbers).
The problem is that the English characters are reversed and looks like:
בדיקה 123456 esrever sti fI kcehC.
The numbers and the English characters are reversed, the Hebrew ones are fine.
How can I use built in SQL functions to identify the English substring (and the numbers) and reverse it while maintain the order on the other RTL characters? Any workaround will do :-) ...
thanks
I believe that your entire string is reversed and the fact that the Hebrew words are displaying in the correct order is actually the result of a different problem. What I suspect is that the Hebrew words are stored in a non-lexical order.
In theory you should be able to resolve your problem by simply reversing the string and then force SQL Server to display the Arabic words from left to right. This is done by appending a special character to the front and back of your string as follow:
DECLARE #sourceString NVARCHAR(100) = N'123456 בדיקה esrever sti fI kcehC';
DECLARE #reversedString NVARCHAR(4000) = nchar(8237) + REVERSE(#sourceString) + nchar(8236)
SELECT #reversedString;
I've never worked with Hebrew characters so I'm not sure if this will work,
However I think you can implement a function with a while loop using patindex
you'll need a variable for holding the reversed english part #EngTemp
a variable to hold the substring currently being processed #SubTemp
a variable to hold the remaining text in the string that still needs to be processed #SubNext
a variable to hold the length of the current substring #Len
an output variable #Out
Steps:
Take a string input, put it into #SubNext
while PatIndex('%[^a-z]%', #SubNext) > 0
substring to the pat index store in #SubTemp, also trim #SubNext to the patindex
store the length of the #SubTemp in #Len
if #Len > 1; set #Out = #Out + #EngTemp + #SubTemp; Set #EngTemp = ''
(This step assumes the possibility that there could be cases where the english string is not the end of the line)
if #Len = 1; set #EngTemp = #SubTemp + #EngTemp
if #Len = 0; set #Out = #Out + #EngTemp
(At this point the loop should close also)
I'm going to play with this when I have some time and post actual code, sorry if my scribbles doesn't make any sense
You can use ASCII function in SQL Server for getting the ascii value of characters in the text field in DB. Once you get the ascii value, compare that against the valid range of english visible characters and numerals. Anything else can be considered as Hebrew character.
Also there exists REVERSE function automatically in SQL Server for reversing the string as required.
Following link has some sample code.
http://www.sqlservercurry.com/2009/09/how-to-find-ascii-value-of-each.html