I am trying to write a WHERE clause for where a certain string variable is not null or empty. The problem I am running into is that certain non-empty strings equal the N'' literal. For instance:
declare #str nvarchar(max) = N'㴆';
select case when #str = N'' then 1 else 0 end;
Yields 1. From what I can gather on Wikipedia, this particular unicode character is a pictograph for submerging something, which is not semantically equal to an empty string. Also, the string length is 1, at least in T-SQL.
Is there a better (accurate) way to check a T-SQL variable for the empty string?
I found a blog, https://bbzippo.wordpress.com/2013/09/10/sql-server-collations-and-string-comparison-issues/
which explained that
The problem is because the “default” collation setting
(SQL_Latin1_General_CP1_CI_AS) for SQL Server cannot properly compare
Unicode strings that contain so called Supplementary Characters
(4-byte characters).
A fix is to use a collation that doesn't have problems with the supplementary characters. For example:
select case when N'㴆' COLLATE Latin1_General_100_CI_AS_KS_WS = N'' then 1 else 0 end;
will return 0. See the blog for more examples.
Since you are comparing to the empty string, another solution would be to test the string length.
declare #str1 nvarchar(max) =N'㴆';
select case when len(#str1) = 0 then 1 else 0 end;
This will return 0 as expected.
This also yields 0 when the string is null.
EDIT:
Thanks to devio's comment, I dug a bit deeper and found a comment from Erland Sommarskog https://groups.google.com/forum/#!topic/microsoft.public.sqlserver.server/X8UhQaP9KF0
that in addition to not supporting Supplementary Characters, the Latin1_General_CP1_CI_AS collation doesn't handle new Unicode characters correctly. So I'm guessing that the 㴆 character is a new Unicode character.
Specifying the collation Latin1_General_100_CI_AS will also fix this issue.
Related
When searching for a string in our database where the column is of type nvarchar, specifying the 'N' prefix in the query nets some results. Leaving it out does not. I am trying the search for a Simplified Chinese string in a database that previously did not store any Chinese strings yet.
The EntityFramework application that uses the database, correctly retrieves the strings and the LINQ queries also work in the application. However, in SQL Server 2014 Management Studio, when I do a an SQL query for the string it does not show up unless I specify the 'N' prefix for unicode. (Even though the column is nvarchar type)
Works:
var text = from asd in Translations.TranslationStrings
where asd.Text == "嗄法吖无上几"
select asd;
MessageBox.Show(text.FirstOrDefault().Text);
Does not work:
SELECT *
FROM TranslationStrings
where Text = '嗄法吖无上几'
If I prefix the Chinese characters with 'N' it works.
Works:
SELECT *
FROM TranslationStrings
where Text = N'嗄法吖无上几'
Please excuse the Chinese characters, I just typed something random. My question is, is there something I can do to not have to include the 'N' prefix when doing a query?
Thank you very much!
As #sworkalot has mentioned below:
The default for .Net is Unicode, that's why you don't need to specify
it. This is not the case for Sql Manager.
If not specified Sql will assume that you work with asci according to
the collation specified in your DB.
Hence, when working from Sql Server you need to use N'
https://sqlquantumleap.com/2018/09/28/native-utf-8-support-in-sql-server-2019-savior-false-prophet-or-both/
Check out these examples, pay close attention to the data types and the values being assigned:
DECLARE #Varchar VARCHAR(100) = '嗄'
DECLARE #VarcharWithN VARCHAR(100) = N'嗄' -- Has N prefix
DECLARE #NVarchar NVARCHAR(100) = '嗄'
DECLARE #NVarcharWithN NVARCHAR(100) = N'嗄' -- Has N prefix
SELECT
Varchar = #Varchar,
VarcharWithN = #VarcharWithN,
NVarchar = #NVarchar,
NVarcharWithN = #NVarcharWithN
SELECT
Varchar = CONVERT(VARBINARY, #Varchar),
VarcharWithN = CONVERT(VARBINARY, #VarcharWithN),
NVarchar = CONVERT(VARBINARY, #NVarchar),
NVarcharWithN = CONVERT(VARBINARY, #NVarcharWithN)
Results:
Varchar VarcharWithN NVarchar NVarcharWithN
? ? ? 嗄
Varchar VarcharWithN NVarchar NVarcharWithN
0x3F 0x3F 0x3F00 0xC455
NVARCHAR data type stores 2 bytes for each character while VARCHAR only stores 1 (you can see this on the VARBINARY cast on the 2nd SELECT). Since chinese characters representation need 2 bytes to be stored, you have to use NVARCHAR to store them. If you try to stuff them in a VARCHAR it will be stored as ? and you will lose the original character information. This also happens on the 3rd example, because the literal doesn't have the N so it's converted to VARCHAR before actually assigning the value to the variable.
It's because of this that you need to add the N prefix when typing these characters as literals, so the SQL engine knows that you are typing characters that need 2 byte representation. So if you are doing a comparison against a NVARCHAR column always add the N prefix. You can change the database collation, but it's recommended to always use the proper data type independent of the collation so you don't have problems when using coding on different databases.
If you could explain the reason why you want to omit the N prefix we might address that, although I believe there is no work around in this particular case.
The default for .Net is Unicode, that's why you don't need to specify it.
This is not the case for Sql Manager.
If not specified Sql will assume that you work with asci according to the collation specified in your DB.
Hence, when working from Sql Server you need to use N'
https://sqlquantumleap.com/2018/09/28/native-utf-8-support-in-sql-server-2019-savior-false-prophet-or-both/
I am working on some string manipulation with PATINDEX to fix some incorrect time formatting in XML e.g. (2018-12-20T17:00:00-05:00).
The issue I am having is PATINDEX is finding a match to #Pattern in the #IncorrectMatchIndex string.
You can recreate the issue by running the following:
DECLARE #Pattern nvarchar(36) = '%<EstmatedTime>%T%-%</EstmatedTime>%',
#CorrectMatchIndex nvarchar(100) = '<DiscountedRate>263.34</DiscountedRate><EstmatedTime>2018-12-20T17:00:00-05:00</EstmatedTime></Rate>',
#CorrectMatchIndex2 nvarchar(94) = '<DiscountedRate>263.34</DiscountedRate><EstmatedTime>2018-12-20T17:00:00</EstmatedTime></Rate>',
#IncorrectMatchIndex nvarchar(296) = '<DiscountedRate>263.34</DiscountedRate><EstmatedTime>2018-12-20T17:00:00</EstmatedTime></Rate><Rate><Carrier>FedEx Freight</Carrier><Service>FEDEX_FREIGHT_PRIORITY</Service><PublishedRate>520.6</PublishedRate><DiscountedRate>272.04</DiscountedRate><EstmatedTime>2018-12-18T17:00:00</EstmatedTime>'
SELECT
PATINDEX(#Pattern, #CorrectMatchIndex) AS CorrectMatchIndex,
PATINDEX(#Pattern, #CorrectMatchIndex2) AS CorrectMatchIndex2,
PATINDEX(#Pattern, #IncorrectMatchIndex) AS IncorrectMatchIndex
At a pure guess, I suspect you want:
DECLARE #Pattern nvarchar(300) = '%<EstmatedTime>[1-2][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]:[0-9][0-9]-[0-9][0-9]:[0-9][0-9]</EstmatedTime>%'
This then returns 0 for IncorrectMatchIndex.
Of course, the comments are right, you should really be using XQUERY for this. I can't provide a sample for this, however, as none of the XML data you have supplied it valid XML (for example #CorrectMatchIndex ends with '</Rate>' but that node is never opened).
The #IncorrectMatchIndex string does not contain a match to %<EstmatedTime>%T%-%</EstmatedTime>% as far as I can see. There is no dash between the T and closing </EstmatedTime>
Yes there is. Because there is a second set of <EstimatedTime> tags later in the string, and there most certainly is a '-' character between the first T and the last </EstimatedTime>
I have a varchar column, set as varchar(255), but i cannot query using an = operator.
I know there is data in this set where the field (UOM) = 'PK', but when i query this i get no results. If i query UOM LIKE '%PK%', i get results, but not using a straight equal operator. I have tried changing the datatype to nvarchar, and also tried seeing if there were spaces in the columns throwing it off, but no luck.
Has anyone run into anything like this, and how did you solve? Could the column be corrupted?
Thanks for the replies all! I found this helpful article which showed some hidden ASCII characters in the field. A quick replace statement and we're back to functioning! Thanks again everyone for the quick replies.
Most likely a leading space... try where ltrim(UOM) = 'PK'. Most often, trailing spaces do not affect equality operations, but you could also do where ltrim(rtrim(UOM)) = 'PK'. I wouldn't expect there to be a case sensitivity issue, but keep that in mind when comparing strings as well, and this is where you may want to use the UPPER() or LOWER() methods.
Next, you'll want to start looking for carriage returns, line feeds, tabs, etc.
declare #var varchar(64) = char(10) + --LF
char(13) + --CR
'PK'
select
case when replace(replace(#var,char(10),''),char(13),'') = 'PK' then 1 else 0 end
,case when #var = 'PK' then 1 else 0 end
Naturally, you could just clean your data if this is the case, or continue to us LIKE
declare #test varchar(50)
set #test='sad#fd'
if #test LIKE '%[a-zA-Z0-9 ./,()?''+-]%'
print 'yes'
else
print 'no'
My above code giving yes result as it should give no as I am not allowing '#' in regular expression. Is there anything wrong?
I want to handle this in my stored procedure where string is alpha numeric with specified list of special character allowed. What should I do?
The result is "Yes", because u have an letter s which is matching the condition
to get more clear, try running the below code
declare #test varchar(1000)
set #test='####'
if #test LIKE '%[a-zA-Z0-9 ./,()?''+-]%'
print 'yes'
else
print 'no'
SQL Server doesn't really have native regular expressions1, but what you're trying to achieve can still be done with LIKE by introducing a double negative:
declare #test varchar(50)
set #test='sad#fd'
if #test NOT LIKE '%[^a-zA-Z0-9 ./,()?''+-]%'
print 'yes'
else
print 'no'
% matches any number of characters. ^ inverts a character range. So, now we're asking - is the string any number of characters, then a character not in the set a-zA-Z0-9 ./,()?''+-, then any number of characters? - or, to put it another way, does this string contain any characters outside of the given set of characters?
1You can access a fully featured regex engine from the .NET framework by using the CLR integration. It's one of the usual samples given when talking about CLR integration. But not really needed here.
My field in my SKU table
(BI.dbo.SKU.phl5) is varchar(15)
However below code returns just 3 characters 'Unc' for the null fields in my table while it should return 'Uncategorized'. How to solve that?
ISNULL(SUBSTRING(BI.dbo.SKU.phl5,0,3),'Uncategorized') AS phl1
ISNULL(CAST(SUBSTRING(BI.dbo.SKU.phl5,0,3) AS VARCHAR(13)),'Uncategorized') AS phl1
The size of the return type of SUBSTRING isn't clearly documented that I can find, but the problem is that the type of ISNULL is the type of the first expression, which is clearly coming back as VARCHAR(3) since you are truncating it to 3 characters.
ISNULL docs
Try this
CASE WHEN BI.dbo.SKU.phl5 IS NULL THEN 'Uncategorized'
ELSE SUBSTRING(BI.dbo.SKU.phl5,0,3)
END AS phl1