Differentiate Exponents in T-SQL - sql

In SQL Server 2017 (14.0.2)
Consider the following table:
CREATE TABLE expTest
(
someNumbers [NVARCHAR](10) NULL
)
And let's say you populate the table with some values:
INSERT INTO expTest VALUES('²', '2')
Why does the following SELECT return both rows?
SELECT *
FROM expTest
WHERE someNumbers = '2'
Shouldn't nvarchar realize that '²' is unicode, while '2' is a separate value? How (without using the UNICODE() function) could I identify this data as being nonequivalent?

Here is a db<>fiddle. This shows the following:
Your observation is true even when the values are entered as national character set constants.
The "ASCII" versions of the characters are actually different.
The problem goes away with a case-sensitive collation.
I think the exponent is just being treated as a different "case" of the number, so they are considered the same in a case-insensitive collation.
The comparison is what you expect with a case-sensitive collation.

Related

Reading Unicode strings from SQL Server

I know strings need to be prefixed with N' in SQL Server (2012) INSERT statements to store them as UNICODE but do they have to be retrieved (SELECT statement) in a certain way as well so they are in UNICODE?
I am able to store international strings correctly with N notation but when I run SELECT query to fetch the records back, it comes as question marks. My query is very simple.
SELECT COLUMN1, COLUMN2 FROM TABLE1
I am looking at other possible reasons that may have caused this issue but at least I want to eliminate the SQL statement above. Should it read COLUMN1 and COLUMN2 columns correctly when they both store UNICODE strings using N notation? Do I have to do anything to the statement to tell it they are UNICODE?
Within management studio you should not need to do anything special to display the correct values. Make sure that the columns in your table is defined as Unicode strings NVARCHAR instead of ANSI strings VARCHAR.
The following example demonstrates the concept:
CREATE TABLE UnicodeExample
(
MyUnicodeColumn NVARCHAR(100)
,MYANSIColumn VARCHAR(100)
)
INSERT INTO UnicodeExample
(
MyUnicodeColumn
,MYANSIColumn
)
VALUES
(
N'איש'
,N'איש'
)
SELECT *
FROM UnicodeExample
DROP TABLE UnicodeExample
In the above example the column MyUnicodeColumn is defined as an NVARCHAR(100) and MYANSIColumn is defined as a VARCHAR(100). The query will correctly return the result for MyUnicodeColumn but will return ??? for MYANSIColum.

Considering spaces in a SQL row as null

I was suppose to get all data from the table where the column "Address" is not null
so I made a statement that look like this...
Select * from Table where Address is not null
Unfortunately, there are rows in "Address" column that has spaces so SQL cannot consider it as Null
How can I display rows where Address is not null?
Thanks :)
Most database systems have a NULLIF() function. It was defined together with COALESCE() in the ANSI SQL-99 standard if not earlier. It is implemented in at least SQL Server, Oracle, PostgreSQL, MySQL, SQLite, DB2, Firebird.
Select * from Table where NULLIF(Address,'') is not null
But for me, I like this more
Select * from Table where Address > ''
It kills nulls and empty strings in one go. It will even exclude strings that are made up entirely of spaces ('', ' ', etc). It also retains SARGability.

Oracle varchar2 equivalent in sql server

create table #temp(name nvarchar(10))
insert into #temp values('one')
select * from #temp where name = 'one'
select * from #temp where name = 'one ' --one with space at end
drop table #temp
In the above I have used nvarchar for name.
My requirement is the result should be exist for the first select query, and it should not return for 2nd query. Do not trim the name. Advise me which data type can I use for this in sql server?
Its not the data type that can resolve this issue. You need to see this article:
INF: How SQL Server Compares Strings with Trailing Spaces
SQL Server follows the ANSI/ISO SQL-92 specification (Section 8.2,
, General rules #3) on how to compare strings
with spaces. The ANSI standard requires padding for the character
strings used in comparisons so that their lengths match before
comparing them. The padding directly affects the semantics of WHERE
and HAVING clause predicates and other Transact-SQL string
comparisons. For example, Transact-SQL considers the strings 'abc' and
'abc ' to be equivalent for most comparison operations.
There are several ways to overcome this, one is to use Like.
select * from #temp where name like 'one ' --one with space at end
This will return no result.
You should see this blog post: Testing strings for equality counting trailing spaces by AnthonyBloesch

Unicode characters in Sql table

I am using Sql Server 2008 R2 Enterprise. I am coding an application capable of inserting, updating, deleting and selecting records from a Sql tables. The application is making errors when it comes to the records that contain special characters such as ć, č š, đ and ž.
Here's what happens:
The command:
INSERT INTO Account (Name, Person)
VALUES ('Boris Borenović', 'True')
WHERE Id = '1'
inserts a new record but the Name field is Boris Borenovic, so character ć is changed to c.
The command:
SELECT * FROM Account
WHERE Name = 'Boris Borenović'
returns the correct record, so again the character ć is replaced by c and the record is returned.
Questions:
Is it possible to make Sql Server save the ć and other special characters mentioned earlier?
Is it still possible, if the previous question is resolved, to make Sql be able to return the Boris Borenović record even if the query asks for Boris Borenovic?
So, when saving records I want Sql to save exactly what is given, but when retrieving the records, I want it to be able to ingnore the special characters. Thanks for all the help.
1) Make sure the column is of type nvarchar rather than varchar (or nchar for char)
2) Use N' at the start of string literals containing such strings, e.g. N'Boris Borenović'
3) If you're using a client library (e.g. ADO.Net), it should handle Unicode text, so long as, again, the parameters are marked as being nvarchar/nchar instead of varchar/char
4) If you want to query and ignore accents, then you can add a COLLATE clause to your select. E.g.:
SELECT * FROM Account
WHERE Name = 'Boris Borenovic' COLLATE Latin1_General_CI_AI
Where _CI_AI means Case Insensitive, Accent Insensitive, should return all rows with all variants of the "c" at the end.
5) If the column in the table is part of a UNIQUE/PK constraint, and you need it to contain both "Boris Borenović" and "Boris Borenovic", then add a COLLATE clause to the column definition, but this time use a collation with "_AS" at the end, which says that it's accent sensitive.
To allow SQL Server to store special characters, use nvarchar instead of varchar for the column type.
When retrieving, you can force a accent-insensitve collation so that it ignores the different C's:
WHERE Name = 'Boris Borenović' COLLATE Cyrillic_General_CI_AI
Here, CI stands for Case Insensitive, and AS for Accent Insensitive.
I've faced with the same problem and after some researching:
https://dba.stackexchange.com/questions/139551/how-do-i-set-a-sql-server-unicode-nvarchar-string-to-an-emoji-or-supplementary
What is the difference between varchar and nvarchar?
I altered type of needed fields:
ALTER TABLE [table_name] ALTER COLUMN column_name [nvarchar]
GO
And it works!

Achieving properties of binary and collation at the same time

I have a varchar field in my database which i use for two significantly different things. In one scenario i use it for evaluating with case sensitivity to ensure no duplicates are inserted. To achieve this I've set the comparison to binary. However, I want to be able to search case-insensitively on the same column values. Is there any way I can do this without simply creating a redundant column with collation instead of binary?
CREATE TABLE t_search (value VARCHAR(50) NOT NULL COLLATE UTF8_BIN PRIMARY KEY);
INSERT
INTO t_search
VALUES ('test');
INSERT
INTO t_search
VALUES ('TEST');
SELECT *
FROM t_search
WHERE value = 'test' COLLATE UTF8_GENERAL_CI;
The second query will return both rows.
Note, however, that anything with COLLATE applied to it has the lowest coercibility.
This means that it's value that will be converted to UTF8_GENERAL_CI for the comparision purposes, not the other way round, which means that the index on value will not be used for searching and the condition in the query will be not sargable.
If you need good performance on case-insensitive searching, you should create an additional column with case-insensitive collation, index it and use in the searches.
you can use the COLLATE statement to change the collation on a column in a query. see this manual page for extensive examples.