Achieving properties of binary and collation at the same time - sql

I have a varchar field in my database which i use for two significantly different things. In one scenario i use it for evaluating with case sensitivity to ensure no duplicates are inserted. To achieve this I've set the comparison to binary. However, I want to be able to search case-insensitively on the same column values. Is there any way I can do this without simply creating a redundant column with collation instead of binary?

CREATE TABLE t_search (value VARCHAR(50) NOT NULL COLLATE UTF8_BIN PRIMARY KEY);
INSERT
INTO t_search
VALUES ('test');
INSERT
INTO t_search
VALUES ('TEST');
SELECT *
FROM t_search
WHERE value = 'test' COLLATE UTF8_GENERAL_CI;
The second query will return both rows.
Note, however, that anything with COLLATE applied to it has the lowest coercibility.
This means that it's value that will be converted to UTF8_GENERAL_CI for the comparision purposes, not the other way round, which means that the index on value will not be used for searching and the condition in the query will be not sargable.
If you need good performance on case-insensitive searching, you should create an additional column with case-insensitive collation, index it and use in the searches.

you can use the COLLATE statement to change the collation on a column in a query. see this manual page for extensive examples.

Related

Differentiate Exponents in T-SQL

In SQL Server 2017 (14.0.2)
Consider the following table:
CREATE TABLE expTest
(
someNumbers [NVARCHAR](10) NULL
)
And let's say you populate the table with some values:
INSERT INTO expTest VALUES('²', '2')
Why does the following SELECT return both rows?
SELECT *
FROM expTest
WHERE someNumbers = '2'
Shouldn't nvarchar realize that '²' is unicode, while '2' is a separate value? How (without using the UNICODE() function) could I identify this data as being nonequivalent?
Here is a db<>fiddle. This shows the following:
Your observation is true even when the values are entered as national character set constants.
The "ASCII" versions of the characters are actually different.
The problem goes away with a case-sensitive collation.
I think the exponent is just being treated as a different "case" of the number, so they are considered the same in a case-insensitive collation.
The comparison is what you expect with a case-sensitive collation.

How to determine if SQLite column created with COLLATE NOCASE

A column in a SQLite db must be COLLATE NOCASE. I assume there is no way to add that capability to an existing table, so I'm prepare to recreate the table with it. How can I determine if the existing column is COLLATE NOCASE in order to avoid recreating the table every time it is opened?
How can I determine if the existing column is COLLATE NOCASE
The query
SELECT sql FROM sqlite_master WHERE type='table' AND tbl_name='my_table'
will give you the CREATE TABLE statement for that table. You could inspect the DDL to determine if the column is already defined as COLLATE NOCASE.
You might not need to do that at all if it is sufficient to change the collations in the query. I mean you can just overwrite it in the query. It won't affect constraints or index, but depending on your use case, it might be good enough.
To be clear: the collate clause in the table definition is just a default for the queries. You can overwrite this in the queries.
e.g.
WHERE column = 'term' COLLATE NOCASE
or
ORDER BY column COLLATE NOCASE
However, not that SQLite's LIKE doesn't honor collate clause (use pragma case_sensitive_like instead).
The easiest and most general way is store a version number somewhere (in another table, or with PRAGMA user_version).
If you want to check the column itself, use a query with a comparison that is affected by the column's collation:
SELECT Col = upper(Col)
FROM (SELECT Col
FROM MyTable
WHERE 0 -- don't actually return any row from MyTable
UNION ALL
SELECT 'x' -- lowercase; same collation as Col
);

Unicode in SQL Server unique constraints

Consider the following script - the second INSERT statement throws a primary key violation.
BEGIN TRAN
CREATE TABLE UnicodeQuestion
(
UnicodeCol NVARCHAR(100)
COLLATE Latin1_General_CI_AI
)
CREATE UNIQUE INDEX UX_UnicodeCol
ON UnicodeQuestion ( UnicodeCol )
INSERT INTO UnicodeQuestion (UnicodeCol) VALUES (N'ae')
INSERT INTO UnicodeQuestion (UnicodeCol) VALUES (N'æ')
ROLLBACK
As I understand it, if I want to have my index treat these values separately, I need to use a binary collation. But there are many binary collations, and they have individual cultures in their names! I don't want culture-sensitive treatment...
Which collation should I use when storing arbitrary Unicode data in nvarchar columns?
For Unicode data it is irrelevant what binary collation you choose.
For Unicode data types, data comparisons are based on the Unicode
code points. For binary collations on Unicode data types, the locale
is not considered in data sorts. For example, Latin_1_General_BIN and
Japanese_BIN yield identical sorting results when used on Unicode
data.
The reason for having locale specific BIN collations is that this determines the code page used when dealing with non Unicode data.

Is the LIKE operator case-sensitive with SQL Server?

In the documentation about the LIKE operator, nothing is told about the case-sensitivity of it. Is it? How to enable/disable it?
I am querying varchar(n) columns, on an Microsoft SQL Server 2005 installation, if that matters.
It is not the operator that is case sensitive, it is the column itself.
When a SQL Server installation is performed a default collation is chosen to the instance. Unless explicitly mentioned otherwise (check the collate clause bellow) when a new database is created it inherits the collation from the instance and when a new column is created it inherits the collation from the database it belongs.
A collation like sql_latin1_general_cp1_ci_as dictates how the content of the column should be treated. CI stands for case insensitive and AS stands for accent sensitive.
A complete list of collations is available at https://msdn.microsoft.com/en-us/library/ms144250(v=sql.105).aspx
(a) To check a instance collation
select serverproperty('collation')
(b) To check a database collation
select databasepropertyex('databasename', 'collation') sqlcollation
(c) To create a database using a different collation
create database exampledatabase
collate sql_latin1_general_cp1_cs_as
(d) To create a column using a different collation
create table exampletable (
examplecolumn varchar(10) collate sql_latin1_general_cp1_ci_as null
)
(e) To modify a column collation
alter table exampletable
alter column examplecolumn varchar(10) collate sql_latin1_general_cp1_ci_as null
It is possible to change a instance and database collations but it does not affect previously created objects.
It is also possible to change a column collation on the fly for string comparison, but this is highly unrecommended in a production environment because it is extremely costly.
select
column1 collate sql_latin1_general_cp1_ci_as as column1
from table1
All this talk about collation seem a bit over-complicated. Why not just use something like:
IF UPPER(##VERSION) NOT LIKE '%AZURE%'
Then your check is case insensitive whatever the collation
If you want to achieve a case sensitive search without changing the collation of the column / database / server, you can always use the COLLATE clause, e.g.
USE tempdb;
GO
CREATE TABLE dbo.foo(bar VARCHAR(32) COLLATE Latin1_General_CS_AS);
GO
INSERT dbo.foo VALUES('John'),('john');
GO
SELECT bar FROM dbo.foo
WHERE bar LIKE 'j%';
-- 1 row
SELECT bar FROM dbo.foo
WHERE bar COLLATE Latin1_General_CI_AS LIKE 'j%';
-- 2 rows
GO
DROP TABLE dbo.foo;
Works the other way, too, if your column / database / server is case sensitive and you don't want a case sensitive search, e.g.
USE tempdb;
GO
CREATE TABLE dbo.foo(bar VARCHAR(32) COLLATE Latin1_General_CI_AS);
GO
INSERT dbo.foo VALUES('John'),('john');
GO
SELECT bar FROM dbo.foo
WHERE bar LIKE 'j%';
-- 2 rows
SELECT bar FROM dbo.foo
WHERE bar COLLATE Latin1_General_CS_AS LIKE 'j%';
-- 1 row
GO
DROP TABLE dbo.foo;
You have an option to define collation order at the time of defining your table. If you define a case-sensitive order, your LIKE operator will behave in a case-sensitive way; if you define a case-insensitive collation order, the LIKE operator will ignore character case as well:
CREATE TABLE Test (
CI_Str VARCHAR(15) COLLATE Latin1_General_CI_AS -- Case-insensitive
, CS_Str VARCHAR(15) COLLATE Latin1_General_CS_AS -- Case-sensitive
);
Here is a quick demo on sqlfiddle showing the results of collation order on searches with LIKE.
The like operator takes two strings. These strings have to have compatible collations, which is explained here.
In my opinion, things then get complicated. The following query returns an error saying that the collations are incompatible:
select *
from INFORMATION_SCHEMA.TABLES
where 'abc' COLLATE SQL_Latin1_General_CP1_CI_AS like 'ABC' COLLATE SQL_Latin1_General_CP1_CS_AS
On a random machine here, the default collation is SQL_Latin1_General_CP1_CI_AS. The following query is successful, but returns no rows:
select *
from INFORMATION_SCHEMA.TABLES
where 'abc' like 'ABC' COLLATE SQL_Latin1_General_CP1_CS_AS
The values "abc" and "ABC" do not match in a case-sensitve world.
In other words, there is a difference between having no collation and using the default collation. When one side has no collation, then it is "assigned" an explicit collation from the other side.
(The results are the same when the explicit collation is on the left.)
Try running,
SELECT SERVERPROPERTY('COLLATION')
Then find out if your collation is case sensitive or not.
You can change from the property of every item.
You can easy change collation in Microsoft SQL Server Management studio.
right click table -> design.
choose your column, scroll down i column properties to Collation.
Set your sort preference by check "Case Sensitive"

Unicode characters in Sql table

I am using Sql Server 2008 R2 Enterprise. I am coding an application capable of inserting, updating, deleting and selecting records from a Sql tables. The application is making errors when it comes to the records that contain special characters such as ć, č š, đ and ž.
Here's what happens:
The command:
INSERT INTO Account (Name, Person)
VALUES ('Boris Borenović', 'True')
WHERE Id = '1'
inserts a new record but the Name field is Boris Borenovic, so character ć is changed to c.
The command:
SELECT * FROM Account
WHERE Name = 'Boris Borenović'
returns the correct record, so again the character ć is replaced by c and the record is returned.
Questions:
Is it possible to make Sql Server save the ć and other special characters mentioned earlier?
Is it still possible, if the previous question is resolved, to make Sql be able to return the Boris Borenović record even if the query asks for Boris Borenovic?
So, when saving records I want Sql to save exactly what is given, but when retrieving the records, I want it to be able to ingnore the special characters. Thanks for all the help.
1) Make sure the column is of type nvarchar rather than varchar (or nchar for char)
2) Use N' at the start of string literals containing such strings, e.g. N'Boris Borenović'
3) If you're using a client library (e.g. ADO.Net), it should handle Unicode text, so long as, again, the parameters are marked as being nvarchar/nchar instead of varchar/char
4) If you want to query and ignore accents, then you can add a COLLATE clause to your select. E.g.:
SELECT * FROM Account
WHERE Name = 'Boris Borenovic' COLLATE Latin1_General_CI_AI
Where _CI_AI means Case Insensitive, Accent Insensitive, should return all rows with all variants of the "c" at the end.
5) If the column in the table is part of a UNIQUE/PK constraint, and you need it to contain both "Boris Borenović" and "Boris Borenovic", then add a COLLATE clause to the column definition, but this time use a collation with "_AS" at the end, which says that it's accent sensitive.
To allow SQL Server to store special characters, use nvarchar instead of varchar for the column type.
When retrieving, you can force a accent-insensitve collation so that it ignores the different C's:
WHERE Name = 'Boris Borenović' COLLATE Cyrillic_General_CI_AI
Here, CI stands for Case Insensitive, and AS for Accent Insensitive.
I've faced with the same problem and after some researching:
https://dba.stackexchange.com/questions/139551/how-do-i-set-a-sql-server-unicode-nvarchar-string-to-an-emoji-or-supplementary
What is the difference between varchar and nvarchar?
I altered type of needed fields:
ALTER TABLE [table_name] ALTER COLUMN column_name [nvarchar]
GO
And it works!