I just realized that SQL server '=' comparator when used for text comparison is case insensitive. I have a few questions regarding this functionality:
Is this the same for all databases or specific to SQL server?
I have been using the lower function to ensure the text comparison is insensitive till now. Is it still a good idea to follow the same?
How can we do case sensitive comparisons in SQL server?
Why is '=' operator defaulting to case insensitive comparison?
No, case sensitivity has nothing to do with the equals sign.
Case sensitivity is determined by the collation for the database -- see the documentation for details.
Case sensitivity depends only on the collation.
You can specify the collation within each '=' operation
SELECT *
FROM [Table_1] a inner join
[Table_2] b on a.Col1=b.Col2 collate Modern_Spanish_CS_AI
I have been using the lower function to ensure the text comparison is insensitive till now. Is it still a good idea to follow the same?
Absolutely not. You will generally preclude the use of an index if you do this. Plain old = (or < or > or whatever) will either work or not depending on the collating you have chosen. Do not do this "just to be safe". Testing will make sure you've got it right.
The case sensitivity of operations in SQL Server is determined during installation when you set the collation for the database. At that point in time you can choose to install SQL Server as case insensitive (default) or case sensitive.
http://msdn.microsoft.com/en-us/library/aa197951(v=sql.80).aspx
How the comparison is done depends on the collation that you have chosen for the field. If you change the field to use a case sensetive collation, the comparisons will be case sensetive.
By default fields use the collation set for the database, but each field can have it's own collation setting.
Related
Today I viewed some query examples, and I found some string comparisons in the WHERE condition.
The comparison was made using the greater than (>) and less than (<) symbols, is this a possible way to compare strings in SQL? And how does it act? A string less than another one comes before in dictionary order? For example, ball is less than water? And this comparison is case sensitive? For example BALL < water, the uppercase character does affect these comparison?
I've googled for hours but I was not able to find nothing that can drive me out these doubt.
The comparison operators (including < and >) "work" with string values as well as numbers.
For MySQL
By default, string comparisons are not case sensitive and use the current character set. The default is latin1 (cp1252 West European), which also works well for English.
String comparisons will be case sensitive when the characterset collation of the strings being compared is case sensitive, i.e. the name of the character set ends in _cs rather than _ci. There's really no point in repeating all of the information that's available in MySQL Reference Manual here.
MySQL Comparison Operators Reference: http://dev.mysql.com/doc/refman/5.5/en/comparison-operators.html
More information about MySQL charactersets/collations: http://dev.mysql.com/doc/refman/5.5/en/charset.html
To answer the specific questions you asked:
Q: is this a possible way to compare strings in SQL?
A: Yes, in both MySQL and SQL Server
Q: and how does it act?
A: A comparison operator returns a boolean, either TRUE, FALSE or NULL.
Q: a string less than another one comes before in dictionary order? For example, ball is less than water?
A: Yes, because 'b' comes before 'w' in the characteset collation, the expression
'ball' < 'water'
will return TRUE. (This depends on the characterset and on the collation.
Q: and this comparison is case sensitive?
A: Whether a particular comparison is case sensitive or not depends on the database server; by default, both SQL Server and MySQL are case insensitive.
In MySQL it is possible to make string comparisons by specifying a characterset collation that is case sensitive (the characterset name will end in _cs rather than _ci)
Q: For example BALL < water, the upper case character does affect these comparison?
A: By default, in both SQL Server and MySQL, the expression
'BALL' < 'water'
would return TRUE.
In Microsoft SQL Server, collation determines to dictionary rules for comparing and sorting character data with regards to:
case sensitivity
accent sensitivity
width sensitivity
kana sensitivity
SQL Server also includes binary collations where comparison and sorting is done by binary code point rather than dictionary rules. Once can choose from many collations according to the desired sensitivity behavior. The default collation selected for Latin-based language locales during SQL installation is case insensitive and accent sensitive.
Collation is specified at the instance (during installation), database, and column level. Instance collation determines the collation of Instance-level objects like logins and database names as well as identifiers for variables, GOTO labels and temporary tables. Database collation (same as instance collation by default), determines the collation of database identifiers like table and column names as well as literal expressions. Column collation (same as database collation by default) determines the collation of that column.
It is certainly possible compare strings using '<', '>', '<>', ,LIKE, BETWEEN, etc.
if you are using Mybatis or XML based technique to execute SQL query, you have to use <![CDATA[your_symbol-here]]> to avoid that issue.
'ball' <![CDATA[<]]> 'water'
Look at the interesting output by SQL Server. The code was to compare the dates, it works fine all the time, but fails when year changes.
SELECT TOP 1 'The ResultSet should be empty' FROM SYS.columns
WHERE '01/04/2023' < '07/11/2022'
I need to create a WHERE-IN query (using Oracle) that is case insensitive. I've tried this way:
select user from users where lower(user) in lower('userNaMe1', 'useRNAmE2');
but I get ORA-00909: invalid number of arguments
The list is dynamically generated in my Spring app. That's why I can't add lower() to every single list's value.
Is there any other way to achieve it?
lower() takes a single argument, so you can use:
where lower(user) in (lower('userNaMe1'), lower('useRNAmE2'))
You could also express this using regular expressions (regexp_like() accepts a case sensitivity argument) if you prefer:
where regexp_like(user, '^(userNaMe1|useRNAmE2)$', 'i')
There is another more drastic approach, and is to make your session or your searching in the database case insensitive.
You can find how to do it in this answer:
Case insensitive searching in Oracle
When ordering data in sql developer, why does the data with allow lowercase letters appear last?
for example
Adam, Ben, Charlotte, Matthew, emily
Why isn't it: Adam, Ben, Charlotte, emily, Matthew?
I don't necessarily want the answer to just changing it but why does it happen? Is there a setting that is ticked to make it happen or does it do it by default unless you write a statement for it not to do it?
Ordering in a database uses a collation. Typically, the collation is specified at the database level, but can be at the table field level and the query level.
A collation is a ordering for the characters used by a culture in a writing system script. If the human writing system itself wouldn't define an ordering between two characters, the collation would likely fall back to a lexicographic ordering based on the character set of the collation. (Humans expect consistency even in the absence of rules that they are aware of.)
Many systems of collations include both case sensitive and case insensitive collations as well as accent sensitive and accent insensitive collations. (So, as many as 2 x 2 collations for the same culture and character set.)
So, somewhere your system has specified case sensitivity. You could order for your user (yourself, in this case?) by the preferred culture, case sensitivity, and accent sensitivity. But choose from collations for the same character set as the data because character set conversions can be lossy unless the source is a subset of the target.
See PL/SQL's documentation on collations.
Today I viewed some query examples, and I found some string comparisons in the WHERE condition.
The comparison was made using the greater than (>) and less than (<) symbols, is this a possible way to compare strings in SQL? And how does it act? A string less than another one comes before in dictionary order? For example, ball is less than water? And this comparison is case sensitive? For example BALL < water, the uppercase character does affect these comparison?
I've googled for hours but I was not able to find nothing that can drive me out these doubt.
The comparison operators (including < and >) "work" with string values as well as numbers.
For MySQL
By default, string comparisons are not case sensitive and use the current character set. The default is latin1 (cp1252 West European), which also works well for English.
String comparisons will be case sensitive when the characterset collation of the strings being compared is case sensitive, i.e. the name of the character set ends in _cs rather than _ci. There's really no point in repeating all of the information that's available in MySQL Reference Manual here.
MySQL Comparison Operators Reference: http://dev.mysql.com/doc/refman/5.5/en/comparison-operators.html
More information about MySQL charactersets/collations: http://dev.mysql.com/doc/refman/5.5/en/charset.html
To answer the specific questions you asked:
Q: is this a possible way to compare strings in SQL?
A: Yes, in both MySQL and SQL Server
Q: and how does it act?
A: A comparison operator returns a boolean, either TRUE, FALSE or NULL.
Q: a string less than another one comes before in dictionary order? For example, ball is less than water?
A: Yes, because 'b' comes before 'w' in the characteset collation, the expression
'ball' < 'water'
will return TRUE. (This depends on the characterset and on the collation.
Q: and this comparison is case sensitive?
A: Whether a particular comparison is case sensitive or not depends on the database server; by default, both SQL Server and MySQL are case insensitive.
In MySQL it is possible to make string comparisons by specifying a characterset collation that is case sensitive (the characterset name will end in _cs rather than _ci)
Q: For example BALL < water, the upper case character does affect these comparison?
A: By default, in both SQL Server and MySQL, the expression
'BALL' < 'water'
would return TRUE.
In Microsoft SQL Server, collation determines to dictionary rules for comparing and sorting character data with regards to:
case sensitivity
accent sensitivity
width sensitivity
kana sensitivity
SQL Server also includes binary collations where comparison and sorting is done by binary code point rather than dictionary rules. Once can choose from many collations according to the desired sensitivity behavior. The default collation selected for Latin-based language locales during SQL installation is case insensitive and accent sensitive.
Collation is specified at the instance (during installation), database, and column level. Instance collation determines the collation of Instance-level objects like logins and database names as well as identifiers for variables, GOTO labels and temporary tables. Database collation (same as instance collation by default), determines the collation of database identifiers like table and column names as well as literal expressions. Column collation (same as database collation by default) determines the collation of that column.
It is certainly possible compare strings using '<', '>', '<>', ,LIKE, BETWEEN, etc.
if you are using Mybatis or XML based technique to execute SQL query, you have to use <![CDATA[your_symbol-here]]> to avoid that issue.
'ball' <![CDATA[<]]> 'water'
Look at the interesting output by SQL Server. The code was to compare the dates, it works fine all the time, but fails when year changes.
SELECT TOP 1 'The ResultSet should be empty' FROM SYS.columns
WHERE '01/04/2023' < '07/11/2022'
How do I construct a SQL query (MS SQL Server) where the "where" clause is case-insensitive?
SELECT * FROM myTable WHERE myField = 'sOmeVal'
I want the results to come back ignoring the case
In the default configuration of a SQL Server database, string comparisons are case-insensitive. If your database overrides this setting (through the use of an alternate collation), then you'll need to specify what sort of collation to use in your query.
SELECT * FROM myTable WHERE myField = 'sOmeVal' COLLATE SQL_Latin1_General_CP1_CI_AS
Note that the collation I provided is just an example (though it will more than likely function just fine for you). A more thorough outline of SQL Server collations can be found here.
Usually, string comparisons are case-insensitive. If your database is configured to case sensitive collation, you need to force to use a case insensitive one:
SELECT balance FROM people WHERE email = 'billg#microsoft.com'
COLLATE SQL_Latin1_General_CP1_CI_AS
I found another solution elsewhere; that is, to use
upper(#yourString)
but everyone here is saying that, in SQL Server, it doesn't matter because it's ignoring case anyway? I'm pretty sure our database is case-sensitive.
The top 2 answers (from Adam Robinson and Andrejs Cainikovs) are kinda, sorta correct, in that they do technically work, but their explanations are wrong and so could be misleading in many cases. For example, while the SQL_Latin1_General_CP1_CI_AS collation will work in many cases, it should not be assumed to be the appropriate case-insensitive collation. In fact, given that the O.P. is working in a database with a case-sensitive (or possibly binary) collation, we know that the O.P. isn't using the collation that is the default for so many installations (especially any installed on an OS using US English as the language): SQL_Latin1_General_CP1_CI_AS. Sure, the O.P. could be using SQL_Latin1_General_CP1_CS_AS, but when working with VARCHAR data, it is important to not change the code page as it could lead to data loss, and that is controlled by the locale / culture of the collation (i.e. Latin1_General vs French vs Hebrew etc). Please see point # 9 below.
The other four answers are wrong to varying degrees.
I will clarify all of the misunderstandings here so that readers can hopefully make the most appropriate / efficient choices.
Do not use UPPER(). That is completely unnecessary extra work. Use a COLLATE clause. A string comparison needs to be done in either case, but using UPPER() also has to check, character by character, to see if there is an upper-case mapping, and then change it. And you need to do this on both sides. Adding COLLATE simply directs the processing to generate the sort keys using a different set of rules than it was going to by default. Using COLLATE is definitely more efficient (or "performant", if you like that word :) than using UPPER(), as proven in this test script (on PasteBin).
There is also the issue noted by #Ceisc on #Danny's answer:
In some languages case conversions do not round-trip. i.e. LOWER(x) != LOWER(UPPER(x)).
The Turkish upper-case "İ" is the common example.
No, collation is not a database-wide setting, at least not in this context. There is a database-level default collation, and it is used as the default for altered and newly created columns that do not specify the COLLATE clause (which is likely where this common misconception comes from), but it does not impact queries directly unless you are comparing string literals and variables to other string literals and variables, or you are referencing database-level meta-data.
No, collation is not per query.
Collations are per predicate (i.e. something operand something) or expression, not per query. And this is true for the entire query, not just the WHERE clause. This covers JOINs, GROUP BY, ORDER BY, PARTITION BY, etc.
No, do not convert to VARBINARY (e.g.convert(varbinary, myField) = convert(varbinary, 'sOmeVal')) for the following reasons:
that is a binary comparison, which is not case-insensitive (which is what this question is asking for)
if you do want a binary comparison, use a binary collation. Use one that ends with _BIN2 if you are using SQL Server 2008 or newer, else you have no choice but to use one that ends with _BIN. If the data is NVARCHAR then it doesn't matter which locale you use as they are all the same in that case, hence Latin1_General_100_BIN2 always works. If the data is VARCHAR, you must use the same locale that the data is currently in (e.g. Latin1_General, French, Japanese_XJIS, etc) because the locale determines the code page that is used, and changing code pages can alter the data (i.e. data loss).
using a variable-length datatype without specifying the size will rely on the default size, and there are two different defaults depending on the context where the datatype is being used. It is either 1 or 30 for string types. When used with CONVERT() it will use the 30 default value. The danger is, if the string can be over 30 bytes, it will get silently truncated and you will likely get incorrect results from this predicate.
Even if you want a case-sensitive comparison, binary collations are not case-sensitive (another very common misconception).
No, LIKE is not always case-sensitive. It uses the collation of the column being referenced, or the collation of the database if a variable is compared to a string literal, or the collation specified via the optional COLLATE clause.
LCASE is not a SQL Server function. It appears to be either Oracle or MySQL. Or possibly Visual Basic?
Since the context of the question is comparing a column to a string literal, neither the collation of the instance (often referred to as "server") nor the collation of the database have any direct impact here. Collations are stored per each column, and each column can have a different collation, and those collations don't need to be the same as the database's default collation or the instance's collation. Sure, the instance collation is the default for what a newly created database will use as its default collation if the COLLATE clause wasn't specified when creating the database. And likewise, the database's default collation is what an altered or newly created column will use if the COLLATE clause wasn't specified.
You should use the case-insensitive collation that is otherwise the same as the collation of the column. Use the following query to find the column's collation (change the table's name and schema name):
SELECT col.*
FROM sys.columns col
WHERE col.[object_id] = OBJECT_ID(N'dbo.TableName')
AND col.[collation_name] IS NOT NULL;
Then just change the _CS to be _CI. So, Latin1_General_100_CS_AS would become Latin1_General_100_CI_AS.
If the column is using a binary collation (ending in _BIN or _BIN2), then find a similar collation using the following query:
SELECT *
FROM sys.fn_helpcollations() col
WHERE col.[name] LIKE N'{CurrentCollationMinus"_BIN"}[_]CI[_]%';
For example, assuming the column is using Japanese_XJIS_100_BIN2, do this:
SELECT *
FROM sys.fn_helpcollations() col
WHERE col.[name] LIKE N'Japanese_XJIS_100[_]CI[_]%';
For more info on collations, encodings, etc, please visit: Collations Info
No, only using LIKE will not work. LIKE searches values matching exactly your given pattern. In this case LIKE would find only the text 'sOmeVal' and not 'someval'.
A pracitcable solution is using the LCASE() function. LCASE('sOmeVal') gets the lowercase string of your text: 'someval'. If you use this function for both sides of your comparison, it works:
SELECT * FROM myTable WHERE LCASE(myField) LIKE LCASE('sOmeVal')
The statement compares two lowercase strings, so that your 'sOmeVal' will match every other notation of 'someval' (e.g. 'Someval', 'sOMEVAl' etc.).
You can force the case sensitive, casting to a varbinary like that:
SELECT * FROM myTable
WHERE convert(varbinary, myField) = convert(varbinary, 'sOmeVal')
What database are you on? With MS SQL Server, it's a database-wide setting, or you can over-ride it per-query with the COLLATE keyword.