UPPER() and LOWER() not required? - sql

For a while I thought, in order for the WHERE criteria to be evaluated correctly, I need to account for case sensitivity. I would use UPPER() and LOWER() when case didn't matter. However, I am finding the below queries produce the same result.
SELECT * FROM ATable WHERE UPPER(part) = 'SOMEPARTNAME'
SELECT * FROM ATable WHERE part = 'SOMEPARTNAME'
SELECT * FROM ATable WHERE part = 'somepartname'
SQL Case Sensitive String Compare explains to use case-sensitive collations. Is this the only way to force case sensitivity? Also, if you had a case-insensitive collation when would UPPER() and LOWER() be necessary?
Thanks for help.

The common SQL Server default of a case-insensitive collation means that UPPER() and LOWER() are not required when comparing strings.
In fact an expression such as
SELECT * FROM Table WHERE UPPER(part) = 'SOMEPARTNAME'
is also non-sargable i.e won't use available indexes, due to the function applied to the part column on the left hand side of the comparison.

this query below produces CASE SENSITIVE search:
SELECT Column1
FROM Table1
WHERE Column1 COLLATE Latin1_General_CS_AS = 'casesearch'
UPPER() and LOWER() are only functions to change the case of the letter so if you case-insensitive collation, they are only use after the SELECT Keyword:
SELECT UPPER('qwerty'), LOWER('Dog')
returns
QWERTY, dog

Related

Case insensitive search without using a function in the where clause

Any way to make a case insensitive without using a function in the where clause?
Please specify the database you are talking about when/if you reply. I am aware that MySQL is already case insensitive by default. What about Oracle or MSSQL or HANA?
select * from mytable WHERE upper(fieldname) = 'VALUE'
collate SQL_Latin1_General_CP1_CS_AS.
Default Collation is SQL_Latin1_General_CP1_CI_AS which is case insensitive. And if we need to make it case sensitive, then adding COLLATE Latin1_General_CS_AS makes the search case sensitive.
Query
select * from [mytable]
where [fieldname] = 'VALUE' collate SQL_Latin1_General_CP1_CS_AS;
Find a demo here
Since your question is tagged Oracle, I will provide a solution which works in Oracle.
You can set these session parameters for case insensitive searching
SQL> alter session set NLS_COMP=ANSI;
SQL> alter session set NLS_SORT=BINARY_CI;
SQL> select 1 from DUAL where 'abc' = 'ABC';
1
----------
1
Read more at Linguistic Sorting and String Searching
as #mathguy points out,
ALTER SESSION SET NLS_COMP=LINGUISTIC;
is more common than using ANSI
Making the column upper (or lower) case as you are showing in order to compare it, is the standard way of making a case insensitive comparision. (UPPER and LOWER are functions defined in the SQL standard.)
If you don't want to apply a function on the column, then you can of course write a recursive query to generate all upper/lower case permutations of the value ('VALUE', 'vALUE', 'VaLUE', ..., 'value') and check whether your column value is in this set. Standard SQL provides the SUBSTRING function for accessing substrings (e.g. the nth letter) and CHAR_LENGTH for getting the string's length.
It depends on the DBMS you are using and its version to what extent the standard is supported. In Oracle for example it's SUBSTR instead of SUBSTRING and LENGTH instead of CHAR_LENGTH. MySQL on the other hand features both SUBSTRING and CHAR_LENGTH directly, but only supports recursive queries as of version 8.0.
this will work:
SELECT *
FROM mytable
WHERE REGEXP_LIKE (column_name, 'value', 'i');
Oracle 12c answer
select * from mytable WHERE fieldname='Value' collate binary_ci
SAP HANA does not seem to have a way other than using upper or lower.
SQL Server and MySQL do not distinguish between upper and lower case letters—they are case-insensitive by default.
One could use CONTAINS Function. For example, Microsoft SQL Server query:
SELECT *
FROM TableName
WHERE ColumnName LIKE 'Abc%'
Maybe written in SAP HANA as:
SELECT *
FROM TableName
WHERE CONTAINS(ColumnName,'Abc%');
https://help.sap.com/viewer/05c9edaee7fe4d28ab3627d0b1583df6/2021_01_QRC/en-US/b45ff4c0e9ab4ba7a9e18a2552adeb3d.html

Case insensitive searching in Oracle

The default behaviour of LIKE and the other comparison operators, = etc is case-sensitive.
Is it possible make them case-insensitive?
There are 3 main ways to perform a case-insensitive search in Oracle without using full-text indexes.
Ultimately what method you choose is dependent on your individual circumstances; the main thing to remember is that to improve performance you must index correctly for case-insensitive searching.
1. Case your column and your string identically.
You can force all your data to be the same case by using UPPER() or LOWER():
select * from my_table where upper(column_1) = upper('my_string');
or
select * from my_table where lower(column_1) = lower('my_string');
If column_1 is not indexed on upper(column_1) or lower(column_1), as appropriate, this may force a full table scan. In order to avoid this you can create a function-based index.
create index my_index on my_table ( lower(column_1) );
If you're using LIKE then you have to concatenate a % around the string you're searching for.
select * from my_table where lower(column_1) LIKE lower('my_string') || '%';
This SQL Fiddle demonstrates what happens in all these queries. Note the Explain Plans, which indicate when an index is being used and when it isn't.
2. Use regular expressions.
From Oracle 10g onwards REGEXP_LIKE() is available. You can specify the _match_parameter_ 'i', in order to perform case-insensitive searching.
In order to use this as an equality operator you must specify the start and end of the string, which is denoted by the carat and the dollar sign.
select * from my_table where regexp_like(column_1, '^my_string$', 'i');
In order to perform the equivalent of LIKE, these can be removed.
select * from my_table where regexp_like(column_1, 'my_string', 'i');
Be careful with this as your string may contain characters that will be interpreted differently by the regular expression engine.
This SQL Fiddle shows you the same example output except using REGEXP_LIKE().
3. Change it at the session level.
The NLS_SORT parameter governs the collation sequence for ordering and the various comparison operators, including = and LIKE. You can specify a binary, case-insensitive, sort by altering the session. This will mean that every query performed in that session will perform case-insensitive parameters.
alter session set nls_sort=BINARY_CI
There's plenty of additional information around linguistic sorting and string searching if you want to specify a different language, or do an accent-insensitive search using BINARY_AI.
You will also need to change the NLS_COMP parameter; to quote:
The exact operators and query clauses that obey the NLS_SORT parameter
depend on the value of the NLS_COMP parameter. If an operator or
clause does not obey the NLS_SORT value, as determined by NLS_COMP,
the collation used is BINARY.
The default value of NLS_COMP is BINARY; but, LINGUISTIC specifies that Oracle should pay attention to the value of NLS_SORT:
Comparisons for all SQL operations in the WHERE clause and in PL/SQL
blocks should use the linguistic sort specified in the NLS_SORT
parameter. To improve the performance, you can also define a
linguistic index on the column for which you want linguistic
comparisons.
So, once again, you need to alter the session
alter session set nls_comp=LINGUISTIC
As noted in the documentation you may want to create a linguistic index to improve performance
create index my_linguistc_index on my_table
(NLSSORT(column_1, 'NLS_SORT = BINARY_CI'));
Since 10gR2, Oracle allows to fine-tune the behaviour of string comparisons by setting the NLS_COMP and NLS_SORT session parameters:
SQL> SET HEADING OFF
SQL> SELECT *
2 FROM NLS_SESSION_PARAMETERS
3 WHERE PARAMETER IN ('NLS_COMP', 'NLS_SORT');
NLS_SORT
BINARY
NLS_COMP
BINARY
SQL>
SQL> SELECT CASE WHEN 'abc'='ABC' THEN 1 ELSE 0 END AS GOT_MATCH
2 FROM DUAL;
0
SQL>
SQL> ALTER SESSION SET NLS_COMP=LINGUISTIC;
Session altered.
SQL> ALTER SESSION SET NLS_SORT=BINARY_CI;
Session altered.
SQL>
SQL> SELECT *
2 FROM NLS_SESSION_PARAMETERS
3 WHERE PARAMETER IN ('NLS_COMP', 'NLS_SORT');
NLS_SORT
BINARY_CI
NLS_COMP
LINGUISTIC
SQL>
SQL> SELECT CASE WHEN 'abc'='ABC' THEN 1 ELSE 0 END AS GOT_MATCH
2 FROM DUAL;
1
You can also create case insensitive indexes:
create index
nlsci1_gen_person
on
MY_PERSON
(NLSSORT
(PERSON_LAST_NAME, 'NLS_SORT=BINARY_CI')
)
;
This information was taken from Oracle case insensitive searches. The article mentions REGEXP_LIKE but it seems to work with good old = as well.
In versions older than 10gR2 it can't really be done and the usual approach, if you don't need accent-insensitive search, is to just UPPER() both the column and the search expression.
maybe you can try using
SELECT user_name
FROM user_master
WHERE upper(user_name) LIKE '%ME%'
From Oracle 12c R2 you could use COLLATE operator:
The COLLATE operator determines the collation for an expression. This operator enables you to override the collation that the database would have derived for the expression using standard collation derivation rules.
The COLLATE operator takes one argument, collation_name, for which you can specify a named collation or pseudo-collation. If the collation name contains a space, then you must enclose the name in double quotation marks.
Demo:
CREATE TABLE tab1(i INT PRIMARY KEY, name VARCHAR2(100));
INSERT INTO tab1(i, name) VALUES (1, 'John');
INSERT INTO tab1(i, name) VALUES (2, 'Joe');
INSERT INTO tab1(i, name) VALUES (3, 'Billy');
--========================================================================--
SELECT /*csv*/ *
FROM tab1
WHERE name = 'jOHN' ;
-- no rows selected
SELECT /*csv*/ *
FROM tab1
WHERE name COLLATE BINARY_CI = 'jOHN' ;
/*
"I","NAME"
1,"John"
*/
SELECT /*csv*/ *
FROM tab1
WHERE name LIKE 'j%';
-- no rows selected
SELECT /*csv*/ *
FROM tab1
WHERE name COLLATE BINARY_CI LIKE 'j%';
/*
"I","NAME"
1,"John"
2,"Joe"
*/
db<>fiddle demo
The COLLATE operator also works if you put it at the end of the expression, and that seems cleaner to me.
So you can use this:
WHERE name LIKE 'j%' COLLATE BINARY_CI
instead of this:
WHERE name COLLATE BINARY_CI LIKE 'j%'
Anyhow, I like the COLLATE operator solution for the following reasons:
you put it only once in the expression and you don't need to worry about multiple UPPER or LOWER, and where to put them
it is isolated to the exact statement and expression where you need it, unlike ALTER SESSION solution that makes it applicable to everything. And your query will work consistently regardless of the DB or session NLS_SORT setting.
select user_name
from my_table
where nlssort(user_name, 'NLS_SORT = Latin_CI') = nlssort('%AbC%', 'NLS_SORT = Latin_CI')
you can do something like that:
where regexp_like(name, 'string$', 'i');

How to find rows that have a value that contains a lowercase letter

I'm looking for an SQL query that gives me all rows where ColumnX contains any lowercase letter (e.g. "1234aaaa5789"). Same for uppercase.
SELECT * FROM my_table
WHERE UPPER(some_field) != some_field
This should work with funny characters like åäöøüæï. You might need to use a language-specific utf-8 collation for the table.
SELECT * FROM my_table WHERE my_column = 'my string'
COLLATE Latin1_General_CS_AS
This would make a case sensitive search.
EDIT
As stated in kouton's comment here and tormuto's comment here whosoever faces problem with the below collation
COLLATE Latin1_General_CS_AS
should first check the default collation for their SQL server, their respective database and the column in question; and pass in the default collation with the query expression. List of collations can be found here.
SELECT * FROM Yourtable
WHERE UPPER([column_NAME]) COLLATE Latin1_General_CS_AS !=[Column_NAME]
This is how I did it for utf8 encoded table and utf8_unicode_ci column, which doesn't seem to have been posted exactly:
SELECT *
FROM table
WHERE UPPER(column) != BINARY(column)
for search all rows in lowercase
SELECT *
FROM Test
WHERE col1
LIKE '%[abcdefghijklmnopqrstuvwxyz]%'
collate Latin1_General_CS_AS
Thanks Manesh Joseph
IN MS SQL server use the COLLATE clause.
SELECT Column1
FROM Table1
WHERE Column1 COLLATE Latin1_General_CS_AS = 'casesearch'
Adding COLLATE Latin1_General_CS_AS makes the search case sensitive.
Default Collation of the SQL Server installation SQL_Latin1_General_CP1_CI_AS is not case sensitive.
To change the collation of the any column for any table permanently run following query.
ALTER TABLE Table1
ALTER COLUMN Column1 VARCHAR(20)
COLLATE Latin1_General_CS_AS
To know the collation of the column for any table run following Stored Procedure.
EXEC sp_help DatabaseName
Source : SQL SERVER – Collate – Case Sensitive SQL Query Search
I've done something like this to find out the lower cases.
SELECT *
FROM YourTable
where BINARY_CHECKSUM(lower(ColumnName)) = BINARY_CHECKSUM(ColumnName)
mysql> SELECT '1234aaaa578' REGEXP '^[a-z]';
I have to add BINARY to the ColumnX, to get result as case sensitive
SELECT * FROM MyTable WHERE BINARY(ColumnX) REGEXP '^[a-z]';
I'm not an expert on MySQL I would suggest you look at REGEXP.
SELECT * FROM MyTable WHERE ColumnX REGEXP '^[a-z]';
In Posgresql you could use ~
For example you could search for all rows that have col_a with any letter in lowercase
select * from your_table where col_a '[a-z]';
You could modify the Regex expression according your needs.
Regards,
--For Sql
SELECT *
FROM tablename
WHERE tablecolumnname LIKE '%[a-z]%';
Logically speaking Rohit's solution should have worked, but it didn't. I think SQL Management Studio messed up when trying to optimize this.
But by modifying the string before comparing them I was able to get the right results. This worked for me:
SELECT [ExternalId]
FROM [EquipmentSerialsMaster] where LOWER('0'+[ExternalId]) COLLATE Latin1_General_CS_AS != '0'+[ExternalId]
This works in Firebird SQL, it should work in any SQL queries I believe, unless the underlying connection is not case sensitive.
To find records with any lower case letters:
select * from tablename where upper(fieldname) <> fieldname
To find records with any upper case letters:
select * from tablename where lower(fieldname) <> fieldname

Finding all caps in columns?

When working with MySQL, how can I fetch all rows where the name column is all uppercase?
Since equality is case insensitive, I'm not quite sure how to do this.
If your column collation is case insensitive, you can override it in your query:
SELECT * FROM my_table WHERE my_column COLLATE latin1_bin = UPPER(my_column);
COLLATE clause syntax.
SELECT * FROM my_table REGEXP '^[[:upper:]]+$';
SELECT * FROM table where binary your_field REGEXP '^[[:upper:]]+$'
Similarly:
SELECT * FROM table where binary your_field REGEXP '^[[:upper:]]+$'
The 'binary' casts the field to binary which is necessary for REGEXP to be case-sensitive with most data types (except binary, of course).
[:character_class:] notation is documented here - there are several other useful character classes.
'binary' operator is documented here.

Differentiating between "AB" and "Ab" in a character Database Field

Specifically, Sql Server 2005/T-Sql. I have a field that is mostly a series of two characters, and they're all supposed to be upper case but there's some legacy data that predates the current DB/System, and I need to figure out which records are in violation of the upper casing covenant.
I thought this would work:
select * from tbl where ascii(field1) <> ascii(upper(field1))
And indeed it returned me a handful of records. They've since been corrected, and now that query returns no data. But I've got people telling me there is still mixed case data in the DB, and I just found an example: 'FS' and 'Fs' are both reporting the same ascii value.
Why is this approach flawed? What is a better way to go about this, or how can I fix this approach to work correctly?
if all the date should have been in upper case just do an update
update tbl
set field1 = upper(field1)
but to answer your original question this query should give you the results that you expect:
select * from tbl
where field1 COLLATE Latin1_General_CS_AS <> upper(field1)
Edit: just noticed that the suggestion to use COLLATE was also posted by Ian
ASCII is only comparing the first letter. You'd have to compare each letter, or change the database collation to be case sensitive.
You can change collation on an entire database level, or just on one column for a specific query, so:
SELECT myColumn
FROM myTable
WHERE myColumn COLLATE Latin1_General_CS_AS <> upper(myColumn)
The ascii() function will only return the ascii number for the first character in an expression if you pass it a multiple character string. To do the comparison you want you need to look at individual characters, not entire fields.
The ASCII() function returns only the ASCII code value of the leftmost character of a character expression. Use UPPER() instead.
This might work:
select * from tbl
where cast(field1 as varbinary(256)) <> cast(upper(field1) as varbinary(256))
The methods described at Case sensitive search in SQL Server queries might be useful to you.
According to the documentation for ASCII(), it only returns the leftmost character.
I think you're going about this wrong.
You could simply:
select * from tbl where field1 <> upper(field1)
if the collation rules were set correctly, so why not fix the collation rules? If you can't change them permanently, try:
select * from tbl where
(field1 collate Latin1_General_CS_AS)
<> upper(field1 collate Latin1_General_CS_AS)