Differentiating between "AB" and "Ab" in a character Database Field - sql

Specifically, Sql Server 2005/T-Sql. I have a field that is mostly a series of two characters, and they're all supposed to be upper case but there's some legacy data that predates the current DB/System, and I need to figure out which records are in violation of the upper casing covenant.
I thought this would work:
select * from tbl where ascii(field1) <> ascii(upper(field1))
And indeed it returned me a handful of records. They've since been corrected, and now that query returns no data. But I've got people telling me there is still mixed case data in the DB, and I just found an example: 'FS' and 'Fs' are both reporting the same ascii value.
Why is this approach flawed? What is a better way to go about this, or how can I fix this approach to work correctly?

if all the date should have been in upper case just do an update
update tbl
set field1 = upper(field1)
but to answer your original question this query should give you the results that you expect:
select * from tbl
where field1 COLLATE Latin1_General_CS_AS <> upper(field1)
Edit: just noticed that the suggestion to use COLLATE was also posted by Ian

ASCII is only comparing the first letter. You'd have to compare each letter, or change the database collation to be case sensitive.
You can change collation on an entire database level, or just on one column for a specific query, so:
SELECT myColumn
FROM myTable
WHERE myColumn COLLATE Latin1_General_CS_AS <> upper(myColumn)

The ascii() function will only return the ascii number for the first character in an expression if you pass it a multiple character string. To do the comparison you want you need to look at individual characters, not entire fields.

The ASCII() function returns only the ASCII code value of the leftmost character of a character expression. Use UPPER() instead.

This might work:
select * from tbl
where cast(field1 as varbinary(256)) <> cast(upper(field1) as varbinary(256))

The methods described at Case sensitive search in SQL Server queries might be useful to you.

According to the documentation for ASCII(), it only returns the leftmost character.
I think you're going about this wrong.
You could simply:
select * from tbl where field1 <> upper(field1)
if the collation rules were set correctly, so why not fix the collation rules? If you can't change them permanently, try:
select * from tbl where
(field1 collate Latin1_General_CS_AS)
<> upper(field1 collate Latin1_General_CS_AS)

Related

Case insensitive search without using a function in the where clause

Any way to make a case insensitive without using a function in the where clause?
Please specify the database you are talking about when/if you reply. I am aware that MySQL is already case insensitive by default. What about Oracle or MSSQL or HANA?
select * from mytable WHERE upper(fieldname) = 'VALUE'
collate SQL_Latin1_General_CP1_CS_AS.
Default Collation is SQL_Latin1_General_CP1_CI_AS which is case insensitive. And if we need to make it case sensitive, then adding COLLATE Latin1_General_CS_AS makes the search case sensitive.
Query
select * from [mytable]
where [fieldname] = 'VALUE' collate SQL_Latin1_General_CP1_CS_AS;
Find a demo here
Since your question is tagged Oracle, I will provide a solution which works in Oracle.
You can set these session parameters for case insensitive searching
SQL> alter session set NLS_COMP=ANSI;
SQL> alter session set NLS_SORT=BINARY_CI;
SQL> select 1 from DUAL where 'abc' = 'ABC';
1
----------
1
Read more at Linguistic Sorting and String Searching
as #mathguy points out,
ALTER SESSION SET NLS_COMP=LINGUISTIC;
is more common than using ANSI
Making the column upper (or lower) case as you are showing in order to compare it, is the standard way of making a case insensitive comparision. (UPPER and LOWER are functions defined in the SQL standard.)
If you don't want to apply a function on the column, then you can of course write a recursive query to generate all upper/lower case permutations of the value ('VALUE', 'vALUE', 'VaLUE', ..., 'value') and check whether your column value is in this set. Standard SQL provides the SUBSTRING function for accessing substrings (e.g. the nth letter) and CHAR_LENGTH for getting the string's length.
It depends on the DBMS you are using and its version to what extent the standard is supported. In Oracle for example it's SUBSTR instead of SUBSTRING and LENGTH instead of CHAR_LENGTH. MySQL on the other hand features both SUBSTRING and CHAR_LENGTH directly, but only supports recursive queries as of version 8.0.
this will work:
SELECT *
FROM mytable
WHERE REGEXP_LIKE (column_name, 'value', 'i');
Oracle 12c answer
select * from mytable WHERE fieldname='Value' collate binary_ci
SAP HANA does not seem to have a way other than using upper or lower.
SQL Server and MySQL do not distinguish between upper and lower case letters—they are case-insensitive by default.
One could use CONTAINS Function. For example, Microsoft SQL Server query:
SELECT *
FROM TableName
WHERE ColumnName LIKE 'Abc%'
Maybe written in SAP HANA as:
SELECT *
FROM TableName
WHERE CONTAINS(ColumnName,'Abc%');
https://help.sap.com/viewer/05c9edaee7fe4d28ab3627d0b1583df6/2021_01_QRC/en-US/b45ff4c0e9ab4ba7a9e18a2552adeb3d.html

Regex to get data with special characters

I have some data in my table's column upn.
Here is a small sample set of this data.
Pasquale.Rombolà#it.eurw.domain.net
JuanMaria.RomanGonçalves#eurs.domain.net
Santo.Paternò#it.eurw.domain.net
Peter.Browne#UK.EURW.domain.net
François.ESTIN#fr.eurw.domain.net
Frédéric.Huynh#fr.eurw.domain.net
Frédérique.Psaume#fr.eurw.domain.net
Laura.PiñeiroGomez#eurs.domain.net
Maria.AranzabalSaldaña#eurs.domain.net
Alberto.RubioMuñoz#eurs.domain.net
Peter.Brüggemann#UK.EURW.domain.net
Russel.Peters#CA.domain.net
I want to query this table for UPN values where I have some special characters in the UPN. So my query should not return upns such as:
Peter.Browne#UK.EURW.domain.net
and
Russel.Peters#CA.domain.net
But returns everything else with special characters such as [à,ò,ñ,ü ...etc]
I have tried this query but it doesn't work.
Select * from TableName
Where [UPN] like %[a-z,0-9,#,\.,-,A-Z]%
It returns everything including those which don't have any special characters.
Please help.
If I understand correctly, I think you'll just need to add a "^" as the first character inside the square brackets.
At present you're saying you want to return all those UPNs where one or more characters is in the list you give (i.e. the "ordinary" characters). The "^" should reverse that and give you all the UPNs where at least one of the characters is not in the list you give.
Update: After testing locally ... Make sure your collation is "Accent Sensitive" (if necessary add "Latin1_General_CI_AS" or similar after your "like" clause.
I found it only worked if rather than "A-Z", I actually typed out the whole alphabet.
You need to add binary collate clause in it. Chose necessary collation as per your data. For given sample data Latin1_General_BIN works. Here is the link for collation in sql server.
This snippet worked for me on my machine-
create table #t (name varchar(100));
insert into #t values
('Pasquale.Rombolà#it.eurw.domain.net'),
('JuanMaria.RomanGonçalves#eurs.domain.net'),
('Santo.Paternò#it.eurw.domain.net'),
('Peter.Browne#UK.EURW.domain.net'),
('François.ESTIN#fr.eurw.domain.net'),
('Frédéric.Huynh#fr.eurw.domain.net'),
('Frédérique.Psaume#fr.eurw.domain.net'),
('Laura.PiñeiroGomez#eurs.domain.net'),
('Maria.AranzabalSaldaña#eurs.domain.net'),
('Alberto.RubioMuñoz#eurs.domain.net'),
('Peter.Brüggemann#UK.EURW.domain.net'),
('Russel.Peters#CA.domain.net');
select * from #t where name not like '%[^a-zA-Z0-9#.]%' COLLATE Latin1_General_BIN;
Output-
Peter.Browne#UK.EURW.domain.net
Russel.Peters#CA.domain.net

Verify if the second character is a letter in SQL

I want to put a condition in my query where I have a column that should contain second position as an alphabet.
How to achieve this?
I've tried with _[A-Z]% in where clause but is not working. I've also tried [A-Z]%.
Any inputs please?
I think you want mysql query. like this
SELECT * FROM table WHERE column REGEXP '^.[A-Za-z]+$'
or sql server
select * from table where column like '_[a-zA-Z]%'
You can use regular expression matching in your query. For example:
SELECT * FROM `test` WHERE `name` REGEXP '^.[a-zA-Z].*';
That would match the name column from the test table against a regex that verifies if the second character is either a lowercase or uppercase alphabet letter.
Also see this SQL Fiddle for an example of data it does and doesn't match.
agree with #Gordon Linoff, your ('_[A-Z]%') should work.
if not work, kindly add some sample data with your question.
Declare #Table Table
(
TextCol Varchar(20)
)
Insert Into #Table(TextCol) Values
('23423cvxc43f')
,('2eD97S9')
,('sAgsdsf')
,('3Ss08008')
Select *
From #Table As t
Where t.TextCol Like '_[A-Z]%'
The use of '%[A-Z]%' suggests that you are using SQL Server. If so, you can do this using LIKE:
where col like '_[A-Z]%'
For LIKE patterns, _ represents any character. If the first character needs to be a digit:
where col like '[0-9][A-Z]%'
EDIT:
The above doesn't work in DB2. Instead:
where substr(col, 2, 1) between 'A' and 'Z'

UPPER() and LOWER() not required?

For a while I thought, in order for the WHERE criteria to be evaluated correctly, I need to account for case sensitivity. I would use UPPER() and LOWER() when case didn't matter. However, I am finding the below queries produce the same result.
SELECT * FROM ATable WHERE UPPER(part) = 'SOMEPARTNAME'
SELECT * FROM ATable WHERE part = 'SOMEPARTNAME'
SELECT * FROM ATable WHERE part = 'somepartname'
SQL Case Sensitive String Compare explains to use case-sensitive collations. Is this the only way to force case sensitivity? Also, if you had a case-insensitive collation when would UPPER() and LOWER() be necessary?
Thanks for help.
The common SQL Server default of a case-insensitive collation means that UPPER() and LOWER() are not required when comparing strings.
In fact an expression such as
SELECT * FROM Table WHERE UPPER(part) = 'SOMEPARTNAME'
is also non-sargable i.e won't use available indexes, due to the function applied to the part column on the left hand side of the comparison.
this query below produces CASE SENSITIVE search:
SELECT Column1
FROM Table1
WHERE Column1 COLLATE Latin1_General_CS_AS = 'casesearch'
UPPER() and LOWER() are only functions to change the case of the letter so if you case-insensitive collation, they are only use after the SELECT Keyword:
SELECT UPPER('qwerty'), LOWER('Dog')
returns
QWERTY, dog

How to find rows that have a value that contains a lowercase letter

I'm looking for an SQL query that gives me all rows where ColumnX contains any lowercase letter (e.g. "1234aaaa5789"). Same for uppercase.
SELECT * FROM my_table
WHERE UPPER(some_field) != some_field
This should work with funny characters like åäöøüæï. You might need to use a language-specific utf-8 collation for the table.
SELECT * FROM my_table WHERE my_column = 'my string'
COLLATE Latin1_General_CS_AS
This would make a case sensitive search.
EDIT
As stated in kouton's comment here and tormuto's comment here whosoever faces problem with the below collation
COLLATE Latin1_General_CS_AS
should first check the default collation for their SQL server, their respective database and the column in question; and pass in the default collation with the query expression. List of collations can be found here.
SELECT * FROM Yourtable
WHERE UPPER([column_NAME]) COLLATE Latin1_General_CS_AS !=[Column_NAME]
This is how I did it for utf8 encoded table and utf8_unicode_ci column, which doesn't seem to have been posted exactly:
SELECT *
FROM table
WHERE UPPER(column) != BINARY(column)
for search all rows in lowercase
SELECT *
FROM Test
WHERE col1
LIKE '%[abcdefghijklmnopqrstuvwxyz]%'
collate Latin1_General_CS_AS
Thanks Manesh Joseph
IN MS SQL server use the COLLATE clause.
SELECT Column1
FROM Table1
WHERE Column1 COLLATE Latin1_General_CS_AS = 'casesearch'
Adding COLLATE Latin1_General_CS_AS makes the search case sensitive.
Default Collation of the SQL Server installation SQL_Latin1_General_CP1_CI_AS is not case sensitive.
To change the collation of the any column for any table permanently run following query.
ALTER TABLE Table1
ALTER COLUMN Column1 VARCHAR(20)
COLLATE Latin1_General_CS_AS
To know the collation of the column for any table run following Stored Procedure.
EXEC sp_help DatabaseName
Source : SQL SERVER – Collate – Case Sensitive SQL Query Search
I've done something like this to find out the lower cases.
SELECT *
FROM YourTable
where BINARY_CHECKSUM(lower(ColumnName)) = BINARY_CHECKSUM(ColumnName)
mysql> SELECT '1234aaaa578' REGEXP '^[a-z]';
I have to add BINARY to the ColumnX, to get result as case sensitive
SELECT * FROM MyTable WHERE BINARY(ColumnX) REGEXP '^[a-z]';
I'm not an expert on MySQL I would suggest you look at REGEXP.
SELECT * FROM MyTable WHERE ColumnX REGEXP '^[a-z]';
In Posgresql you could use ~
For example you could search for all rows that have col_a with any letter in lowercase
select * from your_table where col_a '[a-z]';
You could modify the Regex expression according your needs.
Regards,
--For Sql
SELECT *
FROM tablename
WHERE tablecolumnname LIKE '%[a-z]%';
Logically speaking Rohit's solution should have worked, but it didn't. I think SQL Management Studio messed up when trying to optimize this.
But by modifying the string before comparing them I was able to get the right results. This worked for me:
SELECT [ExternalId]
FROM [EquipmentSerialsMaster] where LOWER('0'+[ExternalId]) COLLATE Latin1_General_CS_AS != '0'+[ExternalId]
This works in Firebird SQL, it should work in any SQL queries I believe, unless the underlying connection is not case sensitive.
To find records with any lower case letters:
select * from tablename where upper(fieldname) <> fieldname
To find records with any upper case letters:
select * from tablename where lower(fieldname) <> fieldname