Spark SQL - Case insensitive string comparison - apache-spark-sql

In Spark SQL, is there any way to make the string comparison case insensitive globally? i.e. when applying the filter for the columns I would like to avoid the "lcase" function calls.

Related

How to use the smaller as operater on strings? [duplicate]

Today I viewed some query examples, and I found some string comparisons in the WHERE condition.
The comparison was made using the greater than (>) and less than (<) symbols, is this a possible way to compare strings in SQL? And how does it act? A string less than another one comes before in dictionary order? For example, ball is less than water? And this comparison is case sensitive? For example BALL < water, the uppercase character does affect these comparison?
I've googled for hours but I was not able to find nothing that can drive me out these doubt.
The comparison operators (including < and >) "work" with string values as well as numbers.
For MySQL
By default, string comparisons are not case sensitive and use the current character set. The default is latin1 (cp1252 West European), which also works well for English.
String comparisons will be case sensitive when the characterset collation of the strings being compared is case sensitive, i.e. the name of the character set ends in _cs rather than _ci. There's really no point in repeating all of the information that's available in MySQL Reference Manual here.
MySQL Comparison Operators Reference: http://dev.mysql.com/doc/refman/5.5/en/comparison-operators.html
More information about MySQL charactersets/collations: http://dev.mysql.com/doc/refman/5.5/en/charset.html
To answer the specific questions you asked:
Q: is this a possible way to compare strings in SQL?
A: Yes, in both MySQL and SQL Server
Q: and how does it act?
A: A comparison operator returns a boolean, either TRUE, FALSE or NULL.
Q: a string less than another one comes before in dictionary order? For example, ball is less than water?
A: Yes, because 'b' comes before 'w' in the characteset collation, the expression
'ball' < 'water'
will return TRUE. (This depends on the characterset and on the collation.
Q: and this comparison is case sensitive?
A: Whether a particular comparison is case sensitive or not depends on the database server; by default, both SQL Server and MySQL are case insensitive.
In MySQL it is possible to make string comparisons by specifying a characterset collation that is case sensitive (the characterset name will end in _cs rather than _ci)
Q: For example BALL < water, the upper case character does affect these comparison?
A: By default, in both SQL Server and MySQL, the expression
'BALL' < 'water'
would return TRUE.
In Microsoft SQL Server, collation determines to dictionary rules for comparing and sorting character data with regards to:
case sensitivity
accent sensitivity
width sensitivity
kana sensitivity
SQL Server also includes binary collations where comparison and sorting is done by binary code point rather than dictionary rules. Once can choose from many collations according to the desired sensitivity behavior. The default collation selected for Latin-based language locales during SQL installation is case insensitive and accent sensitive.
Collation is specified at the instance (during installation), database, and column level. Instance collation determines the collation of Instance-level objects like logins and database names as well as identifiers for variables, GOTO labels and temporary tables. Database collation (same as instance collation by default), determines the collation of database identifiers like table and column names as well as literal expressions. Column collation (same as database collation by default) determines the collation of that column.
It is certainly possible compare strings using '<', '>', '<>', ,LIKE, BETWEEN, etc.
if you are using Mybatis or XML based technique to execute SQL query, you have to use <![CDATA[your_symbol-here]]> to avoid that issue.
'ball' <![CDATA[<]]> 'water'
Look at the interesting output by SQL Server. The code was to compare the dates, it works fine all the time, but fails when year changes.
SELECT TOP 1 'The ResultSet should be empty' FROM SYS.columns
WHERE '01/04/2023' < '07/11/2022'

Case insensitive where-in SQL query in Oracle

I need to create a WHERE-IN query (using Oracle) that is case insensitive. I've tried this way:
select user from users where lower(user) in lower('userNaMe1', 'useRNAmE2');
but I get ORA-00909: invalid number of arguments
The list is dynamically generated in my Spring app. That's why I can't add lower() to every single list's value.
Is there any other way to achieve it?
lower() takes a single argument, so you can use:
where lower(user) in (lower('userNaMe1'), lower('useRNAmE2'))
You could also express this using regular expressions (regexp_like() accepts a case sensitivity argument) if you prefer:
where regexp_like(user, '^(userNaMe1|useRNAmE2)$', 'i')
There is another more drastic approach, and is to make your session or your searching in the database case insensitive.
You can find how to do it in this answer:
Case insensitive searching in Oracle

Determine if substring corresponds to specific code (character types) in SQL

I have a collection of strings and want to filter out those where the last four characters are: (alpha)(alpha)(number)(number).
I know I can make a substring of each of these and separately, but what is the method to determine the types of the characters in the sequence?
This is for SQL in Hive.
You can use regular expressions. Something like:
where col regexp '[a-zA-Z]{2}[0-9]{2}$'

SQL string comparison, greater than and less than operators

Today I viewed some query examples, and I found some string comparisons in the WHERE condition.
The comparison was made using the greater than (>) and less than (<) symbols, is this a possible way to compare strings in SQL? And how does it act? A string less than another one comes before in dictionary order? For example, ball is less than water? And this comparison is case sensitive? For example BALL < water, the uppercase character does affect these comparison?
I've googled for hours but I was not able to find nothing that can drive me out these doubt.
The comparison operators (including < and >) "work" with string values as well as numbers.
For MySQL
By default, string comparisons are not case sensitive and use the current character set. The default is latin1 (cp1252 West European), which also works well for English.
String comparisons will be case sensitive when the characterset collation of the strings being compared is case sensitive, i.e. the name of the character set ends in _cs rather than _ci. There's really no point in repeating all of the information that's available in MySQL Reference Manual here.
MySQL Comparison Operators Reference: http://dev.mysql.com/doc/refman/5.5/en/comparison-operators.html
More information about MySQL charactersets/collations: http://dev.mysql.com/doc/refman/5.5/en/charset.html
To answer the specific questions you asked:
Q: is this a possible way to compare strings in SQL?
A: Yes, in both MySQL and SQL Server
Q: and how does it act?
A: A comparison operator returns a boolean, either TRUE, FALSE or NULL.
Q: a string less than another one comes before in dictionary order? For example, ball is less than water?
A: Yes, because 'b' comes before 'w' in the characteset collation, the expression
'ball' < 'water'
will return TRUE. (This depends on the characterset and on the collation.
Q: and this comparison is case sensitive?
A: Whether a particular comparison is case sensitive or not depends on the database server; by default, both SQL Server and MySQL are case insensitive.
In MySQL it is possible to make string comparisons by specifying a characterset collation that is case sensitive (the characterset name will end in _cs rather than _ci)
Q: For example BALL < water, the upper case character does affect these comparison?
A: By default, in both SQL Server and MySQL, the expression
'BALL' < 'water'
would return TRUE.
In Microsoft SQL Server, collation determines to dictionary rules for comparing and sorting character data with regards to:
case sensitivity
accent sensitivity
width sensitivity
kana sensitivity
SQL Server also includes binary collations where comparison and sorting is done by binary code point rather than dictionary rules. Once can choose from many collations according to the desired sensitivity behavior. The default collation selected for Latin-based language locales during SQL installation is case insensitive and accent sensitive.
Collation is specified at the instance (during installation), database, and column level. Instance collation determines the collation of Instance-level objects like logins and database names as well as identifiers for variables, GOTO labels and temporary tables. Database collation (same as instance collation by default), determines the collation of database identifiers like table and column names as well as literal expressions. Column collation (same as database collation by default) determines the collation of that column.
It is certainly possible compare strings using '<', '>', '<>', ,LIKE, BETWEEN, etc.
if you are using Mybatis or XML based technique to execute SQL query, you have to use <![CDATA[your_symbol-here]]> to avoid that issue.
'ball' <![CDATA[<]]> 'water'
Look at the interesting output by SQL Server. The code was to compare the dates, it works fine all the time, but fails when year changes.
SELECT TOP 1 'The ResultSet should be empty' FROM SYS.columns
WHERE '01/04/2023' < '07/11/2022'

equalignorecase in oracle sql

From the following question,
SQL server ignore case in a where expression
Is it possible with Oracle?
Also, is it possible to compare "your,text" with "your text"?
I want to convert All characters other than A-Z0-9 into space and then compare the string.
I can do it by Java methods through regex but don't prefer writing unecessary code.
Yes, with the UPPER() function.
select whatever from your_table where UPPER(col) = UPPER('YourText');
(Or LOWER() if you prefer that.)
Performance warning: that won't play well with indexes, unless you've indexed on UPPER(col) also and are careful with NULLs.