Flatten national characters in SQL Server - sql

I have a column that contains pet names with national characters. How do I write the query to match them all in one condition?
|PetName|
Ćin
ćin
Ĉin
ĉin
Ċin
ċin
Čin
čin
sth like FLATTEN funciton here:
...WHERE LOWER(FLATTEN(PetName)) = 'cin'
Tried to cast it to from NVARCHAR to VARCHAR but it didn't help. I'd like to avoid using REPLACE for every character.

this should work because cyrillic collation base cases all diacritics like Đ,Ž,Ć,Č,Š,etc...
declare #t table(PetName nvarchar(100))
insert into #t
SELECT N'Ćin' union all
SELECT N'ćin' union all
SELECT N'Ĉin' union all
SELECT N'ĉin' union all
SELECT N'Ċin' union all
SELECT N'ċin' union all
SELECT N'Čin' union all
SELECT N'čin'
SELECT *
FROM #t
WHERE lower(PetName) = 'cin' COLLATE Cyrillic_General_CS_AI

You can change the collation used for the comparison:
WHERE PetName COLLATE Cyrillic_General_CI_AI = 'cin'

There isn't really a way or built-in function that will strip accents from characters.
If you are doing comparisons (LIKE, IN, PATINDEX etc), you can just force COLLATE if the column/db is not already accent insensitive.
Normally, a query like this
with test(col) as (
select 'Ćin' union all
select 'ćin')
select * from test
where col='cin'
will return both columns, since the default collation (unless you change it) is insensitive. This won't work for FULLTEXT indexes though.

Related

How to make Oracle and SQL Server ORDER BY the same?

I need to compare table counts for an Oracle schema to a SQL Server database. However, when I make my query, the results are always off because of the way each handles the underscore ('_') in terms of ordering. I've included an example of what I'm seeing below.
In Oracle:
SELECT FIELD1 FROM ORACLE_ORDER ORDER BY FIELD1 ASC;
Result:
'ABC'
'ABCD'
'ABC_D'
In SQL Server:
SELECT FIELD1 FROM SQL_ORDER ORDER BY FIELD1 ASC;
Result:
'ABC'
'ABC_D'
'ABCD'
As you can see from above, oracle and sql server both treat the underscore differently when it comes to ordering. How can I modify either of the queries (or environments) to make them order the same as the other?
In the SQL Server Side use the following
Select * from SQL_ORDER
ORDER BY FIELD1 Collate SQL_Latin1_General_CP850_BIN
The collation SQL_Latin1_General_CP850_BIN makes it to be used with ASCII values. In this case ASCII of underscore is 95, A being 65, and Z being 90. Remember lower case "a" will have a higher value than upper case "A" and so on.
Here is the fiddle
Simple way is to use Collate SQL_Latin1_General_CP850_BIN function in ORDER BY to achieve this
SELECT * FROM (
SELECT 'ABC' AS TAB UNION
SELECT'ABC_D'UNION
SELECT'ABCD'UNION
SELECT'ABC_'UNION
SELECT 'ABC' UNION
SELECT'A_C' UNION
SELECT'ABC_DE_FGH'UNION
SELECT'ABCXDEYFGH') AS X
ORDER BY X.Tab Collate SQL_Latin1_General_CP850_BIN

DB2 efficient select query with like operator for many values (~200)

I have written the following query:
SELECT TBSPACE FROM SYSCAT.TABLES WHERE TYPE='T' AND (TABNAME LIKE '%_ABS_%' OR TABNAME LIKE '%_ACCT_%')
This gives me a certain amount of results. Now the problem is that I have multiple TABNAME to select using the LIKE operator (~200). Is there an efficient way to write the query for the 200 values without repeating the TABNAME LIKE part (because there are 200 such values which would result in a really huge query) ?
(If it helps, I have stored all required TABNAME values in a table TS to retrieve from)
If you are just looking for substrings, you could use LOCATE. E.g.
WITH SS(S) AS (
VALUES
('_ABS_')
, ('_ACCT_')
)
SELECT DISTINCT
TABNAME
FROM
SYSCAT.TABLES, SS
WHERE
TYPE='T'
AND LOCATE(S,TABNAME) > 0
or if your substrings are in table CREATE TABLE TS(S VARCHAR(64))
SELECT DISTINCT
TABNAME
FROM
SYSCAT.TABLES, TS
WHERE
TYPE='T'
AND LOCATE(S,TABNAME) > 0
You could try REGEXP_LIKE. E.g.
SELECT DISTINCT
TABNAME
FROM
SYSCAT.TABLES
WHERE
TYPE='T'
AND REGEXP_LIKE(TABNAME,'.*_((ABS)|(ACCT))_.*')
Just in case.
Note, that the '_' character has special meaning in a pattern-expression of the LIKE predicate:
The underscore character (_) represents any single character.
The percent sign (%) represents a string of zero or more characters.
Any other character represents itself.
So, if you really need to find _ABS_ substring, you should use something like below.
You get both rows in the result, if you use the commented out pattern instead, which may not be desired.
with
pattern (str) as (values
'%\_ABS\_%'
--'%_ABS_%'
)
, tables (tabname) as (values
'A*ABS*A'
, 'A_ABS_A'
)
select tabname
from tables t
where exists (
select 1
from pattern p
where t.tabname like p.str escape '\'
);

Querying SQL IN for list of strings

I have the following query which returns 0 results which I know is wrong. However not sure what is off with my syntax.
select * from SJT_USER where SJT_USER_NAME in
( select USER_NAME from NON_MEMBER);
SJT_USER_NAME type NCHAR(255 CHAR)
USER_NAME type NVARCHAR2(255 CHAR)
I'm guessing I need to do some conversion from NVARCHAR2 to NCHAR.
Try a SQL Cast so you're comparing apples to apples. I'm not certain which DBMS you're using, but the syntax should be similar to this:
select * from SJT_USER where SJT_USER_NAME in ( select CAST(USER_NAME AS NVARCHAR2) from NON_MEMBER);
A CHAR or NCHAR is fixed length so it the SJT_USER_NAME is padded by spaces so it fills up the 255 chars. When comparing the two there are two options. You can either TRIM or use RPAD:
select * from SJT_USER where TRIM(SJT_USER_NAME) in ( select USER_NAME from NON_MEMBER);
or
select * from SJT_USER where TRIM(SJT_USER_NAME) in ( select RPAD(USER_NAME,255) from NON_MEMBER);
The later might be preferable if you have an index SJT_USER_NAME. IF performance is not concern I usually prefer to have a TRIM on both sides of the comparison just to be on the safe side.

SQL Server sort order with nonprintable characters

I have a scalar value function that returns a varchar of data containing the ASCII unit seperator Char(31). I am using this result as part of an Order By clause and attempting to sort in ascending order.
My scalar value function returns results like the following (nonprintable character spelled out for reference)
ABC
ABC (CHAR(31)) DEF
ABC (CHAR(31)) DEF (CHAR(31)) HIJ
I would expect that when I order by ascending the results would be the following:
ABC
ABCDEF
ABCDEFHIJ
instead I am seeing the results as the complete opposite:
ABCDEFHIJ
ABCDEF
ABC
Now I am fairly certain that this has to do with the non-printable characters, but I am not sure why. Any idea as to why that is the case?
Thanks
The sortorder can be influenced by your COLLATION settings. Following script, explicitly using Latin1_General_CI_AS as collation orders the items as you would expect.
;WITH q (Col) AS (
SELECT 'ABC' UNION ALL
SELECT 'ABC' + CHAR(31) + 'DEF' UNION ALL
SELECT 'ABC' + CHAR(31) + 'DEF' + CHAR(31) + 'HIJ'
)
SELECT *
FROM q
ORDER BY
Col COLLATE Latin1_General_CI_AS
What collation are you using? You can verify your current database collation settings with
SELECT DATABASEPROPERTYEX('master', 'Collation') SQLCollation;
I am able to duplicate this behavior in SQL Server 2008 R2 with collation set to SQL_Latin1_General_CP1_CI_AS.
If you cannot change your collation settings, set the field to nvarchar instead of varchar. This solved the issue for me.

Non-latin-characters ordering in database with "order by"

I just found some strange behavior of database's "order by" clause. In string comparison, I expected some characters such as '[' and '_' are greater than latin characters/digits such as 'I' or '2' considering their orders in the ASCII table. However, the sorting results from database's "order by" clause is different with my expectation. Here's my test:
SQLite version 3.6.23
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table products(name varchar(10));
sqlite> insert into products values('ipod');
sqlite> insert into products values('iphone');
sqlite> insert into products values('[apple]');
sqlite> insert into products values('_ipad');
sqlite> select * from products order by name asc;
[apple]
_ipad
iphone
ipod
select * from products order by name asc;
name
...
[B#
_ref
123
1ab
...
This behavior is different from Java's string comparison (which cost me some time to find this issue). I can verify this in both SQLite 3.6.23 and Microsoft SQL Server 2005. I did some web search but cannot find any related documentation. Could someone shed me some light on it? Is it a SQL standard? Where can I find some information about this? Thanks in advance.
The concept of comparing and ordering the characters in a database is called collation.
How the strings are stored depends on the collation which is usually set in the server, client or session properties.
In MySQL:
SELECT *
FROM (
SELECT 'a' AS str
UNION ALL
SELECT 'A' AS str
UNION ALL
SELECT 'b' AS str
UNION ALL
SELECT 'B' AS str
) q
ORDER BY
str COLLATE UTF8_BIN
--
'A'
'B'
'a'
'b'
and
SELECT *
FROM (
SELECT 'a' AS str
UNION ALL
SELECT 'A' AS str
UNION ALL
SELECT 'b' AS str
UNION ALL
SELECT 'B' AS str
) q
ORDER BY
str COLLATE UTF8_GENERAL_CI
--
'a'
'A'
'b'
'B'
UTF8_BIN sorts characters according to their unicode. Caps have lower unicodes and therefore go first.
UTF8_GENERAL_CI sorts characters according to their alphabetical position, disregarding case.
Collation is also important for indexes, since the indexes rely heavily on sorting and comparison rules.
The important keyword in this case is 'collation'. I have no experience with SQLite, but would expect it to be similar to other database engines in that you can define the collation to use for whole databases, single tables, per connection, etc.
Check your DB documentation for the options available to you.
The ASCII codes for lower-case characters such as 'i' are greater than the ones for '[' and '_':
'i': 105
'[': 91
'_': 95
However, try to insert upper-case characters, eg. try with "IPOD" or "Iphone", those will become before "_" and "[" with the default binary collation.