SQL Server sort order with nonprintable characters - sql

I have a scalar value function that returns a varchar of data containing the ASCII unit seperator Char(31). I am using this result as part of an Order By clause and attempting to sort in ascending order.
My scalar value function returns results like the following (nonprintable character spelled out for reference)
ABC
ABC (CHAR(31)) DEF
ABC (CHAR(31)) DEF (CHAR(31)) HIJ
I would expect that when I order by ascending the results would be the following:
ABC
ABCDEF
ABCDEFHIJ
instead I am seeing the results as the complete opposite:
ABCDEFHIJ
ABCDEF
ABC
Now I am fairly certain that this has to do with the non-printable characters, but I am not sure why. Any idea as to why that is the case?
Thanks

The sortorder can be influenced by your COLLATION settings. Following script, explicitly using Latin1_General_CI_AS as collation orders the items as you would expect.
;WITH q (Col) AS (
SELECT 'ABC' UNION ALL
SELECT 'ABC' + CHAR(31) + 'DEF' UNION ALL
SELECT 'ABC' + CHAR(31) + 'DEF' + CHAR(31) + 'HIJ'
)
SELECT *
FROM q
ORDER BY
Col COLLATE Latin1_General_CI_AS
What collation are you using? You can verify your current database collation settings with
SELECT DATABASEPROPERTYEX('master', 'Collation') SQLCollation;

I am able to duplicate this behavior in SQL Server 2008 R2 with collation set to SQL_Latin1_General_CP1_CI_AS.
If you cannot change your collation settings, set the field to nvarchar instead of varchar. This solved the issue for me.

Related

How to make Oracle and SQL Server ORDER BY the same?

I need to compare table counts for an Oracle schema to a SQL Server database. However, when I make my query, the results are always off because of the way each handles the underscore ('_') in terms of ordering. I've included an example of what I'm seeing below.
In Oracle:
SELECT FIELD1 FROM ORACLE_ORDER ORDER BY FIELD1 ASC;
Result:
'ABC'
'ABCD'
'ABC_D'
In SQL Server:
SELECT FIELD1 FROM SQL_ORDER ORDER BY FIELD1 ASC;
Result:
'ABC'
'ABC_D'
'ABCD'
As you can see from above, oracle and sql server both treat the underscore differently when it comes to ordering. How can I modify either of the queries (or environments) to make them order the same as the other?
In the SQL Server Side use the following
Select * from SQL_ORDER
ORDER BY FIELD1 Collate SQL_Latin1_General_CP850_BIN
The collation SQL_Latin1_General_CP850_BIN makes it to be used with ASCII values. In this case ASCII of underscore is 95, A being 65, and Z being 90. Remember lower case "a" will have a higher value than upper case "A" and so on.
Here is the fiddle
Simple way is to use Collate SQL_Latin1_General_CP850_BIN function in ORDER BY to achieve this
SELECT * FROM (
SELECT 'ABC' AS TAB UNION
SELECT'ABC_D'UNION
SELECT'ABCD'UNION
SELECT'ABC_'UNION
SELECT 'ABC' UNION
SELECT'A_C' UNION
SELECT'ABC_DE_FGH'UNION
SELECT'ABCXDEYFGH') AS X
ORDER BY X.Tab Collate SQL_Latin1_General_CP850_BIN

How to perform Like in SQL on a column with '%' in the data?

I am using oracle database and writing the likeness filter for the persistence layer and want to perform likeness on a column that can possibly have '%' in it's data.
To filter data I am writing the query for likeness using LIKE clause as
select * from table where columnName like '%%%';
which is returning all the values but I only want the rows that contains '%' in the columnName.
Not sure what escape character to use or what to do to filter on the '%' symbol. Any suggestions??
Also, I have to do the same thing using Criteria api in java and have no clues about putting escape character there.
You can use an escape character.
where columnName like '%$%%' escape '$'
REGEXP_LIKE might help in a rather simple manner.
SQL> with test (col) as
2 (select 'abc%def' from dual union all
3 select '%12345&' from dual union all
4 select '%abc12%' from dual union all
5 select '1234567' from dual
6 )
7 select *
8 from test
9 where regexp_like(col, '%');
COL
-------
abc%def
%12345&
%abc12%
SQL>
If I have understood it correctly the answer from the #Littlefoot is correct(and I will up-vote it now). I will just add more details because you are looking for name of the columns of your table "I only want the rows that contains '%' in the columnName".
Here is the table I have created:
CREATE TABLE "table_WITH%" ("numer%o" number
, name varchar2(50)
, "pric%e" number
, coverage number
, activity_date date);
Then this query gives me the correct answer:
SELECT column_name
FROM ALL_TAB_COLUMNS
WHERE TABLE_NAME = 'table_WITH%'
AND regexp_like(COLUMN_NAME, '%');
Gordon's answer using the escape clause of the like command is correct in the general case.
In your specific case (searching for a single character, anywhere in a string), a simpler method is to use instr, e.g.
where instr(columnname, '%') > 0

Flatten national characters in SQL Server

I have a column that contains pet names with national characters. How do I write the query to match them all in one condition?
|PetName|
Ćin
ćin
Ĉin
ĉin
Ċin
ċin
Čin
čin
sth like FLATTEN funciton here:
...WHERE LOWER(FLATTEN(PetName)) = 'cin'
Tried to cast it to from NVARCHAR to VARCHAR but it didn't help. I'd like to avoid using REPLACE for every character.
this should work because cyrillic collation base cases all diacritics like Đ,Ž,Ć,Č,Š,etc...
declare #t table(PetName nvarchar(100))
insert into #t
SELECT N'Ćin' union all
SELECT N'ćin' union all
SELECT N'Ĉin' union all
SELECT N'ĉin' union all
SELECT N'Ċin' union all
SELECT N'ċin' union all
SELECT N'Čin' union all
SELECT N'čin'
SELECT *
FROM #t
WHERE lower(PetName) = 'cin' COLLATE Cyrillic_General_CS_AI
You can change the collation used for the comparison:
WHERE PetName COLLATE Cyrillic_General_CI_AI = 'cin'
There isn't really a way or built-in function that will strip accents from characters.
If you are doing comparisons (LIKE, IN, PATINDEX etc), you can just force COLLATE if the column/db is not already accent insensitive.
Normally, a query like this
with test(col) as (
select 'Ćin' union all
select 'ćin')
select * from test
where col='cin'
will return both columns, since the default collation (unless you change it) is insensitive. This won't work for FULLTEXT indexes though.

SQL Server queries case sensitivity

I have this database:
abcDEF
ABCdef
abcdef
if I write: select * from MyTbl where A='ABCdef'
how to get: ABCdef
and how to get:
abcDEF
ABCdef
abcdef
Thanks in advance
forgot to write - sqlCE
You can make your query case sensitive by making use of the COLLATE keyword.
SELECT A
FROM MyTbl
WHERE A COLLATE Latin1_General_CS_AS = 'ABCdef'
If you have abcDEF, ABCdef, abcdef already in the database then it's already case sensitive or you have no constraint.
You'd have to add a COLLATE on both sides to make sure it's truly case sensitive (for a non case sensitive database) which will invalidate index usage
SELECT TheColumn
FROM MyTable
WHERE TheColumn COLLATE Latin1_General_CS_AS = 'ABCdef' COLLATE Latin1_General_CS_AS
What about accents too? Latin1_General_CS_AI, Latin1_General_Bin?
Try this just add binary keyword after where:
select * from MyTbl where binary A = 'ABCdef';
It's all about collation. Each one has a suffix (CI and CS, meaning Case Insensitive, and Case Sensitive).
http://www.databasejournal.com/features/mssql/article.php/10894_3302341_2/SQL-Server-and-Collation.htm
SQL is non-case-sensitive by default, so you will get all three items if doing a simple string comparison. To make it case-sensitive, you can cast the value of the field and your search value as varbinary:
SELECT * FROM MyTbl WHERE CAST(A AS varbinary(20)) = CAST('ABCdef' as varbinary(20))
The above assumes your varchar field is sized at 20. For nvarchar double it (thanks #ps2goat).

Non-latin-characters ordering in database with "order by"

I just found some strange behavior of database's "order by" clause. In string comparison, I expected some characters such as '[' and '_' are greater than latin characters/digits such as 'I' or '2' considering their orders in the ASCII table. However, the sorting results from database's "order by" clause is different with my expectation. Here's my test:
SQLite version 3.6.23
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table products(name varchar(10));
sqlite> insert into products values('ipod');
sqlite> insert into products values('iphone');
sqlite> insert into products values('[apple]');
sqlite> insert into products values('_ipad');
sqlite> select * from products order by name asc;
[apple]
_ipad
iphone
ipod
select * from products order by name asc;
name
...
[B#
_ref
123
1ab
...
This behavior is different from Java's string comparison (which cost me some time to find this issue). I can verify this in both SQLite 3.6.23 and Microsoft SQL Server 2005. I did some web search but cannot find any related documentation. Could someone shed me some light on it? Is it a SQL standard? Where can I find some information about this? Thanks in advance.
The concept of comparing and ordering the characters in a database is called collation.
How the strings are stored depends on the collation which is usually set in the server, client or session properties.
In MySQL:
SELECT *
FROM (
SELECT 'a' AS str
UNION ALL
SELECT 'A' AS str
UNION ALL
SELECT 'b' AS str
UNION ALL
SELECT 'B' AS str
) q
ORDER BY
str COLLATE UTF8_BIN
--
'A'
'B'
'a'
'b'
and
SELECT *
FROM (
SELECT 'a' AS str
UNION ALL
SELECT 'A' AS str
UNION ALL
SELECT 'b' AS str
UNION ALL
SELECT 'B' AS str
) q
ORDER BY
str COLLATE UTF8_GENERAL_CI
--
'a'
'A'
'b'
'B'
UTF8_BIN sorts characters according to their unicode. Caps have lower unicodes and therefore go first.
UTF8_GENERAL_CI sorts characters according to their alphabetical position, disregarding case.
Collation is also important for indexes, since the indexes rely heavily on sorting and comparison rules.
The important keyword in this case is 'collation'. I have no experience with SQLite, but would expect it to be similar to other database engines in that you can define the collation to use for whole databases, single tables, per connection, etc.
Check your DB documentation for the options available to you.
The ASCII codes for lower-case characters such as 'i' are greater than the ones for '[' and '_':
'i': 105
'[': 91
'_': 95
However, try to insert upper-case characters, eg. try with "IPOD" or "Iphone", those will become before "_" and "[" with the default binary collation.