How to select rows with Chinese-Japanese characters? - sql

How can I filter a column by Chinese-Japanese like characters? I am trying to do something like
SELECT * FROM my_table WHERE column LIKE '%[A-Za-z]%'
Is it possible for Chinese or Japanese characters?

When working with unicode string you will always need to prefix your string with N to tell sql server explicitly that there can be unicode characters in the operation. INSERT, UPDATE, SELECT and DELETE its true for all operations.
In your case when selecting data, in where clause you will need to prefix the Search string with N. Something like this....
SELECT *
FROM my_table
WHERE column LIKE N'%[A-Z]%' --<-- using Japanese characters here
OR Column LIKE N'%[a-z]%' --<-- using Japanese characters here

Below may work as it did for me.
SELECT * FROM my_table WHERE LEN(RTRIM(my_column)) <> DATALENGTH(RTRIM(my_column))
The len function may ignore trailing whitespaces so it's best to trim it before measuring the length.
Above came from advise on a Japanese web page.

Related

How to select rows containing ONLY cyrillic characters in UPPERCASE from the table using LIKE statement in MS SQL

I want to select rows where column [Name] contains ONLY Cyrillic characters in UPPERCASE, and comma and hyphen from the table using LIKE :
SELECT *
FROM Clients
WHERE NAME LIKE '%[А-Я][,-]%' COLLATE Cyrillic_General_CS_AS
Or using explicit pattern:
SELECT *
FROM Clients
WHERE NAME LIKE '%[АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ][,-]%' COLLATE Cyrillic_General_CS_AS
But these selects rows in which at least one character exists in pattern (but allows any other characters not exists in pattern).
Maybe using ^ (NOT predicate) excluding any other characters like this:
SELECT *
FROM Clients
WHERE NAME LIKE '%[^A-Z][./=+]%' COLLATE Cyrillic_General_CS_AS
But this requires enumeration a large number of unnecessary characters.
How best to make a selection?
Use a double negative. Search for rows where the column doesn't contain at least one character not in the set you're interested in:
SELECT *
FROM Clients
WHERE NAME NOT LIKE '%[^-АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ,]%' COLLATE Cyrillic_General_CS_AS
(I'm not quite sure what you were attempting to do by placing the hyphen and comma in a separate grouping, but I've moved them into the same group for now, since that seems to make some sense)

how to retrieve sql column includes special characters and alphabets

How to retrieve a column containing special characters including alphabets in SQL Query. i have a column like this 'abc%def'. i want to retrieve '%' based columns from that table.
Please help me in this regard.
Is abc%def the column name? or column value? Not sure what you are asking but if you mean your column name contains special character then you can escape them which would be different based on specific RDBMS you are using
SQL Server use []
select [abc%def] from tab
MySQL use backquote
select `abc%def` from tab
EDIT:
Try like below to fetch column value containing % character (Checked, it works in Ingres as well)
select * from tab where col like '%%%'
Others suggest that like '%%%' works in Ingres. So this is something special in Ingres. It does not work in other dbms.
In standard SQL you would have to declare an escape character. I think this should work in Ingres, too.
select * from mytable where str like '%!%%' escape '!';

Why does using an Underscore character in a LIKE filter give me all the results?

I wrote the below SQL query with a LIKE condition:
SELECT * FROM Manager
WHERE managerid LIKE '_%'
AND managername LIKE '%_%'
In the LIKE I want to search for any underscores %_%, but I know that my columns' data has no underscore characters.
Why does the query give me all the records from the table?
Sample data:
create table Manager(
id int
,managerid varchar(3)
,managername varchar(50)
);
insert into Manager(id,managerid,managername)values(1,'A1','Mangesh');
insert into Manager(id,managerid,managername)values(2,'A2','Sagar');
insert into Manager(id,managerid,managername)values(3,'C3','Ahmad');
insert into Manager(id,managerid,managername)values(4,'A4','Mango');
insert into Manager(id,managerid,managername)values(5,'B5','Sandesh');
Sql-Fiddle
Modify your WHERE condition like this:
WHERE mycolumn LIKE '%\_%' ESCAPE '\'
This is one of the ways in which Oracle supports escape characters. Here you define the escape character with the escape keyword. For details see this link on Oracle Docs.
The '_' and '%' are wildcards in a LIKE operated statement in SQL.
The _ character looks for a presence of (any) one single character. If you search by columnName LIKE '_abc', it will give you result with rows having 'aabc', 'xabc', '1abc', '#abc' but NOT 'abc', 'abcc', 'xabcd' and so on.
The '%' character is used for matching 0 or more number of characters. That means, if you search by columnName LIKE '%abc', it will give you result with having 'abc', 'aabc', 'xyzabc' and so on, but no 'xyzabcd', 'xabcdd' and any other string that does not end with 'abc'.
In your case you have searched by '%_%'. This will give all the rows with that column having one or more characters, that means any characters, as its value. This is why you are getting all the rows even though there is no _ in your column values.
The underscore is the wildcard in a LIKE query for one arbitrary character.
Hence LIKE %_% means "give me all records with at least one arbitrary character in this column".
You have to escape the wildcard character, in sql-server with [] around:
SELECT m.*
FROM Manager m
WHERE m.managerid LIKE '[_]%'
AND m.managername LIKE '%[_]%'
See: LIKE (Transact-SQL)
Demo
As you want to specifically search for a wildcard character you need to escape that
This is done by adding the ESCAPE clause to your LIKE expression. The character that is specified with the ESCAPE clause will "invalidate" the following wildcard character.
You can use any character you like (just not a wildcard character). Most people use a \ because that is what many programming languages also use
So your query would result in:
select *
from Manager
where managerid LIKE '\_%' escape '\'
and managername like '%\_%' escape '\';
But you can just as well use any other character:
select *
from Manager
where managerid LIKE '#_%' escape '#'
and managername like '%#_%' escape '#';
Here is an SQLFiddle example: http://sqlfiddle.com/#!6/63e88/4
Underscore is a wildcard for something.
for example
'A_%' will look for all match that Start whit 'A' and have minimum 1 extra character after that
In case people are searching how to do it in BigQuery:
An underscore "_" matches a single character or byte.
You can escape "\", "_", or "%" using two backslashes. For example, "\%". If you are using raw strings, only a single backslash is required. For example, r"\%".
WHERE mycolumn LIKE '%\\_%'
Source: https://cloud.google.com/bigquery/docs/reference/standard-sql/operators
You can write the query as below:
SELECT * FROM Manager
WHERE managerid LIKE '\_%' escape '\'
AND managername LIKE '%\_%' escape '\';
it will solve your problem.

How to compare a varchar field having "(" character

If a field value in the table of SQL Server is like A(B) and if I to write a query
SELECT * FROM MyTable WHERE MyField = 'A(B)'
it is not returning any result. How to handle this situation?
Your query should work fine, if you want to specify a different escape parameter, you can use ESCAPE.
WHERE column LIKE '%A#(B#)%' ESCAPE '#'
Also, if you want to match anything that contains "A(B)", don't forget to surround it by percetages symbols.

Differentiating between "AB" and "Ab" in a character Database Field

Specifically, Sql Server 2005/T-Sql. I have a field that is mostly a series of two characters, and they're all supposed to be upper case but there's some legacy data that predates the current DB/System, and I need to figure out which records are in violation of the upper casing covenant.
I thought this would work:
select * from tbl where ascii(field1) <> ascii(upper(field1))
And indeed it returned me a handful of records. They've since been corrected, and now that query returns no data. But I've got people telling me there is still mixed case data in the DB, and I just found an example: 'FS' and 'Fs' are both reporting the same ascii value.
Why is this approach flawed? What is a better way to go about this, or how can I fix this approach to work correctly?
if all the date should have been in upper case just do an update
update tbl
set field1 = upper(field1)
but to answer your original question this query should give you the results that you expect:
select * from tbl
where field1 COLLATE Latin1_General_CS_AS <> upper(field1)
Edit: just noticed that the suggestion to use COLLATE was also posted by Ian
ASCII is only comparing the first letter. You'd have to compare each letter, or change the database collation to be case sensitive.
You can change collation on an entire database level, or just on one column for a specific query, so:
SELECT myColumn
FROM myTable
WHERE myColumn COLLATE Latin1_General_CS_AS <> upper(myColumn)
The ascii() function will only return the ascii number for the first character in an expression if you pass it a multiple character string. To do the comparison you want you need to look at individual characters, not entire fields.
The ASCII() function returns only the ASCII code value of the leftmost character of a character expression. Use UPPER() instead.
This might work:
select * from tbl
where cast(field1 as varbinary(256)) <> cast(upper(field1) as varbinary(256))
The methods described at Case sensitive search in SQL Server queries might be useful to you.
According to the documentation for ASCII(), it only returns the leftmost character.
I think you're going about this wrong.
You could simply:
select * from tbl where field1 <> upper(field1)
if the collation rules were set correctly, so why not fix the collation rules? If you can't change them permanently, try:
select * from tbl where
(field1 collate Latin1_General_CS_AS)
<> upper(field1 collate Latin1_General_CS_AS)