Using unicode in "like" operator - sql

In Sybase ASE 15, how can I use unicode to search for records containing a certain character using the like operator? For example, U+001E
pseudo-code:
select * from table where field like "%{U+001E}%"

Make sure the collation on your table supports unicode.
Use Nvarchar/Nchar. There used to be an Ntext datatype as well,
but it's depreciated now in favour of Nvarchar.
The columns take up twice as much space over the non-unicode counterparts
(char and varchar).
Then when manually inserting into them, use N to indicate it's unicode text:
INSERT INTO MyTable(SomeNvarcharColumn) VALUES (N'français')

Related

How to translate and what could cause characters such as å¿è€

Goal:
I only use select statements with the dbs I have access to.
One of the columns is supposed to store legible english sentences but there are values with strange characters. I would like to find a way to translate those special characters to legible characters
My question is two fold:
Can I translate the following string to a legible format all stored data is basically lost in translation
How can I ensure that the data is stored correctly?
Column Collation: SQL_Latin1_General_CP1_CI_AS
Column Data Type: NVARCHAR(300)
Data Examples:
å¿è€
ÐžÐ±Ð°Ð¶Ð´Ð°Ð½Ð¸Ñ Ð·Ð°
Use prefix of ‘N’ While you Enter to table
Insert Into TownMessage_Tbl Values (elanat=N' + Elanat +"')

SQL NVARCHAR(MAX) returning ASCII and Weird Characters instead of Text

I have an SQL Table and I'm trying to return the values as a string.
The values should be city names like Sydney, Melbourne, Port Maquarie etc.
But When I run a select I either get black results or as detailed in the first picture some strange backwards L character. The column is an NVARCHAR(MAX)
SELECT ctGlobalName FROM Crm.Cities
Then I tried using MSSQL's Edit top 200 rows feature and I could see the names of the cities, but also all these weird ascii characters.
Now I didn't create the database, I'm just running queries on it. Some things I've read have suggested it is a problem with the Collation. But the table is SQL_Latin1_General_CP1_CI_AS which matches the server collation.
I'm sure there must be something I can add to my select query to return the values as an ordinary string. Is there something I can do to my select query to return the expected format without the weird characters?
An NVARCHAR datatype can store Unicode characters, which are used for languages that are not supported by the ASCII character set i.e. non-English (or related) languages such as Chinese or Indonesian. If your SQL Server or Windows doesn't have that language installed then you might see strange-looking representations of the data.
On the other hand, it could also be that the application that updates this table has just stored bad data in that column.
Either way you might need to do some string manipulation to strip out the characters you don't want.

SQL column collation change

I would like to change a column collation to some Polish collation and be able to view Polish characters properly. All three, original column, original table and original database, use SQL_Scandinavian_CP850_CS_AS.
For column collation change I tried:
SELECT CAST([ColumnName] AS nvarchar(50)) COLLATE Polish_CI_AS FROM t1
These 3 example letters appear in Scandinavian table:
SELECT 'ØùÒ' COLLATE Polish_CI_AS
Should return in results łŚń. Instead it shows 'OuO'.
Unfortunately SQL Server does not support OEM code page 852 which is what you need to convert code page 850 data into if you want to convert 'ØùÒ' to 'łŚń'. You can change the collation of data without SQL Server doing character mapping by CASTing through varbinary, but this only works with supported collations.
An alternative approach might be to create a user-defined function that takes a string and maps characters one-at-a-time, so Ø maps to ł etc. Fiddly to do, there are (up to) 127 characters to map, but not difficult.

Store string with special characters like quotes or backslash in postgresql table

I have a string with value
'MAX DATE QUERY: SELECT iso_timestamp(MAX(time_stamp)) AS MAXTIME FROM observation WHERE offering_id = 'HOBART''
But on inserting into postgresql table i am getting error:
org.postgresql.util.PSQLException: ERROR: syntax error at or near "HOBART".
This is probably because my string contains single quotes. I don't know my string value. Every time it keeps changing and may contain special characters like \ or something since I am reading from a file and saving into postgres database.
Please give a general solution to escape such characters.
As per the SQL standard, quotes are delimited by doubling them, ie:
insert into table (column) values ('I''m OK')
If you replace every single quote in your text with two single quotes, it will work.
Normally, a backslash escapes the following character, but literal backslashes are similarly escaped by using two backslashes"
insert into table (column) values ('Look in C:\\Temp')
You can use double dollar quotation to escape the special characters in your string.
The above query as mentioned insert into table (column) values ('I'm OK')
changes to insert into table (column) values ($$I'm OK$$).
To make the identifier unique so that it doesn't mix with the values, you can add any characters between 2 dollars such as
insert into table (column) values ($aesc6$I'm OK$aesc6$).
here $aesc6$ is the unique string identifier so that even if $$ is part of the value, it will be treated as a value and not a identifier.
You appear to be using Java and JDBC. Please read the JDBC tutorial, which describes how to use paramaterized queries to safely insert data without risking SQL injection problems.
Please read the prepared statements section of the JDBC tutorial and these simple examples in various languages including Java.
Since you're having issues with backslashes, not just 'single quotes', I'd say you're running PostgreSQL 9.0 or older, which default to standard_conforming_strings = off. In newer versions backslashes are only special if you use the PostgreSQL extension E'escape strings'. (This is why you always include your PostgreSQL version in questions).
You might also want to examine:
Why you should use prepared statements.
The PostgreSQL documentation on the lexical structure of SQL queries.
While it is possible to explicitly quote values, doing so is error-prone, slow and inefficient. You should use parameterized queries (prepared statements) to safely insert data.
In future, please include a code snippet that you're having a problem with and details of the language you're using, the PostgreSQL version, etc.
If you really must manually escape strings, you'll need to make sure that standard_conforming_strings is on and double quotes, eg don''t manually escape text; or use PostgreSQL-specific E'escape strings where you \'backslash escape\' quotes'. But really, use prepared statements, it's way easier.
Some possible approaches are:
use prepared statements
convert all special characters to their equivalent html entities.
use base64 encoding while storing the string, and base64 decoding while reading the string from the db table.
Approach 1 (prepared statements) can be combined with approaches 2 and 3.
Approach 3 (base64 encoding) converts all characters to hexadecimal characters without loosing any info. But you may not be able to do full-text search using this approach.
Literals in SQLServer start with N like this:
update table set stringField = N'/;l;sldl;'''mess'

When should I use the SQL Server Unicode 'N' Constant?

I've been looking into the use of the Unicode 'N' constant within my code, for example:
select object_id(N'VW_TABLE_UPDATE_DATA', N'V');
insert into SOME_TABLE (Field1, Field2) values (N'A', N'B');
After doing some reading around when to use it, and I'm still not entirely clear as to the circumstances under which it should and should not be used.
Is it as simple as using it when data types or parameters expect a unicode data type (as per the above examples), or is it more sophiticated than that?
The following Microsoft site gives an explanation, but I'm also a little unclear as to some of the terms it is using
http://msdn.microsoft.com/en-us/library/ms179899.aspx
Or to precis:
Unicode constants are interpreted as
Unicode data, and are not evaluated by
using a code page. Unicode constants
do have a collation. This collation
primarily controls comparisons and
case sensitivity. Unicode constants
are assigned the default collation of
the current database, unless the
COLLATE clause is used to specify a
collation.
What does it mean by:
'evaluated by using a code page'?
Collation?
I realise this is quite a broad question, but any links or help would be appreciated.
Thanks
Is it as simple as using it when data types or parameters expect a unicode data type?
Pretty much.
To answer your other points:
A code page is another name for encoding of a character set. For example, windows code page 1255 encodes Hebrew. This is normally used for 8bit encodings for characters. In terms of your question, strings may be evaluated using different code pages (so the same bit pattern may be interpreted as a Japanese character or an Arabic one, depending on what code page was used to evaluate it).
Collation is about how SQL Server is to order strings - this depends on code page, as you would order strings in different languages differently. See this article for an example.
National character nchar() and nvarchar() use two bytes per character and support international character set -- think internet.
The N prefix converts a string constant to two bytes per character. So if you have people from different countries and would like their names properly stored -- something like:
CREATE TABLE SomeTable (
id int
,FirstName nvarchar(50)
);
Then use:
INSERT INTO SomeTable
( Id, FirstName )
VALUES ( 1, N'Guðjón' );
and
SELECT *
FROM SomeTable
WHERE FirstName = N'Guðjón';