Finding character values outside ASCII range in an NVARCHAR column - sql

Is there a simple way of finding rows in an Oracle table where a specific NVARCHAR2 column has one or more characters which wouldn't fit into the standard ASCII range?
(I'm building a warehousing and data extraction process which takes the Oracle data, drags it into SQL Server -- UCS-2 NVARCHAR -- and then exports it to a UTF-8 XML file. I'm pretty sure I'm doing all the translation properly, but I'd like to find a bunch of real data to test with that's more likely to cause problems.)

Not sure how to tackle this in Oracle, but here is something I've done in MS-SQL to deal with the same issue...
create table #temp (id int, descr nvarchar(200))
insert into #temp values(1,'Now is a good time')
insert into #temp values(2,'So is yesterday')
insert into #temp values(2,'But not '+NCHAR(2012))
select *
from #temp
where CAST(descr as varchar(200)) <> descr
drop table #temp

Sparky's example for SQL Server was enough to lead me to a pretty simple Oracle solution, once I'd found the handy ASCIISTR() function.
SELECT
*
FROM
test_table
WHERE
test_column != ASCIISTR(test_column)
...seems to find any data outside the standard 7-bit ASCII range, and appears to work for NVARCHAR2 and VARCHAR2.

Related

Best way to compress xml text column using SQL?

Using Microsoft SQL Server 2019.
I have two columns, one text representing some xml, another varbinary(max) representing already compressed xml, that I need to compress.
Please assume I cannot change the source data, but conversions can be made as necessary in the code.
I'd like to compress the text column, and initially it works fine, but if I try to save it into a temp table to be used further along in the process I get weird characters like ‹ or tŠÌK'À3û€Í‚;jw. Again, the first temp table I make stores it just fine, I can select the initial table and it displays compressed correctly. But if I need to pull it into a secondary temp table or variable from there it turns into a mess.
I've tried converting into several different formats, converting later in the process, and bringing in the source data for the column at the very last stage, but my end goal is to populate a variable that will be converted into JSON, and it always ends up weird there as well. i just need the compressed version of the columns do display properly when viewing the json variable I've made.
Any suggestions on how to tackle this?
Collation issue?
This smells of collation issue. tempdb is actually its own database with its own default collation and other settings.
In one database with default CollationA you call COMPRESS(NvarcharData) and that produces some VARBINARY.
In other database (tempdb) with default CollationB you call CONVERT(NVARCHAR(MAX), DECOMPRESS(CompressedData)). Now, what happens under the hood is:
CompressedData gets decompressed into VARBINARY representing NvarcharData in CollationA
that VARBINARY is converted to NVARCHAR assuming the binary data represents NVARCHAR data in CollationB, which is not true!
Try to be more explicit (collation, data type) with conversions between XML, VARBINARY and (N)VARCHAR.
Double compression?
I have also noticed "representing already compressed xml, that I need to compress". If you are doublecompressing, maybe you forgot to doubledecompress?
Example?
You are sadly missing an example, but I have produced minimal example of converting between XML and compressed data that works for me.
BEGIN TRANSACTION
GO
CREATE TABLE dbo.XmlData_Base (
PrimaryKey INTEGER NOT NULL IDENTITY(1, 1),
XmlCompressed VARBINARY(MAX) NULL
);
GO
CREATE OR ALTER VIEW dbo.XmlData
WITH SCHEMABINDING
AS
SELECT
BASE.PrimaryKey,
CONVERT(XML, DECOMPRESS(BASE.XmlCompressed)) AS XmlData
FROM
dbo.XmlData_Base AS BASE;
GO
CREATE OR ALTER TRIGGER dbo.TR_XmlData_instead_I
ON dbo.XmlData
INSTEAD OF INSERT
AS
BEGIN
INSERT INTO dbo.XmlData_Base
(XmlCompressed)
SELECT
COMPRESS(CONVERT(VARBINARY(MAX), I.XmlData))
FROM
Inserted AS I;
END;
GO
CREATE OR ALTER TRIGGER dbo.TR_XmlData_instead_U
ON dbo.XmlData
INSTEAD OF UPDATE
AS
BEGIN
UPDATE BASE
SET
BASE.XmlCompressed = COMPRESS(CONVERT(VARBINARY(MAX), I.XmlData))
FROM
dbo.XmlData_Base AS BASE
JOIN Inserted AS I ON I.PrimaryKey = BASE.PrimaryKey;
END;
GO
INSERT INTO dbo.XmlData
(XmlData)
VALUES
(CONVERT(XML, N'<this><I>I call upon thee!</I></this>'));
SELECT
*
FROM
dbo.XmlData;
SELECT
PrimaryKey,
XmlCompressed,
CONVERT(XML, DECOMPRESS(XmlCompressed))
FROM
dbo.XmlData_Base;
UPDATE dbo.XmlData
SET
XmlData = CONVERT(XML, N'<that><I>I call upon thee!</I></that>');
SELECT
*
FROM
dbo.XmlData;
SELECT
PrimaryKey,
XmlCompressed,
CONVERT(XML, DECOMPRESS(XmlCompressed))
FROM
dbo.XmlData_Base;
GO
ROLLBACK TRANSACTION;

Select cyrillic character in SQL

When user insert Russian word like 'пример' to database,database saves it like '??????'. If they insert with 'N' letter or I select it with 'N' letter, ie; exec Table_Name N'иытание' there is no problem. But I don't want to use 'N' in every query, so is there any solution for this? I will use stored procedure by the way.
UPDATE:
Now I can use Russian letters with alter collation. But I can't alter collation for every language and I just want to learn is there any trigger or function for automatic add N in front of the text after text add. IE; when I insert 'пример', SQL should take it like N'пример' autamaticly.
You have to use column's datatype NVARCHAR to insert unicode letters, also you have to use N'value' when inserting.
You can test it in following:
CREATE TABLE #test
(
varcharCol varchar(40),
nvarcharCol nvarchar(40)
)
INSERT INTO #test VALUES (N'иытание', N'иытание')
SELECT * FROM #test
OUTPUT
varcharCol nvarcharCol
??????? иытание
As you see column of datatype varchar returning questionmarks ?????? and column of datatype nvarchar returning russian characters иытание.
UPDATE
Problem is that your database collation does not support russian letters.
In Object Explorer, connect to an instance of the SQL Server Database Engine, expand that instance, and then expand Databases.
Right-click the database that you want and click Properties.
Click the Options page, and select a collation from the Collation
drop-down list.
After you are finished, click OK.
MORE INFO
it would very difficult to put in comment i would recommend this link Info
declare #test TABLE
(
Col1 varchar(40),
Col2 varchar(40),
Col3 nvarchar(40),
Col4 nvarchar(40)
)
INSERT INTO #test VALUES
('иытание',N'иытание','иытание',N'иытание')
SELECT * FROM #test
RESULT
To store and select Unicode character in database you have to use NVARCHAR instead of VARCHAR. To insert Unicode data you have to use N
See this link https://technet.microsoft.com/en-us/library/ms191200%28v=sql.105%29.aspx
The n prefix for these data types comes from the ISO standard for National (Unicode) data types.
Change type of your columns (containing Russian) from varchar to nvarchar.

SSIS / SQL Server - dealing with various money type notations

In a SQL Server money column how can I deal with different currency notations coming in from country specific Excel files via SSIS (in varchar - transformed to money), taking care of comma and dot representation to make sure the values stay correct?
For example if these are three column values in Excel:
22,333.44
22.333,44
22333,44
the first notation above will result in 22,3334, which of course is incorrect.
What do I need to do with the data? Is it a string replace or something more elegant?
thank you.
UPDATED:
After discussion in comments the problem has been clarified. The values in the excel column can be of many different regional formats (English using commas to separate thousands and '.' for decimal point, German using '.' for separating thousands and comma for decimal point).
Assuming that the destination format is English and you don't have an accompanying column to indicate the format then you're gonna have to implement a kludge of a workaround. If you can guarantee there will always be 2 numbers after the "decimal place" (comma in german format) then REPLACE(REPLACE(#Value,',',''),'.','') will get rid of every comma/point. Then you will have to get the length of the resulting varchar and manually insert a decimal (or comma) before the last 2 characters. Here's a sample implementation:
declare #number varchar(12),#trimmednumber varchar(12),#inserteddecimal varchar(12)
set #number='22.333,44'
select #trimmednumber=REPLACE(REPLACE(#number,',',''),'.','')
select #inserteddecimal=(LEFT(#trimmednumber,len(#trimmednumber)-2) + '.' + RIGHT(#trimmednumber,2))
select #number AS [Original],#trimmednumber AS [Trimmed],#inserteddecimal AS [Result]
And the results:
Original Trimmed Result
------------ ------------ ------------
22.333,44 2233344 22333.44
Original Answer:
I may be misunderstanding your question but if you take in those values as VARCHAR and insert them into MONEY columns then the implicit conversion should be correct.
Here's what I've knocked together to test:
declare #money_varchar1 varchar(12),#money_varchar2 varchar(12),#money_varchar3 varchar(12)
set #money_varchar1='22,333.44'
set #money_varchar2='22.333,44'
set #money_varchar3='22333,22'
declare #table table (Value money)
insert into #table values (#money_varchar1)
insert into #table values (#money_varchar2)
insert into #table values (#money_varchar3)
select * from #table
And the results:
Value
---------------------
22333.44
22.3334
2233322.00

Unicode- VARCHAR and NVARCHAR

-- Creating Table
Create Table Test1
(
id Varchar(8000)
)
-- Inserting a record
Insert into Test1 Values ('我們的鋁製車架採用最新的合金材料所製成,不但外型輕巧、而且品質優良。為了達到強化效果,骨架另外經過焊接和高溫處理。創新的設計絕對能充分提升踏乘舒適感和單車性能。');
As I have defined data type of id as Varchar. The data is stored as ?????.
Do I have to use NVARCHAR..? What is Difference between VarChar and Nvarchar(). Please explain about UNIcode as well.
The column type nvarchar allows you to store Unicode characters, which basically means almost any character from almost any language (including modern languages and some obsolete languages), and a good number of symbols too.
also it is required to prefix N before your value. example Insert into Test1 Values (N'我們的鋁製車架採用最新的合金材料所製成,不但外型輕巧、而且品質優良。為了達到強化效果,骨架另外經過焊接和高溫處理。創新的設計絕對能充分提升踏乘舒適感和單車性能。'); or programatically use preparedstatement with bind values for inserting and updating natural characterset
Nvarchar supports UNICODE. SO yes. you need to have the column as nvarchar and not varchar.
Despite the collation of your database. Use nvarchar to store UNICODE.
Embbed your Unicode value in N'[value]'
INSERT INTO ... VALUES
('Azerbaijani (Cyrillic)', N'Aзәрбајҹан (кирил әлифбасы)', 'az-cyrl')
In DB: 59 Azerbaijani (Cyrillic) Aзәрбајҹан (кирил әлифбасы) az-cyrl
Important is the N prefix!
Valid for MS SQL 2014 I am using. Hope this helps.
Yes you have to use nvarchar or use a collation for the language set you want. But nvarchar is preferred. Goodgle can tell you what this stuff means.
Varchar uses Windows-1252 character encoding, which is for all practical purposes standard ASCII.
As others have noted, nvarchar allows the storage of unicode characters.
You can get the ASCII translations from either data type, as shown here:
IF OBJECT_ID('TEST1') IS NOT NULL
DROP TABLE TEST1
GO
CREATE TABLE TEST1(VARCHARTEST VARCHAR(8000), NVARCHARTEST NVARCHAR(4000))
-- Inserting a record
INSERT INTO TEST1 VALUES ('ABC','DEF')
SELECT
VARCHARTEST
,NVARCHARTEST
,ASCII(SUBSTRING(VARCHARTEST,1,1))
,ASCII(SUBSTRING(VARCHARTEST,2,1))
,ASCII(SUBSTRING(VARCHARTEST,3,1))
,ASCII(SUBSTRING(NVARCHARTEST,1,1))
,ASCII(SUBSTRING(NVARCHARTEST,2,1))
,ASCII(SUBSTRING(NVARCHARTEST,3,1))
FROM
TEST1
DROP TABLE TEST1

TSQL - Case on Ntext (SQL 2005)

Stored Procedures in SQL 2005 - with field type NText
Im Writing a stored procedure to tidy up some data before importing it into Microsoft CRM.
So far all works fine.
However i need to do a case statement on a nText Field. It needs to check this field against about 3 or 4 text values and set a new field (already in the destination table) which is also an nText field.
However i am getting the error
"The data types ntext and varchar are incompatible in the equal to operator.
I have come across a few articles however their solutions all seem very complex.
Thanks for your help and advice in advanced.
I recommend, if at all possible, replacing the NTEXT type with NVARCHAR(MAX), since NTEXT is not a first class type and NVARCHAR is. This should be easy to do with an ALTER TABLE statement.
Most higher level code shouldn't care about the type change. Any procedural code that uses READTEXT, WRITETEXT, etc. to deal with the NTEXT columns can be simplified to just basic selects and updates.
If the type change is not possible you may have to wrap the comparisons and assignments with CAST() or CONVERT() operators, which is ugly.
NTEXT is deprecated in SQL Server 2005. You should use NVARCHAR(MAX) instead (NVARCHAR(MAX) can be used in CASE). Is it possible for you to change the type?
this works as well
CREATE TABLE #TEMP
(
MyDummy NTEXT
)
INSERT INTO #TEMP (MyDummy) Values ('test')
SELECT
CASE CAST(MyDummy AS NVARCHAR(MAX)) WHEN 'test' THEN 'ok' ELSE 'NOK' END MyTest
FROM #temp
drop table #temp