data length issue from SQL Server to Oracle with non english characher

data length issue from SQL Server to Oracle with non english characher - sql

we have 2 applications. One application uses SQL Server as the backend and the other application uses Oracle.
In the first application the user can enter some information and the 2nd application gets the data from SQL Server and insert it into oracle.
The problem is that the user can enter in any language following table shows sample data
Table in SQL Server
For instance user has entered Chinese characters in address field and length is 10,
Oracle Table
Address is not inserted here because length of address exceeds to 12, in oracle special character considering as 3 length.
I want to substring character (with non english and with english). How can I achieve that? I have written function which written number of special character.
how to get only 5 charachters from #nstring

Try to define the column in Oracle as VARCHAR2(10 CHAR). That changes the length semantics from bytes to characters. So the column will be able to accept 10 characters not just 10 bytes, which might be to short if there are special characters in the string.

declare #nstring NVARCHAR(MAX)=N'理,551'
declare #lenSQL int = len(#nstring)
declare #oracleLen int = #lenSQL +(2 * [dbo].[CountNonEnglishfromString](#nstring))
declare #OracleMaxLen int = 5; -- change as per len required
declare #newString nvarchar(max);
if(#OracleMaxLen < #oracleLen)
begin
declare #olen int =0
declare #count int =1;
WHILE ( #count <= #lenSQL)
BEGIN
declare #ch nvarchar(1) =(SELECT SUBSTRING(#nstring,#count,#count) AS ExtractString);
declare #isSpecialChar int = [dbo].[CountNonEnglishfromString](#ch)
if(#isSpecialChar = 1)
begin
set #olen = #olen+3;
end
else
begin
set #olen = #olen+1;
end
if(#OracleMaxLen < #olen)
begin
break
end
set #newString =CONCAT(#newString , #ch)
set #count = #count +1
End
End
else
begin
set #newString = #nstring
end
select isnull(#newString,'') as 'new string';

Related

SQL Server update statement performance

I have problem in optimizing an SQL query to do some data cleansing.
In fact, I have a table which is a sort of referential of a multiple special characters and word. Let's call it ABNORMAL(ID,PATTERN)
I have also another table INDIVIDUALS containing a column (NAME) which I want to clean by removing from it all characters that exist in the table ABNORMAL.
Currently, I have tried to use update statements, but I'm not sure if there is a better way to do this.
Approach one
Use a while loop to build a replace containing all characters from ABNORMALS by a blank '' and do one update using the built-in REPLACE
DECLARE #REPLACE_EXPRESSION nvarchar(max) ='REPLACE(NAME,'''','''')'
DECLARE #i int = 1
DECLARE #nbr int = (SELECT COUNT(*) FROM ABNORMAL)
-- CURRENT_CHARAC
DECLARE #CURRENT_CHARAC nvarchar(max)
-- NEW REPLACE EXPRESSION TO IMBRICATE INTO THE REPLACE EXPRESSION VARIABLE
DECLARE #CURR_REP NVARCHAR(max)
-- STRING TO BUILD AN SQL QUERY CONTAINING THE REPLACE EXPRESSION
DECLARE #UPDATE_QUERY nvarchar(max)
WHILE #i < #nbr
BEGIN
SELECT #CURRENT_CHARAC=PATTERN FROM CLEANSING_STG_PRISM_FRA_REF_UNSIGNIFICANT_VALUES WHERE ID_PATTERN=#i ;
SET #REPLACE_EXPRESSION = REPLACE(#REPLACE_EXPRESSION ,'NAME','REPLACE(NAME,'+''''+#CURRENT_CHARAC+''''+','''')')
set #i=#i+1
END
SET #UPDATE_QUERY = 'UPDATE INDIVIDUAL SET NAME ='+ #REPLACE_EXPRESSION
EXEC sp_executesql #UPDATE_QUERY
Approach two
Use a while loop to select every character in abnormal and do an update using replace containing the characters to remove:
DECLARE #i int = 1
DECLARE #nbr int = (SELECT COUNT(*) FROM ABNORMAL)
-- CURRENT_CHARAC
DECLARE #CURRENT_CHARAC nvarchar(max)
-- STRING TO BUILD AN SQL QUERY CONTAINING THE REPLACE EXPRESSION
DECLARE #UPDATE_QUERY nvarchar(max)
WHILE #i < #nbr
BEGIN
SELECT #CURRENT_CHARAC=PATTERN FROM CLEANSING_STG_PRISM_FRA_REF_UNSIGNIFICANT_VALUES WHERE ID_PATTERN=#i ;
UPDATE INDIVIDUAL
SET NAME = REPLACE(NAME,#CURRENT_CHARAC,'')
SET #i=#i+1
END
I already tested both approaches for 2 millions records, and I found that the first approach is faster than the second. I would know if you have already done something similar and new (better) ideas to try.

If you are using SQL Server 2017 you could use TRANSLATE and avoid dynamic SQL:
SELECT i.*
, REPLACE(TRANSLATE(i.NAME, f, REPLICATE('!', s.l)), '!', '') AS cleansed
FROM INDIVIDUALS i
OUTER APPLY (SELECT STRING_AGG(PATTERN, '') AS f
,LEN(STRING_AGG(PATTERN,'')) AS l
FROM ABNORMAL) AS s
DBFiddle Demo
Anyway 1st approach is better becasue you do one UPDATE, with second approach you remove characters one character at time (so you will have multiple UPDATE).
I would also track transaction log growth with both approaches.

If there's not too many characters that to be cleaned, then this trick might work.
Basically, you build 1 big update statement with a replace for each value in the table with the characters to be removed.
Example code:
Test data (using temp tables)
create table #ABNORMAL_CHARACTERS (id int identity(1,1), chr varchar(30));
insert into #ABNORMAL_CHARACTERS (chr) values ('!'),('&'),('#');
create table #INDIVIDUAL (id int identity(1,1), name varchar(30));
insert into #INDIVIDUAL (name) values ('test 1 &'),('test !&#2'),('test 3');
Code:
declare #FieldName varchar(30) = 'name';
declare #Replaces varchar(max) = #FieldName;
declare #UpdateSQL varchar(max);
select #Replaces = concat('replace('+#Replaces+', ', ''''+chr+''','''')') from #ABNORMAL_CHARACTERS order by id;
set #UpdateSQL = 'update #INDIVIDUAL
set name = '+#Replaces + '
where exists (select 1 from #ABNORMAL_CHARACTERS where charindex(chr,name)>0)';
exec (#UpdateSQL);
select * from #INDIVIDUAL;
A test here on rextester
And if you would have a UDF that can do a regex replace.
For example here
Then the #Replaces variable could be simplified with only 1 RegexReplace function and a pattern.

SQL Server pass in string value and see if it is in a hard-coded list of values and return int

I have a stored procedure that I have created that for right now I am not touching a database.
Pseudo code
CREATE PROCEDURE [dbo].[CustomerKeyChecker]
#ldccode VARCHAR(12)
AS
#result int = 0
BEGIN
DECLARE #listofCodes varchar(200)
SET #listofCodes = 'CLP, UIC, NSTAR, NSTARB, NSTARC, PSNH'
-- So lets say that the passed in #ldccode is "CLP" , well then I went to
-- set a #result = 1
END
What is the a decent way to do a substring to search for these codes inside list?
Pseudo code:
substring(#lcdcode, #listOfCodes) ?
Again, for now this is nothing to do with the database, and I understand that someone would say "why even pass this data to sql" ... there will be sql tables added in later is why ...

You may notice that we add a space to the begining and comma to the end of each variable. This is to prevent false positives i.e. STAR. You may also notice SIGN(), this will return 1 if a positive number, 0 if not found.
Declare #ldccode VARCHAR(12) = 'CLP'
Declare #result int = 0
declare #listofCodes varchar(200)
SET #listofCodes = 'CLP, UIC, NSTAR, NSTARB, NSTARC, PSNH'
Select #result=sign(charindex(' '+#ldccode+',',' '+#listofCodes+','))
Returns
1

Find all ERDs containing table with specified name

Is it possible that I can query for names of all ERDs (Entity Relationship Diagram) which contain table with name
Like '%mytable%'
Something like this:
select *
from <ERD objects>
where tableName like '%%'
Actually, I have a large Database with lots of ERDs. So, to understand the scope of a table I want to browse ERDs of that specific table.

As I understand you need a list of tables contained in a database diagrams. There you can help this article. I will add part of it here:
The diagram itself is stored in a binary field. And you can not convert it into readable form without difficulties. For deserializing this field (called definition) we need 2 functions:
CREATE FUNCTION [dbo].[Tool_VarbinaryToVarchar_Text]
(
#VarbinaryValue VARBINARY(max),
#bitASCIIOnly BIT = 0
)
RETURNS VARCHAR(max) AS
BEGIN
DECLARE #NumberOfBytes INT
SET #NumberOfBytes = DATALENGTH(#VarbinaryValue)
-- PART ONE --
IF (#NumberOfBytes > 4)
BEGIN
DECLARE #FirstHalfNumberOfBytes INT
DECLARE #SecondHalfNumberOfBytes INT
SET #FirstHalfNumberOfBytes = #NumberOfBytes/2
SET #SecondHalfNumberOfBytes = #NumberOfBytes - #FirstHalfNumberOfBytes
-- Call this function recursively with the two parts of the input split in half
RETURN dbo.Tool_VarbinaryToVarchar_Text(CAST(SUBSTRING(#VarbinaryValue, 1 , #FirstHalfNumberOfBytes) AS VARBINARY(max)),#bitASCIIOnly)
+ dbo.Tool_VarbinaryToVarchar_Text(CAST(SUBSTRING(#VarbinaryValue, #FirstHalfNumberOfBytes+1 , #SecondHalfNumberOfBytes) AS VARBINARY(max)),#bitASCIIOnly)
END
IF (#NumberOfBytes = 0)
BEGIN
RETURN '' -- No bytes found, therefore no 'hex string' is returned
END
-- PART TWO --
DECLARE #HighByte INT
-- #NumberOfBytes <= 4 (four or less characters/8 hex digits were input)
-- eg. 88887777 66665555 44443333 22221111
-- We'll process ONLY the right-most (least-significant) Byte, which consists
-- of eight bits
-- 2. Carve off the rightmost eight bits/single hex digit (ie 22221111)
-- Divide by 16 does a shift-left (now processing 2222)
SET #HighByte = CAST(#VarbinaryValue AS INT) & 255
IF #bitASCIIOnly = 1 AND (#HighByte < 32 OR #HighByte > 126) SET #HighByte=13;
-- 3. Trim the byte (two hex values) from the right (least significant) input Binary
-- in preparation for further parsing
SET #VarbinaryValue = SUBSTRING(#VarbinaryValue, 1, (#NumberOfBytes-1))
-- 4. Recursively call this method on the remaining Binary data, concatenating the text
-- 'value' we just decoded as their ASCII character representation
-- ie. we pass 88887777 66665555 44443333 back to this function, adding X to the result string
RETURN dbo.Tool_VarbinaryToVarchar_Text(#VarbinaryValue,#bitASCIIOnly) +
CHAR(#HighByte)
END
And:
CREATE FUNCTION [dbo].[fnTool_ScriptDiagram2005_Text]()
RETURNS
#tblOut TABLE
(
-- Add the column definitions for the TABLE variable here
diagramname NVARCHAR(128),
diagram_id INT PRIMARY KEY,
diagram_text VARCHAR(MAX),
diagram_ASCII VARCHAR(MAX)
)
AS
BEGIN
DECLARE #name NVARCHAR(128);
DECLARE #diagram_id INT;
DECLARE #index INT;
DECLARE #size INT;
DECLARE #chunk INT;
DECLARE #line VARCHAR(MAX);
DECLARE #lineASC VARCHAR(MAX);
DECLARE #CurrentPos INT;
SELECT #CurrentPos = MIN(diagram_id) FROM dbo.sysdiagrams;
WHILE (#CurrentPos IS NOT NULL)
BEGIN
-- Set start index, and chunk 'constant' value
SET #index = 1; --
SET #chunk = 32; -- values that work: 2, 6
-- values that fail: 15,16, 64
SELECT #diagram_id = diagram_id,
#size = DATALENGTH(definition),
#name = name
FROM dbo.sysdiagrams
WHERE diagram_id = #CurrentPos;
-- Now with the diagram_id, do all the work
SET #line = '';
SET #lineASC = '';
WHILE #index < #size
BEGIN
-- Output as many UPDATE statements as required to append all the diagram binary
-- data, represented as hexadecimal strings
SELECT #line = #line + dbo.Tool_VarbinaryToVarchar_Text(SUBSTRING (definition, #index, #chunk),0),
#lineASC = #lineASC + dbo.Tool_VarbinaryToVarchar_Text(SUBSTRING (definition, #index, #chunk),1)
FROM dbo.sysdiagrams
WHERE diagram_id = #CurrentPos;
SET #index = #index + #chunk;
END
INSERT INTO #tblOut (diagramname, diagram_id, diagram_text, diagram_ASCII)
VALUES (#name, #diagram_id, #line, REPLACE(#lineASC,CHAR(13),''));
SELECT #CurrentPos = MIN(diagram_id)
FROM dbo.sysdiagrams
WHERE diagram_id > #CurrentPos;
END
RETURN;
END
After that you can run:
SELECT *
FROM [dbo].[fnTool_ScriptDiagram2005_Text] ()
WHERE diagram_ASCII LIKE '%TableToFind%'
For example I have created diagram TestDiagram with 2 tables named whatever and IE_Stat. In return query:
Root EntrypfoBCompObj_ !"#$%&'()*+,-.123456789:(}5n]4o[\0V?[?i???V?[?i??T,,,4") -bH''Uu94941#xV4XdboIE_StatMicrosoft DDS Form 2.0Embedded Object9q&sch_labels_visibled(ActiveTableViewMode1 TableViewMode:0:4,0,28DdsStreamSchema UDV Default&/DSREF-SCHEMA-CONTENTS,0Schema UDV Default Post V66;4,0,2310,1,1890,5,1260 TableViewMode:12,0,284,0,2805 TableViewMode:22,0,284,0,2310 TableViewMode:32,0,284,0,2310 TableViewMode:4>4,0,284,0,2310,12,2730,11,1680(ActiveTableViewMode1 TableViewMode:0:4,0,284,0,2310,1,1890,5,1260 TableViewMode:12,0,284,0,2805 TableViewMode:22,0,284,0,2310 TableViewMode:32,0,284,0,2310 TableViewMode:4>4,0,284,0,2310,12,2730,11,1680NaQW9 LHEData Source=********;Initial Catalog=test;Integrated Security=True;MultipleActiveResultSets=False;TrustServerCertificate=True;Packet Size=4096;Application Name="Microsoft SQL Server Management Studio"TestDiagram&whateverdbo$IE_StatdbokE7d2pN{1634CDD7-0888-42E3-9FA2-B6D32563B91D}bR
you can see both table names.

Creating multiple UDFs in one batch - SQL Server

I'm asking this question for SQL Server 2008 R2
I'd like to know if there is a way to create multiple functions in a single batch statement.
I've made the following code as an example; suppose I want to take a character string and rearrange its letters in alphabetical order. So, 'Hello' would become 'eHllo'
CREATE FUNCTION char_split (#string varchar(max))
RETURNS #characters TABLE
(
chars varchar(2)
)
AS
BEGIN
DECLARE #length int,
#K int
SET #length = len(#string)
SET #K = 1
WHILE #K < #length+1
BEGIN
INSERT INTO #characters
SELECT SUBSTRING(#string,#K,1)
SET #K = #K+1
END
RETURN
END
CREATE FUNCTION rearrange (#string varchar(max))
RETURNS varchar(max)
AS
BEGIN
DECLARE #SplitData TABLE (
chars varchar(2)
)
INSERT INTO #SplitData SELECT * FROM char_split(#string)
DECLARE #Output varchar(max)
SELECT #Output = coalesce(#Output,' ') + cast(chars as varchar(10))
from #SplitData
order by chars asc
RETURN #Output
END
declare #string varchar(max)
set #string = 'Hello'
select dbo.rearrange(#string)
When I try running this code, I get this error:
'CREATE FUNCTION' must be the first statement in a query batch.
I tried enclosing each function in a BEGIN END block, but no luck. Any advice?

Just use a GO statement between the definition of the UDFs

Not doable. SImple like that.
YOu can make it is one statement using a GO between them.
But as the GO is a batch delimiter.... this means you send multiple batches, which is explicitly NOT Wanted in your question.
So, no - it is not possible to do that in one batch as the error clearly indicates.

Perform string comaparison ignoring the diacritics

I'm trying search in Arabic text in SQL Server and need to ignore the Arabic diacritics.
So I'm using Arabic_100_CI_AI collation. but it's not work.
For example for the below query I must get 1, but it has no result!
select 1
where (N'مُحَمَّد' Collate Arabic_100_CI_AI) = (N'محمّد' Collate Arabic_100_CI_AI)
What is the problem and how can I perform diacritics insensitive comparison in Arabic text?

It seems AI flag is NOT working for Arabic. You can build your own Unicode Normalization function.
ALTER FUNCTION [dbo].[NormalizeUnicode]
(
-- Add the parameters for the function here
#unicodeWord nvarchar(max)
)
RETURNS nvarchar(max)
AS
BEGIN
-- Declare the return variable here
DECLARE #Result nvarchar(max)
-- Add the T-SQL statements to compute the return value here
declare #l int;
declare #i int;
SET #l = len(#unicodeWord + '-') - 1
SET #i = 1;
SET #Result = '';
WHILE (#i <= #l)
BEGIN
DECLARE #c nvarchar(1);
SET #c = SUBSTRING(#unicodeWord, #i, 1);
-- 0x064B to 0x65F, 0x0670 are Combining Characters
-- You may need to perform tests for this character range
IF NOT (unicode(#c) BETWEEN 0x064B AND 0x065F or unicode(#c) = 0x0670)
SET #Result = #Result + #c;
SET #i = #i + 1;
END
-- Return the result of the function
RETURN #Result
END
Following test should work correctly,
select 1
where dbo.NormalizeUnicode(N'بِسمِ اللہِ الرَّحمٰنِ الرَّحیم') = dbo.NormalizeUnicode(N'بسم اللہ الرحمن الرحیم');
Notes:
You may experience slow performance with this solution
The character range I've used in the function is NOT thoroughly tested.
For a complete reference on Arabic Unicode Character Set, see this document http://www.unicode.org/charts/PDF/U0600.pdf

Your use of collation is correct but if you carefully see the two Arabic words in your query (highlighted bold) they are completely different even though their meaning same and hence you are not getting the result (since comparison is failing)
N'مُحَمَّد' and N'محمّد'
I am pretty sure, if you try to find out their unicode value using unicode() function; their result will be different.
If you try the below query, it will succeed
select 1
where N'مُحَمَّد' Collate Arabic_100_CI_AI like '%%'
See this post for a better explanation
Treating certain Arabic characters as identical

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

data length issue from SQL Server to Oracle with non english characher - sql

Try to define the column in Oracle as VARCHAR2(10 CHAR). That changes the length semantics from bytes to characters. So the column will be able to accept 10 characters not just 10 bytes, which might be to short if there are special characters in the string.

Related

SQL Server update statement performance

SQL Server pass in string value and see if it is in a hard-coded list of values and return int

Find all ERDs containing table with specified name

Creating multiple UDFs in one batch - SQL Server

Perform string comaparison ignoring the diacritics

Categories

Resources