Removing hidden character at end of SQL server field

Removing hidden character at end of SQL server field - sql

I have a strange situation displaying value from SQL server. There is a value stored in SQL server 2008 field which is hidden when queried from server and shown in Management Studio (see below).
Test template 2
But when displayed on a screen in HTML editor it is showing as ? (see below)
Test template 2?
When I check for ascii value it shows 63. Not sure how user got this special value into this field in SQL server. When I test by entering ? into input field and display it works fine without any issues.
I don't want to blindly remove last character from this field. I am trying to determine a solution to identify this invisible value and remove it either while storing or displaying.
Any solution is greatly appreciated.
As comments below suggests this turned out to be Unicode 8203 (zero width space).
My next question is how to replace this Unicode 8203 in one statement in T-SQL without parsing through each character?

Use REPLACE to remove the zero-width space character:
-- setup unicode string containing zero-width character
DECLARE #UnicodeReplace NVARCHAR(5) = N'Test' + NCHAR(8203);
-- check that unicode string length is 5,
-- and prove existence of zero-width space character matching unicode 8203
SELECT #UnicodeReplace AS String,
LEN(#UnicodeReplace) AS Length,
UNICODE(SUBSTRING(#UnicodeReplace, 5, 1)) AS UnicodeValue
-- replace and prove the unicode string length is reduced to 4
SELECT REPLACE(#UnicodeReplace, NCHAR(8203), N''),
LEN(REPLACE(#UnicodeReplace, NCHAR(8203), N'')) AS Length;
SQL Fiddle

Such characters could not be replaced if database collation has default values like this: SQL_Latin1_General_CP1_CI_AS. In such cases this command could work:
set #word=replace(#word collate Latin1_General_100_BIN2, nchar(8205),N'')

Related

How to not lose unicode characters in Stored Procedure parameter

Problem Statement:
We are having a legacy application with backend as SQL Server. Till now, we did not face any issues in passing non-unicode values. Now, we are getting unicode characters from user interface.
The unicode character is getting passed as given below, in UI. These data are being inserted into table.
Currently, we pass unicode characters like below and we are losing non-english characters.
EXEC dbo.ProcedureName #IN_DELIM_VALS = '삼성~AX~Aland Islands~ALLTest1~Aland Islands~~~~'
What we tried:
If we pass unicode with N prefix, the non-english characters are being inserted into table properly.
EXEC dbo.ProcedureName #IN_DELIM_VALS = N'삼성~AX~Aland Islands~ALLTest1~Aland Islands~~~~'
But, adding N prefix, requires UI code change. As it is legacy application, we want to avoid UI change. We want to handle in the sql server side.
when I read about passing parameter without N prefix, the data is implicitly converted to default code page and korean characters are getting lost. Reference
Prefix a Unicode character string constants with the letter N to
signal UCS-2 or UTF-16 input, depending on whether an SC collation is
used or not. Without the N prefix, the string is converted to the
default code page of the database that may not recognize certain
characters. Starting with SQL Server 2019 (15.x), when a UTF-8 enabled
collation is used, the default code page is capable of storing UNICODE
UTF-8 character set.
Our Ask:
Is there a way to add N prefix to the stored procedure parameter, before being assigned to stored procedure parameter and so, we are not losing unicode characters.

As it is not possible to add N prefix, after parameter is being passed to SQL Server, we are going with below application code change. Ideally, the application should pass the parameter with right nvarchar datatype, so that it is having N prefix.
SqlCommand cmd = new SqlCommand("EXEC dbo.ProcedureName #IN_DELIM_VALS", myConnection);
cmd.Parameters.Add(new SqlParameter("#IN_DELIM_VALS", SqlDbType.NVarChar,400)).Value
= "삼성~AX~Aland Islands~ALLTest1~Aland Islands~~~~";
cmd.ExecuteNonQuery();

Removing replacement character � from column

Based on my research so far this character indicates bad encoding between the database and front end. Unfortunately, I don't have any control over either of those. I'm using Teradata Studio.
How can I filter this character out? I'm trying to perform a REGEX_SUBSTR function on a column that occasionally contains �, which throws the error "The string contains an untranslatable character".
Here is my SQL. AIRCFT_POSITN_ID is the column that contains the replacement character.
SELECT DISTINCT AIRCFT_POSITN_ID,
REGEXP_SUBSTR(AIRCFT_POSITN_ID, '[0-9]+') AS AUTOROW
FROM PROD_MAE_MNTNC_VW.FMR_DISCRPNCY_DFRL
WHERE DFRL_CREATE_TMS > CURRENT_DATE -25

Your diagnostic is correct, so first of all, you might want to check the Session Character Set (it is part of the connection definition).
If it is ASCII change it to UTF8 and you will be able to see the original characters instead of the substitute character.
And in case the character is indeed part of the data and not just an indication for encoding translations issues:
The substitute character AKA SUB (DEC: 26 HEX: 1A) is quite unique in Teradata.
you cannot use it directly -
select '�';
-- [6706] The string contains an untranslatable character.
select '1A'XC;
-- [6706] The string contains an untranslatable character.
If you are using version 14.0 or above you can generate it with the CHR function:
select chr(26);
If you're below version 14.0 you can generate it like this:
select translate (_unicode '05D0'XC using unicode_to_latin with error);
Once you have generated the character you can now use it with REPLACE or OTRANSLATE
create multiset table t (i int,txt varchar(100) character set latin) unique primary index (i);
insert into t (i,txt) values (1,translate ('Hello שלום world עולם' using unicode_to_latin with error));
select * from t;
-- Hello ���� world ����
select otranslate (txt,chr(26),'') from t;
-- Hello world
select otranslate (txt,translate (_unicode '05D0'XC using unicode_to_latin with error),'') from t;
-- Hello world
BTW, there are 2 versions for OTRANSLATE and OREPLACE:
The functions under syslib works with LATIN.
the functions under TD_SYSFNLIB works with UNICODE.

In addition to Dudu's excellent answer above, I wanted to add the following now that I've encountered the issue again and had more time to experiment. The following SELECT command produced an untranslatable character:
SELECT IDENTIFY FROM PROD_MAE_MNTNC_VW.SCHD_MNTNC;
IDENTIFY
24FEB1747659193DC330A163DCL�ORD
Trying to perform a REGEXP_REPLACE or OREPLACE directly on this character produces an error:
Failed [6706 : HY000] The string contains an untranslatable character.
I changed the CHARSET property in my Teradata connection from UTF8 to ASCII and I could now see the offending character, looks like a tab
IDENTIFY
Using the TRANSLATE_CHK command using this specific conversion succeeds and identifies the position of the offending character (Note that this does not work using the UTF8 charset):
TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) AS BADCHAR
BADCHAR
28
Now this character can be dealt with using some CASE statements to remove the bad character and retain the remainder of the string:
CASE WHEN TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE) = 0 THEN IDENTIFY
ELSE SUBSTR(IDENTIFY, 1, TRANSLATE_CHK(IDENTIFY USING KANJI1_SBC_TO_UNICODE)-1)
END AS IDENTIFY
Hopes this helps someone out.

DB2/iSeries SQL clean up CR/LF, tabs etc

I need to find and clean up line breaks, carriage returns, tabs and "SUB"-characters in a set of 400k+ string records, but this DB2 environment is taking a toll on me.
Thought I could do some search and replacing with the REPLACE() and CHR() functions, but it seems CHR() is not available on this system (Error: CHR in *LIBL type *N not found). Working with \t, \r, \n etc doesn't seem to be working either. The chars can be in the middle of strings or at the end of them.
DBMS = DB2
System = iSeries
Language = SQL
Encoding = Not sure, possibly EBCDIC
Any hints on what I can do with this?

I used this SQL to find x'25' and x'0D':
SELECT
<field>
, LOCATE(x'0D', <field>) AS "0D"
, LOCATE(x'25', <field>) AS "25"
, length(trim(<field>)) AS "Length"
FROM <file>
WHERE LOCATE(x'25', <field>) > 0
OR LOCATE(x'0D', <field>) > 0
And I used this SQL to replace them:
UPDATE <file>
SET <field> = REPLACE(REPLACE(<field>, x'0D', ' '), x'25', ' ')
WHERE LOCATE(x'25', <field>) > 0
OR LOCATE(x'0D', <field>) > 0

If you want to clear up specific characters like carriage return (EBCDIC x'0d') and line feed (EBCDIC x'25') you should find the translated character in EBCDIC then use the TRANSLATE() function to replace them with space.
If you just want to remove undisplayable characters then look for anything under x'40'.

Here is an sample script that replaces X'41' by X'40'. Something that was creating issues at our shop:
UPDATE [yourfile] SET [yourfield] = TRANSLATE([yourfield], X'40',
X'41') WHERE [yourfield] like '%' concat X'41' concat '%'
If you need to replace more than one character, extend the "to" and "from" hexadecimal strings to the values you need in the TRANSLATE function.

Try TRANSLATE or REPLACE.
The brute force method involves using POSITION to find the errant character, then SUBSTR before and after it. CONCAT the two substrings (less the undesirable character) to re-form the column.
The character encoding is almost certainly one of the EBCDIC character sets. Depending on how the table got loaded in the first place, the CR may be x'0d' and the LF x'15', x'25'. An easy way to find out is to get to a green screen and do a DSPPFM against the table. Press F10 then F11 to view the table is raw, hexadecimal (over/under) format.

For details on the available functions see the
DB2 for i5/OS SQL Reference.

Perhaps the TRANSLATE() function will serve your needs.
TRANSLATE( data, tochars, fromchars )
...where fromchars is the set of characters you don't want, and tochars is the corresponding characters you want them replaced with. You may have to write this out in hex format, as x'nnnnnn...' and you will need to know what character set you are working with.
Using the DSPFFD command on your table should show the CCSID of your fields.

we struggled a lot to replace the new line char and carriage return from flat file.
Finally we used below sql to sort the issue.
REPLACE(REPLACE(COLUMN_NAME, CHR(13), ''), CHR(10), '')
Try it out
CR = CHR(13)
LF = CHR(10)

Unable to replace Char(63) by SQL query

I am having some rows in table with some unusual character. When I use ascii() or unicode() for that character, it returns 63. But when I try this -
update MyTable
set MyColumn = replace(MyColumn,char(63),'')
it does not replace. The unusual character still exists after the replace function. Char(63) incidentally looks like a question mark.
For example my string is 'ddd#dd ddd' where # it's my unusual character and
select unicode('#')
return me 63.But this code
declare #str nvarchar(10) = 'ddd#dd ddd'
set #char = char(unicode('#'))
set #str = replace(#str,#char,'')
is working!
Any ideas how to resolve this?
Additional information:
select ascii('�') returns 63, and so does select ascii('?'). Finally select char(63) returns ? and not the diamond-question-mark.
When this character is pasted into Excel or a text editor, it looks like a space, but in an SQL Server Query window (and, apparently, here on StackOverflow as well), it looks like a diamond containing a question mark.

Not only does char(63) look like a '?', it is actually a '?'.
(As a simple test ensure you have numlock on your keyboard on, hold down the alt key andtype '63' into the number pad - you can all sorts of fun this way, try alt-205, then alt-206 and alt-205 again: ═╬═)
Its possible that the '?' you are seeing isn't a char(63) however, and more indicitive of a character that SQL Server doesn't know how to display.
What do you get when you run:
select ascii(substring('[yourstring]',[pos],1));
--or
select unicode(substring('[yourstring]',[pos],1));
Where [yourstring] is your string and [pos] is the position of your char in the string
EDIT
From your comment it seems like it is a question mark. Have you tried:
replace(MyColumn,'?','')
EDIT2
Out of interest, what does the following do for you:
replace(replace(MyColumn,char(146),''),char(63),'')

char(63) is a question mark. It sounds like these "unusual" characters are displayed as a question mark, but are not actually characters with char code 63.
If this is the case, then removing occurrences of char(63) (aka '?') will of course have no effect on these "unusual" characters.

I believe you actually didn't have issues with literally CHAR(63), because that should be just a normal character and you should be able to properly work with it.
What I think happened is that, by mistake, an UTF character (for example, a cyrilic "А") was inserted into the table - and either your:
columns setup,
the SQL code,
or the passed in parameters
were not prepared for that.
In this case, the sign might be visible to you as ?, and its CHAR() function would actually give 63, but you should really use the NCHAR() to figure out the real code of it.
Let me give a specific example, that I had multiple times - issues
with that Cyrilic "А", which looks identical to the Latin one, but has
a unicode of 1040.
If you try to use the non-UTF CHAR function on that 1040 character,
you would get a code 63, which is not true (and is probably just an
info about the first byte of multibyte character).
Actually, run this to make the differences in my example obvious:
SELECT NCHAR(65) AS Latin_A, NCHAR(1040) Cyrilic_A, ASCII(NCHAR(1040)) Latin_A_Code, UNICODE(NCHAR(1040)) Cyrilic_A_Code;

That empty string Which shows us '?' in substring.
Gives us Ascii value as 63.
It's a Zero Width space which gets appended if you copy data from ui and insert into the database.
To replace the data, you can use below query
**set MyColumn = replace(MyColumn,NCHAR(8203),'')**

It's an older question, but I've run into this problem as well. I found the solution somewhere else on internet, but I thought it would be good to share it here as well. Have a good day.
Replace(YourString, nchar(65533) COLLATE Latin1_General_BIN2, '')

This should work as well:
UPDATE TABLE
SET [FieldName] = SUBSTRING([FieldName], 2, LEN([FieldName]))
WHERE ASCII([FieldName]) = 63

Replace character in SQL results

This is from a Oracle SQL query. It has these weird skinny rectangle shapes in the database in places where apostrophes should be. (I wish we would could paste screen shots in here)
It looks like this when I copy and paste the results.
spouse�s
is there a way to write a SQL SELECT statement that searches for this character in the field and replaces it with an apostrophe in the results?
Edit: I need to change only the results in a SELECT statement for reporting purposes, I can't change the Database.
I ran this
select dump('�') from dual;
which returned
Typ=96 Len=3: 239,191,189
This seems to work so far
select translate('What is your spouse�s first name?', '�', '''') from dual;
but this doesn't work
select translate(Fieldname, '�', '''') from TableName
Select FN from TN
What is your spouse�s first name?
SELECT DUMP(FN, 1016) from TN
Typ=1 Len=33 CharacterSet=US7ASCII: 57,68,61,74,20,69,73,20,79,6f,75,72,20,73,70,6f,75,73,65,92,73,20,66,69,72,73,74,20,6e,61,6d,65,3f
EDIT:
So I have established that is the backquote character. I can't get the DB updated so I'm trying this code
SELECT REGEX_REPLACE(FN,"\0092","\0027") FROM TN
and I"m getting ORA-00904:"Regex_Replace":invalid identifier

This seems a problem with your charset configuracion. Check your NLS_LANG and others NLS_xxx enviroment/regedit values. You have to check the oracle server, your client and the client of the inserter of that data.
Try to DUMP the value. you can do it with a select as simple as:
SELECT DUMP(the_column)
FROM xxx
WHERE xxx
UPDATE: I think that before try to replace, look for the root of the problem. If this happens because a charset trouble you can get big problems with bad data.
UPDATE 2: Answering the comments. The problem may be is not on the database server side, may be is in the client side. The problem (if this is the problem) can be a translation on server to/from client comunication. It's for a server-client bad configuracion-coordination. For instance if the server has defined UTF8 charset and your client uses US7ASCII, then all acutes will appear as ?.
Another approach can be that if the server has defined UTF8 charset and your client also UTF8 but the application is not able to show UTF8 chars, then the problem is in the application side.
UPDATE 3: On your examples:
select translate('What. It works because the � is exactly the same char: You have pasted on both sides.
select translate(Fieldname. It does not work because the � is not stored on database, it's the char that the client receives may be because some translation occurs from the data table until it's showed to you.
Next step: Look in DUMP syntax and try to extract the codes for the mysterious char (from the table not pasting �!).

I would say there's a good chance the character is a single-tick "smart quote" (I hate the name). The smart quotes are characters 91-94 (using a Windows encoding), or Unicode U+2018, U+2019, U+201C, and U+201D.

I'm going to propose a front-end application-based, client-side approach to the problem:
I suspect that this problem has more to do with a mismatch between the font you are trying to display the word spouse�s with, and the character �. That icon appears when you are trying to display a character in a Unicode font that doesn't have the glyph for the character's code.
The Oracle database will dutifully return whatever characters were INSERTed into its' column. It's more up to you, and your application, to interpret what it will look like given the font you are trying to display your data with in your application, so I suggest investigating as to what this mysterious � character is that is replacing your apostrophes. Start by using FerranB's recommended DUMP().
Try running the following query to get the character code:
SELECT DUMP(<column with weird character>, 1016)
FROM <your table>
WHERE <column with weird character> like '%spouse%';
If that doesn't grab your actual text from the database, you'll need to modify the WHERE clause to actually grab the offending column.
Once you've found the code for the character, you could just replace the character by using the regex_replace() built-in function by determining the raw hex code of the character and then supplying the ASCII / C0 Controls and Basic Latin character 0x0027 ('), using code similar to this:
UPDATE <table>
set <column with offending character>
= REGEX_REPLACE(<column with offending character>,
"<character code of �>",
"'")
WHERE regex_like(<column with offending character>,"<character code of �>");
If you aren't familiar with Unicode and different ways of character encoding, I recommend reading Joel's article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). I wasn't until I read that article.
EDIT: If your'e seeing 0x92, there's likely a charset mismatch here:
0x92 in CP-1252 (default Windows code page) is a backquote character, which looks kinda like an apostrophe. This code isn't a valid ASCII character, and it isn't valid in IS0-8859-1 either. So probably either the database is in CP-1252 encoding (don't find that likely), or a database connection which spoke CP-1252 inserted it, or somehow the apostrophe got converted to 0x92. The database is returning values that are valid in CP-1252 (or some other charset where 0x92 is valid), but your db client connection isn't expecting CP-1252. Hence, the wierd question mark.
And FerranB is likely right. I would talk with your DBA or some other admin about this to get the issue straightened out. If you can't, I would try either doing the update above (seems like you can't), or doing this:
INSERT (<normal table columns>,...,<column with offending character>) INTO <table>
SELECT <all normal columns>, REGEX_REPLACE(<column with offending character>,
"\0092",
"\0027") -- for ASCII/ISO-8859-1 apostrophe
FROM <table>
WHERE regex_like(<column with offending character>,"\0092");
DELETE FROM <table> WHERE regex_like(<column with offending character>,"\0092");

Before you do this you need to understand what actually happened. It looks to me that someone inserted non-ascii strings in the database. For example Unicode or UTF-8. Before you fix this, be very sure that this is actually a bug. The apostrophe comes in many forms, not just the "'".

TRANSLATE() is a useful function for replacing or eliminating known single character codes.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas