TBlobField(<Field>).SaveToStream cut the last NULL char with delphi 2010 - blob

If a blob contain a rtf that end with NULL char, delphi 2010 cut this character when I do SaveToStream or SaveToFile
Is There a possibility to change this behavior?
It is a problem for me because I calculate a hash on that fields.
With Delphi 2010 I have this problem but not with Delphi 2007! and so hash result change....

Ok, I found when the bug appear:
Delphi 2010
Firebird DBMS
TIBDatabase object have the charset property (in
my case for example lc_ctype=ISO8859_1)
If I remove the charset, the save method work fine!
Other information:
- I use Firebird database
- The blob field is blob subtype text
- The value is a rtf and his terminator is null character
NOTE:
When remove the charset, the text in blob fields are not readable correctly, the only workaround I have found is cast the blob in varchar directly in the query

Related

How to convert string with German characters to Blob in Firebird?

I want to convert a string into a blob with the f_strblob(CSTRING) function of FreeAdhocUDF. At this point I do not find a way to get my special characters like ß or ä shown in the blob.
The result of f_strblob('Gemäß') is Gem..
I tried to change the character set to UTF8 of my variables, but that does not help.
Is there a masking option which I did not find?
You don't need that function, and the FreeAdhocUDF documentation also marks it as obsolete for that reason.
In a lot of situations, Firebird will automatically convert string literals to blobs (eg in statements where a string literal is assigned to a blob value), and otherwise you can explicitly cast using cast('your string' as blob sub_type text).

Removing hidden character at end of SQL server field

I have a strange situation displaying value from SQL server. There is a value stored in SQL server 2008 field which is hidden when queried from server and shown in Management Studio (see below).
Test template 2​
But when displayed on a screen in HTML editor it is showing as ? (see below)
Test template 2?
When I check for ascii value it shows 63. Not sure how user got this special value into this field in SQL server. When I test by entering ? into input field and display it works fine without any issues.
I don't want to blindly remove last character from this field. I am trying to determine a solution to identify this invisible value and remove it either while storing or displaying.
Any solution is greatly appreciated.
As comments below suggests this turned out to be Unicode 8203 (zero width space).
My next question is how to replace this Unicode 8203 in one statement in T-SQL without parsing through each character?
Use REPLACE to remove the zero-width space character:
-- setup unicode string containing zero-width character
DECLARE #UnicodeReplace NVARCHAR(5) = N'Test' + NCHAR(8203);
-- check that unicode string length is 5,
-- and prove existence of zero-width space character matching unicode 8203
SELECT #UnicodeReplace AS String,
LEN(#UnicodeReplace) AS Length,
UNICODE(SUBSTRING(#UnicodeReplace, 5, 1)) AS UnicodeValue
-- replace and prove the unicode string length is reduced to 4
SELECT REPLACE(#UnicodeReplace, NCHAR(8203), N''),
LEN(REPLACE(#UnicodeReplace, NCHAR(8203), N'')) AS Length;
SQL Fiddle
Such characters could not be replaced if database collation has default values like this: SQL_Latin1_General_CP1_CI_AS. In such cases this command could work:
set #word=replace(#word collate Latin1_General_100_BIN2, nchar(8205),N'')

Inserting UTF-32 characters

I'm testing UTF-32 characters (specifically emojis) with SQL Server (2008 R2, 10.5) and at this stage I'm checking if the server supports the given code
For this case I'm using the :rose with the following query
SELECT '' + nchar(0x1F339) + 'test'
which returns back in Management Studio with (NULL).
What format do I need to encode the character to have it not return null in SQL Server
SQL Server only supports UCS-2, which is currently (almost) the same as UTF-16. So exactly 2 bytes per character and all that.
An idea, if I may. You can store the data in a BINARY or VARBINARY data field which doesn't care about encoding. You can then use a mapping table or external script to parse the binary into a text field replacing 0x1F339 with :rose: or your own custom forma for example.
Since it's UTF-32, it has two be written as two UTF-16 characters:
-- Returns: 🌹test
SELECT '' + nchar(0xD83C) + nchar(0xDF39) + 'test'
You can find this code under "UTF-16 Hex (C Syntax)" title, following your link.
Also I have to recommend this article, because it was very helpful during investigation: Unicode Escape Sequences Across Various Languages and Platforms (including Supplementary Characters)
Couple of options for those who are looking for answers:
SQL Server technically does not have character escape sequences, but
you can still create characters using either byte sequences or Code
Points using the CHAR() and NCHAR() functions. We are only concerned
with Unicode here, so we will only be using NCHAR().
All versions:
NCHAR(0 - 65535) for BMP Code Points (using an int/decimal value)
NCHAR(0x0 - 0xFFFF) for BMP Code Points (using a binary/hex value)
NCHAR(0 - 65535) + NCHAR(0 - 65535) for a Surrogate Pair / Two UTF-16
Code Units
NCHAR(0x0 - 0xFFFF) + NCHAR(0x0 - 0xFFFF) for a Surrogate Pair / Two
UTF-16 Code Units
CONVERT(NVARCHAR(size), 0xHHHH) for one or more characters in UTF-16
Little Endian (“HHHH” is 1 or more sets of 4 hex digits)
Starting in SQL Server 2012:
If database’s default collation supports Supplementary Characters
(collation name ends in _SC, or starting in SQL Server 2017 name
contains 140 but does not end in _BIN*, or starting in SQL Server
2019 name ends in _UTF8 but does not contain _BIN2), then NCHAR() can
be given Supplementary Character Code Points:
decimal value can go up to 1114111
hex value can go up to 0x10FFFF
Starting in SQL Server 2019:
“_UTF8” collations enable CHAR and VARCHAR data to use the UTF-8
encoding:
CONVERT(VARCHAR(size), 0xHH) for one or more characters in UTF-8 (“HH”
is 1 or more sets of 2 hex digits)
NOTE: The CHAR() function does not work for this purpose. It can only
produce a single byte, and UTF-8 is only a single byte for values 0 –
127 / 0x00 – 0x7F.

PL/SQL Chinese garbled in oracle

My oracle version is 11g,installing on linux. Client is xp.
Now,By PL/SQL to query data and the chinese grabled; like this(Field Name):
In pl/sql execute command:" select userenv('language') from dual;" and show
SIMPLIFIED CHINESE_CHINA.AL32UTF8(I think it is Server-side character set)
So I look at the Windows xp registry: HKEY_LOCAL_MACHINE->SOFTWARE->Oracle->NLS_LANG.
It show:
SIMPLIFIED CHINESE_CHINA.ZHS16GBK (I think it is Client-side character set)
I changed it to
SIMPLIFIED CHINESE_CHINA.AL32UTF8
But the Chinese are still garbled.
And,This "NAME" field should actually show : "北京市".
I execute command:
select dump(name,1016) from MN_C11_SM_S31 where objectid=1;
and show:
Does that mean that the data itself is stored is incorrect?
How should I do?
Supplementary:Just,I used C# code to parse this string by UTF-8:"e58c97e4baace5b882".
and it show: "北京市".I think this proves the data itself is not wrong.
You need to be careful.
SQLplus and OracleSqlDeveloper might actually display Chinese characters incorrectly.
And you need to take an nvarchar (utf8) field for a chinese string.
Try using a C# Windows application to input, insert and display data, it uses unicode internally, so you can at least rule out that bug source. Don't use a console application for display, console applications can't display Unicode characters correctly.
Then, you need to store your string in an nvarchar field (not varchar), and use a parameter of type nvarchar, when you insert your string.
INSERT INTO YOUR_TABLE (newname)
VALUES (:UnicodeString)
command.Parameters.Add (":UnicodeString", OracleType.NVarChar).Value = stringToSave;

Replace character in SQL results

This is from a Oracle SQL query. It has these weird skinny rectangle shapes in the database in places where apostrophes should be. (I wish we would could paste screen shots in here)
It looks like this when I copy and paste the results.
spouse�s
is there a way to write a SQL SELECT statement that searches for this character in the field and replaces it with an apostrophe in the results?
Edit: I need to change only the results in a SELECT statement for reporting purposes, I can't change the Database.
I ran this
select dump('�') from dual;
which returned
Typ=96 Len=3: 239,191,189
This seems to work so far
select translate('What is your spouse�s first name?', '�', '''') from dual;
but this doesn't work
select translate(Fieldname, '�', '''') from TableName
Select FN from TN
What is your spouse�s first name?
SELECT DUMP(FN, 1016) from TN
Typ=1 Len=33 CharacterSet=US7ASCII: 57,68,61,74,20,69,73,20,79,6f,75,72,20,73,70,6f,75,73,65,92,73,20,66,69,72,73,74,20,6e,61,6d,65,3f
EDIT:
So I have established that is the backquote character. I can't get the DB updated so I'm trying this code
SELECT REGEX_REPLACE(FN,"\0092","\0027") FROM TN
and I"m getting ORA-00904:"Regex_Replace":invalid identifier
This seems a problem with your charset configuracion. Check your NLS_LANG and others NLS_xxx enviroment/regedit values. You have to check the oracle server, your client and the client of the inserter of that data.
Try to DUMP the value. you can do it with a select as simple as:
SELECT DUMP(the_column)
FROM xxx
WHERE xxx
UPDATE: I think that before try to replace, look for the root of the problem. If this happens because a charset trouble you can get big problems with bad data.
UPDATE 2: Answering the comments. The problem may be is not on the database server side, may be is in the client side. The problem (if this is the problem) can be a translation on server to/from client comunication. It's for a server-client bad configuracion-coordination. For instance if the server has defined UTF8 charset and your client uses US7ASCII, then all acutes will appear as ?.
Another approach can be that if the server has defined UTF8 charset and your client also UTF8 but the application is not able to show UTF8 chars, then the problem is in the application side.
UPDATE 3: On your examples:
select translate('What. It works because the � is exactly the same char: You have pasted on both sides.
select translate(Fieldname. It does not work because the � is not stored on database, it's the char that the client receives may be because some translation occurs from the data table until it's showed to you.
Next step: Look in DUMP syntax and try to extract the codes for the mysterious char (from the table not pasting �!).
I would say there's a good chance the character is a single-tick "smart quote" (I hate the name). The smart quotes are characters 91-94 (using a Windows encoding), or Unicode U+2018, U+2019, U+201C, and U+201D.
I'm going to propose a front-end application-based, client-side approach to the problem:
I suspect that this problem has more to do with a mismatch between the font you are trying to display the word spouse�s with, and the character �. That icon appears when you are trying to display a character in a Unicode font that doesn't have the glyph for the character's code.
The Oracle database will dutifully return whatever characters were INSERTed into its' column. It's more up to you, and your application, to interpret what it will look like given the font you are trying to display your data with in your application, so I suggest investigating as to what this mysterious � character is that is replacing your apostrophes. Start by using FerranB's recommended DUMP().
Try running the following query to get the character code:
SELECT DUMP(<column with weird character>, 1016)
FROM <your table>
WHERE <column with weird character> like '%spouse%';
If that doesn't grab your actual text from the database, you'll need to modify the WHERE clause to actually grab the offending column.
Once you've found the code for the character, you could just replace the character by using the regex_replace() built-in function by determining the raw hex code of the character and then supplying the ASCII / C0 Controls and Basic Latin character 0x0027 ('), using code similar to this:
UPDATE <table>
set <column with offending character>
= REGEX_REPLACE(<column with offending character>,
"<character code of �>",
"'")
WHERE regex_like(<column with offending character>,"<character code of �>");
If you aren't familiar with Unicode and different ways of character encoding, I recommend reading Joel's article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!). I wasn't until I read that article.
EDIT: If your'e seeing 0x92, there's likely a charset mismatch here:
0x92 in CP-1252 (default Windows code page) is a backquote character, which looks kinda like an apostrophe. This code isn't a valid ASCII character, and it isn't valid in IS0-8859-1 either. So probably either the database is in CP-1252 encoding (don't find that likely), or a database connection which spoke CP-1252 inserted it, or somehow the apostrophe got converted to 0x92. The database is returning values that are valid in CP-1252 (or some other charset where 0x92 is valid), but your db client connection isn't expecting CP-1252. Hence, the wierd question mark.
And FerranB is likely right. I would talk with your DBA or some other admin about this to get the issue straightened out. If you can't, I would try either doing the update above (seems like you can't), or doing this:
INSERT (<normal table columns>,...,<column with offending character>) INTO <table>
SELECT <all normal columns>, REGEX_REPLACE(<column with offending character>,
"\0092",
"\0027") -- for ASCII/ISO-8859-1 apostrophe
FROM <table>
WHERE regex_like(<column with offending character>,"\0092");
DELETE FROM <table> WHERE regex_like(<column with offending character>,"\0092");
Before you do this you need to understand what actually happened. It looks to me that someone inserted non-ascii strings in the database. For example Unicode or UTF-8. Before you fix this, be very sure that this is actually a bug. The apostrophe comes in many forms, not just the "'".
TRANSLATE() is a useful function for replacing or eliminating known single character codes.