HANA: Unknown Characters in Database column of datatype BLOB - sql

I need help on how to resolve characters of unknown type from a database field into a readable format, because I need to overwrite this value on database level with another valid value (in the exact format the application stores it in) to automate system copy acitvities.
I have a proprietary application that also allows users to configure it in via the frontend. This configuration data gets stored in a table and the values of a configuration property are stored in a column of type "BLOB". For the here desired value, I provide a valid URL in the application frontend (like http://myserver:8080). However, what gets stored in the database is not readable (some square characters). I tried all sorts of conversion functions of HANA (HEX, binary), simple, and in a cascaded way (e.g. first to binary, then to varchar) to make it readable. Also, I tried it the other way around and make the value that I want to insert appear in the correct format (conversion to BLOL over hex or binary) but this does not work either. I copied the value to clipboard and compared it to all sorts of character set tables (although I am not sure if this can work at all).
My conversion tries look somewhat like this:
SELECT TO_ALPHANUM('') FROM DUMMY;
while the brackets would contain the characters in question. I cant even print them here.
How can one approach this and maybe find out the character set that is used by this application? I would be grateful for some more ideas.

What you have in your BLOB column is a series of bytes. As you mentioned, these bytes have been written by an application that uses an unknown character set.
In order to interpret those bytes correctly, you need to know the character set as this is literally the mapping of bytes to characters or character identifiers (e.g. code points in UTF).
Now, HANA doesn't come with a whole lot of options to work on LOB data in the first place and for C(haracter)LOB data most manipulations implicitly perform a conversion to a string data type.
So, what I would recommend is to write a custom application that is able to read out the BLOB bytes and perform the conversion in that custom app. Once successfully converted into a string you can store the data in a new NVCLOB field that keeps it in UTF-8 encoding.
You will have to know the character set in the first place, though. No way around that.

I assume you are on Oracle. You can convert BLOB to CLOB as described here.
http://www.dba-oracle.com/t_convert_blob_to_clob_script.htm
In case of your example try this query:
select UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(<your_blob_value)) from dual;
Obviously this only works for values below 32767 characters.

Related

XML text in Varchar(max) mysterious question mark

I have a function in a VB .NET class library, which inserts XML text into a VARCHAR(MAX) column.
The column results in an extra "?" at the front of the data in the column. I do not want that character in my data.
The column data starts like :
?<?xml version="1.0" encoding="utf-8"?><Registration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"....
The insert function is :
INSERT INTO Table (Data) OUTPUT Inserted.ID VALUES (#Data)
The table has 2 columns, Data and ID.
Am I doing something wrong. The XML is created by the .Net XmlSerializer.
Thanks
First, all XML in SQL Server is in Unicode (UCS-2, to be precise), and data access libraries probably know that. So storing their output in varchar column isn't the best idea - you might run into various issues with implicit conversion, and such. Try switching the column data type to nvarchar and look whether it helped.
Second, it might be some mark bytes that are usually found in disk files stored in UTF-8. Since SQL Server doesn't support this encoding, these bytes might have been converted (again, implicitly) into something unreadable. Try something like this query:
select cast(substring(XMLField, 1, 10) as varbinary)
from dbo.MyTable;
It will show you ASCII codes for those characters, at least.
My best guess, however, would be to get rid of UTF-8 completely - the only way to store such data in SQL Server is via varbinary columns, but I doubt you will like the resulting overhead. Try switching to UTF-16 - it's backward compatible with UCS-2 (unless you deal in something truly exotique).
Varchar can only hold characters in the ascii code page. My guess would be you have some unicode character at the beginning of that string.
Switch to nvarchar, you won't get rid of that initial character but you won't lose it either
I had some difficulty with this using System.Xml.Serialization.XmlSerializer. I also wanted the XML to be stored as a human-readable string (because of reasons).
Here's the code I used in case it is useful to someone:
var ser = new XmlSerializer(typeof(Model.SomeRootType));
using var ms = new MemoryStream();
using var writer = new StreamWriter(ms);
ser.Serialize(writer, myObjectModel);
var xml = System.Text.Encoding.UTF8.GetString(ms.ToArray());
// format the XML
var doc = System.Xml.Linq.XDocument.Parse(xml);
var niceXmlForTheDB = doc.ToString();
Notes:
Needs C# 8 for the using statements without the braces
Doesn't strictly need using for MemoryStream, but hey...
Can convert to VB if needed using one of the many online tools (you'll be used to if you code in VB!)
This code should probably be in a try-catch which does something sensible if it fails!

Storing and returning emojis

What's the simplest way to write and, then, read Emoji symbols in Oracle table?
Currently I have this situation:
iOS client pass encoded Emojis: One%20more%20time%20%F0%9F%98%81%F0%9F%98%94%F0%9F%98%8C%F0%9F%98%92. For example, %F0%9F%98%81 means 😁;
Column type is nvarchar2(2000), so when view saved text via Oracle SQL Developer it looks like: One more time ????????.
This seems more a client problem than a database problem. Certain iOs programs are capable of interpreting that string and show an image instead of that string.
SQL Developer does not do that.
As long as the data stored in the database is the same as the data retrieved from the database, you have no problem.
After all, we do BASE64 encoding/decoding of the text. It’s suitable for small texts.
In MySQL the character set needs to be set to UTF-16 to be able to save emojis, I assume Oracle would need the same ch

Error Inserting Entry With Text Column That Contains New Line And Quotes

I have an Informix 11.70 database.I am unable to sucessfully execute this insert statement on a table.
INSERT INTO some_table(
col1,
col2,
text_col,
col3)
VALUES(
5,
50,
CAST('"id","title1","title2"
"row1","some data","some other data"
"row2","some data","some other"' AS TEXT),
3);
The error I receive is:
[Error Code: -9634, SQL State: IX000] No cast from char to text.
I found that I should add this statement in order to allow using new lines in text literals, so I added this above the same query I have already written:
EXECUTE PROCEDURE IFX_ALLOW_NEWLINE('t');
Still, I receive the same error.
I have also read the IBM documentation that says: to alternatively allow new lines, I could set the ALLOW_NEWLINE parameter in the ONCONFIG file. I suppose the last one requires administrative access to the server to alter that config file, which I do not have, and I prefer not to take advantage of this setting.
Informix's TEXT (and BYTE) columns pre-date any standard, and are in many ways very peculiar types. TEXT in Informix is very different from TEXT found in other DBMS. One of the long-standing (over 20 years) problems with them is that there isn't a string literal notation that can be used to insert data into them. The 'No cast from char to text' is saying there is no explicit conversion from string literal to TEXT, either.
You have a variety of options:
Use LVARCHAR in the table (good if your values won't be longer than a few KiB, because the total row length is approximately 32 KiB). Maximum size of an LVARCHAR column is just under 32 KiB.
Use a programming language which can handle Informix 'locator' structures — in ESQL/C, the type used to hold a TEXT is loc_t.
Consider using CLOB instead. However, this has the same limitation (no string to CLOB conversion), but you'd be able to use the FILETOCLOB() function to get the information from a file on the client to the database (and LOTOFILE transfers information from the DB to a file on the client).
If you can use LVARCHAR, that is by far the simplest alternative.
I forgot to mention an important detail in the question - I use Java and the Hibernate ORM to access my Informix database, thus some of the suggested approaches (the loc_t handling in particular) in Jonathan Leffler's answer are unfortunately not applicable. Also, I need to store large data of dynamic length and I fear the LVARCHAR column would not be sufficient to hold it.
The way I got it working was to follow Michał Niklas's suggestion from his comment, and use PreparedStatement. This could potentially be explained by Informix handing the TEXT data type in its own manner.

operations on blob data in informix

How can we use substring, trim, length operations on some text of blob datatype. And how can we update a column of blob datatype using query?
Thanks,
With difficulty!
First of all, which of the 4 various types of blob are you discussing:
BYTE
TEXT
BLOB
CLOB
These come in pairs (like Sith Lords): there is a binary version (BYTE, BLOB) and a text version (TEXT, CLOB). There's also another pairing: old (BYTE, TEXT) and newer (BLOB, CLOB). The BYTE and TEXT types were introduced with Informix OnLine 4.00 in about 1989. The BLOB and CLOB types were introduced with Informix Universal Server 9.00 in 1996, and are also known as SmartBlobs.
However, there's a very real sense in which it doesn't matter which of the types you are referring to.
There are very few operations that can be performed on BYTE and TEXT blobs. They can be fetched and stored, but for all practical purposes, that's all. I believe you can use LENGTH to determine the length of a TEXT blob. I don't believe there are any methods available to update part of BYTE or TEXT blob; it is an all-or-nothing replacement. Further, the replacement is from a host variable of the appropriate type - there are no BYTE or TEXT literals.
The situation is a bit better with SmartBlobs, but I'm not an expert on them. There are mechanisms for obtaining a LO (large object) handle and then manipulating that, but I don't think those are available server-side (from SQL or SPL). I may be willfully not understanding what's available with the SmartBlobs, but I think the operations are only available from programming APIs and not within SQL. There are no BLOB or CLOB literals either. However, you can use SQL to load from files (FILETOBLOB, FILETOCLOB) and write to files (LOTOFILE) - with the files either on the server or on the client.
I have already answered your question about substring: substring operation on blob text in informix
. With BLOBs you can use substring operator, but not SUBSTRING() nor SUBST() functions.
You can also use LENGTH(), but not TRIM().
Example code:
CREATE TABLE _text_test (id serial, txt_vch varchar(200), txt_text text);
INSERT INTO _text_test (txt_vch, txt_text) VALUES ('1234567890', '1234567890');
SELECT txt_vch, txt_text, txt_vch[3,5], txt_text[3,5], length(txt_text) FROM _text_test;
In my example I used TEXT blob type (Jonathan showed you more blob types, you should show us what kind of blob you use in question). Last select shows usage of substring operator and LENGTH() function. You can replace LENGTH() function with other functions like TRIM() to test it with your environment. In my case TRIM() test ends with:
ODBC Error: -880 [Informix][Informix ODBC Driver][Informix]
Trim character and trim source must be of string data type.
Last select works well with JDBC 3.70JC1 driver, but it seems that ODBC 3.70TC1 driver has bug and shows 3 first chars: 123 instead of 345. Test it yourself.
In recent version (12.10) there is DBMS_LOB package
However it doesn't work as documented: for example there is no dbms_lob.get_length function. Instead I've found that dbms_lob_get_length is working as expected.
So for CLOB fields you have following usefull operations:
dbms_lob_get_length;
dbms_lob_instr;
dbms_lob_substr (unfortunately it gets data after get_length too);
I've found also one undocumented but very, very useful function: dbms_lob_new_clob which gets lvarchar argument and it converts it to CLOB.
I know that this answer is very late. I think that it can be usefull for other people searching ways to handle blobs in Informix (I've found this post few days ago when I was starting mini-research about using blobs for storing xml).

is there a downside to putting N in front of strings in scripts? Is it considered a "best practice"?

Let's say I have a table that has a varchar field. If I do an insert like this:
INSERT MyTable
SELECT N'the string goes here'
Is there any fundamental difference between that and:
INSERT MyTable
SELECT 'the string goes here'
My understanding was that you'd only have a problem if the string contained a Unicode character and the target column wasn't unicode. Other than that, SQL deals with it just fine and converts the string with the N'' into a varchar field (basically ignores the N).
I was under the impression that N in front of strings was a good practice, but I'm unable to find any discussion of it that I'd consider definitive.
You should prefix strings with N when they are destined for an nvarchar(...) column or parameter. If they are destined for a varchar(...) column or parameter, then omit it, otherwise you end up with an unnecessary conversion.
It's definitely not a "best practice" to stick N in front of every string regardless of what it's for.
Short answer: fine for scripts, bad for production code.
It is not considered a best practice. There is a downside, it creates a minuscule performance hit as 2 byte characters are converted to 1 byte characters.
If one doesn't know where the insert is going, or doesn't know where the source text is coming from (say this is a general purpose data insertion utility that generates insert statements for an unknown target, say when exporting data), N'foo' might be the more defensive coding style.
So the downside is small and the upside is that your script code is much more adaptable to changes in database structure. Which is probably why you see it in bulk data-insert scripts.
However, if the code in question is something meant for re-use in an environment where you care about the quality of the code, you should not use N'the string' because you are adding a conversion where none is necessary.
From INSERT (Transact-SQL)
When referencing the Unicode character
data types nchar, nvarchar, and ntext,
'expression' should be prefixed with
the capital letter 'N'.
Also have a read at Why do some SQL strings have an 'N' prefix?
And
Server-Side Programming with Unicode
Unicode string constants that appear
in code executed on the server, such
as in stored procedures and triggers,
must be preceded by the capital letter
N. This is true even if the column
being referenced is already defined as
Unicode. Without the N prefix, the
string is converted to the default
code page of the database. This may
not recognize certain characters.