XML text in Varchar(max) mysterious question mark - sql

I have a function in a VB .NET class library, which inserts XML text into a VARCHAR(MAX) column.
The column results in an extra "?" at the front of the data in the column. I do not want that character in my data.
The column data starts like :
?<?xml version="1.0" encoding="utf-8"?><Registration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"....
The insert function is :
INSERT INTO Table (Data) OUTPUT Inserted.ID VALUES (#Data)
The table has 2 columns, Data and ID.
Am I doing something wrong. The XML is created by the .Net XmlSerializer.
Thanks

First, all XML in SQL Server is in Unicode (UCS-2, to be precise), and data access libraries probably know that. So storing their output in varchar column isn't the best idea - you might run into various issues with implicit conversion, and such. Try switching the column data type to nvarchar and look whether it helped.
Second, it might be some mark bytes that are usually found in disk files stored in UTF-8. Since SQL Server doesn't support this encoding, these bytes might have been converted (again, implicitly) into something unreadable. Try something like this query:
select cast(substring(XMLField, 1, 10) as varbinary)
from dbo.MyTable;
It will show you ASCII codes for those characters, at least.
My best guess, however, would be to get rid of UTF-8 completely - the only way to store such data in SQL Server is via varbinary columns, but I doubt you will like the resulting overhead. Try switching to UTF-16 - it's backward compatible with UCS-2 (unless you deal in something truly exotique).

Varchar can only hold characters in the ascii code page. My guess would be you have some unicode character at the beginning of that string.
Switch to nvarchar, you won't get rid of that initial character but you won't lose it either

I had some difficulty with this using System.Xml.Serialization.XmlSerializer. I also wanted the XML to be stored as a human-readable string (because of reasons).
Here's the code I used in case it is useful to someone:
var ser = new XmlSerializer(typeof(Model.SomeRootType));
using var ms = new MemoryStream();
using var writer = new StreamWriter(ms);
ser.Serialize(writer, myObjectModel);
var xml = System.Text.Encoding.UTF8.GetString(ms.ToArray());
// format the XML
var doc = System.Xml.Linq.XDocument.Parse(xml);
var niceXmlForTheDB = doc.ToString();
Notes:
Needs C# 8 for the using statements without the braces
Doesn't strictly need using for MemoryStream, but hey...
Can convert to VB if needed using one of the many online tools (you'll be used to if you code in VB!)
This code should probably be in a try-catch which does something sensible if it fails!

Related

HANA: Unknown Characters in Database column of datatype BLOB

I need help on how to resolve characters of unknown type from a database field into a readable format, because I need to overwrite this value on database level with another valid value (in the exact format the application stores it in) to automate system copy acitvities.
I have a proprietary application that also allows users to configure it in via the frontend. This configuration data gets stored in a table and the values of a configuration property are stored in a column of type "BLOB". For the here desired value, I provide a valid URL in the application frontend (like http://myserver:8080). However, what gets stored in the database is not readable (some square characters). I tried all sorts of conversion functions of HANA (HEX, binary), simple, and in a cascaded way (e.g. first to binary, then to varchar) to make it readable. Also, I tried it the other way around and make the value that I want to insert appear in the correct format (conversion to BLOL over hex or binary) but this does not work either. I copied the value to clipboard and compared it to all sorts of character set tables (although I am not sure if this can work at all).
My conversion tries look somewhat like this:
SELECT TO_ALPHANUM('') FROM DUMMY;
while the brackets would contain the characters in question. I cant even print them here.
How can one approach this and maybe find out the character set that is used by this application? I would be grateful for some more ideas.
What you have in your BLOB column is a series of bytes. As you mentioned, these bytes have been written by an application that uses an unknown character set.
In order to interpret those bytes correctly, you need to know the character set as this is literally the mapping of bytes to characters or character identifiers (e.g. code points in UTF).
Now, HANA doesn't come with a whole lot of options to work on LOB data in the first place and for C(haracter)LOB data most manipulations implicitly perform a conversion to a string data type.
So, what I would recommend is to write a custom application that is able to read out the BLOB bytes and perform the conversion in that custom app. Once successfully converted into a string you can store the data in a new NVCLOB field that keeps it in UTF-8 encoding.
You will have to know the character set in the first place, though. No way around that.
I assume you are on Oracle. You can convert BLOB to CLOB as described here.
http://www.dba-oracle.com/t_convert_blob_to_clob_script.htm
In case of your example try this query:
select UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(<your_blob_value)) from dual;
Obviously this only works for values below 32767 characters.

Update multiple rows in same query using mapping table in Postgres 8.4

In question Update multiple rows in same query using PostgreSQL Roman Peckar gave an answer similar to this; I have modified it for the purpose of my question:
update test as t set
column_a = c.column_a,
column_b = c.column_b
from (values
('123', bytea1),
('345', bytea2)
) as c(column_a, column_b)
where c.column_a = t.column_a;
In my case table test has a column of type bytea, say column_b. However, this does not work as c.column_b is of type text and thus an error is produced saying there is no conversion from text to bytea and hinting to use a cast. Well, using a cast does not help either as another error occurs about encoding referring to a LATIN encoding. I apologise for the imprecise reporting of the errors but I do not presently have access to the machine on which this work was carried out.
It seems that the default type of the c.column_b is text. Cannot the type of a column be dictated in the 'as' clause say, 'as c(column_a, column_b type bytea)' or in some other way? If not I assume I must resort to using some binary string function which seems a bit inelegant to say the least.
Because text type is for text. It needs properly encoded text in your client encoding and which can be saved with no data loss in your server encoding (so for example in latin1 no “ or €, as characters like this can not be saved using this encoding).
So if you need to save text, which can contain characters outside of latin1 (like anything typed to a web form) you'd need to change database encoding to utf-8. Or, as a last resort, use encode(data,'base64').

Is there a quick way to re-format all values in a ColdFusion query column?

I am at the liberty of a certain database, which stores date values as integers (i.e. 20121119). I have several queries that retrieve these values for reporting displays, so I need to convert these values to m/d/yyyy format.
I see several ways to do this:
Do the conversion in the display, using an existing global UDF. The drawback here, is that if the query is re-used, I need to duplicate the code necessary to convert the display value.
Parse the value in the SQL to return a properly formatted value. I'm reading from a DB2/iSeries, which does not (as far as I have found) have a built-in function for this.
Loop over the result set and convert each value one at a time. This is what I am currently doing, however for larger data-sets, performance is an issue:
<cfscript>
var i = 1;
var _query = ARGUMENTS.query;
if ( !Len(Trim(ARGUMENTS.column))
|| !ListFindNoCase(_query.ColumnList, ARGUMENTS.column))
return _query;
for (i=1; i<=_query.RecordCount; i++) {
_query[ARGUMENTS.column][i] =
VARIABLES.Library.DateTime.ParseAS400Date(
_query[ARGUMENTS.column][i]
);
}
return _query;
</cfscript>
Is there an easy/quick way to apply a formatting function to an entire column in a ColdFusion query object?
As noted by #Dan, there is a built-in function that will convert a string to a timestamp representation. Since you have an int, then it would be something like this:
SELECT VARCHAR_FORMAT(TIMESTAMP_FORMAT(CAST(20121119 AS CHAR(8)), 'YYYYMMDD'), 'MM/DD/YYYY')
FROM SYSIBM.SYSDUMMY1
This page might help you with the parsing in db2 you didn't know about. Format date to string
Edit: Oops, you said it was an integer. The cast() function will convert it to character and then you can use concat() and substr() to format it.
Using that function would be my approach.
Do the conversion in the display, using an existing global UDF.
This is the right way to handle this problem. Typically, formatting should not be applied to a data model (which is essentially what your query result is). Formatting should be applied when the data is displayed. This allows you to format the data differently when it is displayed in different contexts. Also, it improves code readability.
The drawback here, is that if the query is re-used, I need to duplicate the code necessary to convert the display value.
This is not a drawback. You will be formatting the data for presentation any time you are displaying data. Formatting a date is no different.
Calling a function that formats your data more than once is not "code duplication". You are simply using the formatting function as it was intended to be used.

Error Inserting Entry With Text Column That Contains New Line And Quotes

I have an Informix 11.70 database.I am unable to sucessfully execute this insert statement on a table.
INSERT INTO some_table(
col1,
col2,
text_col,
col3)
VALUES(
5,
50,
CAST('"id","title1","title2"
"row1","some data","some other data"
"row2","some data","some other"' AS TEXT),
3);
The error I receive is:
[Error Code: -9634, SQL State: IX000] No cast from char to text.
I found that I should add this statement in order to allow using new lines in text literals, so I added this above the same query I have already written:
EXECUTE PROCEDURE IFX_ALLOW_NEWLINE('t');
Still, I receive the same error.
I have also read the IBM documentation that says: to alternatively allow new lines, I could set the ALLOW_NEWLINE parameter in the ONCONFIG file. I suppose the last one requires administrative access to the server to alter that config file, which I do not have, and I prefer not to take advantage of this setting.
Informix's TEXT (and BYTE) columns pre-date any standard, and are in many ways very peculiar types. TEXT in Informix is very different from TEXT found in other DBMS. One of the long-standing (over 20 years) problems with them is that there isn't a string literal notation that can be used to insert data into them. The 'No cast from char to text' is saying there is no explicit conversion from string literal to TEXT, either.
You have a variety of options:
Use LVARCHAR in the table (good if your values won't be longer than a few KiB, because the total row length is approximately 32 KiB). Maximum size of an LVARCHAR column is just under 32 KiB.
Use a programming language which can handle Informix 'locator' structures — in ESQL/C, the type used to hold a TEXT is loc_t.
Consider using CLOB instead. However, this has the same limitation (no string to CLOB conversion), but you'd be able to use the FILETOCLOB() function to get the information from a file on the client to the database (and LOTOFILE transfers information from the DB to a file on the client).
If you can use LVARCHAR, that is by far the simplest alternative.
I forgot to mention an important detail in the question - I use Java and the Hibernate ORM to access my Informix database, thus some of the suggested approaches (the loc_t handling in particular) in Jonathan Leffler's answer are unfortunately not applicable. Also, I need to store large data of dynamic length and I fear the LVARCHAR column would not be sufficient to hold it.
The way I got it working was to follow Michał Niklas's suggestion from his comment, and use PreparedStatement. This could potentially be explained by Informix handing the TEXT data type in its own manner.

operations on blob data in informix

How can we use substring, trim, length operations on some text of blob datatype. And how can we update a column of blob datatype using query?
Thanks,
With difficulty!
First of all, which of the 4 various types of blob are you discussing:
BYTE
TEXT
BLOB
CLOB
These come in pairs (like Sith Lords): there is a binary version (BYTE, BLOB) and a text version (TEXT, CLOB). There's also another pairing: old (BYTE, TEXT) and newer (BLOB, CLOB). The BYTE and TEXT types were introduced with Informix OnLine 4.00 in about 1989. The BLOB and CLOB types were introduced with Informix Universal Server 9.00 in 1996, and are also known as SmartBlobs.
However, there's a very real sense in which it doesn't matter which of the types you are referring to.
There are very few operations that can be performed on BYTE and TEXT blobs. They can be fetched and stored, but for all practical purposes, that's all. I believe you can use LENGTH to determine the length of a TEXT blob. I don't believe there are any methods available to update part of BYTE or TEXT blob; it is an all-or-nothing replacement. Further, the replacement is from a host variable of the appropriate type - there are no BYTE or TEXT literals.
The situation is a bit better with SmartBlobs, but I'm not an expert on them. There are mechanisms for obtaining a LO (large object) handle and then manipulating that, but I don't think those are available server-side (from SQL or SPL). I may be willfully not understanding what's available with the SmartBlobs, but I think the operations are only available from programming APIs and not within SQL. There are no BLOB or CLOB literals either. However, you can use SQL to load from files (FILETOBLOB, FILETOCLOB) and write to files (LOTOFILE) - with the files either on the server or on the client.
I have already answered your question about substring: substring operation on blob text in informix
. With BLOBs you can use substring operator, but not SUBSTRING() nor SUBST() functions.
You can also use LENGTH(), but not TRIM().
Example code:
CREATE TABLE _text_test (id serial, txt_vch varchar(200), txt_text text);
INSERT INTO _text_test (txt_vch, txt_text) VALUES ('1234567890', '1234567890');
SELECT txt_vch, txt_text, txt_vch[3,5], txt_text[3,5], length(txt_text) FROM _text_test;
In my example I used TEXT blob type (Jonathan showed you more blob types, you should show us what kind of blob you use in question). Last select shows usage of substring operator and LENGTH() function. You can replace LENGTH() function with other functions like TRIM() to test it with your environment. In my case TRIM() test ends with:
ODBC Error: -880 [Informix][Informix ODBC Driver][Informix]
Trim character and trim source must be of string data type.
Last select works well with JDBC 3.70JC1 driver, but it seems that ODBC 3.70TC1 driver has bug and shows 3 first chars: 123 instead of 345. Test it yourself.
In recent version (12.10) there is DBMS_LOB package
However it doesn't work as documented: for example there is no dbms_lob.get_length function. Instead I've found that dbms_lob_get_length is working as expected.
So for CLOB fields you have following usefull operations:
dbms_lob_get_length;
dbms_lob_instr;
dbms_lob_substr (unfortunately it gets data after get_length too);
I've found also one undocumented but very, very useful function: dbms_lob_new_clob which gets lvarchar argument and it converts it to CLOB.
I know that this answer is very late. I think that it can be usefull for other people searching ways to handle blobs in Informix (I've found this post few days ago when I was starting mini-research about using blobs for storing xml).