Issue with blobs (Cassandra driver, Python) - datastax

As part of testing, I'm using the Cassandra Python driver for trying to select and delete all rows on one of the tables generated by the Cassandra stress tool (Standard1, on Keyspace1). Standard1 consists of several blob columns.
My approach is to extract the (primary) key for the rows and then run a loop to delete the rows based on that.
The problem I'm facing is that it looks like the Cassandra driver converts the blobs (hex bytes) to strings, so when i try to pass that to the delete statement it fails with "cannot parse 'XXXXXX' as hex bytes".
The data in the table on CQLSH looks like "0x303038333830343432", whereas the below select extracts the keys as, i.e., '069672027'.
Is there any way of preventing the hex bytes from being converted into strings? Any other approach i should be using?
Thanks!
query = SimpleStatement("SELECT (key) FROM \"Standard1\" LIMIT 10", consistency_level=ConsistencyLevel.LOCAL_QUORUM)
rows = session.execute(query)
for row in rows:
query = SimpleStatement("DELETE FROM \"Standard1\" WHERE key = %s", consistency_level=ConsistencyLevel.LOCAL_QUORUM)
session.execute(query, (row.key, ))

When using simple (unprepared) statements, you need to create a buffer from the string in order for the encoder to recognize it as a blob type.
http://datastax.github.io/python-driver/getting_started.html#type-conversions
Try this:
session.execute(query, (buffer(row.key),)
Alternatively, binding a prepared statement would do that implicitly.

A little late to the party, but you can also convert it directly by encoding in hex and prepending 0x. for example:
'0x{}'.format(row.key.encode('hex'))

Related

What should I use for a default value for a json column in SQL Server?

Background: I have been tasked with grafting some simple key/value data pairs to an existing database table in SQL Server (in Azure). The nature of the KvP data is simply some extended data that may or may not exist for all rows.
Further, the data is somewhat freeform as not all rows will have the same key/value pairs. This is very much bolt-on-data that (in my opinion) doesn't merit the complexity of a related table. Instead, I've decided to try using JSON to hold the data and so, to get my feet wet I've tried the following:
First, I created a new column on my table thusly:
ALTER TABLE [TheTable]
ADD [ExtendedData] NVARCHAR(512) NOT NULL DEFAULT('')
Second, I picked a few records at random and added some additional JSON in the newly created column, for example:
{ "Color":"Red", "Size":"Big", "Shape":"Round" }
Finally, I expected to be able to query this extra data, by using the JSON_VALUE function in SQL, like this:
SELECT
Field1,
Field2,
JSON_VALUE(ExtendedData, '$.Color') AS Color,
JSON_VALUE(ExtendedData, '$.Size') AS Size
FROM
MyTable
I expected my output to be a result set with 4 columns (Field1, Field2, Color, Size) where some (most) of the Color and Size fields were NULL (because the majority of rows simply do not have any json data) - but instead I got an error complaining
JSON text is not properly formatted
This led me to suspect that ALL of my ExtendedData should be properly formatted JSON for my new query to work, and so replacing my default column value of '' (an empty string) with '{}' seemingly fixes my problem.
But I am left wondering if this is the correct solution. Should I indeed default my new ExtendedData column to use an empty json object '{}', or is it safe to use an empty string '' and I am missing something syntactically in my query?
Without any evidence to the contrary, and working within the rules established for this database, I've decided to use a default value of '{}' for my JSON data.
If anyone else does this, be careful as some API's / parsers / IDE's might not like the string '{}' and require you to escape the sequence as '{{}}'.

What happens if I send integers to a BigQuery field "string"?

One of the columns I send (in my code) to BigQuery is integers. I added the columns to BigQuery and I was too fast and added them as type string.
Will they be automatically converted? Or will the data be totally corrupted (= I cannot trust at all the resulting string)?
Data shouldn't be automatically converted as this would destroy the purpose of having a table schema.
What I've seen people doing is saving a whole json line as string and then processing this string inside of BigQuery. Other than that, if you try to save values not correspondent to the field schema definition, you should see an error being thrown, like so:
If you need to change a table schema's definition, you can check this tutorial on updating a table schema.
Actually BigQuery converted automatically the integers that I have sent it to string, so my table populates ok

Dynamic type cast in select query

I have totally rewritten my question because of inaccurate description of the problem!
We have to store a lot of different informations about a specific region. For this we need a flexible data structure which does not limit the possibilities for the user.
So we've create a key-value table for this additional data which is described through a meta table which contains the datatype of the value.
We already use this information for queries over our rest api. We then automatically wrap the requested field with into a cast.
SQL Fiddle
We return this data together with information form other tables as a JSON object. We convert the corresponding rows from the data-table with array_agg and json_object into a JSON object:
...
CASE
WHEN count(prop.name) = 0 THEN '{}'::json
ELSE json_object(array_agg(prop.name), array_agg(prop.value))
END AS data
...
This works very well. Now the problem we have is if we store data like a floating point number into this field, we then get returned a string representation of this number:
e.g. 5.231 returns as "5.231"
Now we would like to CAST this number during our select statement into the right data-format so the JSON result would be correctly formatted. We have all the information we need so we tried following:
SELECT
json_object(array_agg(data.name),
-- here I cast the value into the right datatype!
-- results in an error
array_agg(CAST(value AS datatype))) AS data
FROM data
JOIN (
SELECT name, datatype
FROM meta)
AS info
ON info.name = data.name
The error message is following:
ERROR: type "datatype" does not exist
LINE 3: array_agg(CAST(value AS datatype))) AS data
^
Query failed
PostgreSQL said: type "datatype" does not exist
So is it possible to dynamically cast the text of the data_type column to a postgresql type to return a well-formatted JSON object?
First, that's a terrible abuse of SQL, and ought to be avoided in practically all scenarios. If you have a scenario where this is legitimate, you probably already know your RDBMS so intimately, that you're writing custom indexing plugins, and wouldn't even think of asking this question...
If you tell us what you're actually trying to do, there's about a 99.9% chance we can tell you a better way to do it.
Now with that disclaimer aside:
This is not possible, without using dynamic SQL. With a sufficiently recent version of PostgreSQL, you can accomplish this with the use of 'EXECUTE IMMEDIATE', which you can read about in the manual. It basically boils down to using EXEC.
Note, however, that even using this method, the result for every row fetched in the same query must have the same data type. In other words, you can't expect that row 1 will have a data type of VARCHAR, and row 2 will have INT. That is completely impossible.
The problem you have is, that json_object does create an object out of a string array for the keys and another string array for the values. So if you feed your JSON objects into this method, it will always return an error.
So the first problem is, that you have to use a JSON or JSONB column for the values. Or you can convert the values from string to json with to_json().
Now the second problem is that you need to use another method to create your json object because you want to feed it with a string array for the keys and a json-object array for the values. For this there is a method called json_object_agg.
Then your output should be like the one you expected! Here the full query:
SELECT
json_object_agg(data.name, to_json(data.value)) AS data
FROM data

XML text in Varchar(max) mysterious question mark

I have a function in a VB .NET class library, which inserts XML text into a VARCHAR(MAX) column.
The column results in an extra "?" at the front of the data in the column. I do not want that character in my data.
The column data starts like :
?<?xml version="1.0" encoding="utf-8"?><Registration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"....
The insert function is :
INSERT INTO Table (Data) OUTPUT Inserted.ID VALUES (#Data)
The table has 2 columns, Data and ID.
Am I doing something wrong. The XML is created by the .Net XmlSerializer.
Thanks
First, all XML in SQL Server is in Unicode (UCS-2, to be precise), and data access libraries probably know that. So storing their output in varchar column isn't the best idea - you might run into various issues with implicit conversion, and such. Try switching the column data type to nvarchar and look whether it helped.
Second, it might be some mark bytes that are usually found in disk files stored in UTF-8. Since SQL Server doesn't support this encoding, these bytes might have been converted (again, implicitly) into something unreadable. Try something like this query:
select cast(substring(XMLField, 1, 10) as varbinary)
from dbo.MyTable;
It will show you ASCII codes for those characters, at least.
My best guess, however, would be to get rid of UTF-8 completely - the only way to store such data in SQL Server is via varbinary columns, but I doubt you will like the resulting overhead. Try switching to UTF-16 - it's backward compatible with UCS-2 (unless you deal in something truly exotique).
Varchar can only hold characters in the ascii code page. My guess would be you have some unicode character at the beginning of that string.
Switch to nvarchar, you won't get rid of that initial character but you won't lose it either
I had some difficulty with this using System.Xml.Serialization.XmlSerializer. I also wanted the XML to be stored as a human-readable string (because of reasons).
Here's the code I used in case it is useful to someone:
var ser = new XmlSerializer(typeof(Model.SomeRootType));
using var ms = new MemoryStream();
using var writer = new StreamWriter(ms);
ser.Serialize(writer, myObjectModel);
var xml = System.Text.Encoding.UTF8.GetString(ms.ToArray());
// format the XML
var doc = System.Xml.Linq.XDocument.Parse(xml);
var niceXmlForTheDB = doc.ToString();
Notes:
Needs C# 8 for the using statements without the braces
Doesn't strictly need using for MemoryStream, but hey...
Can convert to VB if needed using one of the many online tools (you'll be used to if you code in VB!)
This code should probably be in a try-catch which does something sensible if it fails!

operations on blob data in informix

How can we use substring, trim, length operations on some text of blob datatype. And how can we update a column of blob datatype using query?
Thanks,
With difficulty!
First of all, which of the 4 various types of blob are you discussing:
BYTE
TEXT
BLOB
CLOB
These come in pairs (like Sith Lords): there is a binary version (BYTE, BLOB) and a text version (TEXT, CLOB). There's also another pairing: old (BYTE, TEXT) and newer (BLOB, CLOB). The BYTE and TEXT types were introduced with Informix OnLine 4.00 in about 1989. The BLOB and CLOB types were introduced with Informix Universal Server 9.00 in 1996, and are also known as SmartBlobs.
However, there's a very real sense in which it doesn't matter which of the types you are referring to.
There are very few operations that can be performed on BYTE and TEXT blobs. They can be fetched and stored, but for all practical purposes, that's all. I believe you can use LENGTH to determine the length of a TEXT blob. I don't believe there are any methods available to update part of BYTE or TEXT blob; it is an all-or-nothing replacement. Further, the replacement is from a host variable of the appropriate type - there are no BYTE or TEXT literals.
The situation is a bit better with SmartBlobs, but I'm not an expert on them. There are mechanisms for obtaining a LO (large object) handle and then manipulating that, but I don't think those are available server-side (from SQL or SPL). I may be willfully not understanding what's available with the SmartBlobs, but I think the operations are only available from programming APIs and not within SQL. There are no BLOB or CLOB literals either. However, you can use SQL to load from files (FILETOBLOB, FILETOCLOB) and write to files (LOTOFILE) - with the files either on the server or on the client.
I have already answered your question about substring: substring operation on blob text in informix
. With BLOBs you can use substring operator, but not SUBSTRING() nor SUBST() functions.
You can also use LENGTH(), but not TRIM().
Example code:
CREATE TABLE _text_test (id serial, txt_vch varchar(200), txt_text text);
INSERT INTO _text_test (txt_vch, txt_text) VALUES ('1234567890', '1234567890');
SELECT txt_vch, txt_text, txt_vch[3,5], txt_text[3,5], length(txt_text) FROM _text_test;
In my example I used TEXT blob type (Jonathan showed you more blob types, you should show us what kind of blob you use in question). Last select shows usage of substring operator and LENGTH() function. You can replace LENGTH() function with other functions like TRIM() to test it with your environment. In my case TRIM() test ends with:
ODBC Error: -880 [Informix][Informix ODBC Driver][Informix]
Trim character and trim source must be of string data type.
Last select works well with JDBC 3.70JC1 driver, but it seems that ODBC 3.70TC1 driver has bug and shows 3 first chars: 123 instead of 345. Test it yourself.
In recent version (12.10) there is DBMS_LOB package
However it doesn't work as documented: for example there is no dbms_lob.get_length function. Instead I've found that dbms_lob_get_length is working as expected.
So for CLOB fields you have following usefull operations:
dbms_lob_get_length;
dbms_lob_instr;
dbms_lob_substr (unfortunately it gets data after get_length too);
I've found also one undocumented but very, very useful function: dbms_lob_new_clob which gets lvarchar argument and it converts it to CLOB.
I know that this answer is very late. I think that it can be usefull for other people searching ways to handle blobs in Informix (I've found this post few days ago when I was starting mini-research about using blobs for storing xml).