Inserting text string with hex into PostgreSQL as a bytea - sql

I have a text file with several strings of hex in it:
013d7d16d7ad4fefb61bd95b765c8ceb
007687fc64b746569616414b78c81ef1
I would like to store these in the database as a bytea, instead of a varchar. That is, I would like the database to store 01 as the single byte 00000001, not characters '0' & '1'.
I can easily run this file through sed to format/escape it any way I need to.
This is what I have tried:
create table mytable (testcol BYTEA);
This works:
insert into mytable (testcol) values (E'\x7f\x7f');
However, as soon as I have a byte that goes above \x7f, I get this error:
insert into mytable (testcol) values (E'\x7f\x80');
ERROR: invalid byte sequence for encoding "UTF8": 0x80
Any ideas, or am I approaching things wrong?

You can convert a hex string to bytea using the decode function (where "encoding" means encoding a binary value to some textual value). For example:
select decode('DEADBEEF', 'hex');
decode
------------------
\336\255\276\357
which is more understandable with 9.0's default output:
decode
------------
\xdeadbeef
The reason you can't just say E'\xDE\xAD\xBE\xEF' is that this is intended to make a text value, not a bytea, so Postgresql will try to convert it from the client encoding to the database encoding. You could write the bytea escape format like that, but you need to double the backslashes: E'\\336\\255\\276\\357'::bytea. I think you can see why the bytea format is being changed.... IMHO the decode() function is a reasonable way of writing inputs, even though there is some overhead involved.

INSERT INTO
mytable (testcol)
VALUES
(decode('013d7d16d7ad4fefb61bd95b765c8ceb', 'hex'))

The Ruby Way
I recently needed to read/write binary data from/to Postgres, but via Ruby. Here's how I did it using the Pg library.
Although not strictly Postgres-specific, I thought I'd include this Ruby-centric answer for reference.
Postgres DB Setup
require 'pg'
DB = PG::Connection.new(host: 'localhost', dbname:'test')
DB.exec "CREATE TABLE mytable (testcol BYTEA)"
BINARY = 1
Insert Binary Data
sql = "INSERT INTO mytable (testcol) VALUES ($1)"
param = {value: binary_data, format: BINARY}
DB.exec_params(sql, [param]) {|res| res.cmd_tuples == 1 }
Select Binary Data
sql = "SELECT testcol FROM mytable LIMIT 1"
DB.exec_params(sql, [], BINARY) {|res| res.getvalue(0,0) }

Introduction
This is an updated answer that includes both how to insert but also how to query.
It is possible to convert the hex into a bytea value using the decode function. This should be used for both querying and also inserting.
This can be used for both inserting but also querying.
Example SQL Fiddle
Querying Existing Data
SELECT * FROM mytable WHERE testcol = (decode('013d7d16d7ad4fefb61bd95b765c8ceb', 'hex'));
Encode vs Decode for Querying
A user had asked the following:
How does searching the bytea field by hex value after inserting it?
SELECT * FROM my_table WHERE myHexField =
(encode('013d7d16d7ad4fefb61bd95b765c8ceb', 'hex'));
does not work.
In the documentation Binary String Functions and Operators, they have the description of both encode and decode.
+==================================+=============+=======================================================================================================+=======================================+============+
| Function | Return Type | Description | Example | Result |
+==================================+=============+=======================================================================================================+=======================================+============+
| decode(string text, format text) | bytea | Decode binary data from textual representation in string. Options for format are same as in encode. | decode('123\000456', 'escape') | 123\000456 |
+----------------------------------+-------------+-------------------------------------------------------------------------------------------------------+---------------------------------------+------------+
| encode(data bytea, format text) | text | Encode binary data into a textual representation. Supported formats are: base64, hex, escape. escape  | encode('123\000456'::bytea, 'escape') | 123\000456 |
| | | converts zero bytes and high-bit-set bytes to octal sequences (\nnn) and doubles backslashes. | | |
+----------------------------------+-------------+-------------------------------------------------------------------------------------------------------+---------------------------------------+------------+
So you will notice that Encode is for encoding binary data into a textual string and returns text. However, since we are storing bytea we have to use decode for both inserting and querying.
Inserting
create table mytable (testcol BYTEA);
INSERT INTO
mytable (testcol)
VALUES
(decode('013d7d16d7ad4fefb61bd95b765c8ceb', 'hex'));
From: see previous answer

From: https://www.postgresql.org/docs/current/functions-binarystring.html
INSERT INTO
mytable (testcol)
VALUES
('\x013d7d16d7ad4fefb61bd95b765c8ceb'::bytea);

More and sundry options where testcol is of type bytea:
-- how to insert the string "123[a char of value zero]abc456"
insert into mytable (testcol) values decode(E'123\\000abc456', 'escape');
-- how to insert the string "123abc456"
insert into mytable (testcol) values decode(E'123abc456', 'escape');
-- how to insert in base64: insert string "abc456"
insert into mytable (testcol) values decode('YWJjNDU2', 'base64');

Related

BigQuery - Cast HEX string to NUMERIC or BIGNUMERIC?

I've got a data string in a hex format. Something like
'0x00000000000000000000000000000000000000000000000000000000000000006cc09155dd769741d7cd1c6a3334a1aeef62da2d0e92a39230becd6e56c2ad490000000000000000000000000000000000000000000000007ce66c50e2840000' as data
I know that substring(data, 131) is a large number.
I can pass SAFE_CAST(CONCAT('0x', SUBSTRING(data, 131)) AS INT64) just fine on the smaller numbers.
SAFE_CAST(CONCAT('0x', SUBSTRING(data, 131)) AS NUMERIC) (or bignumeric) won't work.
I tried something like FROM_HEX(SUBSTRING(data, 131)) to get a byte format. But couldn't find any good options for getting BYTE to NUMERIC either.
For such big numbers, not even BIGNUMERIC will fit them, so you will have to work as string. Regular BigQUery functions will not be able to handle that numbers, so I suggest you to use a UDF:
CREATE TEMP FUNCTION from_hex_to_intstring(hex STRING)
RETURNS STRING
LANGUAGE js AS r"""
yourNumber = BigInt(hex,16);
return yourNumber;
""";
select from_hex_to_intstring('0x00000000000000000000000000000000000000000000000000000000000000006cc09155dd769741d7cd1c6a3334a1aeef62da2d0e92a39230becd6e56c2ad490000000000000000000000000000000000000000000000007ce66c50e2840000') data;
select from_hex_to_intstring('0x00000000000000000000000000000000000000000009ed194db19b238c000000') data
Results:
-------------------------------
Row | data
1 | 5695815805094697319662327076913960577653781492348607706655047793592681546373383993595483025021696631917691807178407718241565809060633202962632700189736960
-------------------------------
Row | data
1 | 12000000000000000000000000
-------------------------------
Bonus 1:
If the hex is not that big you can return it as NUMERIC or BIGNUMERIC with:
select cast(from_hex_to_intstring(<hex string>) as NUMERIC)
Bonus 2:
If you want to trim the zeros on your hex use the following (But its not required for the function above):
select concat("0x",ltrim('0x00000000000000000000000000000000000000000009ed194db19b238c000000',"0x")) as data
-------------------------------
Row | data
1 | 0x9ed194db19b238c000000
-------------------------------
I recommend you to work only with string, not cast to NUMERIC.

Additional 0 in varbinary insert in SSMS

I have a problem when I am trying to move a varbinary(max) field from one DB to another.
If I insert like this:
0xD0CF11E0A1B11AE10000000
It results the beginning with an additional '0':
0x0D0CF11E0A1B11AE10000000
And I cannot get rid of this. I've tried many tools, like SSMS export tool or BCP, but without any success. And it would be better fro me to solve it in a script anyway.
And don't have much kowledge about varbinary (a program generates it), my only goal is to copy it:)
0xD0CF11E0A1B11AE10000000
This value contains an odd number of characters. Varbinary stores bytes. Each byte is represented by exactly two hexadecimal characters. You're either missing a character, or your not storing bytes.
Here, SQL Server is guessing that the most significant digit is a zero, which would not change the numeric value of the string. For example:
select 0xD0C "value"
,cast(0xD0C as int) "as_integer"
,cast(0x0D0C as int) "leading_zero"
,cast(0xD0C0 as int) "trailing_zero"
value 3_char leading_zero trailing_zero
---------- --------- --------------- ----------------
0d0c 3340 3340 53440
Or:
select 1 "test"
where 0xD0C = 0x0D0C
test
-------
1
It's just a difference of SQL Server assuming that varbinary always represents bytes.

How to do a count of fields in SQL with wrong datatype

I am trying to import legacy data from another system into our system. The problem I am having is that the legacy data is dirty- very dirty! We have a field which should be an integer, but sometimes is a varchar, and the field is defined as a varchar...
In SQL Server, how can I do a select to show those records where the data is varchar instead if int?
Thanks
If you want to find rows1 where a column contains any non-digit characters or is longer than 9 characters (either condition means that we cannot assume it would fit in an int, use something like:
SELECT * FROM Table WHERE LEN(ColumnName) > 9 or ColumnName LIKE '%[^0-9]%'
Not that there's a negative in the LIKE condition - we're trying to find a string that contains at least one non-digit character.
A more modern approach would be to use TRY_CAST or TRY_CONVERT. But note that a failed conversion returns NULL and NULL is perfectly valid for an int!
SELECT * FROM Table WHERE ColumnName is not null and try_cast(ColumnName as int) is null
ISNUMERIC isn't appropriate. It answers a question nobody has ever wanted to ask (IMO) - "Can this string be converted to any of the numeric data types (I don't care which ones and I don't want you to tell me which ones either)?"
ISNUMERIC('$,,,,,,,.') is 1. That should tell you all you need to know about this function.
1If you just want a count, as per the title of the question, then substitute COUNT(*) for *.
In SQL Server, how can I do a select to show those records where the data is varchar instead of int?
I would do it like
CREATE TABLE T
(
Data VARCHAR(50)
);
INSERT INTO T VALUES
('102'),
(NULL),
('11Blah'),
('5'),
('Unknown'),
('1ThinkPad123'),
('-11');
SELECT Data -- Per the title COUNT(Data)
FROM
(
SELECT Data,
cast('' as xml).value('sql:column("Data") cast as xs:int ?','int') Result
FROM T --You can add WHERE Data IS NOT NULL to exclude NULLs
) TT
WHERE Result IS NULL;
Returns:
+----+--------------+
| | Data |
+----+--------------+
| 1 | NULL |
| 2 | 11Blah |
| 3 | Unknown |
| 4 | 1ThinkPad123 |
+----+--------------+
That if you can't use TRY_CAST() function, if you are working on 2012+ version, I'll recommend that you use TRY_CAST() function like
SELECT Data
FROM T
WHERE Data IS NOT NULL
AND
TRY_CAST(Data AS INT) IS NULL;
Demo
Finally, I would say do not use ISNUMERIC() function because of (from docs) ...
Note
ISNUMERIC returns 1 for some characters that are not numbers, such as plus (+), minus (-), and valid currency symbols such as the dollar sign ($). For a complete list of currency symbols, see money and smallmoney (Transact-SQL).

How to insert bytestrings into HANA

I am trying to insert two byte strings into a HANA table with VARBINARY columns, but I keep getting a syntax error, e.g.
SAP DBTech JDBC: [257]: sql syntax error: incorrect syntax near "G\xa2ac\xa0av\xf6": line 1 col 98 (at pos 98)
My two byte strings look like this:
STRING1 = b'G\xa2ac\xa0av\xf6'
type(STRING1) == <class 'bytes'>
STRING2 = b'708ca7fbb701799bb387f2e50deaca402e8502abe229f705693d2d4f350e1ad6'
type(STRING2) == <class 'bytes'>
My query to insert the values looks like this:
INSERT INTO testTable VALUES(
CAST(b'708ca7fbb701799bb387f2e50deaca402e8502abe229f705693d2d4f350e1ad6' AS VARBINARY),
CAST(b'G\xa2ac\xa0av\xf6' AS VARBINARY));
I've also tried to do a query how the documentation suggests:
INSERT INTO testTable VALUES(
CAST(x'708ca7fbb701799bb387f2e50deaca402e8502abe229f705693d2d4f350e1ad6' AS VARBINARY),
CAST(x'G\xa2ac\xa0av\xf6' AS VARBINARY));
As well as:
INSERT INTO testTable VALUES(
b'708ca7fbb701799bb387f2e50deaca402e8502abe229f705693d2d4f350e1ad6',
b'G\xa2ac\xa0av\xf6');
But all of these give me some syntax error. Any help would be greatly appreciated. Thanks!
The problem here lies with your STRING1 value ( b'G\xa2ac\xa0av\xf6' ).
It is not a valid hexadecimal string that can represent a binary value in SAP HANA. That's why any type casting will fail here.
Instead, it seems that it is actually a string and some of the characters are represented hexadecimal values (UNICODE codepoints maybe?).
At least that's what I make of the \x escpace sequence in the string.
So, you can do different things now.
you can store the string as-is with the escape sequences in the
VARBINARY column. To do that, you can use
to_binary('G\xa2ac\xa0av\xf6') in the insert statement.
you can convert this string into a valid UNICODE string in your application code and store the data in an NVARCHAR column instead.
As far as I am aware HANA does not understand byte encode like python so I think there is the mix up if you use that representation within the sql console. So in python when printing b'G\xa2ac\xa0av\xf6' a byte that is non presentable in ascii (your local encoding?) is prefixed with \x.
If you want to do that you might first want to convert that to a hex representation in python
>>> import binascii
>>> binascii.hexlify(b'\xa2ac\xa0av\xf6')
b'47a26163a06176f6'
This will give you a uniform representation of your bytearray in hex which you can now use in your SQL console (as HANA Studio and the likes):
INSERT INTO TestTable VALUES(x'47a26163a06176f6');
-- OR
INSERT INTO TestTable VALUES(HEXTOBIN('47a26163a06176f6'));
Note that the prefix b changes to x in the first case to indicate HANA that it should consider this as binary data in hexadecimal representation.
To insert the value from Python 2 as prepared statement:
>>> cursor.execute("INSERT INTO TestTable Values(?)", \
parameters=[binascii.hexlify(b'G\xa2ac\xa0av\xf6')])
PyHDB seems to expect a string to cope correctly, but in Python 3 hexlify will yield a byte array so you need to turn the result into a string again
>>> param = str(binascii.hexlify(b'G\xa2ac\xa0av\xf6'), 'ascii')
>>> cursor.execute("INSERT INTO TestTable Values(?)", parameters=[param])
I guess this could be considered a bug in PyHDB or at least an inconsistency. Just for completeness sake, in SAP's dbapi client there is a Binary class to wrap bytearrays for this purpose.
Now query that with your client
>>> import pyhdb
>>> con = pyhdb.connect(....)
>>> cursor = con.cursor()
>>> cursor.execute('SELECT * FROM TestTable')
>>> cursor.fetchall()
[(b'G\xa2ac\xa0av\xf6',)]
To sum the entire thing up: b'G\xa2ac\xa0av\xf6' is not a representation HANA understands as such when using it in a SQL statement. We need to find a common ground, for that we converted the bytearray to a hex representation (hexlify) and told HANA to handle it as such (x-prefix / HEXTOBIN).
As Lars Br. mentioned, if those are indeed unicode literals you might
want to consider NVARCHAR as datatype.

How do I cast a type to a bigint in MySQL?

CAST() seems to only work for BINARY,CHAR,DATE;DATETIME,DECIMAL,TIME,SIGNED,UNSIGNED.
I need to convert a hex string to a bigint, that is, I'd want:
SELECT CAST(CONV("55244A5562C5566354',16,10) AS BIGINT)
CONV() returns a string, so that's why I'm trying the convert it. I have 2 uses for this
Inserting data, e.g. INSERT INTO a(foo) SELECT CONV(bar,16,10) FROM ... Here foo is a bigint column, bar a varchar. Perhaps I could get away with the select statement being a string and let MySQL take care of it (?)
Returning data where the client will dynamically learn the data type of the column, SELECT CONV(bar,16,10) is no good as the client will handle it as a string.
SELECT CAST(CONV('55244A5562C5566354',16,10) AS UNSIGNED INTEGER);
What seems to be the problem? I've tested this conversion both on 64-bit and 32-bit system. Works fine. Note, that instead of doing hex to bin conversion, you can just treat the number as hexadecimal.
mysql> SELECT CAST(X'55244A5562C5566354' AS UNSIGNED);
+-----------------------------------------+
| CAST(X'55244A5562C5566354' AS UNSIGNED) |
+-----------------------------------------+
| 2614996416347923284 |
+-----------------------------------------+
1 row in set (0.00 sec)