BigQuery - Cast HEX string to NUMERIC or BIGNUMERIC? - sql

I've got a data string in a hex format. Something like
'0x00000000000000000000000000000000000000000000000000000000000000006cc09155dd769741d7cd1c6a3334a1aeef62da2d0e92a39230becd6e56c2ad490000000000000000000000000000000000000000000000007ce66c50e2840000' as data
I know that substring(data, 131) is a large number.
I can pass SAFE_CAST(CONCAT('0x', SUBSTRING(data, 131)) AS INT64) just fine on the smaller numbers.
SAFE_CAST(CONCAT('0x', SUBSTRING(data, 131)) AS NUMERIC) (or bignumeric) won't work.
I tried something like FROM_HEX(SUBSTRING(data, 131)) to get a byte format. But couldn't find any good options for getting BYTE to NUMERIC either.

For such big numbers, not even BIGNUMERIC will fit them, so you will have to work as string. Regular BigQUery functions will not be able to handle that numbers, so I suggest you to use a UDF:
CREATE TEMP FUNCTION from_hex_to_intstring(hex STRING)
RETURNS STRING
LANGUAGE js AS r"""
yourNumber = BigInt(hex,16);
return yourNumber;
""";
select from_hex_to_intstring('0x00000000000000000000000000000000000000000000000000000000000000006cc09155dd769741d7cd1c6a3334a1aeef62da2d0e92a39230becd6e56c2ad490000000000000000000000000000000000000000000000007ce66c50e2840000') data;
select from_hex_to_intstring('0x00000000000000000000000000000000000000000009ed194db19b238c000000') data
Results:
-------------------------------
Row | data
1 | 5695815805094697319662327076913960577653781492348607706655047793592681546373383993595483025021696631917691807178407718241565809060633202962632700189736960
-------------------------------
Row | data
1 | 12000000000000000000000000
-------------------------------
Bonus 1:
If the hex is not that big you can return it as NUMERIC or BIGNUMERIC with:
select cast(from_hex_to_intstring(<hex string>) as NUMERIC)
Bonus 2:
If you want to trim the zeros on your hex use the following (But its not required for the function above):
select concat("0x",ltrim('0x00000000000000000000000000000000000000000009ed194db19b238c000000',"0x")) as data
-------------------------------
Row | data
1 | 0x9ed194db19b238c000000
-------------------------------
I recommend you to work only with string, not cast to NUMERIC.

Related

Calculating hash integer from a string in Athena

I'm trying to calculate a hash from a string for best-effort ordering and partioning purposes in Athena. There is no String to hashCode() similar in Athena, so as a best effort, I try to get the 2nd character and calculate its codepoint and get the modulus. (As I said, best effort, maybe a nice effort)
Consider the query:
SELECT
doc_id,
substring(doc_id, 2, 1),
typeof(split(substring(doc_id, 2, 1)))
FROM events LIMIT 100
The 3rd row returns a varchar but the codepoint function expects a varchar(1) and casting it does not work as cast(substring(doc_id, 2, 1) as varchar(1)).
FUNCTION_NOT_FOUND: line 6:5: Unexpected parameters (varchar) for function codepoint. Expected: codepoint(varchar(1))
How can I accomplish this task without modifiying the data source? I'm open to ideas.
You can compute a hash code with the xxhash64 function. It takes a varbinary as input, so first cast the string to that type. Since the function also returns a 64-bit varbinary value, you can convert it to a bigint via the from_big_endian_64 function
WITH t(x) AS (VALUES 'hello')
SELECT from_big_endian_64(xxhash64(cast(x AS varbinary)))
FROM t
output:
_col0
---------------------
2794345569481354659
(1 row)

How to do a count of fields in SQL with wrong datatype

I am trying to import legacy data from another system into our system. The problem I am having is that the legacy data is dirty- very dirty! We have a field which should be an integer, but sometimes is a varchar, and the field is defined as a varchar...
In SQL Server, how can I do a select to show those records where the data is varchar instead if int?
Thanks
If you want to find rows1 where a column contains any non-digit characters or is longer than 9 characters (either condition means that we cannot assume it would fit in an int, use something like:
SELECT * FROM Table WHERE LEN(ColumnName) > 9 or ColumnName LIKE '%[^0-9]%'
Not that there's a negative in the LIKE condition - we're trying to find a string that contains at least one non-digit character.
A more modern approach would be to use TRY_CAST or TRY_CONVERT. But note that a failed conversion returns NULL and NULL is perfectly valid for an int!
SELECT * FROM Table WHERE ColumnName is not null and try_cast(ColumnName as int) is null
ISNUMERIC isn't appropriate. It answers a question nobody has ever wanted to ask (IMO) - "Can this string be converted to any of the numeric data types (I don't care which ones and I don't want you to tell me which ones either)?"
ISNUMERIC('$,,,,,,,.') is 1. That should tell you all you need to know about this function.
1If you just want a count, as per the title of the question, then substitute COUNT(*) for *.
In SQL Server, how can I do a select to show those records where the data is varchar instead of int?
I would do it like
CREATE TABLE T
(
Data VARCHAR(50)
);
INSERT INTO T VALUES
('102'),
(NULL),
('11Blah'),
('5'),
('Unknown'),
('1ThinkPad123'),
('-11');
SELECT Data -- Per the title COUNT(Data)
FROM
(
SELECT Data,
cast('' as xml).value('sql:column("Data") cast as xs:int ?','int') Result
FROM T --You can add WHERE Data IS NOT NULL to exclude NULLs
) TT
WHERE Result IS NULL;
Returns:
+----+--------------+
| | Data |
+----+--------------+
| 1 | NULL |
| 2 | 11Blah |
| 3 | Unknown |
| 4 | 1ThinkPad123 |
+----+--------------+
That if you can't use TRY_CAST() function, if you are working on 2012+ version, I'll recommend that you use TRY_CAST() function like
SELECT Data
FROM T
WHERE Data IS NOT NULL
AND
TRY_CAST(Data AS INT) IS NULL;
Demo
Finally, I would say do not use ISNUMERIC() function because of (from docs) ...
Note
ISNUMERIC returns 1 for some characters that are not numbers, such as plus (+), minus (-), and valid currency symbols such as the dollar sign ($). For a complete list of currency symbols, see money and smallmoney (Transact-SQL).

LEFT returns different results for the same values

Can some one explain me why the results of LEFT are different? For:
Declare #f as float
set #f = 40456510.
select LEFT(cast(#f as float), LEN(4045.)), LEFT(404565., LEN(4045.))
I got:
|
------------
4.04 | 4045
Is there a default cast which causes this? Fiddle SQL
When you call LEFT(...) on the FLOAT value you are converting it to a string representation of the number as it's a string function. If you convert the value to a varchar for example, you'll see what the output is:
SELECT CAST(CAST(#f as float) AS VARCHAR(100))
You get: '4.04565e+007'
So the first 4 characters of that are: '4.04'
The first one is taking the left 4 characters of the exponential representation. Why I don't know.
You are applying a string function [Left()] to a float variable.

How do I cast a type to a bigint in MySQL?

CAST() seems to only work for BINARY,CHAR,DATE;DATETIME,DECIMAL,TIME,SIGNED,UNSIGNED.
I need to convert a hex string to a bigint, that is, I'd want:
SELECT CAST(CONV("55244A5562C5566354',16,10) AS BIGINT)
CONV() returns a string, so that's why I'm trying the convert it. I have 2 uses for this
Inserting data, e.g. INSERT INTO a(foo) SELECT CONV(bar,16,10) FROM ... Here foo is a bigint column, bar a varchar. Perhaps I could get away with the select statement being a string and let MySQL take care of it (?)
Returning data where the client will dynamically learn the data type of the column, SELECT CONV(bar,16,10) is no good as the client will handle it as a string.
SELECT CAST(CONV('55244A5562C5566354',16,10) AS UNSIGNED INTEGER);
What seems to be the problem? I've tested this conversion both on 64-bit and 32-bit system. Works fine. Note, that instead of doing hex to bin conversion, you can just treat the number as hexadecimal.
mysql> SELECT CAST(X'55244A5562C5566354' AS UNSIGNED);
+-----------------------------------------+
| CAST(X'55244A5562C5566354' AS UNSIGNED) |
+-----------------------------------------+
| 2614996416347923284 |
+-----------------------------------------+
1 row in set (0.00 sec)

Inserting text string with hex into PostgreSQL as a bytea

I have a text file with several strings of hex in it:
013d7d16d7ad4fefb61bd95b765c8ceb
007687fc64b746569616414b78c81ef1
I would like to store these in the database as a bytea, instead of a varchar. That is, I would like the database to store 01 as the single byte 00000001, not characters '0' & '1'.
I can easily run this file through sed to format/escape it any way I need to.
This is what I have tried:
create table mytable (testcol BYTEA);
This works:
insert into mytable (testcol) values (E'\x7f\x7f');
However, as soon as I have a byte that goes above \x7f, I get this error:
insert into mytable (testcol) values (E'\x7f\x80');
ERROR: invalid byte sequence for encoding "UTF8": 0x80
Any ideas, or am I approaching things wrong?
You can convert a hex string to bytea using the decode function (where "encoding" means encoding a binary value to some textual value). For example:
select decode('DEADBEEF', 'hex');
decode
------------------
\336\255\276\357
which is more understandable with 9.0's default output:
decode
------------
\xdeadbeef
The reason you can't just say E'\xDE\xAD\xBE\xEF' is that this is intended to make a text value, not a bytea, so Postgresql will try to convert it from the client encoding to the database encoding. You could write the bytea escape format like that, but you need to double the backslashes: E'\\336\\255\\276\\357'::bytea. I think you can see why the bytea format is being changed.... IMHO the decode() function is a reasonable way of writing inputs, even though there is some overhead involved.
INSERT INTO
mytable (testcol)
VALUES
(decode('013d7d16d7ad4fefb61bd95b765c8ceb', 'hex'))
The Ruby Way
I recently needed to read/write binary data from/to Postgres, but via Ruby. Here's how I did it using the Pg library.
Although not strictly Postgres-specific, I thought I'd include this Ruby-centric answer for reference.
Postgres DB Setup
require 'pg'
DB = PG::Connection.new(host: 'localhost', dbname:'test')
DB.exec "CREATE TABLE mytable (testcol BYTEA)"
BINARY = 1
Insert Binary Data
sql = "INSERT INTO mytable (testcol) VALUES ($1)"
param = {value: binary_data, format: BINARY}
DB.exec_params(sql, [param]) {|res| res.cmd_tuples == 1 }
Select Binary Data
sql = "SELECT testcol FROM mytable LIMIT 1"
DB.exec_params(sql, [], BINARY) {|res| res.getvalue(0,0) }
Introduction
This is an updated answer that includes both how to insert but also how to query.
It is possible to convert the hex into a bytea value using the decode function. This should be used for both querying and also inserting.
This can be used for both inserting but also querying.
Example SQL Fiddle
Querying Existing Data
SELECT * FROM mytable WHERE testcol = (decode('013d7d16d7ad4fefb61bd95b765c8ceb', 'hex'));
Encode vs Decode for Querying
A user had asked the following:
How does searching the bytea field by hex value after inserting it?
SELECT * FROM my_table WHERE myHexField =
(encode('013d7d16d7ad4fefb61bd95b765c8ceb', 'hex'));
does not work.
In the documentation Binary String Functions and Operators, they have the description of both encode and decode.
+==================================+=============+=======================================================================================================+=======================================+============+
| Function | Return Type | Description | Example | Result |
+==================================+=============+=======================================================================================================+=======================================+============+
| decode(string text, format text) | bytea | Decode binary data from textual representation in string. Options for format are same as in encode. | decode('123\000456', 'escape') | 123\000456 |
+----------------------------------+-------------+-------------------------------------------------------------------------------------------------------+---------------------------------------+------------+
| encode(data bytea, format text) | text | Encode binary data into a textual representation. Supported formats are: base64, hex, escape. escape  | encode('123\000456'::bytea, 'escape') | 123\000456 |
| | | converts zero bytes and high-bit-set bytes to octal sequences (\nnn) and doubles backslashes. | | |
+----------------------------------+-------------+-------------------------------------------------------------------------------------------------------+---------------------------------------+------------+
So you will notice that Encode is for encoding binary data into a textual string and returns text. However, since we are storing bytea we have to use decode for both inserting and querying.
Inserting
create table mytable (testcol BYTEA);
INSERT INTO
mytable (testcol)
VALUES
(decode('013d7d16d7ad4fefb61bd95b765c8ceb', 'hex'));
From: see previous answer
From: https://www.postgresql.org/docs/current/functions-binarystring.html
INSERT INTO
mytable (testcol)
VALUES
('\x013d7d16d7ad4fefb61bd95b765c8ceb'::bytea);
More and sundry options where testcol is of type bytea:
-- how to insert the string "123[a char of value zero]abc456"
insert into mytable (testcol) values decode(E'123\\000abc456', 'escape');
-- how to insert the string "123abc456"
insert into mytable (testcol) values decode(E'123abc456', 'escape');
-- how to insert in base64: insert string "abc456"
insert into mytable (testcol) values decode('YWJjNDU2', 'base64');