BigQuery casting int64 to uint64 - google-bigquery

I am storing uint64 as INTEGER type in bigquery (values > 2^63 become negative). Is there a way to cast it to it's correct value while querying bigquery?

You will need a type which is big enough to hold UINT64, in BigQuery, the easy option is to use NUMERIC type. That said, I think the correct way to convert it back is through:
CREATE TEMP FUNCTION int64_to_uint64(x INT64) AS
(IF (x < 0,
NUMERIC "18446744073709551616" /* 2^64 */ + x,
/* ELSE */ CAST(x AS NUMERIC)));
-- Check 2 boundaries:
SELECT int64_to_uint64(-1); -- returns 18446744073709551615, 0xFFFFFFFFFFFFFFFF
SELECT int64_to_uint64(-9223372036854775808); -- returns 9223372036854775808, 0x8000000000000000
SELECT int64_to_uint64(12345); -- positive number also works

Related

Cast a hexadecimal string to an array of bigint in hive

I have a column that contains a length 16 hexademical string. I would like to convert it to a bigint. Is there any way to accomplish that? The usual approach returns null since the input string could represent a number > 2^63-1.
select
cast(conv(hash_col, 16, 10) as bigint) as p0,
conv(hash_col, 16, 10) as c0
from mytable limit 10
I have also tried using unhex(..),
cast(unhex(hash_col) as bigint) as p0 from mytable limit 10
but got the following error
No matching method for class org.apache.hadoop.hive.ql.udf.UDFToLong
with (binary). Possible choices: FUNC(bigint) FUNC(boolean)
FUNC(decimal(38,18)) FUNC(double) FUNC(float) FUNC(int) FUNC(smallint) FUNC(string) FUNC(timestamp) FUNC(tinyint) FUNC(void)
If I don't do the cast(.. as bigint) part, I get some undisplayable binary value for p0. It seems unhex is not exactly the inverse of hex in hive.
Your values are out of range for BigInt
Ref : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types
Max range for BigInt is 9,223,372,036,854,775,807
Use decimal(20,0) instead.
select cast(conv('85A58F8B014692CA',16,10) as decimal(20,0))

BitTest search in BigQuery (by position)

I keep number representation of binary data in BigQuery table
I need to be able to search by BitPos and find out if bit on given position in 0 or 1
Oracle analog is BitTest
Use this function to return TRUE (1) if the specified bit in a value is a 1; otherwise return FALSE (0).
Syntax BitTest(Value1, BitPos)
Example: number in DB is 1099511627780
So it is binary 10000000000000000000000000000000000000100
Thus Results are:
BitTest(1099511627780, 1) = 0;
BitTest(1099511627780, 2) = 0;
BitTest(1099511627780, 3) = 1;
Can you help me to find native implementation in BigQuery?
I was looking through doc with no luck
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators
You can create a temporary function that performs this computation using a bit shift and bitwise and. Here is an example:
CREATE TEMP FUNCTION BitTest(value INT64, bit INT64) AS (
value >> (bit - 1) & 0x1 = 1
);
SELECT
value,
bit,
BitTest(value, bit) AS result
FROM (
SELECT 1099511627780 AS value, bit
FROM UNNEST(GENERATE_ARRAY(1, 42)) AS bit
)
ORDER BY bit;
The function BitTest checks whether the bit at the 1-based index is set. The FROM clause in this example generates bit indexes between 1 and 42 to demonstrate what the output is.

Bigquery: INTEGER type overflow

I'm experiencing problem with INTEGER type. It oveflows and where is no way to prevent it (as it's 64 bit unsigned int). The worst thing it oveflows with no error, just becoming negative number
SELECT 9223372036854775807 + 1
Is there any possibility to overcome this issue (maybe google has plans to introduce new int types)?
BigQuery will provide an option for SQL to raise error in such cases (integer overflow, division by zero etc)
You can detect such conditions and use e.g. NULL as an error indicator, at the cost of more typing.
Something like (assuming you are adding up two non-negative values):
select if(a + b >= a, a + b, NULL) from
( -- sample data
select 9223372036854775807 as a, 1 as b
)

Hex string to integer conversion in Amazon Redshift

Amazon Redshift is based on ParAccel which is based on Postgres. From my research it seems that the preferred way to perform hexadecimal string to integer conversion in Postgres is via a bit field, as outlined in this answer.
In the case of bigint, this would be:
select ('x'||lpad('123456789abcdef',16,'0'))::bit(64)::bigint
Unfortunately, this fails on Redshift with:
ERROR: cannot cast type text to bit [SQL State=42846]
What other ways are there to perform this conversion in Postgres 8.1ish (that's close to the Redshift level of compatibility)? UDFs are not supported in Redshift and neither are array, regex functions or set generating functions...
It looks like they added a function for this at some point: STRTOL
Syntax
STRTOL(num_string, base)
Return type
BIGINT. If num_string is null, returns NULL.
For example
SELECT strtol('deadbeef', 16);
Returns: 3735928559
Assuming that you want a simple digit-by-digit ordinal position conversion (i.e. you're not worried about two's compliment negatives, etc) I think this should work on an 8.1-equivalent DB:
CREATE OR REPLACE FUNCTION hex2dec(text) RETURNS bigint AS $$
SELECT sum(CASE WHEN v >= ascii('a') THEN v - ascii('a') + 10 ELSE v - ascii('0') END * 16^ordpos)::bigint
FROM (
SELECT n-1, ascii(substring(reverse($1), n, 1))
FROM generate_series(1, length($1)) n
) AS x(ordpos, v);
$$ LANGUAGE sql IMMUTABLE;
The function form is optional, it just makes it easier to avoid repeating the argument a bunch of times. It should get inlined anyway. Efficiency will probably be awful, but most of the tools available to do this smarter don't seem to be available on versions that old, and this at least works:
regress=> CREATE TABLE t AS VALUES ('c13b'), ('a'), ('f');
regress=> SELECT hex2dec(column1) FROM t;
hex2dec
---------
49467
10
15
(3 rows)
If you can use regexp_split_to_array and generate_subscripts it might be faster. Or slower. I haven't tried. Another possible trick is to use a digit mapping array instead of the CASE, like:
'[48:102]={0,1,2,3,4,5,6,7,8,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,11,12,13,14,15}'::integer[]
which you can use with:
CREATE OR REPLACE FUNCTION hex2dec(text) RETURNS bigint AS $$
SELECT sum(
('[48:102]={0,1,2,3,4,5,6,7,8,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,11,12,13,14,15}'::integer[])[ v ]
* 16^ordpos
)::bigint
FROM (
SELECT n-1, ascii(substring(reverse($1), n, 1))
FROM generate_series(1, length($1)) n
) AS x(ordpos, v);
$$ LANGUAGE sql IMMUTABLE;
Personally, I'd do it client-side instead, rather than wrangling the limited capabilities of an old PostgreSQL fork, especially one you can't load your own sensible user-defined C functions on, or use PL/Perl, etc.
In real PostgreSQL I'd just use this:
hex2dec.c:
#include "postgres.h"
#include "fmgr.h"
#include "utils/builtins.h"
#include "errno.h"
#include "limits.h"
#include <stdlib.h>
PG_MODULE_MAGIC;
Datum from_hex(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(hex2dec);
Datum
hex2dec(PG_FUNCTION_ARGS)
{
char *endpos;
const char *hexstr = text_to_cstring(PG_GETARG_TEXT_PP(0));
long decval = strtol(hexstr, &endpos, 16);
if (endpos[0] != '\0')
{
ereport(ERROR, (ERRCODE_INVALID_PARAMETER_VALUE, errmsg("Could not decode input string %s as hex", hexstr)));
}
if (decval == LONG_MAX && errno == ERANGE)
{
ereport(ERROR, (ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE, errmsg("Input hex string %s overflows int64", hexstr)));
}
PG_RETURN_INT64(decval);
}
Makefile:
MODULES = hex2dec
DATA = hex2dec--1.0.sql
EXTENSION = hex2dec
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
hex2dec.control:
comment = 'Utility function to convert hex strings to decimal'
default_version = '1.0'
module_pathname = '$libdir/hex2dec'
relocatable = true
hex2dec--1.0.sql:
CREATE OR REPLACE FUNCTION hex2dec(hexstr text) RETURNS bigint
AS 'hex2dec','hex2dec'
LANGUAGE c IMMUTABLE STRICT;
COMMENT ON FUNCTION hex2dec(hexstr text)
IS 'Decode the hex string passed, which may optionally have a leading 0x, as a bigint. Does not attempt to consider negative hex values.';
Usage:
CREATE EXTENSION hex2dec;
postgres=# SELECT hex2dec('7fffffffffffffff');
hex2dec
---------------------
9223372036854775807
(1 row)
postgres=# SELECT hex2dec('deadbeef');
hex2dec
------------
3735928559
(1 row)
postgres=# SELECT hex2dec('12345');
hex2dec
---------
74565
(1 row)
postgres=# select hex2dec(to_hex(-1));
hex2dec
------------
4294967295
(1 row)
postgres=# SELECT hex2dec('8fffffffffffffff');
ERROR: Input hex string 8fffffffffffffff overflows int64
postgres=# SELECT hex2dec('0x7abcz123');
ERROR: Could not decode input string 0x7abcz123 as hex
The performance difference is ... noteworthy. Given sample data:
CREATE TABLE randhex AS
SELECT '0x'||to_hex( abs(random() * (10^((random()-.5)*10)) * 10000000)::bigint) AS h
FROM generate_series(1,1000000);
conversion from hex to decimal takes about 1.3 from a warm cache using the C extension, which isn't great for a million rows. Reading them without any transformation takes 0.95s. It took 36 seconds for the SQL based hex2dec approach to process the same rows. Frankly I'm really impressed that the SQL approach was as fast as that, and surprised the C ext was that slow.
A likely explanation is that the cast from text to bit(n) relies on undocumented behavior, I repeat the quote from Tom Lane:
This is relying on some undocumented behavior of the bit-type input
converter, but I see no reason to expect that would break. A possibly
bigger issue is that it requires PG >= 8.3 since there wasn't a text
to bit cast before that.
And Amazon derivate is obviously not allowing this undocumented feature. Not surprising, since it is based off of Postgres 8.1 where there was no cast at all.
Previously quoted in this closely related answer:
Convert hex in text representation to decimal number

How to reduce the float length

Using SQL Server 2000
I want to reduce the decimal length
Query
Select 23/12 as total
Output is showing as 1.99999999999
I don't want to round the value, I want to diplay like this 1.99
Tried Query
Select LEFT(23/12, LEN(23/12) - 3) as total
The above query is working only if there is decimal value like 12.444444, but if the total is single digit means like 12 or 4 or 11...., i am getting error at run time.
How to do this.
Need Query Help
There is a very simple solution. You can find it in BOL. Round takes an optional 3rd argument, which is round type. The values are round or truncate.
ROUND numeric_expression , length [ ,function ] )
...
function Is the type of operation to perform. function must be
tinyint, smallint, or int. When function is omitted or has a value of
0 (default), numeric_expression is rounded. When a value other than 0
is specified, numeric_expression is truncated.
So just do
Select ROUND(cast(23 as float)/12, 2, 1) as total
That gives 1.91. Note, if you were really seeing 1.999 - something is really wrong with your computer. 23/12 = 1.916666666(ad infinitum). You need to cast one of the numbers as float since sql is assuming they're integers and doing integer division otherwise. You can of course cast them both as float, but as long as one is float the other will be converted too.
Not terribly elegant, but works for all cases:
Select CONVERT(float,LEFT(CONVERT(nvarchar, 23.0/12.0),CHARINDEX('.',CONVERT(nvarchar, 23.0/12.0)) + 2)) as total
Scalar Function
-- Description: Truncate instead of rounding a float
-- SELECT dbo.TruncateNumber(23.0/12.0,2)
-- =============================================
CREATE FUNCTION TruncateNumber
(
-- Add the parameters for the function here
#inFloat float,
#numDecimals smallint
)
RETURNS float
AS
BEGIN
IF (#numDecimals < 0)
BEGIN
SET #numDecimals = 0
END
-- Declare the return variable here
RETURN CONVERT(float,LEFT(CONVERT(nvarchar, #inFloat),CHARINDEX('.',CONVERT(nvarchar, #inFloat)) + #numDecimals))
END
GO