BitTest search in BigQuery (by position) - google-bigquery

I keep number representation of binary data in BigQuery table
I need to be able to search by BitPos and find out if bit on given position in 0 or 1
Oracle analog is BitTest
Use this function to return TRUE (1) if the specified bit in a value is a 1; otherwise return FALSE (0).
Syntax BitTest(Value1, BitPos)
Example: number in DB is 1099511627780
So it is binary 10000000000000000000000000000000000000100
Thus Results are:
BitTest(1099511627780, 1) = 0;
BitTest(1099511627780, 2) = 0;
BitTest(1099511627780, 3) = 1;
Can you help me to find native implementation in BigQuery?
I was looking through doc with no luck
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators

You can create a temporary function that performs this computation using a bit shift and bitwise and. Here is an example:
CREATE TEMP FUNCTION BitTest(value INT64, bit INT64) AS (
value >> (bit - 1) & 0x1 = 1
);
SELECT
value,
bit,
BitTest(value, bit) AS result
FROM (
SELECT 1099511627780 AS value, bit
FROM UNNEST(GENERATE_ARRAY(1, 42)) AS bit
)
ORDER BY bit;
The function BitTest checks whether the bit at the 1-based index is set. The FROM clause in this example generates bit indexes between 1 and 42 to demonstrate what the output is.

Related

BigQuery casting int64 to uint64

I am storing uint64 as INTEGER type in bigquery (values > 2^63 become negative). Is there a way to cast it to it's correct value while querying bigquery?
You will need a type which is big enough to hold UINT64, in BigQuery, the easy option is to use NUMERIC type. That said, I think the correct way to convert it back is through:
CREATE TEMP FUNCTION int64_to_uint64(x INT64) AS
(IF (x < 0,
NUMERIC "18446744073709551616" /* 2^64 */ + x,
/* ELSE */ CAST(x AS NUMERIC)));
-- Check 2 boundaries:
SELECT int64_to_uint64(-1); -- returns 18446744073709551615, 0xFFFFFFFFFFFFFFFF
SELECT int64_to_uint64(-9223372036854775808); -- returns 9223372036854775808, 0x8000000000000000
SELECT int64_to_uint64(12345); -- positive number also works

How to process bitand operation in Informix with column in hex string format

In table I have string column which contains a hex value. For example value '000000000000000a' means 10. Now I need to process bitand operation: bitand(tableName.hexColumn, ?). When I read the Informix specification of this function it needs 2 int. So my question is: what is the simpler way to process this operation?
PS: Probably there is no solution in Informix so I will have to create my own bitandhexstring function where input will be 2 string and hex form but I have no idea where to start.
There are a variety of issues to be dealt with:
Your hex string has 16 digits, so the values are presumably (in general) 64-bit quantities. That means you need to be sure that the BITAND function has a variant that handles BIGINT (or perhaps INT8 — I'm not going to mention INT8 again, but it is nominally an option when BIGINT is mentioned) data.
You need to convert your hex string to a BIGINT.
It is not clear whether you'll need to convert the result BIGINT back to a hex string.
Some testing with Informix 11.70.FC6 on Mac OS X 10.10.4 shows that BITAND is safe with 64-bit numbers. That's good news!
The HEX function, when passed a BIGINT, returns a CHAR(20) string that starts with 0x and contains a hex representation of the number, so that more or less addresses point 3. The residual issue is 'how to convert 16-byte strings of hex digits to a BIGINT value'. Nominally, a cast operation like:
CAST('0xde3962e8c68a8001' AS BIGINT)
should do the job (but see below). There may be a better way of doing it than a brute-force and ignorance stored procedure, but I'm not immediately sure what it is.
Caveat Lector.
While testing this, I tried two queries:
SELECT bi, HEX(bi) FROM Test_BigInt;
SELECT bi, HEX(bi), SUBSTR(HEX(bi), 3, 16) FROM Test_BigInt;
on a table Test_BigInt with a single column bi of type BIGINT (not null, as it happened, but that's not material).
The first query worked fine. The type of the HEX(bi) expression was CHAR(20) and the values were like
0 0x0000000000000000
6898532535585831936 0x5fbc82ca87117c00
-2300268458811555839 0xe013ce0628808001
The second query sort of worked for small values of bi (0, 1, 2), but generated an error -1215: Value exceeds limit of INTEGER precision when the values got large. The problem is not the SUBSTR function directly. This was testing with Informix 11.70.FC6 on Mac OS X 10.10.4 — tested on 2015-07-08. The following pair of queries worked as expected (which is my justification for claiming that the problem is not in the SUBSTR function per se).
SELECT bi, HEX(bi) AS hex_bi FROM Test_BigInt INTO TEMP t;
SELECT bi, hex_bi, SUBSTR(hex_bi, 3, 16) FROM t;
It seems to be an interaction problem when the result of HEX is used in a string operation context. I first got the problem when trying to concatenate an empty string to the result of HEX: HEX(bi) || ''. That turns out to be unnecessary given that the result of HEX is reported as CHAR(20), but also indicates SUBSTR is not directly at fault.
I also tried CAST to get the hex string converted to BIGINT:
SELECT CAST('0xde3962e8c68a8001' AS BIGINT) FROM dual;
BIGINT
-964001791
SELECT HEX(CAST('0xde3962e8c68a8001' AS BIGINT)) FROM dual;
CHAR(18)
0xffffffffc68a8001
Grrr! Something is mishandling the conversion. This is not new software (well over 2 years old), but the chances are that unless someone else has spotted the bug, it has not yet been fixed, even in the latest version.
I've reported this through back-channels to IBM/Informix.
Stored procedures to convert hex string to BIGINT
CREATE PROCEDURE hexval(c CHAR(1)) RETURNING INTEGER;
RETURN INSTR("0123456789abcdef", lower(c)) - 1;
END PROCEDURE;
CREATE PROCEDURE hexstr_to_bigint(ival VARCHAR(18)) RETURNING bigint;
DEFINE oval DECIMAL(20,0);
DEFINE i,j,len INTEGER;
LET ival = LOWER(ival);
IF (ival[1,2] = '0x') THEN LET ival = ival[3,18]; END IF;
LET len = LENGTH(ival);
LET oval = 0;
FOR i = 1 TO len
LET j = hexval(SUBSTR(ival, i, 1));
LET oval = oval * 16 + j;
END FOR;
IF (oval > 9223372036854775807) THEN
LET oval = oval - 18446744073709551616;
END IF;
RETURN oval;
END PROCEDURE;
Casual testing:
execute procedure hexstr_to_bigint('000A');
10
execute procedure hexstr_to_bigint('FFff');
65535
execute procedure hexstr_to_bigint('FFFFffffFFFFffff');
-1
execute procedure hexstr_to_bigint('0XFFFFffffFFFFffff');
-1
execute procedure hexstr_to_bigint('000000000000000A');
10
Those values are correct.

Hex string to integer conversion in Amazon Redshift

Amazon Redshift is based on ParAccel which is based on Postgres. From my research it seems that the preferred way to perform hexadecimal string to integer conversion in Postgres is via a bit field, as outlined in this answer.
In the case of bigint, this would be:
select ('x'||lpad('123456789abcdef',16,'0'))::bit(64)::bigint
Unfortunately, this fails on Redshift with:
ERROR: cannot cast type text to bit [SQL State=42846]
What other ways are there to perform this conversion in Postgres 8.1ish (that's close to the Redshift level of compatibility)? UDFs are not supported in Redshift and neither are array, regex functions or set generating functions...
It looks like they added a function for this at some point: STRTOL
Syntax
STRTOL(num_string, base)
Return type
BIGINT. If num_string is null, returns NULL.
For example
SELECT strtol('deadbeef', 16);
Returns: 3735928559
Assuming that you want a simple digit-by-digit ordinal position conversion (i.e. you're not worried about two's compliment negatives, etc) I think this should work on an 8.1-equivalent DB:
CREATE OR REPLACE FUNCTION hex2dec(text) RETURNS bigint AS $$
SELECT sum(CASE WHEN v >= ascii('a') THEN v - ascii('a') + 10 ELSE v - ascii('0') END * 16^ordpos)::bigint
FROM (
SELECT n-1, ascii(substring(reverse($1), n, 1))
FROM generate_series(1, length($1)) n
) AS x(ordpos, v);
$$ LANGUAGE sql IMMUTABLE;
The function form is optional, it just makes it easier to avoid repeating the argument a bunch of times. It should get inlined anyway. Efficiency will probably be awful, but most of the tools available to do this smarter don't seem to be available on versions that old, and this at least works:
regress=> CREATE TABLE t AS VALUES ('c13b'), ('a'), ('f');
regress=> SELECT hex2dec(column1) FROM t;
hex2dec
---------
49467
10
15
(3 rows)
If you can use regexp_split_to_array and generate_subscripts it might be faster. Or slower. I haven't tried. Another possible trick is to use a digit mapping array instead of the CASE, like:
'[48:102]={0,1,2,3,4,5,6,7,8,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,11,12,13,14,15}'::integer[]
which you can use with:
CREATE OR REPLACE FUNCTION hex2dec(text) RETURNS bigint AS $$
SELECT sum(
('[48:102]={0,1,2,3,4,5,6,7,8,9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,11,12,13,14,15}'::integer[])[ v ]
* 16^ordpos
)::bigint
FROM (
SELECT n-1, ascii(substring(reverse($1), n, 1))
FROM generate_series(1, length($1)) n
) AS x(ordpos, v);
$$ LANGUAGE sql IMMUTABLE;
Personally, I'd do it client-side instead, rather than wrangling the limited capabilities of an old PostgreSQL fork, especially one you can't load your own sensible user-defined C functions on, or use PL/Perl, etc.
In real PostgreSQL I'd just use this:
hex2dec.c:
#include "postgres.h"
#include "fmgr.h"
#include "utils/builtins.h"
#include "errno.h"
#include "limits.h"
#include <stdlib.h>
PG_MODULE_MAGIC;
Datum from_hex(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(hex2dec);
Datum
hex2dec(PG_FUNCTION_ARGS)
{
char *endpos;
const char *hexstr = text_to_cstring(PG_GETARG_TEXT_PP(0));
long decval = strtol(hexstr, &endpos, 16);
if (endpos[0] != '\0')
{
ereport(ERROR, (ERRCODE_INVALID_PARAMETER_VALUE, errmsg("Could not decode input string %s as hex", hexstr)));
}
if (decval == LONG_MAX && errno == ERANGE)
{
ereport(ERROR, (ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE, errmsg("Input hex string %s overflows int64", hexstr)));
}
PG_RETURN_INT64(decval);
}
Makefile:
MODULES = hex2dec
DATA = hex2dec--1.0.sql
EXTENSION = hex2dec
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
hex2dec.control:
comment = 'Utility function to convert hex strings to decimal'
default_version = '1.0'
module_pathname = '$libdir/hex2dec'
relocatable = true
hex2dec--1.0.sql:
CREATE OR REPLACE FUNCTION hex2dec(hexstr text) RETURNS bigint
AS 'hex2dec','hex2dec'
LANGUAGE c IMMUTABLE STRICT;
COMMENT ON FUNCTION hex2dec(hexstr text)
IS 'Decode the hex string passed, which may optionally have a leading 0x, as a bigint. Does not attempt to consider negative hex values.';
Usage:
CREATE EXTENSION hex2dec;
postgres=# SELECT hex2dec('7fffffffffffffff');
hex2dec
---------------------
9223372036854775807
(1 row)
postgres=# SELECT hex2dec('deadbeef');
hex2dec
------------
3735928559
(1 row)
postgres=# SELECT hex2dec('12345');
hex2dec
---------
74565
(1 row)
postgres=# select hex2dec(to_hex(-1));
hex2dec
------------
4294967295
(1 row)
postgres=# SELECT hex2dec('8fffffffffffffff');
ERROR: Input hex string 8fffffffffffffff overflows int64
postgres=# SELECT hex2dec('0x7abcz123');
ERROR: Could not decode input string 0x7abcz123 as hex
The performance difference is ... noteworthy. Given sample data:
CREATE TABLE randhex AS
SELECT '0x'||to_hex( abs(random() * (10^((random()-.5)*10)) * 10000000)::bigint) AS h
FROM generate_series(1,1000000);
conversion from hex to decimal takes about 1.3 from a warm cache using the C extension, which isn't great for a million rows. Reading them without any transformation takes 0.95s. It took 36 seconds for the SQL based hex2dec approach to process the same rows. Frankly I'm really impressed that the SQL approach was as fast as that, and surprised the C ext was that slow.
A likely explanation is that the cast from text to bit(n) relies on undocumented behavior, I repeat the quote from Tom Lane:
This is relying on some undocumented behavior of the bit-type input
converter, but I see no reason to expect that would break. A possibly
bigger issue is that it requires PG >= 8.3 since there wasn't a text
to bit cast before that.
And Amazon derivate is obviously not allowing this undocumented feature. Not surprising, since it is based off of Postgres 8.1 where there was no cast at all.
Previously quoted in this closely related answer:
Convert hex in text representation to decimal number

SQL Server - evaluate a function in a dynamic query

I have a piece of dynamic SQL inside part of which retrieves a function dependent on other results from the query, but also uses these results to evaluate this function. I know eval() does not exist in SQL so what do I use?
A very simplified version
select reading, functiontype, #result = eval(f.functionformula)
from readingstables r
join functiontable f on (r.functiontype = f.functiontype)
So basically (note these are only example formulae) I want to use the functionformula which is related to a set of readings via the formulatype
if f.functiontype == 'A' then f.functionformula = reading * reading
if f.functiontype == 'B' then f.functionformula = reading * costant / anothervalue
//etc etc
The real version is a huge piece of dynamic SQL in a stored procedure that drives a cursor. I would prefer to do it in one query but suspect I might have to compromise and have a second dynamic query driven from the first.
Why not simply use the POWER function:
Case functionType
When 'A' Then Power( reading, 2 )
When 'B' Then Power( reading, 3 )
...
End
You could even get super fancy like so:
Power( reading, Ascii( functionType ) - Ascii('A') + 2 )
Edit
Given your change to your OP, beyond dynamic SQL, there is no way to dynamically execute a function call. You could create a UDF which takes the function type parameter and executes the correct expression however the UDF itself would need to be a large Case expression.
Create Function FunctionTypeExpression( #FunctionType char(1) )
Returns float
As
Return Case #FunctionType
When 'A' Then ..expression 1
When 'B' Then ..expression 2
...
One note in this, you will need to make the return value of the function compatible with any possible return type from the expressions. Hopefully, they are all numeric. If they are not all numeric (or all text), then a more detailed explanation for why this is not the case would be needed.

SQL update to a table based on a flag word?

I've got a field in my DB that's an arbitrary value on a per-row basis, and I'd like to add X to this. I'd only like to add X if a flag word (held as an int in this row) has the 2nd and 10th bits set true. Is it possible to create an SQL statement to do this for every row in the table? Or do I have to iterate through my entire table?
Using MySQL (5.5)
Bonus points question: I say add X based on a flag, but there's also a scaling factor. For example, based on a value of bits 20-12 interpreted as a short unsigned integer, I'd really like to assign:
value = value + ('X' * thatShort * (bit2 and bit10));
In MS SQL:
update MyTable
set Field1 = Field1 + 'X'
where Field2 & 0x202 = 0x202
[EDIT]
value = value (X * (field & 0x1FF800 >> 12) * 0x202)
0x1FF800 - is the mask from 12 to 20.
>> 11 - shift it to remove bits from 0 to 12.
since you are filtering by bit2 and bit10 set, then (bit2 and bit10) = 0x202
Hope this will answer your question. Not sure how you are going to grant 'bonus points' though :).