What to try to get BigQuery to CAST BYTES to STRING? - sql

BigQuery Standard SQL documentation suggests that BYTE fields can be coerced into STRINGS.
We have a byte field that is the result of SHA256 hashing a field using BigQuery itself.
We now want to coerce it to a STRING, yet when we run "CAST(field_name to STRING)" we get an error:
Query Failed Error: Invalid cast of bytes to UTF8 string
What is preventing us from getting a string from this byte field? Is it surmountable? If so, what is the solution?

Below example should show you an idea
#standardSQL
WITH t AS (
SELECT SHA256('abc') x
)
SELECT x, TO_BASE64(x)
FROM t
in short - you can use TO_BASE64() for this

If you want to see the "traditional" representation of the hash in String, you have to use TO_HEX() function.
WITH table AS (
SELECT SHA256('abc') as bytes_field
)
SELECT bytes_field, TO_HEX(bytes_field) as string_field
FROM table
By default in the UI, BigQuery shows you the base64 representation but if you want to compare it with other sha256 function from other language, for example, you have to use TO_HEX()

You can try SAFE_CONVERT_BYTES_TO_STRING() function.
reference: SAFE_CONVERT_BYTES_TO_STRING

Related

How to convert a hash string to an integer in Snowflake?

I'm trying to get a hash of a decimal value and convert it to an integer. But the query results in the following error:
Numeric value 'b902cc4550838229a710bfec4c38cbc7eb11082367a409df9135e7f007a96bda' is not recognized
SELECT (CAST(sha2(TO_VARCHAR(ABS(12.5)), 256) AS INTEGER) % 100) AS temp_value
What is the correct way to convert a hash string to an integer in Snowflake?
I can not use any user defined functions. And have to go with Snowflake native functions.
The hash value contains alphabetic character so it will throw an error
SELECT --(CAST(
sha2(
TO_VARCHAR(
ABS(12.5)), 256)-- AS INTEGER) % 100)
AS temp_value;
You need to convert the hex value from the hash encoding to be int.
I've not been able to find a function built into Snowflake that does this, but if you have a look in the following link, it will explain how to create a javascript function to do the conversion for you:
https://snowflakecommunity.force.com/s/article/faq-does-snowflake-have-a-hex-to-int-type-function
If you use the function in the link, then your code becomes something like this:
SELECT (CAST(js_hextoint(sha2(TO_VARCHAR(ABS(12.5)), 256)) AS INTEGER) % 100) AS temp_value
I've not been able to test the above code I'm afraid, so there may be a bracket in the wrong place...
You have a 56 digit hexadecimal number. It's not going to fit into the maximum numeric precision of 38. You could use a floating point number, but that will lose precision.
create or replace function CONV(VALUE_IN string, OLD_BASE float, NEW_BASE float)
returns string
language javascript
as
$$
// Usage note: Loses precision for very large inputs
return parseInt(VALUE_IN, Math.floor(OLD_BASE).toString(Math.floor(NEW_BASE)));
$$;
select conv('b902cc4550838229a710bfec4c38cbc7eb11082367a409df9135e7f007a96bda', 16, 10);
--Returns 8.368282050700398e+76
For hex to integer, check Does snowflake have a ‘hex’ to ‘int’ type native function?. My guess is that most people checking this question (1k views) are looking for that.
But this specific question wants to convert a sha2 digest to integer for comparison purposes. My advice for that specific question is "don't".
That's because the hex string in the question represent the integer 83682820507003986697271120393377917644380327831689630185856829040117055843290, which is too much to handle even by Java's BigInteger.
Instead, just compare strings/binary to check if the values match or not.

parameterized query with long string

I have a parametrized SQL query that I want to execute from (local) R on Exasol database as described here:
https://db.rstudio.com/best-practices/run-queries-safely/#parameterized-queries.
with tab as
(select
t.*,
position(value in ?) as pos
from MY_TABLE t
)
select * from tab where pos > 0;
The value that is passed to ? is a (long) string. When this string is 2000 characters long or less, everything works fine. When I increase it to 2001 characters, I get an error:
Error in result_bind(res#ptr, as.list(params)) :
nanodbc/nanodbc.cpp:1587: 40001: [EXASOL][EXASolution driver]GlobalTransactionRollback
msg: data exception - string data, right truncation. (Session: 1640027176042911503)
I guess the source of the problem is that my parameter is recognized as CHAR and not as VARCHAR.
The Exasol User Manual states:
"The length of both types is limited to 2,000 characters (CHAR) and 2,000,000 characters (VARCHAR), respectively".
Is there any way to cast ? to VARCHAR?
If you establish your db connection via ODBC you could try having a look at these parameters:
MAXPARAMSIZE and DEFAULTPARAMSIZE.
Probably, if you set DEFAULTPARAMSIZE to a higher value in the odbc config:
https://docs.exasol.com/connect_exasol/drivers/odbc/using_odbc.htm?Highlight=varchar
The problem above has been present when I tried using the first suggested method for running parametrized queries described in tutorial here: https://db.rstudio.com/best-practices/run-queries-safely/. This first approach uses a combination of functions dbSendQuery() and dbBind().
My problem with long strings has been solved when I switched to the second (less safe) method which uses the sqlInterpolate() function.

Convert String to array and validate size on Vertica

I need to execute a SQL query, which converts a String column to a Array and then validate the size of that array
I was able to do it easily with postgresql:
e.g.
select
cardinality(string_to_array('a$b','$')),
cardinality(string_to_array('a$b$','$')),
cardinality(string_to_array('a$b$$$$$','$')),
But for some reason trying to convert String on vertica to array is not that simple, Saw this links:
https://www.vertica.com/blog/vertica-quick-tip-dynamically-split-string/
https://forum.vertica.com/discussion/239031/how-to-create-an-array-in-vertica
And much more that non of them helped.
I also tried using:
select REGEXP_COUNT('a$b$$$$$','$')
But i get an incorrect value - 1.
How can i Convert String to array on Vertica and gets his Length ?
$ has a special meaning in a regular expression. It represents the end of the string.
Try escaping it:
select REGEXP_COUNT('a$b$$$$$', '[$]')
You could create a UDx scalar function (UDSF) in Java, C++, R or Python. The input would be a string and the output would be an integer. https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/ExtendingVertica/UDx/ScalarFunctions/ScalarFunctions.htm
This will allow you to use language specific array logic on the strings passed in. For example in python, you could include this logic:
input_list = input.split("$")
filtered_input_list = list(filter(None, input_list))
list_count = len(filtered_input_list)
These examples are a good starting point for writing UDx's for Vertica. https://github.com/vertica/UDx-Examples
I wasn't able to convert to an array - but Im able to get the length of the values
What i do is convert to Rows an use count - its not best performance wise
But with this way Im able to do also manipulation like filtering of each value between delimiter - and i dont need to use [] for characters like $
select (select count(1)
from (select StringTokenizerDelim('a$b$c','$') over ()) t)
Return 3

ERROR: function regexp_matches(jsonb, unknown) does not exist in Tableau but works elsewhere

I have a column called "Bakery Activity" whose values are all JSONs that look like this:
{"flavors": [
{"d4js95-1cc5-4asn-asb48-1a781aa83": "chocolate"},
{"dc45n-jnsa9i-83ysg-81d4d7fae": "peanutButter"}],
"degreesToCook": 375,
"ingredients": {
"d4js95-1cc5-4asn-asb48-1a781aa83": [
"1nemw49-b9s88e-4750-bty0-bei8smr1eb",
"98h9nd8-3mo3-baef-2fe682n48d29"]
},
"numOfPiesBaked": 1,
"numberOfSlicesCreated": 6
}
I'm trying to extract the number of pies baked with a regex function in Tableau. Specifically, this one:
REGEXP_EXTRACT([Bakery Activity], '"numOfPiesBaked":"?([^\n,}]*)')
However, when I try to throw this calculated field into my text table, I get an error saying:
ERROR: function regexp_matches(jsonb, unknown) does not exist;
Error while executing the query
Worth noting is that my data source is PostgreSQL, which Tableau regex functions support; not all of my entries have numOfPiesBaked in them; when I run this in a simulator I get the correct extraction (actually, I get "numOfPiesBaked": 1" but removing the field name is a problem for another time).
What might be causing this error?
In short: Wrong data type, wrong function, wrong approach.
REGEXP_EXTRACT is obviously an abstraction layer of your client (Tableau), which is translated to regexp_matches() for Postgres. But that function expects text input. Since there is no assignment cast for jsonb -> text (for good reasons) you have to add an explicit cast to make it work, like:
SELECT regexp_matches("Bakery Activity"::text, '"numOfPiesBaked":"?([^\n,}]*)')
(The second argument can be an untyped string literal, Postgres function type resolution can defer the suitable data type text.)
Modern versions of Postgres also have regexp_match() returning a single row (unlike regexp_matches), which would seem like the better translation.
But regular expressions are the wrong approach to begin with.
Use the simple json/jsonb operator ->>:
SELECT "Bakery Activity"->>'numOfPiesBaked';
Returns '1' in your example.
If you know the value to be a valid integer, you can cast it right away:
SELECT ("Bakery Activity"->>'numOfPiesBaked')::int;
I found an easier way to handle JSONB data in Tableau.
Firstly, make a calculated field from the JSONB field and convert the field to a string by using str([FIELD_name]) command.
Then, on the calculated field, make another calculated field and use function:
REGEXP_EXTRACT([String_Field_Name], '"Key_to_be_extracted":"?([^\n,}]*)')
The required key-value pair will form the second caluculated field.

BigQuery COALESCE/IFNULL type mismatch with literals

In SQL I usually use COALESCE and IFNULL to ensure that I get numbers and not NULL when my queries contain aggregate functions like COUNT and SUM, for example:
SELECT IFNULL(COUNT(foo), 0) AS foo_count FROM …
However, in BigQuery I run into an error:
Argument type mismatch in function IFNULL: 'f0_' is type uint64, '0' is type int32.
Is there a way to make BigQuery understand that a literal 0 should be interpreted as a unit64 in this context?
I've tried using CAST, but there's no unit64 type I can cast to, so I try INTEGER:
SELECT IFNULL(COUNT(foo), CAST(0 AS INTEGER)) AS foo_count FROM …
That gives me basically the same error, but at least I've successfully gotten a 64-bit zero instead of a 32-bit:
Argument type mismatch in function IFNULL: 'f0_' is type uint64, '0' is type int64.
The same happens if I use INTEGER(0).
I can get it to work if I cast both arguments to INTEGER:
SELECT IFNULL(INTEGER(COUNT(foo)), INTEGER(0)) AS foo_count FROM …
But now it starts to be verbose. Is this really how you're supposed to do it in BigQuery?
This is a bug in BigQuery which has been around for quite some time. For the time being you need to force the conversion of the COUNT, but you shouldn't need to do it for your "0".
The following should work:
SELECT IFNULL(INTEGER(COUNT(foo)), 0) AS foo_count FROM
Thanks #Kinaan Khan Sherwani for the link to the official bug report.