I'm trying to calculate a hash from a string for best-effort ordering and partioning purposes in Athena. There is no String to hashCode() similar in Athena, so as a best effort, I try to get the 2nd character and calculate its codepoint and get the modulus. (As I said, best effort, maybe a nice effort)
Consider the query:
SELECT
doc_id,
substring(doc_id, 2, 1),
typeof(split(substring(doc_id, 2, 1)))
FROM events LIMIT 100
The 3rd row returns a varchar but the codepoint function expects a varchar(1) and casting it does not work as cast(substring(doc_id, 2, 1) as varchar(1)).
FUNCTION_NOT_FOUND: line 6:5: Unexpected parameters (varchar) for function codepoint. Expected: codepoint(varchar(1))
How can I accomplish this task without modifiying the data source? I'm open to ideas.
You can compute a hash code with the xxhash64 function. It takes a varbinary as input, so first cast the string to that type. Since the function also returns a 64-bit varbinary value, you can convert it to a bigint via the from_big_endian_64 function
WITH t(x) AS (VALUES 'hello')
SELECT from_big_endian_64(xxhash64(cast(x AS varbinary)))
FROM t
output:
_col0
---------------------
2794345569481354659
(1 row)
I'm trying to display a table in DataStudio plugged on a BigQuery Table. Where I have a String field, and a Struct of 2 Arrays. This is where my issue is.
When I want to include both of my arrays from the struct, the table kind of time out and shows a connection error. Whereas when I try to include on of them independently there are no issues.
This kind of struct is not supported in DataStudio? Or am I doing something wrong? Thank you.
It doesn't support it. You have to transform it on go in SELECT clause.
If you want to concatenate all strings from repeated string field you can use ARRAY_TO_STRING:
ARRAY_TO_STRING(recos.reco_sku)
or for integers, you have to cast them into a string and then concatenate them
ARRAY_TO_STRING(
ARRAY(
SELECT
CAST(i AS STRING)
FROM
UNNEST(recos.nb_asso) AS i WITH OFFSET o
ORDER BY
o
)
)
Otherwise, you can explode your array with LEFT/CROSS JOIN + UNNEST and make rows flat for each array entry.
Is there a way in BigQuery to convert a hex string to a decimal value?
Something like:
select hex("ff")
CAST now supports converting hexadecimal strings to INT64 or FLOAT64 values, even though it's not specified in their reference
Here's how you use it:
SELECT
CAST(columnA as FLOAT64) as float,
CAST(columnB as INT64) as int
FROM table
This should work, but it doesn't (I'm filing a feature request):
SELECT INTEGER('0xffff')
In the meantime, this does work:
SELECT FLOAT('0xffff')
255.0
For integer results:
SELECT INTEGER(FLOAT('0xffff'))
255
Looking into the query reference, I'd say no.
You have "HEX_STRING()" which does the opposite, but all the string to number functions seem to not take hex.
I've a set of data in a postgresql DB, where one of these columns store string data in float format, but now I need remove the decimal component of the string. How can I do this using an sql update statement in my BD console? Is that possible?
for example:
"25.3" -> "25"
If it does not possible how can I do this?
Thanks in advance.
You would be better suited casting the columns that were text, to numeric, to integer, so that rounding is taken into consideration e.g.
SELECT '25.3'::numeric::integer AS num1, '25.5'::numeric::integer AS num2
which would return integers of 25 and 26 respectively.
If you were not concerned with the digits following the point, the floor(column_name::numeric)::integer function or a substring, as mentioned, should be fine.
Since it is a string, you can use string functions to drop the after decimal digits.
If you do not want to round them off, just drop the decimal part then use -
update table_name
set column_name = substring(column_name from 1 for position('.' in column_name)-1);
If you want rounding off, then you can use the cast as mentioned by #mlinth.
I have a VARCHAR of numbers inside my stored procedure, these numbers are organized as arrays, I will show an example below:
{1,2,3,4,5,6,7,8,9},{1,2,3,4,5},{1,2,3},{9} -- This is a STRING
I want to do a FOR loop to select every time a substring from this set between {} and convert this to an array of integers.
So at first time inside my loop I will have:
{1,2,3,4,5,6,7,8,9}
So I will use array_to_string to convert this to an integer[]
At second time I will have:
{1,2,3,4,5}
and keep going using array_to_string
Any tips? Careful, because unfortunately I'm using PostgreSQL 8.3!
You could do it in a single statement:
SELECT string_to_array(unnest(string_to_array(
trim('{1,2,3,4,5,6,7,8,9},{1,2,3,4,5},{1,2,3},{9}', '{}')
, '},{')), ',')::int[]
.. in Postgres 8.4 or later. 8.3 has reached EOL. Urgently consider an upgrade.
However, there is regexp_split_to_table() in 8.3 already:
SELECT string_to_array(regexp_split_to_table(
trim('{1,2,3,4,5,6,7,8,9},{1,2,3,4,5},{1,2,3},{9}', '{}')
, '},{'), ',')::int[]
-> SQLfiddle demo for Postgres 8.3.
For looping the array, consider this related answer:
Postgres - array for loop