Convert String to array<int> in Hive

Convert String to array<int> in Hive - hive

I have a integer array represented with String. For example,
"[1,2,2,3]"
And the field type in Hive table is the array integer, I was wondering if there is any Hive build-in UDF that can cast the above string into the array integer.
Thanks

tl;dr I do not know of a Hive UDF that will do this for you, and casting on your own can be gross.
No, there isn't a UDF. As for building your own solution:
Casting to array[string] would be easy enough - just drop the square brackets using regexp_replace and split the resulting string on ,.
The problem is converting an array[string] to array[int] for arrays of arbitrary size. You can individually cast the array elements one by one:
hive> select id, my_array from array_table limit 3;
OK
10023307 ["0.20296966","0.17753501","-0.03543373"]
100308007 ["0.16155224","0.1945944","0.09167781"]
100384207 ["0.025892768","0.023214806","-0.003712816"]
hive> select array(cast(my_array[0] as double), cast(my_array[1] as double), cast(my_array[2] as double)) from array_table limit 3;
OK
[0.20296966,0.17753501,-0.03543373]
[0.16155224,0.1945944,0.09167781]
[0.025892768,0.023214806,-0.003712816]
but this approach only works because I know I have arrays of length 3.

Related

Calculating hash integer from a string in Athena

I'm trying to calculate a hash from a string for best-effort ordering and partioning purposes in Athena. There is no String to hashCode() similar in Athena, so as a best effort, I try to get the 2nd character and calculate its codepoint and get the modulus. (As I said, best effort, maybe a nice effort)
Consider the query:
SELECT
doc_id,
substring(doc_id, 2, 1),
typeof(split(substring(doc_id, 2, 1)))
FROM events LIMIT 100
The 3rd row returns a varchar but the codepoint function expects a varchar(1) and casting it does not work as cast(substring(doc_id, 2, 1) as varchar(1)).
FUNCTION_NOT_FOUND: line 6:5: Unexpected parameters (varchar) for function codepoint. Expected: codepoint(varchar(1))
How can I accomplish this task without modifiying the data source? I'm open to ideas.

You can compute a hash code with the xxhash64 function. It takes a varbinary as input, so first cast the string to that type. Since the function also returns a 64-bit varbinary value, you can convert it to a bigint via the from_big_endian_64 function
WITH t(x) AS (VALUES 'hello')
SELECT from_big_endian_64(xxhash64(cast(x AS varbinary)))
FROM t
output:
_col0
---------------------
2794345569481354659
(1 row)

Google DataStudio on BigQuery Data, How to display Struct of Arrays

I'm trying to display a table in DataStudio plugged on a BigQuery Table. Where I have a String field, and a Struct of 2 Arrays. This is where my issue is.
When I want to include both of my arrays from the struct, the table kind of time out and shows a connection error. Whereas when I try to include on of them independently there are no issues.
This kind of struct is not supported in DataStudio? Or am I doing something wrong? Thank you.

It doesn't support it. You have to transform it on go in SELECT clause.
If you want to concatenate all strings from repeated string field you can use ARRAY_TO_STRING:
ARRAY_TO_STRING(recos.reco_sku)
or for integers, you have to cast them into a string and then concatenate them
ARRAY_TO_STRING(
ARRAY(
SELECT
CAST(i AS STRING)
FROM
UNNEST(recos.nb_asso) AS i WITH OFFSET o
ORDER BY
o
)
)
Otherwise, you can explode your array with LEFT/CROSS JOIN + UNNEST and make rows flat for each array entry.

BigQuery - function to handle hex string

Is there a way in BigQuery to convert a hex string to a decimal value?
Something like:
select hex("ff")

CAST now supports converting hexadecimal strings to INT64 or FLOAT64 values, even though it's not specified in their reference
Here's how you use it:
SELECT
CAST(columnA as FLOAT64) as float,
CAST(columnB as INT64) as int
FROM table

This should work, but it doesn't (I'm filing a feature request):
SELECT INTEGER('0xffff')
In the meantime, this does work:
SELECT FLOAT('0xffff')
255.0
For integer results:
SELECT INTEGER(FLOAT('0xffff'))
255

Looking into the query reference, I'd say no.
You have "HEX_STRING()" which does the opposite, but all the string to number functions seem to not take hex.

Parse float to int in postgresql

I've a set of data in a postgresql DB, where one of these columns store string data in float format, but now I need remove the decimal component of the string. How can I do this using an sql update statement in my BD console? Is that possible?
for example:
"25.3" -> "25"
If it does not possible how can I do this?
Thanks in advance.

You would be better suited casting the columns that were text, to numeric, to integer, so that rounding is taken into consideration e.g.
SELECT '25.3'::numeric::integer AS num1, '25.5'::numeric::integer AS num2
which would return integers of 25 and 26 respectively.
If you were not concerned with the digits following the point, the floor(column_name::numeric)::integer function or a substring, as mentioned, should be fine.

Since it is a string, you can use string functions to drop the after decimal digits.
If you do not want to round them off, just drop the decimal part then use -
update table_name
set column_name = substring(column_name from 1 for position('.' in column_name)-1);
If you want rounding off, then you can use the cast as mentioned by #mlinth.

Select substring from a varchar and convert to Integer array

I have a VARCHAR of numbers inside my stored procedure, these numbers are organized as arrays, I will show an example below:
{1,2,3,4,5,6,7,8,9},{1,2,3,4,5},{1,2,3},{9} -- This is a STRING
I want to do a FOR loop to select every time a substring from this set between {} and convert this to an array of integers.
So at first time inside my loop I will have:
{1,2,3,4,5,6,7,8,9}
So I will use array_to_string to convert this to an integer[]
At second time I will have:
{1,2,3,4,5}
and keep going using array_to_string
Any tips? Careful, because unfortunately I'm using PostgreSQL 8.3!

You could do it in a single statement:
SELECT string_to_array(unnest(string_to_array(
trim('{1,2,3,4,5,6,7,8,9},{1,2,3,4,5},{1,2,3},{9}', '{}')
, '},{')), ',')::int[]
.. in Postgres 8.4 or later. 8.3 has reached EOL. Urgently consider an upgrade.
However, there is regexp_split_to_table() in 8.3 already:
SELECT string_to_array(regexp_split_to_table(
trim('{1,2,3,4,5,6,7,8,9},{1,2,3,4,5},{1,2,3},{9}', '{}')
, '},{'), ',')::int[]
-> SQLfiddle demo for Postgres 8.3.
For looping the array, consider this related answer:
Postgres - array for loop

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Convert String to array<int> in Hive - hive

I have a integer array represented with String. For example, "[1,2,2,3]" And the field type in Hive table is the array integer, I was wondering if there is any Hive build-in UDF that can cast the above string into the array integer. Thanks

Related

Calculating hash integer from a string in Athena

Google DataStudio on BigQuery Data, How to display Struct of Arrays

BigQuery - function to handle hex string

Parse float to int in postgresql

Select substring from a varchar and convert to Integer array

Categories

Resources