Cast a hexadecimal string to an array of bigint in hive - hive

I have a column that contains a length 16 hexademical string. I would like to convert it to a bigint. Is there any way to accomplish that? The usual approach returns null since the input string could represent a number > 2^63-1.
select
cast(conv(hash_col, 16, 10) as bigint) as p0,
conv(hash_col, 16, 10) as c0
from mytable limit 10
I have also tried using unhex(..),
cast(unhex(hash_col) as bigint) as p0 from mytable limit 10
but got the following error
No matching method for class org.apache.hadoop.hive.ql.udf.UDFToLong
with (binary). Possible choices: FUNC(bigint) FUNC(boolean)
FUNC(decimal(38,18)) FUNC(double) FUNC(float) FUNC(int) FUNC(smallint) FUNC(string) FUNC(timestamp) FUNC(tinyint) FUNC(void)
If I don't do the cast(.. as bigint) part, I get some undisplayable binary value for p0. It seems unhex is not exactly the inverse of hex in hive.

Your values are out of range for BigInt
Ref : https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types
Max range for BigInt is 9,223,372,036,854,775,807
Use decimal(20,0) instead.
select cast(conv('85A58F8B014692CA',16,10) as decimal(20,0))

Related

Calculating hash integer from a string in Athena

I'm trying to calculate a hash from a string for best-effort ordering and partioning purposes in Athena. There is no String to hashCode() similar in Athena, so as a best effort, I try to get the 2nd character and calculate its codepoint and get the modulus. (As I said, best effort, maybe a nice effort)
Consider the query:
SELECT
doc_id,
substring(doc_id, 2, 1),
typeof(split(substring(doc_id, 2, 1)))
FROM events LIMIT 100
The 3rd row returns a varchar but the codepoint function expects a varchar(1) and casting it does not work as cast(substring(doc_id, 2, 1) as varchar(1)).
FUNCTION_NOT_FOUND: line 6:5: Unexpected parameters (varchar) for function codepoint. Expected: codepoint(varchar(1))
How can I accomplish this task without modifiying the data source? I'm open to ideas.
You can compute a hash code with the xxhash64 function. It takes a varbinary as input, so first cast the string to that type. Since the function also returns a 64-bit varbinary value, you can convert it to a bigint via the from_big_endian_64 function
WITH t(x) AS (VALUES 'hello')
SELECT from_big_endian_64(xxhash64(cast(x AS varbinary)))
FROM t
output:
_col0
---------------------
2794345569481354659
(1 row)

Output of cast function

I am working on the usage of the Cast function in SQL Server. If I use explicit value as written in the code. The output is 38 which is the correct value. But I would need to use field names instead of direct values(There is only value of Base, escalator here ; Base =1.15 and escalator=0.05. But when I use field names, the output is 37. The data type of Base and escalator fields is float. I also tried using round function inside cast, did not solve the issue. Could someone out here help me with this. My query below:
Select CAST((3.05-1.15)/0.05 AS INT) -- returns 38
Select ((3.05-1.15)/0.05) --returns 38
Select cast((3.05-base)/Escalator) as int) from table1 -- I am using field names here. Returns 37
What is likely happening here is that the base and Escalator columns are some kind of non exact floating point type. As a result, the following computation results a value which is slightly less than 38:
(3.05-base) / Escalator = 37.999995 (for example)
Then, when casting to integer, the entire decimal component is being truncated, leaving behind just 37.
One possible workaround to prevent this from happening would be to use NUMERIC or some other exact type for the base and Escalator columns.
You can use Decimal to get rid of the issue
DECLARE #Escalator DECIMAL(7, 5) = 0.05
Select ((3.05-1.15)/0.05) --returns 38
Select CAST(((3.05-1.15)/#Escalator) AS INT) -- returns 38
Demo on db<>fiddle
you can use ceiling or floor inbuilt function based on your requirement
DECLARE #Escalator float = 0.05
DECLARE #Base float = 1.66
Select ((3.05-1.66)/0.05) --returns 27.8
Select ceiling (((3.05-#Base)/#Escalator)) -- returns 28
Select floor (((3.05-#Base)/#Escalator)) -- returns 27

Numeric precision as variable

Declare #Precision INTEGER
Set #precision = 3
-> I have a select statement here which selects the integer value
and if their is a way that I can use this #precision in numeric data type for example something like this
numeric(20,#precision)
You want to use the str() function (documented here).
It allows you to convert numerics to strings, while controlling the precision.
For instance:
select str(3.14158165, 5, 3)
Returns '3.142'.

SQL server 'like' against a float field produces inconsistent results

I am using LIKE to return matching numeric results against a float field. It seems that once there are more than 4 digits to the left of the decimal, values that match my search item on the right side of the decimal are not returned. Here's an example illustrating the situation:
CREATE TABLE number_like_test (
num [FLOAT] NULL
)
INSERT INTO number_like_test (num) VALUES (1234.56)
INSERT INTO number_like_test (num) VALUES (3457.68)
INSERT INTO number_like_test (num) VALUES (13457.68)
INSERT INTO number_like_test (num) VALUES (1234.76)
INSERT INTO number_like_test (num) VALUES (23456.78)
SELECT num FROM number_like_test
WHERE num LIKE '%68%'
That query does not return the record with the value of 12357.68, but it does return the record with the value of 3457.68. Also running the query with 78 instead of 68 does not return the 23456.78 record, but using 76 returns the 1234.76 record.
So to get to the question: why having a larger number causes these results to change? How can I change my query to get the expected results?
The like operator requires a string as a left-hand value. According to the documentation, a conversion from float to varchar can use several styles:
Value Output
0 (default) A maximum of 6 digits. Use in scientific notation, when appropriate.
1 Always 8 digits. Always use in scientific notation.
2 Always 16 digits. Always use in scientific notation.
The default style will work fine for the six digits in 3457.68, but not for the seven digits in 13457.68. To use 16 digits instead of 6, you could use convert and specify style 2. Style 2 represents a number like 3.457680000000000e+003. But that wouldn't work for the first two digits, and you get an unexpected +003 exponent for free.
The best approach is probably a conversion from float to decimal. That conversion allows you to specify the scale and precision. Using scale 20 and precision 10, the float is represented as 3457.6800000000:
where convert(decimal(20,10), num) like '%68%'
When you are comparing number with LIKE it is implicitly converted to string and then matched
The problem here is that float number is not precise and when it is converted you can get
13457.679999999999999 instead of 13457.68
So to avid this explicitly format number in appropriate format(not sure how to do this in sql server, but it will be something like)
SELECT num FROM number_like_test
WHERE Format("0.##",num) LIKE '%68%'
The conversion to string is rounding your values. Both CONVERT and CAST have the same behavior.
SELECT cast(num as nvarchar(50)) as s
FROM number_like_test
Or
SELECT convert(nvarchar(50), num) as s
FROM number_like_test
provide the results:
1234.56
3457.68
13457.7
1234.76
23456.8
You'll have to use the STR function and correct format parameters to try to get your results. For example,
SELECT STR(num, 10, 2) as s
FROM number_like_test
gives:
1234.56
3457.68
13457.68
1234.76
23456.78
Pretty well solved already, but you only need to CAST once, not twice like the other answer suggests, LIKE takes care of the string conversion:
SELECT *
FROM number_like_test
WHERE CAST(num AS DECIMAL(12,6)) LIKE '%68%'
And here's a SQL Fiddle showing the rounding behavior: SQL Fiddle
It's probably because a FLOAT data type represents a floating point number which is an approximation of the number and should not be relied on for exact comparisons.
If you need to do a search that includes the float value you would need to either store it in a decimal data type (which will hold the exact number) or convert it to a varchar using something like the STR() function

How to reduce the float length

Using SQL Server 2000
I want to reduce the decimal length
Query
Select 23/12 as total
Output is showing as 1.99999999999
I don't want to round the value, I want to diplay like this 1.99
Tried Query
Select LEFT(23/12, LEN(23/12) - 3) as total
The above query is working only if there is decimal value like 12.444444, but if the total is single digit means like 12 or 4 or 11...., i am getting error at run time.
How to do this.
Need Query Help
There is a very simple solution. You can find it in BOL. Round takes an optional 3rd argument, which is round type. The values are round or truncate.
ROUND numeric_expression , length [ ,function ] )
...
function Is the type of operation to perform. function must be
tinyint, smallint, or int. When function is omitted or has a value of
0 (default), numeric_expression is rounded. When a value other than 0
is specified, numeric_expression is truncated.
So just do
Select ROUND(cast(23 as float)/12, 2, 1) as total
That gives 1.91. Note, if you were really seeing 1.999 - something is really wrong with your computer. 23/12 = 1.916666666(ad infinitum). You need to cast one of the numbers as float since sql is assuming they're integers and doing integer division otherwise. You can of course cast them both as float, but as long as one is float the other will be converted too.
Not terribly elegant, but works for all cases:
Select CONVERT(float,LEFT(CONVERT(nvarchar, 23.0/12.0),CHARINDEX('.',CONVERT(nvarchar, 23.0/12.0)) + 2)) as total
Scalar Function
-- Description: Truncate instead of rounding a float
-- SELECT dbo.TruncateNumber(23.0/12.0,2)
-- =============================================
CREATE FUNCTION TruncateNumber
(
-- Add the parameters for the function here
#inFloat float,
#numDecimals smallint
)
RETURNS float
AS
BEGIN
IF (#numDecimals < 0)
BEGIN
SET #numDecimals = 0
END
-- Declare the return variable here
RETURN CONVERT(float,LEFT(CONVERT(nvarchar, #inFloat),CHARINDEX('.',CONVERT(nvarchar, #inFloat)) + #numDecimals))
END
GO