ROUND second argument only takes constant + hive - hive

The following:
hive> create table t1 (val double, digit int);
hive> insert into t1 values(10,2);
hive> insert into t1 values(156660,3);
hive> insert into t1 values(8765450,4);
hive> select round(val, digit) from round_test;
Gives this error:
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments
'digit': ROUND second argument only takes constant
Its working fine in impala.
Could somebody help me please point out where the error is coming from?

BigDecimal a = new BigDecimal(value);
BigDecimal roundOff = a.setScale(places, BigDecimal.ROUND_HALF_EVEN);
return roundOff.doubleValue();
Thanks Mark for your quick response.
I've already used UDF to solve this issue. As this is a known issue HIVE-4523. Thought some patch has already applied.

The error says that the secund argument of ROUND must be a costant. i.e. with hive you can't use a column as secund argument for your ROUND function. If you need to do that I'd suggest you to create you UDF.

Related

how to truncate decimal places in databricks without rounding off in databricks?

select truncate(12.455555,2)
I was trying to truncate the decimal value in the database from databricks but it was giving me the following error. it gave the same error when I tried executing a simple statement for trimming the decimal places given above.
Error-
Error in SQL statement: AnalysisException: Undefined function: 'truncate'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7
Can anyone tell me how we can truncate the decimal places without rounding off the decimals?
You can use substring.
spark.sql("select substring(12.455555,0, instr(12.455555,'.')+2) as out").show()
+-----+
| out|
+-----+
|12.45|
+-----+
If you want to do that for printing the data, then you can use format_number function (doc), like this:
SELECT format_number(12332.123456, '#.###');

Invalid digits on Redshift

I'm trying to load some data from stage to relational environment and something is happening I can't figure out.
I'm trying to run the following query:
SELECT
CAST(SPLIT_PART(some_field,'_',2) AS BIGINT) cmt_par
FROM
public.some_table;
The some_field is a column that has data with two numbers joined by an underscore like this:
some_field -> 38972691802309_48937927428392
And I'm trying to get the second part.
That said, here is the error I'm getting:
[Amazon](500310) Invalid operation: Invalid digit, Value '1', Pos 0,
Type: Long
Details:
-----------------------------------------------
error: Invalid digit, Value '1', Pos 0, Type: Long
code: 1207
context:
query: 1097254
location: :0
process: query0_99 [pid=0]
-----------------------------------------------;
Execution time: 2.61s
Statement 1 of 1 finished
1 statement failed.
It's literally saying some numbers are not valid digits. I've already tried to get the exactly data which is throwing the error and it appears to be a normal field like I was expecting. It happens even if I throw out NULL fields.
I thought it would be an encoding error, but I've not found any references to solve that.
Anyone has any idea?
Thanks everybody.
I just ran into this problem and did some digging. Seems like the error Value '1' is the misleading part, and the problem is actually that these fields are just not valid as numeric.
In my case they were empty strings. I found the solution to my problem in this blogpost, which is essentially to find any fields that aren't numeric, and fill them with null before casting.
select cast(colname as integer) from
(select
case when colname ~ '^[0-9]+$' then colname
else null
end as colname
from tablename);
Bottom line: this Redshift error is completely confusing and really needs to be fixed.
When you are using a Glue job to upsert data from any data source to Redshift:
Glue will rearrange the data then copy which can cause this issue. This happened to me even after using apply-mapping.
In my case, the datatype was not an issue at all. In the source they were typecast to exactly match the fields in Redshift.
Glue was rearranging the columns by the alphabetical order of column names then copying the data into Redshift table (which will
obviously throw an error because my first column is an ID Key, not
like the other string column).
To fix the issue, I used a SQL query within Glue to run a select command with the correct order of the columns in the table..
It's weird why Glue did that even after using apply-mapping, but the work-around I used helped.
For example: source table has fields ID|EMAIL|NAME with values 1|abcd#gmail.com|abcd and target table has fields ID|EMAIL|NAME But when Glue is upserting the data, it is rearranging the data by their column names before writing. Glue is trying to write abcd#gmail.com|1|abcd in ID|EMAIL|NAME. This is throwing an error because ID is expecting a int value, EMAIL is expecting a string. I did a SQL query transform using the query "SELECT ID, EMAIL, NAME FROM data" to rearrange the columns before writing the data.
Hmmm. I would start by investigating the problem. Are there any non-digit characters?
SELECT some_field
FROM public.some_table
WHERE SPLIT_PART(some_field, '_', 2) ~ '[^0-9]';
Is the value too long for a bigint?
SELECT some_field
FROM public.some_table
WHERE LEN(SPLIT_PART(some_field, '_', 2)) > 27
If you need more than 27 digits of precision, consider a decimal rather than bigint.
If you get error message like “Invalid digit, Value ‘O’, Pos 0, Type: Integer” try executing your copy command by eliminating the header row. Use IGNOREHEADER parameter in your copy command to ignore the first line of the data file.
So the COPY command will look like below:
COPY orders FROM 's3://sourcedatainorig/order.txt' credentials 'aws_access_key_id=<your access key id>;aws_secret_access_key=<your secret key>' delimiter '\t' IGNOREHEADER 1;
For my Redshift SQL, I had to wrap my columns with Cast(col As Datatype) to make this error go away.
For example, setting my columns datatype to Char with a specific length worked:
Cast(COLUMN1 As Char(xx)) = Cast(COLUMN2 As Char(xxx))

Invalid input syntax error for type numeric: V1

I am loading a csv file to create a new table with a column containing a decimal value of 1.449043781.
Here's my code
CREATE TABLE table (
v1 float
);
Postgres spits out an error saying invalid input syntax error for type numeric even though the value is a float. I have tried changing the data type declaration to decimal(15,13) to no avail. What am I missing here?
Thank you for your input.
Can't reproduce - copies without errors on 9.6:
t=# CREATE TABLE t (
v1 float
);
CREATE TABLE
t=# copy t from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> 1.449043781
>> \.
COPY 1
t=# select v1,pg_typeof(v1) from t;
v1 | pg_typeof
-------------+------------------
1.449043781 | double precision
(1 row)
also from your error, it looks you created table with numeric, not float. And they are not the same (both would accept the 1.449043781 though)

Hive variable concatenation

I am facing problems in concatenating the value of a variable with a string .
my script contains the below
set hivevar:tab_dt= substr(date_sub(current_date,1),1,10);
CREATE TABLE default.udr_lt_bc_${hivevar:tab_dt}
(
trans_id double
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
in the above, the variable tab_dt gets assigned correctly with yesterdays date in the format yyyymmdd.
but when i try to concatenate this variable in a table name with a static string, the script fails. it is not doing the concatenation .
Kindly provide a solution.
note: i tried the below too, which is erroring out too
set hivevar:tab_dt= substr(date_sub(current_date,1),1,10);
set hivevar:tab_nm1= default.udr_lt_bc_;
set hivevar:tab_name= concat(${hivevar:tab_dt},${hivevar:tab_nm1})
CREATE TABLE ${hivevar:tab_name}
(
trans_id double
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
This too is returning an error.
Hive does not calculate expressions in the variables, substituting them as is.
Your create table expression results in this:
CREATE TABLE default.udr_lt_bc_substr(date_sub(current_date,1),1,10)...
Your second expression results in this:
CREATE TABLE concat(substr(date_sub(current_date,1),1,10),default.udr_lt_bc_)
Unfortunately Hive does not support such expressions in DDL.
I recommend to calculate this variable in a shell and pass as a --hivevar to the hive script.
For example in the sell script:
table_name=udr_lt_bc_$(date +'%Y_%m_%d' --date "-1 day")
#table_name is udr_lt_bc_2017_10_31 now
#call your script
hive -hivevar table_name="$table_name" -f your_script.hql
And then in your_script you can use variable:
CREATE TABLE default.${hivevar:table_name}
Note that '-' is not allowed in table names, that is why i used '_' instead.
For better understanding how Hive substitutes variables, try this:
hive> set hivevar:tab_dt= substr(date_sub(current_date,1),1,10);
hive> select ${hivevar:tab_dt};
OK
2017-10-31
Time taken: 1.406 seconds, Fetched: 1 row(s)
hive> select '${hivevar:tab_dt}';
OK
substr(date_sub(current_date,1),1,10)
Time taken: 0.087 seconds, Fetched: 1 row(s)
Note that in the first select statement the variable was substituted as is before execution and then calculated in the SQL. Second select statement prevent calculation because the variable is quoted and remains as is: substr(date_sub(current_date,1),1,10).
Another way in Hive:
select concat("table_",date_sub(from_unixtime(unix_timestamp(current_date,'yyyy-MM-dd'),'yyyy-MM-dd'),0));
Here, we can use above in a variable and use it as per our needs.

In Hive I need to Get numeric value after a particular word is it possible?

i want to get a numeric value immediately after a particular word in string
In hive for example :
APDSGDSCRAM051 in that i need to get numeric value after word RAM
is it possible in hive
Note: its not a fixed length string
Here you go, you need to use substr and instr pre-defined hive functions:
create table str_testing (c string);
insert into table str_testing values ('APDSGDSCRAM051');
select substr(c, instr(c, 'RAM') + 3) from str_testing;
OK
051
Time taken: 0.243 seconds, Fetched: 1 row(s)
As explained here, you can implemented in hive as
select regexp_extract(name, '\\d+', 0) from <table_name>;
Note: I do not have environment for Hive configured so you can check this by running at your end. Ya this will work only for first set of numbers found in your string, if you string has numbers at multiple places this might fail.