hive record inserted but then get a error - hive

I create a table in hive:
CREATE TABLE `test3`.`shop_dim` (
`shop_id` bigint,
`shop_name` string,
`shop_company_id` bigint,
`shop_url1` string,
`shop_url2` string,
`sid` string,
`shop_open_duration` string,
`date_modified` timestamp)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ("path"="hdfs://myhdfs/warehouse/tablespace/managed/hive/test3.db/shop_dim")
STORED AS PARQUET
TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"date_modified\":\"true\",\"shop_company_id\":\"true\",\"shop_id\":\"true\",\"shop_name\":\"true\",\"shop_open_duration\":\"true\",\"shop_url1\":\"true\",\"shop_url2\":\"true\",\"sid\":\"true\"}}', 'bucketing_version'='2', 'numFiles'='12', 'numRows'='12', 'rawDataSize'='96', 'spark.sql.create.version'='2.3.0', 'spark.sql.sources.provider'='parquet', 'spark.sql.sources.schema.numParts'='1', 'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"Shop_id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Shop_name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Shop_company_id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Shop_url1\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Shop_url2\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"sid\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Shop_open_duration\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Date_modified\",\"type\":\"timestamp\",\"nullable\":true,\"metadata\":{}}]}', 'totalSize'='17168')
GO
then I insert a record use below sql:
insert into test3.shop_dim values(11,'aaa',22,'11113','2222','sid','opend',unix_timestamp())
I can see the record is inserted,but waited for a long time,there is error:
>[Error] Script lines: 1-2 --------------------------
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.StatsTask
[Executed: 2018-10-24 下午12:00:03] [Execution: 0ms]
I use aqua studio as a tool.Why this error occur?

This issue can happen if the values being inserted are not matching to the expected type.
In your case "date_modified" column is of timestamp type, but unix_timestamp() will return bigint (current Unix timestamp in seconds).
If you execute the query
select unix_timestamp();
Output would be like : 1558547043
Instead, you need to use current_timestamp.
select current_timestamp;
Output would be like : 2019-05-22 17:50:18.803
You can refer Hive manual for in-built date functions at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions

Below given hive setting can help resolve org.apache.hadoop.hive.ql.exec.StatsTask (state=08S01,code=1)
set hive.stats.column.autogather=false; or set hive.stats.autogather=false
set hive.optimize.sort.dynamic.partition=true;

Related

Impala insert vs hive insert

When I tried to insert integer values into a column in a parquet table with Hive command, values are not getting insert and shows as null. But when used impala command it is working. But the partition size reduces with impala insert. Also number of rows in the partitions (show partitions) show as -1. What is the reason for this?
CREATE TABLE `TEST.LOGS`(
`recordtype` string,
`recordstatus` string,
`recordnumber` string,
`starttime` string,
`endtime` string,
`acctsessionid` string,
`subscriberid` string,
`framedip` string,
`servicename` string,
`totalbytes` int,
`rxbytes` int,
`txbytes` int,
`time` int,
`plan` string,
`tcpudp` string,
`intport` string)
PARTITIONED BY (`ymd` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'serialization.format'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://dev-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
TBLPROPERTIES (
'transient_lastDdlTime'='1634390569')
Insert Statement
Hive
sudo -u hdfs hive -e 'insert into table TEST.LOGS partition (ymd="20220221") select * from TEMP.LOGS;'
Impala
impala-shell --ssl -i xxxxxxxxxxx:21000 -q 'insert into table TEST.LOGS partition (ymd="20220221") select * from TEMP.LOGS;'
When I tried to insert integer values into a column in a parquet table with Hive command, values are not getting insert and shows as null.
Could you pls share your exact insert statement and table definition for precise answer? If i have to guess, this may be because of difference in implicit data type conversion by hive and impala.
HIVE - If you set hive.metastore.disallow.incompatible.col.type.changes to false, the types of columns in Metastore can be changed from any type to any other type. After such a type change, if the data can be shown correctly with the new type, the data will be displayed. Otherwise, the data will be displayed as NULL. As per documentation forward conversion works(int> bigint) whereas backward (big int > small int) doesnt and produces null.
Impala - it supports a limited set of implicit casts to avoid undesired results from unexpected casting behavior. Impala does perform implicit casts among the numeric types, when going from a smaller or less precise type to a larger or more precise one. For example, Impala will implicitly convert a SMALLINT to a BIGINT.
Also number of rows in the partitions (show partitions) show as -1 -
Please run compute stats table_name to fix this issue.

Create a table in hive with timestamp as comment

I would like to create a table in hive, inside the comment include the creation date (current_timestamp function). Something like this:
CREATE TABLE IF NOT EXISTS ex.tb_test ( field1 int, field2 String) COMMENT current_timestamp STORED AS TEXTFILE;
But it returns error: ILED: ParseException line 2: 8 mismatched input 'current_timestamp' expecting StringLiteral near 'COMMENT'
Do you know any way to add to the comment the creation date of the table?
Functions are not supported in table DDL. You can pass pre-calculated timestamp as a --hiveconf parameter and use for example like this: comment '${hiveconf:ts}'(it should be quoted), such parameter will be resolved as a string literal before command execution.
BTW Hive stores CreateTime.
describe formatted table_name command outputs CreateTime along with other table info.

Amazon Athena returning "mismatched input 'partitioned' expecting {, 'with'}" error when creating partitions

I'd like to use this query to create a partitioned table in Amazon Athena:
CREATE TABLE IF NOT EXISTS
testing.partitioned_test(order_id bigint, name string, car string, country string)
PARTITIONED BY (year int)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
STORED AS 'PARQUET'
LOCATION 's3://testing-imcm-into/partitions'
Unfortunately I don't get the error message which tells me the following:
line 3:2: mismatched input 'partitioned' expecting {, 'with'}
The quotes around 'PARQUET' seemed to be causing a problem.
Try this:
CREATE EXTERNAL TABLE IF NOT EXISTS
partitioned_test (order_id bigint, name string, car string, country string)
PARTITIONED BY (year int)
STORED AS PARQUET
LOCATION 's3://testing-imcm-into/partitions/'

Error while creating table in Hive

I am new to hadoop. I need a help regarding error encountered in Hive while creating a new table. I have gone through this Hive FAILED: ParseException line 2:0 cannot recognize input near ''macaddress'' 'CHAR' '(' in column specification
My question: Is it necessary to write a location of the table in the script? because I am writing table location at starting and I am afraid about writing the location because it should not disturb my rest of the databases by any mulfunction operation.
Here is my query:
CREATE TABLE meta_statistics.tank_items (
shop_offers_history_before bigint,
shop_offers_temp bigint,
videos_distinct_temp bigint,
deleted_temp bigint,
t_stamp timestamp )
CLUSTERED BY (
tank_items_id)
INTO 8 BUCKETS
ROW FORMAT SERDE
TBLPROPERTIES (transactional=true)
STORED AS ORC;
The error I am getting is-
ParseException line 1:3 cannot recognize input near 'TBLPROPERTIES'
'(' 'transactional'
What would be the other possibilities of errors and how can I remove those?
There is a syntax error in your create query. Error which you have shared says that hive cannot recognize input near 'TBLPROPERTIES'.
Solution:
As per hive syntax, the key value passed in TBLPROPERTIES should be in double quotes. it should be like this: TBLPROPERTIES ("transactional"="true")
So if I correct your query it will be:
CREATE TABLE meta_statistics.tank_items (
shop_offers_history_before bigint,
shop_offers_temp bigint,
videos_distinct_temp bigint,
deleted_temp bigint,
t_stamp timestamp
) CLUSTERED BY (tank_items_id) INTO 8 BUCKETS
ROW FORMAT SERDE TBLPROPERTIES ("transactional"="true") STORED AS ORC;
Execute above query, then if you get any other syntax error them make sure that the order of STORED AS , CLUSTERED BY , TBLPROPERTIES is as per the hive syntax.
Refer this for more details:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable
1) ROW FORMAT SERDE -> you should pass some serde
2) TBLPROPERTIES key value should be in double quotes
3) if you give CLUSTERED BY value should be there in the columns given
replace as follows
CREATE TABLE meta_statistics.tank_items ( shop_offers_history_before bigint, shop_offers_temp bigint, videos_distinct_temp bigint, deleted_temp bigint, t_stamp timestamp ) CLUSTERED BY (shop_offers_history_before) INTO 8 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS ORC TBLPROPERTIES ("transactional"="true");
hope this helps

PrestoDB: insert fails with null error message

I'm trying to insert data into a table using Presto.
I'm doing a simple insert on a table that I created in hive, which is as simple as possible (stored as text):
CREATE TABLE `stam`(
`name` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
When I try to insert using either of the queries below:
insert into stam values('test');
insert into stam select 'test' from another_table;
Both fail - I get the following error:
Query 20141128_150952_00004_vaf59 failed: null
I see nothing in the logs, and I have no idea how can I debug these.
Any ideas will be appreciated.
well, I've found the answer in the code:
// Hive connector currently does not support insert