PrestoDB: insert fails with null error message - hive

I'm trying to insert data into a table using Presto.
I'm doing a simple insert on a table that I created in hive, which is as simple as possible (stored as text):
CREATE TABLE `stam`(
`name` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
When I try to insert using either of the queries below:
insert into stam values('test');
insert into stam select 'test' from another_table;
Both fail - I get the following error:
Query 20141128_150952_00004_vaf59 failed: null
I see nothing in the logs, and I have no idea how can I debug these.
Any ideas will be appreciated.

well, I've found the answer in the code:
// Hive connector currently does not support insert

Related

Impala insert vs hive insert

When I tried to insert integer values into a column in a parquet table with Hive command, values are not getting insert and shows as null. But when used impala command it is working. But the partition size reduces with impala insert. Also number of rows in the partitions (show partitions) show as -1. What is the reason for this?
CREATE TABLE `TEST.LOGS`(
`recordtype` string,
`recordstatus` string,
`recordnumber` string,
`starttime` string,
`endtime` string,
`acctsessionid` string,
`subscriberid` string,
`framedip` string,
`servicename` string,
`totalbytes` int,
`rxbytes` int,
`txbytes` int,
`time` int,
`plan` string,
`tcpudp` string,
`intport` string)
PARTITIONED BY (`ymd` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'serialization.format'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://dev-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
TBLPROPERTIES (
'transient_lastDdlTime'='1634390569')
Insert Statement
Hive
sudo -u hdfs hive -e 'insert into table TEST.LOGS partition (ymd="20220221") select * from TEMP.LOGS;'
Impala
impala-shell --ssl -i xxxxxxxxxxx:21000 -q 'insert into table TEST.LOGS partition (ymd="20220221") select * from TEMP.LOGS;'
When I tried to insert integer values into a column in a parquet table with Hive command, values are not getting insert and shows as null.
Could you pls share your exact insert statement and table definition for precise answer? If i have to guess, this may be because of difference in implicit data type conversion by hive and impala.
HIVE - If you set hive.metastore.disallow.incompatible.col.type.changes to false, the types of columns in Metastore can be changed from any type to any other type. After such a type change, if the data can be shown correctly with the new type, the data will be displayed. Otherwise, the data will be displayed as NULL. As per documentation forward conversion works(int> bigint) whereas backward (big int > small int) doesnt and produces null.
Impala - it supports a limited set of implicit casts to avoid undesired results from unexpected casting behavior. Impala does perform implicit casts among the numeric types, when going from a smaller or less precise type to a larger or more precise one. For example, Impala will implicitly convert a SMALLINT to a BIGINT.
Also number of rows in the partitions (show partitions) show as -1 -
Please run compute stats table_name to fix this issue.

SQL Union - SemanticException 3:20 Schema of both sides of union should match

I want to create a View with the following code but I somehow encounter errors which I cant really resolve.
The code:
create view default.table_all as
select *
from default.table_a
union select *
from default.table_b;
The Error:
Error while compiling statement: FAILED: SemanticException 3:20 Schema of both sides of union should match: Column column_42 is of type double on first table and type string on second table. Error encountered near token 'table_b'
Firstly, column_42 is in both tables table_a and table_b the same data type, double (I already checked numerous times). Secondly when unionizing only on column_42
create view default.table_all as
select column_42
from default.table_a
union select column_42
from default.table_b;
It works like a charm and does not show any errors whatsoever.
I really can't figure out the problem.
#Mureinik, here the DDLs of the two tables:
DDL - table_a:
createtab_stmt
CREATE TABLE `default.table_a`(`column_42` double, ...)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://ITEG-EG-DC2-DR01-ns/user/hive/warehouse/table_a'
TBLPROPERTIES (
'transient_lastDdlTime'='1635342670')
DDL - table_b:
createtab_stmt
CREATE TABLE `default.table_b`(`column_42` double,...)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://ITEG-EG-DC2-DR01-ns/user/hive/warehouse/table_b`'
TBLPROPERTIES (
'transient_lastDdlTime'='1635410688')
Thank you very much!

Cloudera - Hive/Impala Show Create Table - Error with the syntax

I'm making some automatic processes to create tables on Cloudera Hive.
For that I am using the show create table statement that me give (for example) the following ddl:
CREATE TABLE clsd_core.factual_player ( player_name STRING, number_goals INT ) PARTITIONED BY ( player_name STRING ) WITH SERDEPROPERTIES ('serialization.format'='1') STORED AS PARQUET LOCATION 'hdfs://nameservice1/factual_player'
What I need is to run the ddl on a different place to create a table with the same name.
However, when I run that code I return the following error:
Error while compiling statement: FAILED: ParseException line 1:123 missing EOF at 'WITH' near ')'
And I remove manually this part "WITH SERDEPROPERTIES ('serialization.format'='1')" it was able to create the table with success.
Is there a better function to retrieves the tables ddls without the SERDE information?
First issue in your DDL is that partitioned column should not be listed in columns spec, only in the partitioned by. Partition is the folder with name partition_column=value and this column is not stored in the table files, only in the partition directory. If you want partition column to be in the data files, it should be named differently.
Second issue is that SERDEPROPERTIES is a part of SERDE specification, If you do not specify SERDE, it should be no SERDEPROPERTIES. See this manual: StorageFormat andSerDe
Fixed DDL:
CREATE TABLE factual_player (number_goals INT)
PARTITIONED BY (player_name STRING)
STORED AS PARQUET
LOCATION 'hdfs://nameservice1/factual_player';
STORED AS PARQUET already implies SERDE, INPUTFORMAT and OUPPUTFORMAT.
If you want to specify SERDE with it's properties, use this syntax:
CREATE TABLE factual_player(number_goals int)
PARTITIONED BY (player_name string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES ('serialization.format'='1') --I believe you really do not need this
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'hdfs://nameservice1/factual_player'

Create Hive table using “as select” and also specify TBLPROPERTIES

For example, when using Parquet format, I'd like to be able to specify the compression scheme (("parquet.compression"="SNAPPY")). Running this query:
CREATE TABLE table_a_copy
STORED AS PARQUET
TBLPROPERTIES("parquet.compression"="SNAPPY")
AS
SELECT * FROM table_a
returns an error:
Error: Error while compiling statement: FAILED: ParseException line 1:69 cannot recognize input near 'parquet' '.' 'compression' in table properties list (state=42000,code=40000)
The same query without the TBLPROPERTIES works just fine.
This is similar to this question: Create hive table using "as select" or "like" and also specify delimiter. But I can't figure out how to make TBLPROPERTIES work with that approach. I'm using Hive 1.1.
I was able to run same exact statement in Hive 2.1.1 version.
Try with this workaround:
CREATE TABLE table_a_copy like table_a STORED AS PARQUET;
alter table set TBLPROPERTIES("parquet.compression"="SNAPPY");
insert into table table_a_copy select * from table_a ;

hive record inserted but then get a error

I create a table in hive:
CREATE TABLE `test3`.`shop_dim` (
`shop_id` bigint,
`shop_name` string,
`shop_company_id` bigint,
`shop_url1` string,
`shop_url2` string,
`sid` string,
`shop_open_duration` string,
`date_modified` timestamp)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ("path"="hdfs://myhdfs/warehouse/tablespace/managed/hive/test3.db/shop_dim")
STORED AS PARQUET
TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"date_modified\":\"true\",\"shop_company_id\":\"true\",\"shop_id\":\"true\",\"shop_name\":\"true\",\"shop_open_duration\":\"true\",\"shop_url1\":\"true\",\"shop_url2\":\"true\",\"sid\":\"true\"}}', 'bucketing_version'='2', 'numFiles'='12', 'numRows'='12', 'rawDataSize'='96', 'spark.sql.create.version'='2.3.0', 'spark.sql.sources.provider'='parquet', 'spark.sql.sources.schema.numParts'='1', 'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"Shop_id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Shop_name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Shop_company_id\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Shop_url1\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Shop_url2\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"sid\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Shop_open_duration\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"Date_modified\",\"type\":\"timestamp\",\"nullable\":true,\"metadata\":{}}]}', 'totalSize'='17168')
GO
then I insert a record use below sql:
insert into test3.shop_dim values(11,'aaa',22,'11113','2222','sid','opend',unix_timestamp())
I can see the record is inserted,but waited for a long time,there is error:
>[Error] Script lines: 1-2 --------------------------
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.StatsTask
[Executed: 2018-10-24 下午12:00:03] [Execution: 0ms]
I use aqua studio as a tool.Why this error occur?
This issue can happen if the values being inserted are not matching to the expected type.
In your case "date_modified" column is of timestamp type, but unix_timestamp() will return bigint (current Unix timestamp in seconds).
If you execute the query
select unix_timestamp();
Output would be like : 1558547043
Instead, you need to use current_timestamp.
select current_timestamp;
Output would be like : 2019-05-22 17:50:18.803
You can refer Hive manual for in-built date functions at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
Below given hive setting can help resolve org.apache.hadoop.hive.ql.exec.StatsTask (state=08S01,code=1)
set hive.stats.column.autogather=false; or set hive.stats.autogather=false
set hive.optimize.sort.dynamic.partition=true;