I have a PySpark code which develops the query and runs insert into command on another Hive table which is internally mapped to a HBase table.
When I run the insert into command onto the Hive table using spark sql I get the following exception..
java.lang.ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.HiveOutputFormat
I checked the datatypes and tblproperties but unable to get through this exception.
The versions I am using are:
PySpark -- 1.6.0
Hive -- 1.1.0-cdh5.8.2
The table properties are:
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe'
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties("hbase.columns.mapping"=":key,colf:a")
tblproperties("hbase.table.name"="abc",'hbase.mapred.output.outputtable' = 'abc');
I tried removing the Row Format Serde even though getting the same issue..
Am I getting the issue because of the versions not getting matched?? or am I going wrong??
This is a bug of spark, see this apache spark pull, https://github.com/apache/spark/pull/17989
Related
I was able to run query in presto to read the non-float columns from Hive ORC(snappy) table. However, when I select all float datatype columns through the presto cli, gives me the below error message. Any suggestions what is the alternative other than changing the filed type to double in the targetHive table
presto:sample> select * from emp_detail;
Query 20200107_112537_00009_2zpay failed: Error opening Hive split hdfs://ip_address/warehouse/tablespace/managed/hive/sample.db/emp_detail/part-00079-5b0c6005-0943-4181-951f-43bcfcfe741f-c000.snappy.orc (offset=0, length=1999857): Malformed ORC file. Can not read SQL type real from ORC stream .salary of type DOUBLE [hdfs://ip_address/warehouse/tablespace/managed/hive/sample.db/emp_detail/part-00079-5b0c6005-0943-4181-951f-43bcfcfe741f-c000.snappy.orc]
Please try to add this property
hive.orc.use-column-names=true
to presto-server/conf/catalog/hive.properties,
and restart your presto server.
To test it without restarting the server run this from presto-cli
SET SESSION hive.orc_use_column_names=true;
Release notes from Presto regarding these attribute.
I have a table in Hive:
CREATE EXTERNAL TABLE sr2015(
creation_date STRING,
status STRING,
first_3_chars_of_postal_code STRING,
intersection_street_1 STRING,
intersection_street_2 STRING,
ward STRING,
service_request_type STRING,
division STRING,
section STRING )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES (
'colelction.delim'='\u0002',
'field.delim'=',',
'mapkey.delim'='\u0003',
'serialization.format'=',', 'skip.header.line.count'='1',
'quoteChar'= "\"")
The table is loaded data this way:
LOAD DATA INPATH "hdfs:///user/rxie/SR2015.csv" INTO TABLE sr2015;
Why the table is only accessible in Hive? when I attempt to access it in HUE/Impala Editor I got the following error:
AnalysisException: Could not resolve table reference: 'sr2015'
which seems saying there is no such a table, but the table does show up in the left panel.
In Impala-shell, error is different as below:
ERROR: AnalysisException: Failed to load metadata for table: 'sr2015'
CAUSED BY: TableLoadingException: Failed to load metadata for table:
sr2015 CAUSED BY: InvalidStorageDescriptorException: Impala does not
support tables of this type. REASON: SerDe library
'org.apache.hadoop.hive.serde2.OpenCSVSerde' is not supported.
I have always been thinking Hive table and Impala table are essentially the same and difference is Impala is a more efficient query engine.
Can anyone help sort it out? Thank you very much.
Assuming that sr2015 is located in DB called db, in order to make the table visible in Impala, you need to either issue
invalidate metadata db;
or
invalidate metadata db.sr2015;
in Impala shell
However in your case, the reason is probably the version of Impala you're using, since it doesn't support the table format altogether
I have a table in Redshift in which I am inserting data from S3.
I viewed the table before inserting the data and it returned a blank table.
However, After inserting data in Redshift table, I am getting below error while doing select * from table.
Command to copy data in table from S3 runs successfully without any error.
java.lang.NoClassDefFoundError:
com/amazon/jdbc/utils/DataTypeUtilities$NumericRepresentation error in
redshift
what could be the possible cause and sol for this?
I have faced this error : java.lang.NoClassDefFoundError when the JDBC connection properties are set incorrectly.
If you are using postgres driver then ensure using postgres://
eg : jdbc:postgresql:// HostName:5439/
Let me know if this works.
Using cloudera 8.1. In Hive, loaded a table in ORC format with a CSV file. Getting this error on attempting to query the loaded table:
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.HiveVarcharWritable cannot be cast to org.apache.hadoop.io.IntWritable
This is common issue I see lots of people make,
You can create hive external table with CSV format and then say
"INSERT INTO TABLE FINAL SELECT * FROM TEMP_TABLE" which will copy the CSV data into ORC table.
By using this method Hive will convert the CSV data into ORC using inbuilt libraries.
When I try to create a table in hive, I am getting OK, and getting OK while I insert the values through a text file. But while selecting the values from that table I am getting the below error message
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.unset(Ljava/lang/String;)
Due to the version inconsistencies, you need to switched to newer version of hadoop 2.X.x.
https://stackoverflow.com/questions/27842004/issue-in-apache-hive-0-14-running-dml-queries
you can check which version of hadoop will work with which version of hive - from hive download page