Hive RunTimeException NULL::character varying while doing HBaseIntegration - hive

I get the following error
FAILED: RuntimeException java.lang.ClassNotFoundException: NULL::character varying
when I try to select on a table in hive shell which is created using 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'.
On all the other tables that I create in hive, I can easily select over them. So my guess is that it has to do something with jar file related to HBaseStorageHandler.
CREATE TABLE hbase_table_emp(id int, name string, role string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:name,cf1:role")
TBLPROPERTIES ("hbase.table.name" = "emp");
I have been following this tutorial to accomplish my task : HBaseHiveIntegration
Versions: Hadoop 2.7.2 - Hive 2.1.0 - HBase 1.2.3
FYI this is not a duplicate to hive-hbase integration throws classnotfoundexception NULL::character varying because his answer does not take into account, how we can solve this on a shell.

I could not fix this problem. I decided to change my Hive metastore from Postgresq to MySql and now it works.
In Postgresql when I made my schema I made sure that there is no Null casted to character varying (NULL::character varying), but this did not resolve the issue.

Related

Presto query error on hive ORC, Can not read SQL type real from ORC stream of type DOUBLE

I was able to run query in presto to read the non-float columns from Hive ORC(snappy) table. However, when I select all float datatype columns through the presto cli, gives me the below error message. Any suggestions what is the alternative other than changing the filed type to double in the targetHive table
presto:sample> select * from emp_detail;
Query 20200107_112537_00009_2zpay failed: Error opening Hive split hdfs://ip_address/warehouse/tablespace/managed/hive/sample.db/emp_detail/part-00079-5b0c6005-0943-4181-951f-43bcfcfe741f-c000.snappy.orc (offset=0, length=1999857): Malformed ORC file. Can not read SQL type real from ORC stream .salary of type DOUBLE [hdfs://ip_address/warehouse/tablespace/managed/hive/sample.db/emp_detail/part-00079-5b0c6005-0943-4181-951f-43bcfcfe741f-c000.snappy.orc]
Please try to add this property
hive.orc.use-column-names=true
to presto-server/conf/catalog/hive.properties,
and restart your presto server.
To test it without restarting the server run this from presto-cli
SET SESSION hive.orc_use_column_names=true;
Release notes from Presto regarding these attribute.

How to load data to Hive table and make it also accessible in Impala

I have a table in Hive:
CREATE EXTERNAL TABLE sr2015(
creation_date STRING,
status STRING,
first_3_chars_of_postal_code STRING,
intersection_street_1 STRING,
intersection_street_2 STRING,
ward STRING,
service_request_type STRING,
division STRING,
section STRING )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES (
'colelction.delim'='\u0002',
'field.delim'=',',
'mapkey.delim'='\u0003',
'serialization.format'=',', 'skip.header.line.count'='1',
'quoteChar'= "\"")
The table is loaded data this way:
LOAD DATA INPATH "hdfs:///user/rxie/SR2015.csv" INTO TABLE sr2015;
Why the table is only accessible in Hive? when I attempt to access it in HUE/Impala Editor I got the following error:
AnalysisException: Could not resolve table reference: 'sr2015'
which seems saying there is no such a table, but the table does show up in the left panel.
In Impala-shell, error is different as below:
ERROR: AnalysisException: Failed to load metadata for table: 'sr2015'
CAUSED BY: TableLoadingException: Failed to load metadata for table:
sr2015 CAUSED BY: InvalidStorageDescriptorException: Impala does not
support tables of this type. REASON: SerDe library
'org.apache.hadoop.hive.serde2.OpenCSVSerde' is not supported.
I have always been thinking Hive table and Impala table are essentially the same and difference is Impala is a more efficient query engine.
Can anyone help sort it out? Thank you very much.
Assuming that sr2015 is located in DB called db, in order to make the table visible in Impala, you need to either issue
invalidate metadata db;
or
invalidate metadata db.sr2015;
in Impala shell
However in your case, the reason is probably the version of Impala you're using, since it doesn't support the table format altogether

Hive create table for json data

I am trying to create the hive table which can read the json data, but when I am executing the create statement it is throwing an error.
Create statement:
CREATE TABLE employee_exp_json
( id INT,
fname STRING,
lname STRING,
profession STRING,
experience INT,
exp_service STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serede2.Jsonserede'
STORED AS TEXTFILE;
Error:
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde:
org.apache.hadoop.hive.contrib.serede2.Jsonserede
I have also added the jar hive-json-serde.jar, but I'm still facing the same issue. I am creating this table on cloudera and hive version is 1.1.0.
The correct class name is
org.apache.hive.hcatalog.data.JsonSerDe
Refer: Hive SerDes
As for the other JAR you added, check its documentation. Still a different class
org.openx.data.jsonserde.JsonSerDe
Try adding the json-serde-with-dependencies.jar.
You can Download it from Download Hive Serde
Also try the class
'org.openx.data.jsonserde.JsonSerDe'

PySpark: java.lang.ClassCastException

I have a PySpark code which develops the query and runs insert into command on another Hive table which is internally mapped to a HBase table.
When I run the insert into command onto the Hive table using spark sql I get the following exception..
java.lang.ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.HiveOutputFormat
I checked the datatypes and tblproperties but unable to get through this exception.
The versions I am using are:
PySpark -- 1.6.0
Hive -- 1.1.0-cdh5.8.2
The table properties are:
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe'
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties("hbase.columns.mapping"=":key,colf:a")
tblproperties("hbase.table.name"="abc",'hbase.mapred.output.outputtable' = 'abc');
I tried removing the Row Format Serde even though getting the same issue..
Am I getting the issue because of the versions not getting matched?? or am I going wrong??
This is a bug of spark, see this apache spark pull, https://github.com/apache/spark/pull/17989

Apache Kylin Hive table schema require Streaming cluster settings

I created Hive table over my HBase table
CREATE EXTERNAL TABLE test(key string, value string, value1 string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, g:array, g:key")
TBLPROPERTIES ("hbase.table.name" = "close-counter-accounts");
and connected it to Kylin. But when I try to build cube I get one strange exception about Apache.Kafka
Also DataSource page require to setup Streaming cluster settings that sounds very strange couse its table mapped data
Did anybody have that kind of problem like me?
Problem solved. There was a problem in bad kylin configuration. Reinstall kylin helped