Cannot validate serde: org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe - hive

Getting Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe
while creating table on Hive. Below is the table creation script :
CREATE EXTERNAL TABlE ratings(user_id INT, movie_id INT,rating INT,rating_time String)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
WITH SERDEPROPERTIES ("field.delim"="::")
LOCATION '/user/hive/ratings';
HDP Version : 2.1.1

You are facing this problem because your hive lib does not have hive-contrib jar or hive-site.xml is not pointing to it.
Check '/usr/lib/hive/lib' folders . There must be a jar hive-contrib-<version>.jar in this folder
If you do not find any jar in that folder download it from this link
Please take care of the correct version.
Now put that file to your hive lib folder mentioned above.
you can add this file to your hive CLI in two ways
for single Session :
add jar /usr/lib/hive/lib/hive-contrib-<version>.jar;
For permanent solution : Add this to your hive-site.xml
<property>
<name>hive.aux.jars.path</name>
<value>/usr/lib/hive/lib/*</value>
</property>
P.S: MultiDelimitSerDe class is added after hive-contrib-0.13.Please ensure that you are using correct version

Related

Unable to read the parquet files present on the Amazon s3 and load into FlinkSql

I have succesfully loaded the csv files data from amazon s3 into the FlinkSql local machine
CREATE TABLE fs_table (
username STRING,
age STRING
) WITH (
'connector'='filesystem',
'path'='s3://d11-data-lake-load/flink/events_data/test_flink/test-csv',
'format'='csv'
);
The same thing I tried with Parquet files to load from S3 into FlinkSql: Getting exceptions
CREATE TABLE fs_table (
username STRING,
age STRING
) WITH (
'connector'='filesystem',
'path'='s3://d11-data-lake-load/flink/events_data/test_flink/test-parquet',
'format'='parquet'
);
[INFO] Table has been created.
Flink SQL> select * from fs_table;
[ERROR] Could not execute SQL statement. Reason:
java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.security.KerberosName
I have included the related jar in the flink classpath, but a new exception comes and this continues forever
In case the flink do not find any filesystem related libraries it will look in the HADOOP classpath. Hence please add below classpath variables in ~/.bash_profile.
export HADOOP_CLASSPATH= 'hadoop classpath'
Looks like this on my machine in ~/.bash_profile:
export HADOOP_HOME="/usr/local/Cellar/hadoop"
export HADOOP_CLASSPATH=$(find $HADOOP_HOME -name '*.jar' | xargs echo | tr ' ' ':')
Also include flink-sql-parquet jar of the same version of the flink(installed) in the lib folder of flink directory. Here it is 1.12.2
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-sql-parquet_2.11</artifactId>
<version>1.12.2</version>
</dependency>

Hive with HBase (both Kerberos) java.net.SocketTimeoutException .. on table 'hbase:meta'

Error
Receiving Timeout errors when trying to query HBase from Hive using HBaseStorageHandler.
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=68199: row 'phoenix_test310,,'
on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hbase-master.example.com,16020,1583728693297, seqNum=0
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:64)
... 3 more
I tried to follow what documentation I could and have some hbase configuraiton options added to hive-site.xml based on this Cloudera link
Environment:
Hadoop 2.9.2
HBase 1.5
Hive 2.3.6
Zookeeper 3.5.6
First, the Cloudera link should be ignored, Hive detects the presence of HBase through environment variables and then automatically reads the hbase-site.xml configuration settings.
There is no need to duplicate HBase settings within hive-site.xml
Configuring Hive for HBase
Modify your hive-env.sh as folllows:
# replace <hbase-install> with your installation path /etc/hbase for example
export HBASE_BIN="<hbase-install>/bin/hbase"
export HBASE_CONF_DIR="<hbase-install>/conf"
Separately you should ensure HADOOP_* environment variables are set as well in hive-env.sh,
and that the hbase lib directory is added to HADOOP_CLASSPATH.
We solved this error,by adding this property hbase.client.scanner.timeout.period=600000
hbase 1.2
https://docs.cloudera.com/documentation/enterprise/5-5-x/topics/admin_hbase_scanner_heartbeat.html#concept_xsl_dz1_jt

How to add JAR for Hive custom UDF so it is available permanently on the HDInsight cluster?

I have created a custom UDF in Hive, it's tested in Hive command line and works fine. So now I have the jar file for the UDF, what I need to do so that users will be able to create temporary function pointing to it? Ideally from command prompt of Hive I would do this:-
hive> add jar myudf.jar;
Added [myudf.jar] to class path
Added resources: [myudf.jar]
hive> create temporary function foo as 'mypackage.CustomUDF';
After this I am able to use the function properly.
But I don't want to add jar each and every time I want to execute the function. I should be able to run this function while:-
executing Hive query against HDInsight cluster from Visual Studio
executing Hive query from command line through SSH(Linux) or
RDP/cmd(Windows)
executing Hive query from Ambari (Linux) Hive view
executing Hive query from HDinsight Query Console Hive
Editor(Windows cluster)
So, no matter how I am executing the query the JAR should be already available and added to the path. What's the process to ensure this for Linux as well as Windows cluster?
may be you could add the jar in hiverc file present in hive etc/conf directory. This file will be loaded every time when hive starts. So from next time you need not to add jar separably for that session.

How to connect Spark-Notebook to Hive metastore?

This is a cluster with Hadoop 2.5.0, Spark 1.2.0, Scala 2.10, provided by CDH 5.3.2. I used a compiled spark-notebook distro
It seems Spark-Notebook cannot find the Hive metastore by default.
How to specify the location of hive-site.xml for spark-notebook so that it can load the Hive metastore?
Here is what I tried:
link all files from /etc/hive/conf, with hive-site.xml included, to the current directory
specify SPARK_CONF_DIR variable in bash
When you start the notebook set the environment variable EXTRA_CLASSPATH with the path where you have located the hive-site.xml,
this works for me:EXTRA_CLASSPATH=/path_of_my_mysql_connector/mysql-connector-java.jar:/my_hive_site.xml_directory/conf ./bin/spark-notebook
I have also passed the jar of my mysqlconnector because I have Hive with MySql.
I have found some info from this link: https://github.com/andypetrella/spark-notebook/issues/351
Using CDH 5.5.0 Quickstart VM, the solution is the following: You need the reference hive-site.xmlto the notebook which provides the access information to the hive metastore. By default, spark-notebooks uses an internal metastore.
You can the define the following environmental variable in ~/.bash_profile:
HADOOP_CONF_DIR=$HADOOP_CONF_DIR:/etc/hive/conf.cloudera.hive/
export HADOOP_CON_DIR
(Make sure you execute source ~/.bash_profile if you do not open a new terminal the terminal)
(The solution is given here: https://github.com/andypetrella/spark-notebook/issues/351)

Hive HBase integration failure

I am using hadoop 2.7.0, hive 1.2.0 and HBase 1.0.1.1
I have created a simple table in HBase
hbase(main):021:0> create 'hbasetohive', 'colFamily'
0 row(s) in 0.2680 seconds
=> Hbase::Table - hbasetohive
hbase(main):022:0> put 'hbasetohive', '1s', 'colFamily:val','1strowval'
0 row(s) in 0.0280 seconds
hbase(main):023:0> scan 'hbasetohive'
ROW COLUMN+CELL
1s column=colFamily:val, timestamp=1434644858733, value=1strowval
1 row(s) in 0.0170 seconds
Now I have tried to access this HBase table through Hive external table. But while select from external table I am getting below error.
hive (default)> CREATE EXTERNAL TABLE hbase_hivetable_k(key string, value string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = "colFamily:val")
> TBLPROPERTIES("hbase.table.name" = "hbasetohive");
OK
Time taken: 1.688 seconds
hive (default)> Select * from hbase_hivetable_k;
OK
hbase_hivetable_k.key hbase_hivetable_k.value
WARN: The method class org.apache.commons.logging.impl.SLF4JLogFactory#release() was invoked.
WARN: Please see http://www.slf4j.org/codes.html#release for an explanation.
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Scan.setCaching(I)V
at org.apache.hadoop.hive.hbase.HiveHBaseInputFormatUtil.getScan(HiveHBaseInputFormatUtil.java:123)
at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getRecordReader(HiveHBaseTableInputFormat.java:99)
at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:673)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1667)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
It is totally coming out of hive prompt it self.
Can someone please tell me what is the issue here.
The below .hiverc also I am using from hive/conf directory :
SET hive.cli.print.header=true;
set hive.cli.print.current.db=true;
set hive.auto.convert.join=true;
SET hbase.scan.cacheblock=0;
SET hbase.scan.cache=10000;
SET hbase.client.scanner.cache=10000;
add JAR /usr/lib/hive/auxlib/zookeeper-3.4.6.jar;
add JAR /usr/lib/hive/auxlib/hive-hbase-handler-1.2.0.jar;
add JAR /usr/lib/hive/auxlib/guava-14.0.1.jar;
add JAR /usr/lib/hive/auxlib/hbase-common-1.0.1.1.jar;
add JAR /usr/lib/hive/auxlib/hbase-client-1.0.1.1.jar;
add JAR /usr/lib/hive/auxlib/hbase-hadoop2-compat-1.0.1.1.jar;
add JAR /usr/lib/hive/auxlib/hbase-hadoop-compat-1.0.1.1.jar;
add JAR /usr/lib/hive/auxlib/commons-configuration-1.6.jar;
add JAR /usr/lib/hive/auxlib/hadoop-common-2.7.0.jar;
add JAR /usr/lib/hive/auxlib/hbase-annotations-1.0.1.1.jar;
add JAR /usr/lib/hive/auxlib/hbase-it-1.0.1.1.jar;
add JAR /usr/lib/hive/auxlib/hbase-prefix-tree-1.0.1.1.jar;
add JAR /usr/lib/hive/auxlib/hbase-protocol-1.0.1.1.jar;
add JAR /usr/lib/hive/auxlib/hbase-rest-1.0.1.1.jar;
add JAR /usr/lib/hive/auxlib/hbase-server-1.0.1.1.jar;
add JAR /usr/lib/hive/auxlib/hbase-shell-1.0.1.1.jar;
add JAR /usr/lib/hive/auxlib/hbase-thrift-1.0.1.1.jar;
add JAR /usr/lib/hive/auxlib/high-scale-lib-1.1.1.jar;
add JAR /usr/lib/hive/auxlib/hive-serde-1.2.0.jar;
add JAR /usr/lib/hbase/lib/commons-beanutils-1.7.0.jar;
add JAR /usr/lib/hbase/lib/commons-beanutils-core-1.8.0.jar;
add JAR /usr/lib/hbase/lib/commons-cli-1.2.jar;
add JAR /usr/lib/hbase/lib/commons-codec-1.9.jar;
add JAR /usr/lib/hbase/lib/commons-collections-3.2.1.jar;
add JAR /usr/lib/hbase/lib/commons-compress-1.4.1.jar;
add JAR /usr/lib/hbase/lib/commons-digester-1.8.jar;
add JAR /usr/lib/hbase/lib/commons-el-1.0.jar;
add JAR /usr/lib/hbase/lib/commons-io-2.4.jar;
add JAR /usr/lib/hbase/lib/htrace-core-3.1.0-incubating.jar;
add JAR /usr/local/src/spark/lib/spark-assembly-1.3.1-hadoop2.6.0.jar;
I was having the same issue, actually the issue because of Hive 1.2.0 version is not compatible with hbase version 1.x.
As mentioned in HBaseIntegration:
Version information
As of Hive 0.9.0 the HBase integration requires at least HBase 0.92, earlier versions of Hive were working with HBase 0.89/0.90
Version information
Hive 1.x will remain compatible with HBase 0.98.x and lower versions. Hive 2.x will be compatible with HBase 1.x and higher. (See HIVE-10990 for details.) Consumers wanting to work with HBase 1.x using Hive 1.x will need to compile Hive 1.x stream code themselves.
So to make hive 1.x work with hbase 1.x you have to download the source code of hive 2.0 branch from hive on github and build it, after building replace the hive-hbase-handler jar file with the newer version then it will work.