Error in installation of Hive 2.1.0 on Hadoop 2.7.2 - Pseudo distributed mode - hive

I followed Apache Hadoop installation links and could install the same along with PIG. They all are working fine.
Following is the configuration:
Hadoop: 2.7.2
Hive: 2.1.0
Machine: Ubuntu 14.04 LTS 64-bit
Java: Version 9
Now I tried to install Apache Hive 2.1.0 according to this link [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Installation#AdminManualInstallation-InstallingfromaTarball].
... and started test execution of Hive CLI but everytime it throws following error and exits.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.ClassCastException: jdk.internal.loader.ClassLoaders$AppClassLoader (in module: java.base) cannot be cast to java.net.URLClassLoader (in module: java.base)
at org.apache.hadoop.hive.ql.session.SessionState.<init> (SessionState.java:374)
at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:350)
at org.apache.hadoop.hive.cli.CliSessionState.<init>(CliSessionState.java:60)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:663)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base#9-ea/Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base#9-ea/NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base#9-ea/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base#9-ea/Method.java:533)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
..But there is a catch. If I invoke Beeline CLI then it works fine.
Could you please help :
a. Are the Beeline CLI and Hive CLI same or any specific difference?
b. Help to install/configure Hive on my machine

A : Beeline CLI VS Hive CLI https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_dataintegration/content/beeline-vs-hive-cli.html
B : According to :
http://openjdk.java.net/projects/jigsaw/talks/prepare-for-jdk9-j1-2015.pdf
Java 9 Uses no longer uses java.net.URLClassLoader.
However, I was able to solve the issue by pointing Hive to JDK8.
** I have only begun using HIVE/HADOOP... Perhaps someone could proved a better explanation or a workaround so that we are able to use JDK9...

Related

Error when trying to start HiveServer2: NullPointerException in ThriftBinaryCLIService

When I start hiveserver2 with the following command:
hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10000 --hiveconf hive.root.logger=INFO,console
I receive the following error before the program exits:
2022-09-12T14:46:53,713 ERROR [Thrift Server] transport.TServerSocket: Could not set socket timeout.
java.net.SocketException: Socket is closed
at java.net.ServerSocket.setSoTimeout(ServerSocket.java:666) ~[?:1.8.0_292]
at org.apache.thrift.transport.TServerSocket.listen(TServerSocket.java:117) ~[hive-exec-3.1.3.jar:3.1.3]
at org.apache.thrift.server.TThreadPoolServer.serve(TThreadPoolServer.java:146) ~[hive-exec-3.1.3.jar:3.1.3]
at org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:169) ~[hive-service-3.1.3.jar:3.1.3]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
Hive Session ID = 56c28481-2b0c-4712-808d-ff7ccf31b543
Hive Session ID = 9771e219-095c-4524-b34a-b8e05c335fc0
2022-09-12T14:48:03,871 ERROR [Thrift Server] thrift.ThriftCLIService: Exception caught by ThriftBinaryCLIService. Exiting.
java.lang.NullPointerException: null
at org.apache.hive.service.cli.thrift.ThriftBinaryCLIService.run(ThriftBinaryCLIService.java:169) ~[hive-service-3.1.3.jar:3.1.3]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
Here is a brief explanation of my setup:
I am using vagrant and VirtualBox to create a "virtual" cluster.
This is very loosely (since the repository hasn't been updated in a while, I have had to make many changes to get it to work) based on this repository - https://github.com/njvijay/vagrant-jilla-hadoop
I have created 5 nodes (1 name node and 4 data nodes). The namenode also contains yarnm hive, pig, spark, mysql, python etc.
I am using Ubuntu 14.04.6, Hadoop 2.10.1, Hive 3.1.3, Spark 3.3.0 and Pig 0.15
It seems that there may be some compatibility issue between Hadoop 2 and Spark 3. I was able to resolve the error after updating Hadoop, Hive and Spark to the latest versions.

Databricks Connect: Unable to run scala program in databricks cluster from IntelliJ

Followed the steps mentioned in this docs. databricks-connect test command works fine. However, when I launch the test scala program from Intellij, I'm seeing following error:
Exception in thread "main" java.lang.NoSuchMethodError: com.databricks.spark.util.MetricDefinitions$.EVENT_INITIAL_CONFIG_LOG()Lcom/databricks/spark/util/MetricDefinition;
at com.databricks.spark.util.InitialConfigLogging$.recordInitialConfigs(InitialConfigLogging.scala:47)
at org.apache.spark.sql.internal.SQLConf.recordSessionInitialConfs(SQLConf.scala:4166)
at org.apache.spark.sql.SparkSession.<init>(SparkSession.scala:150)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1033)
at com.srikanth.Demo$.main(Demo.scala:13)
at com.srikanth.Demo.main(Demo.scala)
Environment details:
python - 3.8
java - 1.8
databricks-connect - 9.1
I had the same error message, turns out I did not add the databricks-connect library properly. You should add the library like described here: https://docs.databricks.com/dev-tools/databricks-connect.html#intellij-scala-or-java

Error while deploying Flink custom JAR file in AWS EMR

Basically I want to deploy a Flink custom JAR file to a new AWS EMR cluster. Here is a summary of what I did. I created a new AWS EMR cluster.
Step1:Software and steps changes -
Created a AWS EMR cluster with flink as the service. (EMR release version - 5.17.0) and clicked Flink 1.5.2 as the software configuration.
Entered the Configuration JSON:-
[
{
"Classification": "flink-conf",
"Properties": {
"jobmanager.heap.mb": "3072",
"taskmanager.heap.mb": "51200",
"taskmanager.numberOfTaskSlots":"2",
"taskmanager.memory.preallocate": "false",
"parallelism.default": "1"
}
]
Step2:Hardware - No change in the hardware configuration.By default we have 1 master, 2 core and 0 Task instances. All are m3.xlarge type.
Step3:General Cluster Settings - No change here.
Step4:Security - Provided my EC2 key pair.
Once the cluster creation is ready I SSHed to the EC2 machine and tried to deploy the custom jar file. Below are the different errors I got everytime tried to deploy it via the CLI.
1)
flink run -m yarn-cluster -yn 2 -c com.deepak.flink.examples.WordCount flink-examples-assembly-1.0.jar
Using the result of 'hadoop classpath' to augment the Hadoop classpath: /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*::/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/cloudwatch-sink/lib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/flink/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-10-09 06:30:36,766 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at ip-IPADDRESS.ec2.internal/IPADDRESS:8032
2018-10-09 06:30:36,909 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-10-09 06:30:37,168 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Killing YARN application
2)
flink run -c com.deepak.flink.examples.WordCount flink-examples-assembly-1.0.jar
Using the result of 'hadoop classpath' to augment the Hadoop classpath: /etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*::/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/cloudwatch-sink/lib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/flink/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.deployment.ClusterRetrieveException: Couldn't retrieve standalone cluster
at org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:51)
at org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:31)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:253)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:214)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1025)
at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
Caused by: org.apache.flink.util.ConfigurationException: Config parameter 'Key: 'jobmanager.rpc.address' , default: null (deprecated keys: [])' is missing (hostname/address of JobManager to connect to).
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getJobManagerAddress(HighAvailabilityServicesUtils.java:141)
at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:81)
at org.apache.flink.client.program.ClusterClient.<init>(ClusterClient.java:158)
at org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:183)
at org.apache.flink.client.program.rest.RestClusterClient.<init>(RestClusterClient.java:156)
at org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:49)
... 10 more
Even I tried to deploy via the AWS Web UI, there also the jar failed to deploy.
So, Basically I want to deploy the custom JAR to the flink YARN Cluster. I am not sure what I am missing for the YARN flink configuration or anything else. Thanks for any help in advance.
You should reduce memory allocation for task manager. Currently, you are trying to allocate 51.2G of memory whereas single m3.xlarge machine has only 15G of memory and in total 30G for 2 machines cluster.

Hive Set up issue "SINGLETON from class org.slf4j.LoggerFactory**"

HAdoop single node cluster is working fine. HAdoop is working fine. JPS/Web interface of hadoop working fine.
I have done set up of hive.
When am entering into hive from hadoop its giving me below error :
"Exception in thread "main" java.lang.IllegalAccessError: tried to access field org.slf4j.impl.StaticLoggerBinder.SINGLETON from class org.slf4j.LoggerFactory
Can someone help me in this case
If you get the slf4j IllegalAccessError shown above, then you are using an older version of slf4j-api, e.g. 1.4.3, with a new version of a slf4j binding, e.g. 1.5.6. As mentioned in the slf4j faq on IllegalAccessError, this typically occurs when your Maven pom.ml file incorporates hibernate 3.3.0 which declares a dependency on slf4j-api version 1.4.2. If your pom.xml declares a dependency on an slf4j binding, say slf4j-log4j12 version 1.5.6, then you will get illegal access errors.
Also refer to the solution here: IllegalAccessError slf4j

java.lang.AbstractMethodError: org.slf4j.impl.Log4jLoggerAdapter.log

i have written a example for login using gwt tool with hibernate and spring integration.
getting this error-
Initializing App Engine server
SLF4J: The requested version 1.5.8 by your slf4j binding is not compatible with [1.6]
SLF4J: See http://www.slf4j.org/codes.html#version_mismatch for further details.
Module setup completed in 579 ms
java.lang.AbstractMethodError: org.slf4j.impl.Log4jLoggerAdapter.log(Lorg/slf4j/Marker;Ljava/lang/String;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V
at org.eclipse.jetty.util.log.JettyAwareLogger.log(JettyAwareLogger.java:620)
at org.eclipse.jetty.util.log.JettyAwareLogger.debug(JettyAwareLogger.java:206)
at org.eclipse.jetty.util.log.Slf4jLog.debug(Slf4jLog.java:89)
at org.eclipse.jetty.util.component.Container.add(Container.java:206)
at org.eclipse.jetty.util.component.Container.update(Container.java:169)
at org.eclipse.jetty.util.component.Container.update(Container.java:111)
at org.eclipse.jetty.server.Server.setConnectors(Server.java:200)
at org.eclipse.jetty.server.Server.addConnector(Server.java:174)
at com.google.gwt.dev.codeserver.WebServer.start(WebServer.java:117)
at com.google.gwt.dev.codeserver.CodeServer.start(CodeServer.java:101)
at com.google.gwt.dev.codeserver.CodeServer.main(CodeServer.java:71)
at com.google.gwt.dev.codeserver.CodeServer.main(CodeServer.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.google.gwt.dev.shell.SuperDevListener$1.run(SuperDevListener.java:112)
jars i have used these jars---------
thanks in advance
AbstractMethodError can only occur at run time if the definition of some class has incompatibly changed since the currently executing method was last compiled.
Seems like the slf4j-api version you have is higher than the slf4j binding version (such as slf4j-jdk14.jar or slf4j-log4j12.jar used to bind slf4j to an underlying logging framework).
For example - slf4j-api 1.7.6 along with slf4j-log4j12-1.5.8 will throw the same error.
You would need to syn the version of both slf4j apis and its binding jar verson.
Mostly its a libraries confilict, what that means there is a library uses log4j with version 1.5.8, and you have imported log4j 1.6.
2 things to try:
1- remove the log4j 1.6 from your imported jars.
2- replace log4j 1.6 jar file with log4j 1.5.8 jar file.
Hope that will help :)