Oozie hive script scheduling in tez mode - hive

I tried to schedule the hive workflow xml file to run the hive script in tez mode by passing the hadoop properties for referring the tez jar files in workflow xml file as shown below.
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>tez.lib.uris</name>
<value>${nameNode}/apps/Tez/,${nameNode}/apps/Tez/lib/</value>
</property>
</configuration>
Also I had changed the hive-site xml file property hive.execution.engine as tez mode.
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
When I scheduled the workflow using oozie. I got the error as follows
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], main() threw exception, org/apache/tez/dag/api/SessionNotRunning
java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:479)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:306)
at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:290)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassNotFoundException: org.apache.tez.dag.api.SessionNotRunning
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 24 more
Can any one please say how to rectify this issue and to schedule my workflow xml file and run the hive script in tez mode.

I observed the above error before and I was able to resolve it and was able to run Hive(tez engine) on Oozie.
Here's the steps I followed.
Class not found error:
As the error says, Oozie Launcher container cannot find the SessionNotRunning class.
This class is part of tez-api-0.x.x.jar. you can confirm that using :
jar tvf /usr/lib/tez/tez-api-0.7.0.jar | grep SessionNotRunning
You need to make sure your Oozie launcher container(which is YARN) localize this and other TEZ JAR's for it to pass it to HiveClient.
The expectation is that if we include the following config property in the workflow.xml , oozie should pick up all those JAR's.
<property>
<name>tez.lib.uris</name>
<value>hdfs:///apps/tez/,hdfs:///apps/tez/lib/</value>
</property>
However, it may not do that.(Not sure why)
So, I copied all TEZ JAR's to Hive-action's share library in HDFS (Ex: to /user/oozie/share/lib/lib_20160405125827/hive/). The oozie hive-action on your workflow should use JAR's present in that path and localize those JAR's.
While doing that, make sure the new JAR's have same permission as previous JAR's present in that HDFS directory. Oozie also need a refresh of share library.
Example commands can be:
hadoop fs -copyFromLocal /usr/lib/tez/*.jar /user/oozie/share/lib/lib_20160405125827/hive/
hadoop fs -copyFromLocal /usr/lib/tez/lib/*.jar /user/oozie/share/lib/lib_20160405125827/hive/
hadoop fs -chown oozie:oozie /user/oozie/share/lib/lib_20160405125827/hive/*.jar
oozie admin -sharelibupdate
Now, if you list your hive share library, oozie admin -shareliblist hive , you should be able to see all TEZ libraries.
With those steps, you should no longer see NoClassDefFoundError's or ClassNotFoundException's from TEZ jars.
Missing Hadoop Dependencies:
At This time, the TEZ job should be submitted, but there's another error that you may encounter on the OOZIE launcher .
14972 [uber-SubtaskRunner] ERROR org.apache.hadoop.hive.ql.exec.Task - Failed to execute tez graph.
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1459860815404_0033 failed 2 times due to AM Container for appattempt_1459860815404_0033_000002 exited with exitCode: 1
looking at container logs, I see
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/service/AbstractService
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.service.AbstractService
This is because the my TEZ installation is minimal and doesn't come with hadoop dependencies.
https://github.com/apache/tez/blob/release-0.7.0/docs/src/site/markdown/install.md#hadoop-installation-dependent-installdeploy-instructions
So, you need to tell TEZ to use your cluster's hadoop libraries using the following property in your workflow.xml.
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
So, with the above steps , I was able to run a hive script successfully on TEZ engine via Oozie.

Related

Flink 1.10 not connecting to minio (s3)

I'm running locally a docker compose running flink and minio
When I try to connect to minio, I always get the following error:
caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
It seems that the plugin isn't loaded correctly.
my flink config (flink-conf.yaml):
state.backend: filesystem
s3.endpoint: http://minio:9000
s3.path.style.access: true
s3.access-key: minio
s3.secret-key: minio123
presto.s3.access-key: minio
presto.s3.secret-key: minio123
presto.s3.endpoint: http://minio:9000
presto.s3.path-style-access: true
I've copied the required plugin as following:
mkdir -p plugins/s3-fs-presto
cp opt/flink-s3-fs-presto-*.jar plugins/s3-fs-presto
Any suggestions?
Stack trace:
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 7ae6657256719d8c32d76ba113fb35f0)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:262)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:338)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:326)
at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:820)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:413)
at org.apache.flink.api.java.DataSet.print(DataSet.java:1652)
at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:604)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:466)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:274)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:746)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:273)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:205)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1008)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1081)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1081)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:147)
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:259)
... 21 more
Caused by: java.io.IOException: Error opening the Input Split s3://test/test.txt [0,3243]: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:824)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:470)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:47)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:173)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 's3'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:450)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:362)
at org.apache.flink.api.common.io.FileInputFormat$InputSplitOpenThread.run(FileInputFormat.java:995)
Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.
at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(UnsupportedSchemeFactory.java:58)
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:446)
... 2 more
#

Apache Hive query on Tez FileNotFoundException

I'm receiving this exception when executing a Hive query on Tez with Hive 2.3.6 and Tez 0.9.2
I know Tez is configured correctly because I can manually run map-reduce jobs via Hadoop.
Dag submit failed due to java.io.FileNotFoundException: Path is not a file: /tmp/hive/root/_tez_session_dir/f4f4b17c-0657-41fa-8674-df83fa3ad362/lib
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62)
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:150)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1829)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:709)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:381)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
This error is seen on Hive 2.2+ or Hive 2.3.x+ when
hive.aux.jars.path in hive-site.xml is configured with an invalid path.
or
HIVE_AUX_JARS_PATH environment variable is configured improperly (usually in hive-env.sh)

java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

I have Hadoop 2.7.1 and apache-hive-1.2.1 versions installed on ubuntu 14.0.
Why this error is occurring ?
Is any metastore installation required?
When we typing hive command on terminal how the xml's internally called, what is the flow of those xml's?
Any other configuration's required?
When I am writing the hive command on ubuntu 14.0 terminal it is throwing the below exception.
$ hive
Logging initialized using configuration in jar:file:/usr/local/hive/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:520)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
... 8 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:426)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
... 14 more
Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
NestedThrowables:
java.lang.reflect.InvocationTargetException
at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:788)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:520)
at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
at java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:57)
at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:624)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
... 19 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:426)
at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
at org.datanucleus.store.AbstractStoreManager.<init>(AbstractStoreManager.java:240)
at org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:286)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:426)
at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
... 48 more
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:259)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:85)
... 66 more
Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
at org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:58)
at org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
... 68 more
To avoid above error I created hive-site.xml with :
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/home/local/hive-metastore-dir/warehouse</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hivedb?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>user</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>
<configuration>
Also provided the environment variables in ~/.bashrc file; Still the error persist
#HIVE home directory configuration
export HIVE_HOME=/usr/local/hive/apache-hive-1.2.1-bin
export PATH="$PATH:$HIVE_HOME/bin"
starting the hive metastore service worked for me.
First, set up the database for hive metastore:
$ hive --service metastore
`
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_installing_manually_book/content/validate_installation.html
Second, run the following commands:
$ schematool -dbType mysql -initSchema
$ schematool -dbType mysql -info
https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool
I did below modifications and I am able to start the Hive Shell without any errors:
1. ~/.bashrc
Inside bashrc file add the below environment variables at End Of File : sudo gedit ~/.bashrc
#Java Home directory configuration
export JAVA_HOME="/usr/lib/jvm/java-9-oracle"
export PATH="$PATH:$JAVA_HOME/bin"
# Hadoop home directory configuration
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HIVE_HOME=/usr/lib/hive
export PATH=$PATH:$HIVE_HOME/bin
2. hive-site.xml
You have to create this file(hive-site.xml) in conf directory of Hive and add the below details
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>True</value>
</property>
</configuration>
3. You also need to put the jar file(mysql-connector-java-5.1.28.jar) in the lib directory of Hive
4. Below installations required on your Ubuntu to Start the Hive Shell:
MySql
Hadoop
Hive
Java
5. Execution Part:
Start all services of Hadoop: start-all.sh
Enter the jps command to check whether all Hadoop services are up and running: jps
Enter the hive command to enter into hive shell: hive
If you're just playing around in local mode, you can drop metastore DB and reinstate it:
rm -rf metastore_db/
$HIVE_HOME/bin/schematool -initSchema -dbType derby
In my case when i tried
$ hive --service metastore
I got
MetaException(message:Version information not found in metastore. )
The necessary tables required for the metastore are missing in MySQL. Manually create the tables and restart hive metastore.
cd $HIVE_HOME/scripts/metastore/upgrade/mysql/
< Login into MySQL >
mysql> drop database IF EXISTS <metastore db name>;
mysql> create database <metastore db name>;
mysql> use <metastore db name>;
mysql> source hive-schema-2.x.x.mysql.sql;
metastore db name should match the database name mentioned in hive-site.xml files connection property tag.
hive-schema-2.x.x.mysql.sql file depends on the version available in the current directory. Try to go for the latest because it holds many old schema files also.
Now try to execute hive --service metastore
If everything goes cool, then simply start the hive from terminal.
>hive
I hope the above answer serves your need.
Run hive in debug mode
hive -hiveconf hive.root.logger=DEBUG,console
and then execute
show tables
can find the actual problem
you just need to instantiate schema and you can do the same with the below commands.i dit it and am able to run hive query without throwing
ERROR:Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
cd $HIVE_HOME
mv metastore_db metastore_db_bkup
schematool -initSchema -dbType derby
bin/hive
now run your query:
hive> show databases;
I have used MySQL DB for Hive MetaStore. Please follow the below steps:
in hive-site.xml the metastore should be proper
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastorecreateDatabaseIfNotExist=true&useSSL=false</value>
</property>
go to the mysql: mysql -u hduser -p
then run drop database metastore
then come out from MySQL and execute schematool -initSchema dbType mysql
Now error will go.
In the middle of the stack trace, lost in the "reflection" junk, you can find the root cause:
The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
This is probably due to its lack of connections to the Hive Meta Store,my hive Meta Store is stored in Mysql,so I need to visit Mysql,So I add a dependency in my build.sbt
libraryDependencies += "mysql" % "mysql-connector-java" % "5.1.38"
and the problem is solved!
mybe your hive metastore are inconsistent! I'm in this scene.
first. I run
$ schematool -dbType mysql -initSchema
then I found this
Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
then I run
$ schematool -dbType mysql -info
found this error
Hive distribution version: 2.3.0
Metastore schema version: 1.2.0
org.apache.hadoop.hive.metastore.HiveMetaException: Metastore schema version is not compatible. Hive Version: 2.3.0, Database Schema Version: 1.2.0
so i format my hive metastore, then it's done!
drop mysql database, the database named hive_db
run schematool -dbType mysql -initSchema for initialize metadata
I have also faced this problem but i had restart Hadoop and use command hadoop dfsadmin -safemode leave
now start hive it will work i think
I solved this problem by removing --deploy-mode cluster from spark-submit code.
By default , spark submit takes client mode which has following advantage :
1. It opens up Netty HTTP server and distributes all jars to the worker nodes.
2. Driver program runs on master node , which means dedicated resources to driver process.
While in cluster mode :
1. It runs on worker node.
2. All the jars need to be placed in a common folder of the cluster so that it is accessible to all the worker nodes or in folder of each worker node.
Here it's not able to access hive metastore due to unavailability of hive jar to any of the nodes in cluster.
1- Append the following lines to the startup file ~/.bashrc
export HIVE_HOME=~/hive
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$HADOOP_HOME/lib/*
export CLASSPATH=$CLASSPATH:HIVE_HOME/lib/*:.
2- Edit the file $HIVE_HOME/conf/hive-env.sh
cd $HIVE_HOME/conf
cp hive-env.sh.template hive-env.sh
3- Edit the hive-env.sh file to append the following line
export HADOOP_HOME=$HADOOP_HOME
4- There are two different versions of the "guava" library used by Hadoop and Hive. The solution is to use the same version of "guava" in both, Hadoop and Hive: Note: (My system is using guava-27, that's why I shared this example. It depends on your guava version of Hadoop and Hive. You need to check it)
cp ~/hadoop/share/hadoop/hdfs/lib/guava-27.0-jre.jar ~/hive/lib/
rm ~/hive/lib/guava-19.0.jar
5- In the conf directory of hive (~/hive/conf), create the metastore:
schematool -initSchema -dbType derby
6- We also need to create the configuration file of Hive, in the conf directory:
cp hive-default.xml.template hive-site.xml
In the file, hive-site.xml, at line 3215, there are some characters that should be deleted around column 96. The four characters to delete are: 
7- Edit the file hive-site.xml as follows:
Replace the occurrences of
${system:java.io.tmpdir} with /tmp/hive_io
Replace ${system:user.name} with hadoop
Note: ${system:java.io.tmpdir} occurs 3-4 times in hive-site.xml. Make sure change everything.
8- Finally you can type hive. (Don't forget, first of all you need to run hadoop before hive) Note: make sure starting the hive under hive path.
just open the hive terminal from the hive folder,after editing (bashrc) and (hive-site.xml) files.
Steps--
open hive folder where it is installed.
now open terminal from folder.
In my case, I stopped my docker hive container and run it again and finally, it worked. Hope it will be useful for someone.
Note: This might be caused because there might be an instance running in the background so stopping the container will stop all background instances.
This is happening because you have NOT started the Hive Metastore ... the simples way to do that is to use the default Derby database one ... you can follow this link : https://sparkbyexamples.com/apache-hive/hive-hiveexception-java-lang-runtimeexception-unable-to-instantiate-org-apache-hadoop-hive-ql-metadata-sessionhivemetastoreclient/
I Fixed this Problem by creating and copying hive-default.xml.template to hive-site.xml. To create that use can use below commands
cd /usr/local/Cellar/hive/2.7.1/libexec/conf (please replace hive version)
cp hive-default.xml.template hive-site.xml
and changed the values of the below properties in hive-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>false</value>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/hive</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/hive</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/tmp/hive</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.druid.metadata.db.type</name>
<value>mysql</value>
<description>
Expects one of the pattern in [mysql, postgresql, derby].
Type of the metadata database.
</description>
</property>
</configuration>
After that, I created DB in MySql with the name matastore, created a usename password and grand permissions for it by using the following queries.
$ mysql
mysql> CREATE DATABASE metastore;
mysql> USE metastore;
mysql> CREATE USER 'hiveuser'#'localhost' IDENTIFIED BY 'password';
mysql> GRANT SELECT,INSERT,UPDATE,DELETE,ALTER,CREATE ON metastore.* TO 'hiveuser'#'localhost';
and run the script in MySql with the following command:
mysql> source /usr/local/Cellar/hive/3.1.2_3/libexec/scripts/metastore/upgrade/mysql/hive-schema-3.1.0.mysql.sql
Also don't forget to move SQL connector jar to hive package using the following commands
Download MySQL connector and extract it
tar zxvf mysql-connector-java-5.1.35.tar.gz
sudo cp mysql-connector-java-5.1.35/mysql-connector-java-5.1.35-bin.jar /usr/local/Cellar/hive/2.7.1/libexec/lib/
That's it. Now I can successfully run show tables, etc commands in Hive. :)

ClassNotFoundException when using the Mule Amazon SQS connector

I'm using the Amazon SQS connector in my Mule project. When I updated it from the 2.5.5 to the 3.0.0 version according to the user guide and set the DEBUG logging level for the com.amazonaws package I noticed the following error right after project starts:
DEBUG 2015-07-20 15:15:56,927 [Receiving Thread] com.amazonaws.1.9.39.shade.jmx.spi.SdkMBeanRegistry: Failed to load the JMX implementation module - JMX is disabled
java.lang.ClassNotFoundException: com.amazonaws.1.9.39.shade.jmx.SdkMBeanRegistrySupport
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at org.mule.module.launcher.FineGrainedControlClassLoader.findClass(FineGrainedControlClassLoader.java:175)
at org.mule.module.launcher.MuleApplicationClassLoader.findClass(MuleApplicationClassLoader.java:134)
at org.mule.module.launcher.FineGrainedControlClassLoader.loadClass(FineGrainedControlClassLoader.java:119)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at com.amazonaws.1.9.39.shade.jmx.spi.SdkMBeanRegistry$Factory.<clinit>(SdkMBeanRegistry.java:46)
at com.amazonaws.1.9.39.shade.metrics.AwsSdkMetrics.registerMetricAdminMBean(AwsSdkMetrics.java:351)
at com.amazonaws.1.9.39.shade.metrics.AwsSdkMetrics.<clinit>(AwsSdkMetrics.java:316)
at com.amazonaws.1.9.39.shade.AmazonWebServiceClient.requestMetricCollector(AmazonWebServiceClient.java:629)
at com.amazonaws.1.9.39.shade.AmazonWebServiceClient.isRMCEnabledAtClientOrSdkLevel(AmazonWebServiceClient.java:570)
at com.amazonaws.1.9.39.shade.AmazonWebServiceClient.isRequestMetricsEnabled(AmazonWebServiceClient.java:562)
at com.amazonaws.1.9.39.shade.AmazonWebServiceClient.createExecutionContext(AmazonWebServiceClient.java:523)
at com.amazonaws.1.9.39.shade.services.sqs.AmazonSQSClient.listQueues(AmazonSQSClient.java:1163)
at com.amazonaws.1.9.39.shade.services.sqs.AmazonSQSClient.listQueues(AmazonSQSClient.java:1501)
at org.mule.modules.sqs.connection.strategy.SQSConnectionManagement.connect(SQSConnectionManagement.java:173)
at org.mule.modules.sqs.connectivity.SQSConnectionManagementSQSConnectorAdapter.connect(SQSConnectionManagementSQSConnectorAdapter.java:21)
at org.mule.modules.sqs.connectivity.SQSConnectionManagementSQSConnectorAdapter.connect(SQSConnectionManagementSQSConnectorAdapter.java:9)
at org.mule.devkit.3.6.1.shade.connection.management.ConnectionManagementConnectorFactory.makeObject(ConnectionManagementConnectorFactory.java:47)
at org.mule.devkit.3.6.1.shade.connection.management.ConnectionManagementConnectorFactory.makeObject(ConnectionManagementConnectorFactory.java:15)
at org.apache.commons.pool.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:1220)
at org.mule.modules.sqs.connectivity.SQSConnectorConfigConnectionManagementConnectionManager.acquireConnection(SQSConnectorConfigConnectionManagementConnectionManager.java:407)
at org.mule.modules.sqs.connectivity.SQSConnectorConfigConnectionManagementConnectionManager.acquireConnection(SQSConnectorConfigConnectionManagementConnectionManager.java:55)
at org.mule.devkit.3.6.1.shade.connection.management.ConnectionManagementProcessInterceptor.execute(ConnectionManagementProcessInterceptor.java:47)
at org.mule.devkit.3.6.1.shade.connection.management.ConnectionManagementProcessInterceptor.execute(ConnectionManagementProcessInterceptor.java:19)
at org.mule.security.oauth.process.RetryProcessInterceptor.execute(RetryProcessInterceptor.java:84)
at org.mule.devkit.3.6.1.shade.connection.management.ConnectionManagementProcessTemplate.execute(ConnectionManagementProcessTemplate.java:33)
at org.mule.modules.sqs.sources.ReceiveMessagesMessageSource.run(ReceiveMessagesMessageSource.java:134)
at java.lang.Thread.run(Thread.java:662)
It's true, the mule-module-sqs-3.0.0.jar downloaded by Maven doesn't contain such class. I rebuild the Amazon SQS connector's source code with little changes in the pom.xml: set <minimizeJar>false</minimizeJar> for the maven-shade-plugin. Then the missing class appeared in the jar and I managed to run the project without errors.
Not sure is it a bug or not but don't like the idea to build the connector manually. Will really appreciate if you help me to sort it out. Thanks.

Nutch deployment on hadoop will not index to solr

I have oozie workflow that does a nutch crawl I designed using hue.
All steps in the process work, except for indexing to solr.
The oozie action that defines the solrindex is as follows
`
<start to="solr-test"/>
<action name="solr-test">
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<main-class>org.apache.nutch.indexer.IndexingJob</main-class>
<java-opts>solr.server.url=http://ip-redacted:8983/solr/raw</java-opts>
<arg>hdfs://ip-redacted:8020/user/admin/c</arg>
<arg>-dir</arg>
<arg>hdfs://ip-redacted:8020/user/admin/s000</arg>
</java>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
`
When I run the action I get the following error message
Main class [org.apache.oozie.action.hadoop.JavaMain], exit code [-1]
The locations hdfs://ip-redacted:8020/user/admin/c and
hdfs://ip-redacted:8020/user/admin/s000 are locations that contain the crawldb and the segments respectively.
The stderr of the job says ::
`Log Length: 122
Intercepting System.exit(-1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.JavaMain], exit code [-1]`
The syslog says::
`ERROR [main] org.apache.nutch.indexer.IndexingJob: Indexer: java.lang.RuntimeException: org.apache.nutch.indexer.IndexWriter not found.
at org.apache.nutch.indexer.IndexWriters.<init>(IndexWriters.java:51)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:100)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:185)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:55)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:38)
at org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:225)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)`
have verified that the class exists in the apache-nutch-1.7.jar file.
And if I request hadoop to run as a map-reduce job in the command shell as follows::
`hadoop jar apache-nutch-1.7.jar org.apache.nutch.indexer.IndexingJob -D solr.server.url=http://ip-redacted:8983/solr/raw hdfs://ip-redacted:8020/user/admin/c -dir hdfs://ip-redacted:8020/user/admin/s000`
It works!! But, when I do it as a oozie job, created through Hue, it fails...
Also, other actions, like inject, generate, fetch, parse work fine in Hue. It's only solrindex step that fails and I don't know what to do to fix it. Any input on this will be great!
Did you put the Nutch jar (and dependencies if needed) in a 'lib' directory in the HDFS workspace of the workflow?
Ah, I'm beginning to loathe the packaging of Nutch!
Try extracting the classes/plugins folder from the job archive, copy it to HDFS (something like hdfs dfs -put -r plugins lib) and then add the HDFS path of the plugins folder to the "files" list of the indexing step.
Best,
Edoardo