My task is to create an oozie workflow to Load Data to Hive tables every hour.
I am using CDH 5.7 in virtualbox
When i run the hive script which contains LOAD DATA INPATH '/sqoop_import_increment' INTO TABLE customer; it works perfectly, data gets loaded to the hive table.
But When i run the same script on oozie workflow the job get killed at 66% and the error message is Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]
Note: but hive script for create table works perfectly with oozie workflow.
plz help.
hive script:
use test;
create external table if not exists customer(customer_id int,name string,address string)row format delimited fields terminated by ',';
load data inpath /sqoop_import_increment into table customer;
workflow.xml:
<workflow-app name="hive_script" xmlns="uri:oozie:workflow:0.5">
<start to="hive-4327"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="hive-4327" cred="hcat">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>lib/hive-config.xml</job-xml>
<script>lib/impala-script.hql</script>
</hive>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
job.properties:
oozie.use.system.libpath=True
security_enabled=False
dryrun=False
jobTracker=localhost:8032
nameNode=hdfs://quickstart.cloudera:8020
hive-config.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- Hive Configuration can either be stored in this file or in the hadoop configuration files -->
<!-- that are implied by Hadoop setup variables. -->
<!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive -->
<!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->
<!-- resource). -->
<!-- Hive Execution Parameters -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>cloudera</value>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>/usr/lib/hive/lib/hive-hwi-0.8.1-cdh4.0.0.jar</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
</configuration>
The last time I ran into this problem, it turned out that the hive client was not installed on all data nodes.
When you run the hive query manually, you presumably do it from a node that has the hive client installed.But when oozie is asked to run the query, it will do so from a random data node. As such you will need to setup the hive client on all data nodes.
This assumes that you are not able to let oozie run hive queries in general (and don't have any specific issues with this particular command).
Related
When I run:
hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10000 --hiveconf hive.root.logger=INFO,console
It shows
Starting HiveServer2
and nothing listens on port 10000 and 10001
The HiveServer2 service does not output error information, causing it hard to diagnostic the problem. You can try to start the metastore service provided by Hive, which listens on port 9083 and might give some information when your configuration is not properly set:
hive --service metastore # not detach from terminal to see logs
In my case, this service cannot be started, with error message:
MetaException(message:Hive Schema version 3.1.0 does not match metastore's schema
version 1.2.0 Metastoed or corrupt)
One of the direct solution to resolve this error is to ignore the version difference by setting the hive-site.xml if there is only one hive version in your machine (another solution is to modify the metastore_db version):
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
After this problem is resolved, the HiveServer2 service can be running and listening on port 10000.
hive --service hiveserver2 > /dev/null 2>&1 &
If your HiveServer2 access metastore via Derby or MySQL JDBC driver, then the aforementioned metastore service is not needed for HiveServer2. However, if HiveServer2 access metastore via thrift protocol, as configed in conf/hive-site.xml like
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop-master:9083</value>
<description>
Thrift URI for the remote metastore.
Used by metastore client to connect to remote metastore.
</description>
</property>
Then, the metastore service must be started at first.
I had a hard time to set up hive-3.1.2. I write this maybe it helps someone out. in order to diagnose the problem first try to launch metastore and hiveserver2 like this:
metastore:
hive --service metastore --hiveconf hive.root.logger=INFO,console
hiveserver2:
hive --service hiveserver2 --hiveconf hive.server2.thrift.port=10000 --hiveconf hive.root.logger=INFO,console
then carefully read the the exceptions were thrown.
my problem was user hive is not allowed to perform this api call
and to solve that I added the following property to hive-site.xml:
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
<description>
Should metastore do authorization against database notification related APIs such as get_next_notification.
If set to true, then only the superusers in proxy settings have the permission
</description>
</property>
also I add my full hive-site.xml as a sample:
<configuration>
<property>
<name>datanucleus.schema.autoCreateTables</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://server-2:3306/metastore?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>mysql_username</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>mysql_password</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://server-2:9083</value>
</property>
<property>
<name>atanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>server-2</value>
</property>
<property>
<name>hive.server2.transport.mode</name>
<value>binary</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
</configuration>
Thanks. There is typo. It should hive.metastore not as shown below.
**metastore**.metastore.event.db.notification.api.auth
false
1 I have created jar with custom UDF function and copied jar into dynamic.jar.dir so when I use my UDF function as part of SELECT I getting result without issues.
2 But when function is a part of WHERE clause I getting error that class of my custom function is not found.
select PK FROM "my.custom.view" where MY_FUN(ARRAY["COLF"."COL1"], 'SOMEPARAM') limit 1;
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: BooleanExpressionFilter failed during reading: java.lang.ClassNotFoundException: com.myCompany.phoenix.MyCustomFunction
at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:96)
at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)
at org.apache.phoenix.filter.BooleanExpressionFilter.readFields(BooleanExpressionFilter.java:109)
at org.apache.phoenix.filter.SingleKeyValueComparisonFilter.readFields(SingleKeyValueComparisonFilter.java:133)
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:131)
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:101)
at org.apache.phoenix.filter.SingleCQKeyValueComparisonFilter.parseFrom(SingleCQKeyValueComparisonFilter.java:50)
... 16 more
base-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:57000/user/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>21081</value>
</property>
<property>
<name>hbase.client.keyvalue.maxsize</name>
<value>0</value>
</property>
<!-- SEP is basically replication, so enable it -->
<property>
<name>hbase.replication</name>
<value>true</value>
</property>
<property>
<name>hbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily</name>
<value>128</value>
</property>
<property>
<name>hbase.fs.tmp.dir</name>
<value>/tmp/hbase</value>
</property>
<property>
<name>phoenix.functions.allowUserDefinedFunctions</name>
<value>true</value>
</property>
<property>
<name>hbase.dynamic.jars.dir</name>
<value>${hbase.rootdir}/lib/</value>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>
</configuration>
Manually adding jar with:
hdfs dfs -copyFromLocal -f /my.jar hdfs:///user/hbase/lib/my.jar
For function creation using:
CREATE FUNCTION MY_FUN(BINARY[], VARCHAR) RETURNS BOOLEAN as 'com.myCompany.phoenix.MyCustomFunction' using jar 'hdfs://localhost:57000/user/hbase/lib/my.jar';
I ran into something similar when I upgraded to Phoenix 5.0 from 4.7. I got an exception stating that I now needed to place my UDF .jar into /apps/hbase/data/lib due to permission issues. In the old environment I was able to get away using the /apps/hbase/lib directory. Maybe this is happening to you as well but it's not alerting you to the new path change.
I'm totally new to oozie and I'm creating a workflow to run a hive query for simply displaying a table's data from hive using select statement but once i submitting the job its giving the below error.
JA017: Unknown hadoop job [job_local1866275230_0001] associated with action [0000000-150519212325700-oozie-oozi-W#adstest]. Failing this action!
Below is my hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoStartMechanism</name>
<value>SchemaTable</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost.localdomain:9083</value>
</property>
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<value>localhost</value>
</property>
<!-- workaround for https://issues.cloudera.org/browse/IMPALA-1416 -->
<property>
<name>hive.metastore.try.direct.sql</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.try.direct.sql.ddl</name>
<value>false</value>
</property>
Below is the workflow.xml
<workflow-app name="adstest" xmlns="uri:oozie:workflow:0.4">
<start to="adstest"/>
<action name="adstest">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-conf.xml</job-xml>
<script>adstest.hql</script>
<file>hive-conf.xml#hive-conf.xml</file>
</hive>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
I didn't selected any parameter as its just a simple select query for displaying first 20 results from the table.
Let me know if if i have to make any chages in any conf file.
When an Oozie workflow is executed, Oozie checks the status of job and while the job is running Oozie will report the status as running, however after the job completes, it queries the data from history server and if the job id is not find at history server, Oozie fails to get the status and marks the status of the workflow as failed.
However, the workflow may have finished successfully and the output will be available. Resource Manager will also report the status of the application executed as FINISHED / SUCCEEDED.
Ensure that the below 2 parameters are same across all the nodes:
mapreduce.jobhistory.intermediate-done-dir
mapreduce.jobhistory.done-dir
Restart YARN services and History Server. Please refer this link for more details. https://support.pivotal.io/hc/en-us/articles/202530283-Oozie-logs-report-Unknown-hadoop-job-and-history-server-UI-not-populated
I know that this is one of the topic that's asked much. Still after I digged into all of the topics I could find (most of them talking about CLASSPATH), I cant solve mine.
Examples of the topics I found and tried:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
java.lang.NoClassDefFoundError with HBase Scan
I'm using Hadoop 2.5.1 with HBase 0.98.11 on Ubuntu 14.04
I set up pseudo-distributed mode and running hadoop with hbase successfully. After I want to set up the full-distributed mode, jobs fail with NoClassDefFound error. I tried adding "export HADOOP_CLASSPATH=/usr/local/hbase-0.98.11-hadoop2/bin/hbase classpath" into hadoop-env (also yarn-env), still dont work.
One notice I found is if I comment the
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
I can run the jobs SUCCESSFULLY. BUT it seems that I run it on single not multi node.
Here are some of the configs:
mapred-site
<property>
<name>mapred.job.tracker</name>
<value>hadoop1:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>`
hdfs-site
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
yarn-site
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>shuffle service that needs to be set for Map Reduce to run
</description>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
In yarn-env and hadoop-env there is just as default except the HADOOP_CLASSPATH (which doesn't change things even if I add it or not..)
Here is the error trace:
2015-04-25 23:29:25,143 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
at apriori2$FrequentItemsReduce.reduce(apriori2.java:550)
at apriori2$FrequentItemsReduce.reduce(apriori2.java:532)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1651)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Really thanks for every help sir.
With Yarn, you need to set "yarn.application.classpath" property with the classpath for your MapReduce job. "export HADOOP_CLASSPATH" would not work with Yarn.
I am trying to set up hive-0.9.0 in local mode configuration. In /conf, I have created hive-site.xml and specified the property for warehouse folder.
But I think hive is not using my defined location as it is not creating the 'warehouse' folder in that location.
Also, is it necessary to have hadoop cluster running in local mode hive configuration as it throws error when I issue any DDL commands without starting hadoop cluster.
FAILED: Error in metadata: MetaException(message:Got exception: java.net.ConnectException Call to localhost/127.0.0.1:54310 failed on connection exception: java.net.ConnectException: Connection refused)
The contents of hive-site.xml is as follows:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/home/hadoopuser/hive/warehouse</value>
<description>
Local or HDFS directory where Hive keeps table contents.
</description>
</property>
<property>
<name>hive.metastore.local</name>
<value>true</value>
<description>
Use false if a production metastore server is used.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/home/hadoopuser/hive/metastore_db;create=true</value>
<description>
The JDBC connection URL.
</description>
</property>
</configuration>
Hive queries internally runs mapreduce. So hadoop has to be up while you are trying to query Hive.