getting error while submitting HIVE query through oozie - hive

I'm totally new to oozie and I'm creating a workflow to run a hive query for simply displaying a table's data from hive using select statement but once i submitting the job its giving the below error.
JA017: Unknown hadoop job [job_local1866275230_0001] associated with action [0000000-150519212325700-oozie-oozi-W#adstest]. Failing this action!
Below is my hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoStartMechanism</name>
<value>SchemaTable</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost.localdomain:9083</value>
</property>
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<value>localhost</value>
</property>
<!-- workaround for https://issues.cloudera.org/browse/IMPALA-1416 -->
<property>
<name>hive.metastore.try.direct.sql</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.try.direct.sql.ddl</name>
<value>false</value>
</property>
Below is the workflow.xml
<workflow-app name="adstest" xmlns="uri:oozie:workflow:0.4">
<start to="adstest"/>
<action name="adstest">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-conf.xml</job-xml>
<script>adstest.hql</script>
<file>hive-conf.xml#hive-conf.xml</file>
</hive>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
I didn't selected any parameter as its just a simple select query for displaying first 20 results from the table.
Let me know if if i have to make any chages in any conf file.

When an Oozie workflow is executed, Oozie checks the status of job and while the job is running Oozie will report the status as running, however after the job completes, it queries the data from history server and if the job id is not find at history server, Oozie fails to get the status and marks the status of the workflow as failed.
However, the workflow may have finished successfully and the output will be available. Resource Manager will also report the status of the application executed as FINISHED / SUCCEEDED.
Ensure that the below 2 parameters are same across all the nodes:
mapreduce.jobhistory.intermediate-done-dir
mapreduce.jobhistory.done-dir
Restart YARN services and History Server. Please refer this link for more details. https://support.pivotal.io/hc/en-us/articles/202530283-Oozie-logs-report-Unknown-hadoop-job-and-history-server-UI-not-populated

Related

Getting 'org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family table does not exist in region hbase:meta'

I'm trying to integrate hive and hbase, but when i create (external) table in hive with hbase handler:
create external table entity_hbase(id bigint, value string, ts bigint, entity_type tinyint)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties ('hbase.columns.mapping'=':key,f1:value,f1:timestamp,f1:entity_type')
tblproperties('hbase.table.name'='entity');
i get this error:
org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException:
Column family table does not exist in region hbase:meta
First of all, i don't understand why error says that column family table (not f1) does not exist. Even if i create table in hbase and then try to create external table in hive i will get the same error.
Before all of this, my steps were:
1. start dfs
2. start yarn
3. start metastore db for hive
4. start metastore service
5. start hbase
6. using hive shell try to create table with hbase handler
hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://localhost:5432/metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>postgres</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value></value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/Users/home/hive/warehouse</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://0.0.0.0:9083</value>
</property>
</configuration>
hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:8020/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/Users/home/hbase/zookeeper</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/3.1.1/hdfs/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
Hadoop version: 3.1.1
Hive version: 3.1.1
Hbase version: 1.2.9

Apache Phoenix UDF not working on server side

1 I have created jar with custom UDF function and copied jar into dynamic.jar.dir so when I use my UDF function as part of SELECT I getting result without issues.
2 But when function is a part of WHERE clause I getting error that class of my custom function is not found.
select PK FROM "my.custom.view" where MY_FUN(ARRAY["COLF"."COL1"], 'SOMEPARAM') limit 1;
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: BooleanExpressionFilter failed during reading: java.lang.ClassNotFoundException: com.myCompany.phoenix.MyCustomFunction
at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:96)
at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)
at org.apache.phoenix.filter.BooleanExpressionFilter.readFields(BooleanExpressionFilter.java:109)
at org.apache.phoenix.filter.SingleKeyValueComparisonFilter.readFields(SingleKeyValueComparisonFilter.java:133)
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:131)
at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:101)
at org.apache.phoenix.filter.SingleCQKeyValueComparisonFilter.parseFrom(SingleCQKeyValueComparisonFilter.java:50)
... 16 more
base-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:57000/user/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>21081</value>
</property>
<property>
<name>hbase.client.keyvalue.maxsize</name>
<value>0</value>
</property>
<!-- SEP is basically replication, so enable it -->
<property>
<name>hbase.replication</name>
<value>true</value>
</property>
<property>
<name>hbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily</name>
<value>128</value>
</property>
<property>
<name>hbase.fs.tmp.dir</name>
<value>/tmp/hbase</value>
</property>
<property>
<name>phoenix.functions.allowUserDefinedFunctions</name>
<value>true</value>
</property>
<property>
<name>hbase.dynamic.jars.dir</name>
<value>${hbase.rootdir}/lib/</value>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
</property>
</configuration>
Manually adding jar with:
hdfs dfs -copyFromLocal -f /my.jar hdfs:///user/hbase/lib/my.jar
For function creation using:
CREATE FUNCTION MY_FUN(BINARY[], VARCHAR) RETURNS BOOLEAN as 'com.myCompany.phoenix.MyCustomFunction' using jar 'hdfs://localhost:57000/user/hbase/lib/my.jar';
I ran into something similar when I upgraded to Phoenix 5.0 from 4.7. I got an exception stating that I now needed to place my UDF .jar into /apps/hbase/data/lib due to permission issues. In the old environment I was able to get away using the /apps/hbase/lib directory. Maybe this is happening to you as well but it's not alerting you to the new path change.

Hadoop 2.x cluster node manager not getting started in slave nodes

i am trying to setup a multi node hadoop 2.x cluster in Virtual machine,after the setup and configurations,while i try to start the cluster,node manager is not getting started in slave nodes,all other daemons are get started in the cluster,can anyone help me to resolve this issue.
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hduser/hadoop-2.6.0/data/nnode</value>
</property>
<property>
<name>dfs.datanode.name.dir</name>
<value>/home/hduser/hadop-2.6.0/data/dnode</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/home/hduser/exclude</value>
</property>
<property>
<name>dfs.hosts.include</name>
<value>/home/hduser/include</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.nodes.include-path</name>
<value>/home/hduser/include</value>
</property>
<property>
<name>yarn.resourcemanager.nodes.exclude-path</name>
<value>/home/hduser/exclude</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
</configuration>
I see that in your configuration:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
Node Manager is a component of YARN, on startup, this component registers with the RM. If "master" is not resolved to a valid ip address the Node Manager can't contact the resource manager.
Status is not pulled by the Resource Manager, instead is pushed by the datanodes to RM.
I think you should investigate this.

test to see if HiveServer2 metastore is working correctly

I recently upgraded our cluster's HiveServer to HiveServer2. I also set up the Hive Metastore (in remote mode) and moved away from embedded mode (which we were previously running).
I want to test that things are properly configured and that the metatdata is acutally being stored in the remote metastore. What would be the easiest way to do this? Are their certain logs I could check to verify this behavior?
I am afraid things are not configured correctly, and I am still running my metastore in local mode, as when I query the postgresql database on the machine hosting the metastore, there are no rows in the metastore DB (despite the fact that I have created test tables through beeline).
It might be worth mentioning is that the end goal of this is to be able to query data stored in HDFS via SparkSQL. Do I need HiveServer2 to accomplish this? Apologies, I am new to a lot of this technology.
Here is my hive-site.xml:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://w7/metastore</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://w7:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.warehouse.subdir.inherit.perms</name>
<value>true</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<value>mn</value>
</property>
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>mn</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hive.zookeeper.namespace</name>
<value>hive_zookeeper_namespace_hive</value>
</property>
<property>
<name>hive.cluster.delegation.token.store.class</name>
<value>org.apache.hadoop.hive.thrift.MemoryTokenStore</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<property>
<name>hive.server2.use.SSL</name>
<value>false</value>
</property>
<property>
<name>hive.support.concurrency</name>
<description>Enable Hive's Table Lock Manager Service</description>
<value>true</value>
</property>
</configuration>

Hive script containing load data inpath not working in oozie

My task is to create an oozie workflow to Load Data to Hive tables every hour.
I am using CDH 5.7 in virtualbox
When i run the hive script which contains LOAD DATA INPATH '/sqoop_import_increment' INTO TABLE customer; it works perfectly, data gets loaded to the hive table.
But When i run the same script on oozie workflow the job get killed at 66% and the error message is Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10001]
Note: but hive script for create table works perfectly with oozie workflow.
plz help.
hive script:
use test;
create external table if not exists customer(customer_id int,name string,address string)row format delimited fields terminated by ',';
load data inpath /sqoop_import_increment into table customer;
workflow.xml:
<workflow-app name="hive_script" xmlns="uri:oozie:workflow:0.5">
<start to="hive-4327"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="hive-4327" cred="hcat">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>lib/hive-config.xml</job-xml>
<script>lib/impala-script.hql</script>
</hive>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
job.properties:
oozie.use.system.libpath=True
security_enabled=False
dryrun=False
jobTracker=localhost:8032
nameNode=hdfs://quickstart.cloudera:8020
hive-config.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- Hive Configuration can either be stored in this file or in the hadoop configuration files -->
<!-- that are implied by Hadoop setup variables. -->
<!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive -->
<!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->
<!-- resource). -->
<!-- Hive Execution Parameters -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>cloudera</value>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>/usr/lib/hive/lib/hive-hwi-0.8.1-cdh4.0.0.jar</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
</configuration>
The last time I ran into this problem, it turned out that the hive client was not installed on all data nodes.
When you run the hive query manually, you presumably do it from a node that has the hive client installed.But when oozie is asked to run the query, it will do so from a random data node. As such you will need to setup the hive client on all data nodes.
This assumes that you are not able to let oozie run hive queries in general (and don't have any specific issues with this particular command).