Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain] - apache-pig

I'm trying to run a pig script by triggering it through oozie. Here is the workflow.xml, job.properties and error message. Please help me to solve the issue. I am using BigInsight VM to run this.
workflow.xml
<workflow-app name="PigApp" xmlns="uri:oozie:workflow:0.1">
<start to="PigAction"/>
<action name="PigAction">
<pig>
<job-tracker>${jobtracker}</job-tracker>
<name-node>${namenode}</name-node>
<prepare></prepare>
<configuration>
<property>
<name>oozie.action.external.stats.write</name>
<value>true</value>
</property>
<property>
<name>oozie.action.sharelib.for.pig</name>
<value>pig</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m -Xms1000m -Xmn100m</value>
</property>
</configuration>
</pig>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Error message[${wf:errorMessage()}]</message>
</kill>
<end name="end"/>
</workflow-app>
Job.properties
#JobTracker and NodeName
jobtracker=bivm:9001
namenode=bivm:9000
#HDFS path where you need to copy workflow.xml and lib/*.jar to
oozie.wf.application.path=hdfs://bivm:9000/user/biadmin/oozieWF/
oozie.libpath=hdfs://bivm:9000/user/biadmin/oozieWF/lib
oozie.use.system.libpath=true
oozie.action.sharelib.for.pig=pig
wf_path=hdfs://bivm:9000/user/biadmin/oozieWF/
#one of the values from Hadoop mapred.queue.names
queueName=default
enter code here
Error Message:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain], main() threw exception, jline.ConsoleReaderInputStream
java.lang.NoClassDefFoundError: jline.ConsoleReaderInputStream
at org.apache.pig.PigRunner.run(PigRunner.java:49)
at org.apache.oozie.action.hadoop.PigMain.runPigJob(PigMain.java:283)
at org.apache.oozie.action.hadoop.PigMain.run(PigMain.java:219)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
at org.apache.oozie.action.hadoop.PigMain.main(PigMain.java:76)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:619)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(AccessController.java:366)
at javax.security.auth.Subject.doAs(Subject.java:572)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.ClassNotFoundException: jline.ConsoleReaderInputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:665)
at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:942)
at java.lang.ClassLoader.loadClass(ClassLoader.java:851)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:827)
... 18 more
If it is problem related to pig jar then specify the version on link to download. I'm using pig 0.12.0 jar.

Related

Oozie shell action failing

I am trying to test oozie shell action in my cloudera vm (quickstart vm). When running a simple hdfs command (hadoop fs -put ...) script its working but when I am triggering a hive script the oozie job is finished with status "KILLED". On oozie consol only error message I am getting is
"Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]"
While the underlying job in history server(name node logs) is coming as SUCCEEDED. Below are oozie job details :
workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.5" name="WorkFlow1">
<start to="shell-node" />
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${myscript}</exec>
<file>${myscriptpath}#${myscript}</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}] </message>
</kill>
<end name="end" />
</workflow-app>
------------------------------------
job.properties
nameNode=hdfs://quickstart.cloudera:8020
jobTracker=hdfs://quickstart.cloudera:8032
queueName=default
myscript=test.sh
myscriptpath=${nameNode}/oozie/sl/test.sh
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/oozie/sl/
workflowAppUri=${nameNode}/oozie/sl/
-----------------------------------------------
test.sh
hive -e "create table test2 as select * from test"
Would really appreciate if anyone can point me in direction I am getting it wrong.
It would be good if you have a look into the Oozie Hive action.
Its pretty easy to configure. Hive action will take care of setting everything.
https://oozie.apache.org/docs/4.3.0/DG_HiveActionExtension.html
To connect hive , you need to explicitly add the hive-site.xml or the Hive server details for it to connect.

Hive action failing with SLF4J error : SLF4J: Class path contains multiple SLF4J bindings

I am trying to create a simple workflow with a hive action. I'm using Cloudera Quickstart VM (CDH 5.12). The following are the components of my workflow:
1) top_n_products.hql
create table instacart.top_n as
(
select * from
(
select row_number() over (order by no_of_times_ordered desc)as num_rank, product_id, product_name, no_of_times_ordered
from
(
select A.product_id, B.product_name, count(*) as no_of_times_ordered from
instacart.order_products__train as A
left outer join
instacart.products as B
on A.product_id=B.product_id
group by A.product_id, B.product_name
)C
)D
where num_rank <= ${N}
);
2) hive-config.xml
I have basically copied the default hive-site.xml from /etc/hive/conf into my workflow workspace folder and renamed it to hive-config.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>cloudera</value>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>/usr/lib/hive/lib/hive-hwi-0.8.1-cdh4.0.0.jar</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
</configuration>
3) Workflow properties
In the hive action, I set the following:
- set HIVE XML, Job XML paths to my hive-config.xml
- Also added hive-config.xml to Files
- In the workflow properties, set the path to my workspace
- Defined the parameter N in my query
Screenshot of my Hive Action properties
When I try to run the workflow it fails, and the stderr throws following error:
Log Type: stderr
Log Upload Time: Mon Nov 20 19:49:04 -0800 2017
Log Length: 2759
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/filecache/130/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Nov 20, 2017 7:47:34 PM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get
WARNING: You are attempting to use a deprecated API (specifically, attempting to #Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information.
Nov 20, 2017 7:47:35 PM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
.
.
.
.
INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest"
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Below are the workflow.xml and job.properties that are generated:
1) Workflow XML:
<workflow-app name="Top_N_Products" xmlns="uri:oozie:workflow:0.5">
<global>
<job-xml>hive-config.xml</job-xml>
</global>
<start to="hive-87ac"/>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="hive-87ac" cred="hcat">
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-config.xml</job-xml>
<script>top_n_products.hql</script>
<param>N={N}</param>
<file>hive-config.xml#hive-config.xml</file>
</hive>
<ok to="End"/>
<error to="Kill"/>
</action>
<end name="End"/>
</workflow-app>
2) job.properties
security_enabled=False
send_email=False
dryrun=False
nameNode=hdfs://quickstart.cloudera:8020
jobTracker=localhost:8032
N=10
Please note that the hive query runs perfectly fine through the Hive query editor. Am I missing something while configuring the workflow? Any help is appreciated!
Thanks,
Deb

Not getting output for oozie job

I am trying to run an oozie job for Word Count MapReduce job but getting a blank output file. Text file resides in '/word' directory of HDFS and jar file in '/map-reduce/lib'. I am running below command to execute oozie job:
oozie job -oozie http://localhost:11000/oozie -config map-reduce/job.properties -run
**My workflow.xml:**
<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
<start to="mr-node"/>
<action name="mr-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="{nameNode}/word_dir"></delete>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
<property>
<name>mapred.mapper.class</name>
<value>MyMap</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>MyReduce</value>
</property>
<property>
<name>mapred.output.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.output.value.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>/word</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/word_dir</value>
</property>
</configuration>
</map-reduce>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
and job.properties:
nameNode=hdfs://quickstart.cloudera:8020
jobTracker=localhost:8032
oozie.wf.application.path=${nameNode}/map-reduce
Please help.

Oozie workflow: Hive action failed because of Tez

Running my script on a data node running the hive client tools is working. But when i schedule the hive script using Oozie than i get the Error as shown below.
I've set the tez.lib.uris in the tez-site.xml to hdfs:///apps/tez/,hdfs:///apps/tez/lib/
What I'm missing here?
Hive script:
USE av_raw;
LOAD DATA INPATH '${INPUT}' INTO TABLE alarms_stg;
INSERT INTO TABLE alarms PARTITION (year, month)
SELECT * FROM alarms_stg WHERE job_id = '${JOBID}';
Workflow action:
<!-- load processed data and store in hive -->
<action name="load-data">
<hive xmlns="uri:oozie:hive-action:0.3">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
<script>load_data.hive</script>
<param>INPUT=${complete}</param>
<param>JOBID=${wf:actionData('stage-data')['hadoopJobs']}</param>
</hive>
<ok to="end"/>
<error to="fail"/>
</action>
Error:
Log Type: stderr
Log Length: 3227
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/grid/5/hadoop/yarn/local/filecache/2418/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
Logging initialized using configuration in file:/grid/2/hadoop/yarn/local/usercache/hdfs/appcache/application_1417175595182_12259/container_1417175595182_12259_01_000002/hive-log4j.properties
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], main() threw exception, org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configurartion
java.lang.RuntimeException: org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configurartion
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:358)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:316)
at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:277)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:38)
at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:225)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez jars, tez.lib.uris is not defined in the configurartion
at org.apache.tez.client.TezClientUtils.setupTezJarsLocalResources(TezClientUtils.java:137)
at org.apache.tez.client.TezSession.start(TezSession.java:105)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:185)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:123)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:356)
... 19 more
Please try to add tez.lib.uris=hdfs:///apps/tez/,hdfs:///apps/tez/lib/ in workflow.xml of your Oozie job
e.g) workflow.xml
<!-- load processed data and store in hive -->
<action name="load-data">
<hive xmlns="uri:oozie:hive-action:0.3">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
<configuration>
<property>
<name>tez.lib.uris</name>
<value>hdfs:///apps/tez/,hdfs:///apps/tez/lib/</value>
</property>
</configuration>
<script>load_data.hive</script>
<param>INPUT=${complete}</param>
<param>JOBID=${wf:actionData('stage-data')['hadoopJobs']}</param>
</hive>
<ok to="end"/>
<error to="fail"/>
</action>
Eventually you can try to add value of "tez.lib.uris" directly in "Workflow Settings" under "Hadoop Properties" .
tez.lib.uris = hdfs:///apps/tez/,hdfs:///apps/tez/lib/
Before you add it verify the correct value in tez-site.xml.

Unable to do distcp from s3 to hdfs using shell-action in oozie

I am trying to copy data from s3 to hdfs using distcp.
The following is my shell script where i am doing distcp.
mkdir.sh
hadoop distcp s3n://bucket-name/foldername hdfs://localhost:8020/user/hdfs/data/
The above shell script works fine when i am running the script manually.
But when i try to run the same script using oozie workflow distcp fails.
I am trying to run the workflow using shell-action.
The following is my job.properties file:
nameNode=hdfs://ip-172-31-34-170.us-west-2.compute.internal:8020
jobTracker=ip-172-31-34-195.us-west-2.compute.internal:8032
queueName=default
oozie.libpath=${nameNode}/user/oozie/share/lib
user.name=hdfs
oozie.wf.application.path=${nameNode}/user/${user.name}/oozie/
mkdirshellscript=${oozie.wf.application.path}/mkdir.sh
And my workflow.xml is as follows:
<workflow-app name="WorkFlowForShellAction" xmlns="uri:oozie:workflow:0.1">
<start to="shellAction"/>
<action name="shellAction">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="/user/hdfs/hari123"/>
<mkdir path="/user/hdfs/hari123"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${mkdirshellscript}</exec>
<file>${mkdirshellscript}</file>
</shell>
<ok to="end"/>
<error to="killAction"/>
</action>
<kill name="killAction">
<message>"Killed job due to error"</message>
</kill>
<end name="end"/>
</workflow-app>
oozie log is as follows:
2014-09-30 10:31:51,102 INFO org.apache.oozie.servlet.CallbackServlet: SERVER[ec2-54-69-26-119.us-west-2.compute.amazonaws.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000018-140930055823135-oozie-oozi-W] ACTION[0000018-140930055823135-oozie-oozi-W#shellAction] callback for action [0000018-140930055823135-oozie-oozi-W#shellAction]
2014-09-30 10:31:51,337 INFO org.apache.oozie.command.wf.ActionEndXCommand: SERVER[ec2-54-69-26-119.us-west-2.compute.amazonaws.com] USER[hdfs] GROUP[-] TOKEN[] APP[WorkFlowForShellActionWithCaptureOutput] JOB[0000018-140930055823135-oozie-oozi-W] ACTION[0000018-140930055823135-oozie-oozi-W#shellAction] ERROR is considered as FAILED for SLA
I want to do distcp using shell-action but not distcp-action in oozie.
Try with:
<workflow-app name="WorkFlowForShellAction" xmlns="uri:oozie:workflow:0.1">
...
<start to="shellAction"/>
<action name="shellAction">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="/user/hdfs/hari123"/>
<mkdir path="/user/hdfs/hari123"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>./${mkdirshellscript}</exec>
<file>${mkdirshellscript}#${mkdirshellscript}</file>
</shell>
...
</workflow-app>