I am trying to run an oozie job for Word Count MapReduce job but getting a blank output file. Text file resides in '/word' directory of HDFS and jar file in '/map-reduce/lib'. I am running below command to execute oozie job:
oozie job -oozie http://localhost:11000/oozie -config map-reduce/job.properties -run
**My workflow.xml:**
<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
<start to="mr-node"/>
<action name="mr-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="{nameNode}/word_dir"></delete>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
<property>
<name>mapred.mapper.class</name>
<value>MyMap</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>MyReduce</value>
</property>
<property>
<name>mapred.output.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapred.output.value.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>/word</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/word_dir</value>
</property>
</configuration>
</map-reduce>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
and job.properties:
nameNode=hdfs://quickstart.cloudera:8020
jobTracker=localhost:8032
oozie.wf.application.path=${nameNode}/map-reduce
Please help.
Related
I am trying to run an insert query and face following error using mapreduce
Application application_1609169302439_0001 failed 2 times due to AM
Container for appattempt_1609169302439_0001_000002 exited with
exitCode: 1 Failing this attempt.Diagnostics: [2020-12-28
16:29:05.332]Exception from container-launch. Container id:
container_1609169302439_0001_02_000001 Exit code: 1 [2020-12-28
16:29:05.335]Container exited with a non-zero exit code 1. Error file:
prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of
stderr : Error: Could not find or load main class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster Please check whether
your <HADOOP_HOME>/etc/hadoop/mapred-site.xml contains the below
configuration: yarn.app.mapreduce.am.env
HADOOP_MAPRED_HOME=${full path of your hadoop distribution
directory}
mapreduce.map.env HADOOP_MAPRED_HOME=${full path
of your hadoop distribution directory}
mapreduce.reduce.env HADOOP_MAPRED_HOME=${full
path of your hadoop distribution directory}
while looking at my mapred-site.xml config file
<configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>localhost:54311</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_MAPRED_HOME</value>
</property>
</configuration>
My understanding is that it is a configuration issue but cannot find any clear and simple answer what is missing on my sytem.
I had installed tez before but it wasn't working either.
Any help or guidance would be apreciated. I browsed site and could find similar issues reported but wasn't able to fix mine based on solution provided.
Best
After some research, I updated my configuration and I am now able to run mapreduce job from hive cli
Here's updated part of mapred-site.xml
> <property> <name>yarn.app.mapreduce.am.env</name>
> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> <property>
> <name>mapreduce.map.env</name>
> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> <property>
> <name>mapreduce.reduce.env</name>
> <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> </property> <property>
> <name>mapreduce.application.classpath</name>
> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/common/*,$HADOOP_MAPRED_HOME/share/hadoop/common/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/yarn/*,$HADOOP_MAPRED_HOME/share/hadoop/yarn/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/hdfs/*,$HADOOP_MAPRED_HOME/share/hadoop/hdfs/lib/*</value>
> </property>
yarn-site.xml also contains the path to jars.
<property>
<name>yarn.application.classpath</name>
<value>
%HADOOP_HOME%\etc\hadoop,
%HADOOP_HOME%\share\hadoop\common\*,
%HADOOP_HOME%\share\hadoop\common\lib\*,
%HADOOP_HOME%\share\hadoop\hdfs\*,
%HADOOP_HOME%\share\hadoop\hdfs\lib\*,
%HADOOP_HOME%\share\hadoop\mapreduce\*,
%HADOOP_HOME%\share\hadoop\mapreduce\lib\*,
%HADOOP_HOME%\share\hadoop\yarn\*,
%HADOOP_HOME%\share\hadoop\yarn\lib\*
</value>
</property>
After these updated and restarting the cluster, I am able to run jobs and follow their progress on port 8088
Best
I am trying to test oozie shell action in my cloudera vm (quickstart vm). When running a simple hdfs command (hadoop fs -put ...) script its working but when I am triggering a hive script the oozie job is finished with status "KILLED". On oozie consol only error message I am getting is
"Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]"
While the underlying job in history server(name node logs) is coming as SUCCEEDED. Below are oozie job details :
workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.5" name="WorkFlow1">
<start to="shell-node" />
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${myscript}</exec>
<file>${myscriptpath}#${myscript}</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}] </message>
</kill>
<end name="end" />
</workflow-app>
------------------------------------
job.properties
nameNode=hdfs://quickstart.cloudera:8020
jobTracker=hdfs://quickstart.cloudera:8032
queueName=default
myscript=test.sh
myscriptpath=${nameNode}/oozie/sl/test.sh
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/oozie/sl/
workflowAppUri=${nameNode}/oozie/sl/
-----------------------------------------------
test.sh
hive -e "create table test2 as select * from test"
Would really appreciate if anyone can point me in direction I am getting it wrong.
It would be good if you have a look into the Oozie Hive action.
Its pretty easy to configure. Hive action will take care of setting everything.
https://oozie.apache.org/docs/4.3.0/DG_HiveActionExtension.html
To connect hive , you need to explicitly add the hive-site.xml or the Hive server details for it to connect.
I'm trying to run a pig script by triggering it through oozie. Here is the workflow.xml, job.properties and error message. Please help me to solve the issue. I am using BigInsight VM to run this.
workflow.xml
<workflow-app name="PigApp" xmlns="uri:oozie:workflow:0.1">
<start to="PigAction"/>
<action name="PigAction">
<pig>
<job-tracker>${jobtracker}</job-tracker>
<name-node>${namenode}</name-node>
<prepare></prepare>
<configuration>
<property>
<name>oozie.action.external.stats.write</name>
<value>true</value>
</property>
<property>
<name>oozie.action.sharelib.for.pig</name>
<value>pig</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m -Xms1000m -Xmn100m</value>
</property>
</configuration>
</pig>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Error message[${wf:errorMessage()}]</message>
</kill>
<end name="end"/>
</workflow-app>
Job.properties
#JobTracker and NodeName
jobtracker=bivm:9001
namenode=bivm:9000
#HDFS path where you need to copy workflow.xml and lib/*.jar to
oozie.wf.application.path=hdfs://bivm:9000/user/biadmin/oozieWF/
oozie.libpath=hdfs://bivm:9000/user/biadmin/oozieWF/lib
oozie.use.system.libpath=true
oozie.action.sharelib.for.pig=pig
wf_path=hdfs://bivm:9000/user/biadmin/oozieWF/
#one of the values from Hadoop mapred.queue.names
queueName=default
enter code here
Error Message:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain], main() threw exception, jline.ConsoleReaderInputStream
java.lang.NoClassDefFoundError: jline.ConsoleReaderInputStream
at org.apache.pig.PigRunner.run(PigRunner.java:49)
at org.apache.oozie.action.hadoop.PigMain.runPigJob(PigMain.java:283)
at org.apache.oozie.action.hadoop.PigMain.run(PigMain.java:219)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
at org.apache.oozie.action.hadoop.PigMain.main(PigMain.java:76)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:619)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(AccessController.java:366)
at javax.security.auth.Subject.doAs(Subject.java:572)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.ClassNotFoundException: jline.ConsoleReaderInputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:665)
at java.lang.ClassLoader.loadClassHelper(ClassLoader.java:942)
at java.lang.ClassLoader.loadClass(ClassLoader.java:851)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:827)
... 18 more
If it is problem related to pig jar then specify the version on link to download. I'm using pig 0.12.0 jar.
I'm using Hadoop 2.5.1 with HBase 0.98.11 on Ubuntu 14.04
I could run it in Pseudo-distributed mode. Now that I want to run on distributed mode. I follow the instruction from sites and end up having an error in RUNTIME called "Error: org/apache/hadoop/hbase/HBaseConfiguration" (while there is no error when I compile the code).
After trying things, I found that if I comment the mapreduce.framework.name in mapred-site.xml and also stuffs in yarn-site, I could be able to run the hadoop successfully.
But I think it's the single-node running (I have no idea, just guessing by comparing the running time to what I ran in Pseudo and there is no MR in slave's node jps when running the job on master).
Here are some of my conf:
hdfs-site
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<!-- <property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>-->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>false</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
mapred-site
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<!--<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>-->
yarn-site
<!-- Site specific YARN configuration properties -->
<!--<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>10.1.1.177:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>10.1.1.177:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>10.1.1.177:8031</value>
</property>-->
Thank you so much for every help
UPDATE: I try making some changes to the yarn-site by adding yarn.applicaton.classpath like this
https://dl-web.dropbox.com/get/Public/yarn.png?_subject_uid=51053996&w=AABeDJfRp_D31RiVHqBWn0r9naQR_lFVJXIlwvCwjdhCAQ
The error changed to EXIT CODE.
https://dl-web.dropbox.com/get/Public/exitcode.jpg?_subject_uid=51053996&w=AAAQ-bYoRSrQV3yFq36vEDPnAB9aIHnyOQfnvt2cUHn5IQ
UPDATE2: In syslog of the application logs it says
2015-04-24 20:34:59,164 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1429792550440_0035_000002
2015-04-24 20:34:59,589 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2015-04-24 20:34:59,610 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2015-04-24 20:34:59,616 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoSuchMethodError: org.apache.hadoop.http.HttpConfig.setPolicy(Lorg/apache/hadoop/http/HttpConfig$Policy;)V
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1364)
2015-04-24 20:34:59,621 INFO [Thread-1] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a signal. Signaling RMCommunicator and JobHistoryEventHandler.
Any suggestions pls
I guess that you didn't set up your hadoop cluster correctly please follow these steps :
Hadoop Configuration:
step 1 : edit hadoop-env.sh as following:
# The java implementation to use. Required.
export JAVA_HOME=/usr/lib/jvm/java-6-sun
step 2 : Now create a directory and set the required ownerships and permissions
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
# ...and if you want to tighten up security, chmod from 755 to 750...
$ sudo chmod 750 /app/hadoop/tmp
step 3 : edit core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
step 5 : edit mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
step 6 : edit hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hduser/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hduser/hadoop/hadoopdata/hdfs/datanode</value>
</property>
step 7 : edit yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
finally format your hdfs (You need to do this the first time you set up a Hadoop cluster)
$ /usr/local/hadoop/bin/hadoop namenode -format
Hbase Configuration:
edit you hbase-site.xml:
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:54310/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/hbase/zookeeper</value>
</property>
Hope this helps you
After sticking with the problem for more than 3 days (maybe it's from my misunderstanding the concept), I can fix the problem by adding HADOOP_CLASSPATH (like what I did when setting up the pseudo-distribute in hadoop-env) into the yarn-env.
I have no idea much in detail. But, yeah, hope this may be able to help someone in the future.
Cheers.
I was using Spark on Yarn and was getting the same error. Actually, the spark jar had a internal dependency of hadoop-client and hadoop-mapreduce-client-* jars pointing to older 2.2.0 versions. So, I included these entries in my POM with the Hadoop version that I was running and did a clean build.
This resolved the issue for me. Hope this helps someone.
I am trying to copy data from s3 to hdfs using distcp.
The following is my shell script where i am doing distcp.
mkdir.sh
hadoop distcp s3n://bucket-name/foldername hdfs://localhost:8020/user/hdfs/data/
The above shell script works fine when i am running the script manually.
But when i try to run the same script using oozie workflow distcp fails.
I am trying to run the workflow using shell-action.
The following is my job.properties file:
nameNode=hdfs://ip-172-31-34-170.us-west-2.compute.internal:8020
jobTracker=ip-172-31-34-195.us-west-2.compute.internal:8032
queueName=default
oozie.libpath=${nameNode}/user/oozie/share/lib
user.name=hdfs
oozie.wf.application.path=${nameNode}/user/${user.name}/oozie/
mkdirshellscript=${oozie.wf.application.path}/mkdir.sh
And my workflow.xml is as follows:
<workflow-app name="WorkFlowForShellAction" xmlns="uri:oozie:workflow:0.1">
<start to="shellAction"/>
<action name="shellAction">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="/user/hdfs/hari123"/>
<mkdir path="/user/hdfs/hari123"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${mkdirshellscript}</exec>
<file>${mkdirshellscript}</file>
</shell>
<ok to="end"/>
<error to="killAction"/>
</action>
<kill name="killAction">
<message>"Killed job due to error"</message>
</kill>
<end name="end"/>
</workflow-app>
oozie log is as follows:
2014-09-30 10:31:51,102 INFO org.apache.oozie.servlet.CallbackServlet: SERVER[ec2-54-69-26-119.us-west-2.compute.amazonaws.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000018-140930055823135-oozie-oozi-W] ACTION[0000018-140930055823135-oozie-oozi-W#shellAction] callback for action [0000018-140930055823135-oozie-oozi-W#shellAction]
2014-09-30 10:31:51,337 INFO org.apache.oozie.command.wf.ActionEndXCommand: SERVER[ec2-54-69-26-119.us-west-2.compute.amazonaws.com] USER[hdfs] GROUP[-] TOKEN[] APP[WorkFlowForShellActionWithCaptureOutput] JOB[0000018-140930055823135-oozie-oozi-W] ACTION[0000018-140930055823135-oozie-oozi-W#shellAction] ERROR is considered as FAILED for SLA
I want to do distcp using shell-action but not distcp-action in oozie.
Try with:
<workflow-app name="WorkFlowForShellAction" xmlns="uri:oozie:workflow:0.1">
...
<start to="shellAction"/>
<action name="shellAction">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="/user/hdfs/hari123"/>
<mkdir path="/user/hdfs/hari123"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>./${mkdirshellscript}</exec>
<file>${mkdirshellscript}#${mkdirshellscript}</file>
</shell>
...
</workflow-app>