Queries in hive-0.8.1-cdh4.0.1 that invoke the Reducer results in Task Failed.
The queries having MAPJOIn is working fine but JOIN gives error.
eg:
hive> select count(*) from table1;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
12/10/15 23:07:02 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
12/10/15 23:07:02 WARN conf.Configuration: mapred.system.dir is deprecated. Instead, use mapreduce.jobtracker.system.dir
12/10/15 23:07:02 WARN conf.Configuration: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
12/10/15 23:07:02 WARN conf.HiveConf: hive-site.xml not found on CLASSPATH
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
Execution log at: /tmp/XXXX
/XXXX_20121015230707_c93521d0-4a97-4972-92b9-0fdd3ab42e5f.log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/XXXX/hadoop-2.0.0-cdh4.0.1/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/XXXX/hive-0.8.1-cdh4.0.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See <http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2012-10-15 23:07:04,721 null map = 0%, reduce = 0%
Ended Job = job_local_0001 with errors
Error during job, obtaining debugging information...
**Execution failed with exit status: 2**
Obtaining error information
**Task failed!**
Task ID:
Stage-1
Logs:
/tmp/XXXX/hive.log
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
The log file shows that its due to Java heap space problem.
**java.lang.Exception: java.lang.OutOfMemoryError: Java heap space**
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:912)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
For hadoop 2.0.0 +,
in etc/hadoop/mapred-site.xml
set:
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>1</value>
</property>
It will work
map join will need more memory.
increase your mapreduce jvm memory size in file conf/mapred-site.xml. mapreduce conf
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024m -server</value>
</property>
Related
Currently I'm trying to run our functional tests (about 300 requests) with 10 users in parallel using gatling-plugin
mvn clean test-compile gatling:test -Dkarate.env=test
with the following .mvn/jvm.config local maven options in the project folder:
-d64 -Xmx4g -Xms1g -XshowSettings:vm -Djava.awt.headless=true
At some point while processing some big response in parallel the gatling process is aborted:
[ERROR] Failed to execute goal io.gatling:gatling-maven-plugin:3.0.2:test (default-cli) on project np.rest-testing: Gatling failed.: Process exited with an error: -1 (Exit value: -1) -> [Help 1]
with the following stack trace:
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid25960.hprof ...
Heap dump file created [1611661680 bytes in 18.184 secs]
Uncaught error from thread [GatlingSystem-scheduler-1]: Java heap space, shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[GatlingSystem]
java.lang.OutOfMemoryError: Java heap space
at akka.actor.LightArrayRevolverScheduler$$anon$3.nextTick(LightArrayRevolverScheduler.scala:269)
at akka.actor.LightArrayRevolverScheduler$$anon$3.run(LightArrayRevolverScheduler.scala:235)
at java.lang.Thread.run(Thread.java:748)
I have tried to increase heap space to 10 GB (-Xmx10g) in different ways:
via environment property MAVEN_OPTS=-Xmx10g
via local project maven options .mvn/jvm.config
via maven-surefire-plugin configuration as suggested here
Although 10GB is allocated for maven process as you can see at the start of maven process:
VM settings:
Min. Heap Size: 1.00G
Max. Heap Size: 10.00G
Ergonomics Machine Class: client
Using VM: Java HotSpot(TM) 64-Bit Server VM
but the OutOfMemoryError is still thrown during each gatling-plugin execution.
When analyzing each heap dump eclipse memory analyzer indicates always the same results:
84 instances of "com.intuit.karate.core.StepResult", loaded by "sun.misc.Launcher$AppClassLoader # 0xc0000000" occupy 954 286 864 (90,44 %) bytes.
Biggest instances:
•com.intuit.karate.core.StepResult # 0xfb93ced8 - 87 239 976 (8,27 %) bytes...
What can be done to reduce the heap space usage and prevent OutOfMemoryError?
Can someone share some thoughts and experience?
After some investigations I've finally noticed, that heap dump shows always 1GB. That means the increased heap space is not used by gatling-plugin.
By adding the following jvm argument to the plugin, the problem is solved even with 4GB:
<jvmArgs>
<jvmArg>-Xmx4g</jvmArg>
</jvmArgs>
So, with the following gatling-plugin configuration the error doesn't appear any more:
<plugin>
<groupId>io.gatling</groupId>
<artifactId>gatling-maven-plugin</artifactId>
<version>${gatling.plugin.version}</version>
<configuration>
<simulationsFolder>src/test/java</simulationsFolder>
<includes>
<include>performance.test.workflow.WorkflowSimulation</include>
</includes>
<compilerJvmArgs>
<compilerJvmArg>-Xmx512m</compilerJvmArg>
</compilerJvmArgs>
<jvmArgs>
<jvmArg>-Xmx4g</jvmArg>
</jvmArgs>
</configuration>
</plugin>
You can try this
<configuration>
<meminitial>1024m</meminitial>
<maxmem>4096m</maxmem>
</configuration>
I have created a view of hbase in hive with 10 miliion rows and when i am running below query ,distcp is invoked and it throws below error.
INSERT OVERWRITE DIRECTORY '/mapred/INPUT' select hive_cdper1.cid,hive_cdper1.emptyp,hive_cdper1.ethtyp,hive_cdper1.gdtyp,hive_cdseg.mrtl from hive_cdper1 join hive_cdseg on hive_cdper1.cnm=hive_cdseg.cnm;
Output:map 100% reduce 100%
2016-10-17 15:05:34,688 INFO [main]: exec.Task (SessionState.java:printInfo(951)) - Moving data to: /mapred/INPUT
from
hdfs://mycluster/mapred/INPUT/.hive-staging_hive_2016-10-17_14-57-48_620_6609613978089243090-1/-ext-10000 2016-10-17 15:05:34,693 INFO [main]: common.FileUtils
(FileUtils.java:copy(551)) - Source is 483335659 bytes. (MAX: 4000000)
2016-10-17 15:05:34,693 INFO [main]: common.FileUtils
(FileUtils.java:copy(552)) - Launch distributed copy (distcp) job.
2016-10-17 15:05:34,695 ERROR [main]: exec.Task
(SessionState.java:printError(960)) - Failed with exception Unable to
move source
hdfs://mycluster/mapred/INPUT/.hive-staging_hive_2016-10-17_14-57-48_620_6609613978089243090-1/-ext-10000 to destination /mapred/INPUT
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move
source
hdfs://mycluster/mapred/INPUT/.hive-staging_hive_2016-10-17_14-57-48_620_6609613978089243090-1/-ext-10000 to destination /mapred/INPUT
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:105)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:222)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.io.IOException: Cannot get DistCp constructor:
org.apache.hadoop.tools.DistCp.()
at org.apache.hadoop.hive.shims.Hadoop23Shims.runDistCp(Hadoop23Shims.java:1160)
at org.apache.hadoop.hive.common.FileUtils.copy(FileUtils.java:553)
at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2622)
... 21 more
What i wonder here is:i am writing to the same cluster ,then why it is invoking distcp instead of normal cp.
Here i am using hive 1.2.1 with hadoop 2.7.2 and my cluster name is mycluster.
Note:i have tried setting hive.exec.copyfile.maxsize=4000000 but didnt work.
Appreciate your suggestions..
1) check permission of your destination path /mapred/INPUT
2) If write permission is not there for other user, then hadoop fs -chmod a+w /mapred/INPUT
Setting below properties in hive-site.xml solved my issue.
<property>
<name>hive.exec.copyfile.maxsize</name>
<value>3355443200</value>
<description>Maximum file size (in Mb) that Hive uses to do single HDFS copies between directories.Distributed copies (distcp) will be used instead for bigger files so that copies can be done faster.</description>
</property>
I tried to schedule the hive workflow xml file to run the hive script in tez mode by passing the hadoop properties for referring the tez jar files in workflow xml file as shown below.
<hive xmlns="uri:oozie:hive-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>hive-site.xml</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>tez.lib.uris</name>
<value>${nameNode}/apps/Tez/,${nameNode}/apps/Tez/lib/</value>
</property>
</configuration>
Also I had changed the hive-site xml file property hive.execution.engine as tez mode.
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
When I scheduled the workflow using oozie. I got the error as follows
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], main() threw exception, org/apache/tez/dag/api/SessionNotRunning
java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:479)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:306)
at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:290)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.ClassNotFoundException: org.apache.tez.dag.api.SessionNotRunning
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 24 more
Can any one please say how to rectify this issue and to schedule my workflow xml file and run the hive script in tez mode.
I observed the above error before and I was able to resolve it and was able to run Hive(tez engine) on Oozie.
Here's the steps I followed.
Class not found error:
As the error says, Oozie Launcher container cannot find the SessionNotRunning class.
This class is part of tez-api-0.x.x.jar. you can confirm that using :
jar tvf /usr/lib/tez/tez-api-0.7.0.jar | grep SessionNotRunning
You need to make sure your Oozie launcher container(which is YARN) localize this and other TEZ JAR's for it to pass it to HiveClient.
The expectation is that if we include the following config property in the workflow.xml , oozie should pick up all those JAR's.
<property>
<name>tez.lib.uris</name>
<value>hdfs:///apps/tez/,hdfs:///apps/tez/lib/</value>
</property>
However, it may not do that.(Not sure why)
So, I copied all TEZ JAR's to Hive-action's share library in HDFS (Ex: to /user/oozie/share/lib/lib_20160405125827/hive/). The oozie hive-action on your workflow should use JAR's present in that path and localize those JAR's.
While doing that, make sure the new JAR's have same permission as previous JAR's present in that HDFS directory. Oozie also need a refresh of share library.
Example commands can be:
hadoop fs -copyFromLocal /usr/lib/tez/*.jar /user/oozie/share/lib/lib_20160405125827/hive/
hadoop fs -copyFromLocal /usr/lib/tez/lib/*.jar /user/oozie/share/lib/lib_20160405125827/hive/
hadoop fs -chown oozie:oozie /user/oozie/share/lib/lib_20160405125827/hive/*.jar
oozie admin -sharelibupdate
Now, if you list your hive share library, oozie admin -shareliblist hive , you should be able to see all TEZ libraries.
With those steps, you should no longer see NoClassDefFoundError's or ClassNotFoundException's from TEZ jars.
Missing Hadoop Dependencies:
At This time, the TEZ job should be submitted, but there's another error that you may encounter on the OOZIE launcher .
14972 [uber-SubtaskRunner] ERROR org.apache.hadoop.hive.ql.exec.Task - Failed to execute tez graph.
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1459860815404_0033 failed 2 times due to AM Container for appattempt_1459860815404_0033_000002 exited with exitCode: 1
looking at container logs, I see
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/service/AbstractService
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.service.AbstractService
This is because the my TEZ installation is minimal and doesn't come with hadoop dependencies.
https://github.com/apache/tez/blob/release-0.7.0/docs/src/site/markdown/install.md#hadoop-installation-dependent-installdeploy-instructions
So, you need to tell TEZ to use your cluster's hadoop libraries using the following property in your workflow.xml.
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
So, with the above steps , I was able to run a hive script successfully on TEZ engine via Oozie.
I am trying to get a Spark/Shark cluster up but keep running into the same problem. I have followed the instructions on https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster and addressed Hive as stated.
Here are the details, any help would be great.
I already installed the following package:
Spark/Shark 1.0.0
Apache Hadoop 2.4.0
Apache Hive 0.13
Scala 2.9.3
Java 7
I configure ~/spark/conf/spark-env.sh as:
export HADOOP_HOME=/path/to/hadoop/
export HIVE_HOME=/path/to/hive/
export MASTER=spark://xxx.xxx.xxx.xxx:7077
export SPARK_HOME=/path/to/spark
export SPARK_MEM=4g
export HIVE_CONF_DIR=/path/to/hive/conf/
source $SPARK_HOME/conf/spark-env.sh
When start spark with "./spark-withinfo", I get the following errors:
-hiveconf hive.root.logger=INFO,console
Starting the Shark Command Line Client
14/07/07 16:26:57 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
14/07/07 16:26:57 [main]: WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
Logging initialized using configuration in jar:file:/path/to/hive/lib/hive-exec-0.13.0.jar!/hive-log4j.properties
14/07/07 16:26:57 [main]: INFO SessionState:
Logging initialized using configuration in jar:file:/path/to/hive/lib/hive-exec-0.13.0.jar!/hive-log4j.properties
14/07/07 16:26:57 [main]: INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:344)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:128)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1139)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:51)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2444)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2456)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:338)
... 2 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1137)
... 7 more
Caused by: java.lang.NoSuchFieldError: METASTOREINTERVAL
at org.apache.hadoop.hive.metastore.RetryingRawStore.init(RetryingRawStore.java:78)
at org.apache.hadoop.hive.metastore.RetryingRawStore.<init>(RetryingRawStore.java:60)
at org.apache.hadoop.hive.metastore.RetryingRawStore.getProxy(RetryingRawStore.java:71)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:413)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:401)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:439)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:325)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:285)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4102)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:121)
... 12 more
I guess Spark can not find some libs ton connect metastore in Hive, but I have been stacked here for a couple days and don't know how to solve it. BTW, I use MYSQL for hive metadata, and everything works well in hive.
Any help is appreciated. Thanks in advance.
You may need to add mysql connector jar file before you start spark...
In my case, I added mysql connector jar like below.
$SPARK_HOME/bin/compute-classpath.sh
CLASSPATH=$CLASSPATH:/opt/big/hive/lib/mysql-connector-java-5.1.25-bin.jar
I try to run latest version of apache giraph examples, describe on the quickstart page (http://giraph.apache.org/quick_start.html). I use CDH 4.4.0 (Cloudera distribution of Hadoop)
I have built Giraph with the dependecies updated to CDH 4.4.0. Everything went ok
When I run the examples I got following output
-bash-4.1$ hadoop jar /usr/local/giraph/giraph-examples/target/giraph-examples-1.1.0- SNAPSHOT-for-hadoop-2.0.0-cdh4.4.0-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimpleShortestPathsComputation
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/hdfs/input/tiny_graph.txt
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat
-op /user/hdfs/output/shortestpaths -w 1
13/10/02 18:31:58 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
13/10/02 18:31:58 INFO utils.ConfigurationUtils: No edge output format specified. Ensure your OutputFormat does not require one.
13/10/02 18:31:58 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
13/10/02 18:31:58 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/10/02 18:32:00 INFO job.GiraphJob: run: Tracking URL: http://hadoop57:50030/jobdetails.jsp?jobid=job_201310021452_0015
13/10/02 18:32:22 INFO mapred.JobClient: Running job: job_201310021452_0015
13/10/02 18:32:22 INFO mapred.JobClient: Job complete: job_201310021452_0015
13/10/02 18:32:22 INFO mapred.JobClient: Counters: 6
13/10/02 18:32:22 INFO mapred.JobClient: Job Counters
13/10/02 18:32:22 INFO mapred.JobClient: Failed map tasks=1
13/10/02 18:32:22 INFO mapred.JobClient: Launched map tasks=2
13/10/02 18:32:22 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=29054
13/10/02 18:32:22 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
13/10/02 18:32:22 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/10/02 18:32:22 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
and the job log shows exception:
java.lang.IllegalStateException: run: Caught an unrecoverable exception
java.io.FileNotFoundException: File
_bsp/_defaultZkManagerDir/job_201310021452_0015/_zkServer does not exist.
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File
_bsp/_defaultZkManagerDir/job_201310021452_0015/_zkServer does not exist.
at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:792)
at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java
The file _bsp/_defaultZkManagerDir/job_201310021452_0015/_zkServer sometimes gets created and sometimes not.
Could you please give any hints where to start hunting for this issue.
BR
Konrad
Looks like Giraph is starting it's own zookeeper session. Just try passing the following as a VM argument to the GiraphRunner.
-Dgiraph.zkList=<zookeeper server address>:<port>
e.g.
-Dgiraph.zkList=localhost:2181
Your command will look something like this:
-bash-4.1$ hadoop jar /usr/local/giraph/giraph-examples/target/giraph-examples-1.1.0- SNAPSHOT-for-hadoop-2.0.0-cdh4.4.0-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimpleShortestPathsComputation
-Dgiraph.zkList=localhost:2181
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/hdfs/input/tiny_graph.txt
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat
-op /user/hdfs/output/shortestpaths -w 1
Best luck..!!