Amazon Elastic MapReduce: Exception from FileSystem - amazon-s3

I run my application using ruby client:
ruby elastic-mapreduce -j j-20PEKMT9BRSUC --jar s3n://sakae55/lib/edu.cit.som.jar --main-class edu.cit.som.hadoop.SOMDriver --arg s3n://sakae55/repository/input/ecoli/ --arg s3n://sakae55/repository/output/ecoli/pl/ --arg s3n://sakae55/repository/data/ecoli/som.txt
Then, I am seeing the following error:
java.lang.IllegalArgumentException: This file system object (file:///) does not support access to the request path 'hdfs://i
-10-195-207-230.ec2.internal:9000/mnt/var/lib/hadoop/tmp/mapred/system/job_201004221221_0017/job.jar' You possibly called Fi
eSystem.get(conf) when you should of called FileSystem.get(uri, conf) to obtain a file system supporting your path.
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:320)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:52)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:416)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:259)
at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:676)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:200)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1184)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1160)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1132)
at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:662)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:729)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
at edu.cit.som.hadoop.SOMDriver.runIteration(SOMDriver.java:106)
at edu.cit.som.hadoop.SOMDriver.train(SOMDriver.java:69)
at edu.cit.som.hadoop.SOMDriver.run(SOMDriver.java:52)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at edu.cit.som.hadoop.SOMDriver.main(SOMDriver.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
I am not sure why the error references to "file:///" even though all the arguments I pass do not use the schema.

It turned out that the reuse of the JobConf object was causing this issue. Changing the code to create a new instance for each iteration solved the problem.

Related

java.lang.NoClassDefFoundError spark-submit in yarn cluster mode, cluster being setup using Ambari

I'm using the spark-submit command as below:
spark-submit --class com.example.hdfs.spark.RawDataAdapter --master yarn --deploy-mode cluster --jars /home/hadoop/emr/deployment/server/emr-core-1.0-SNAPSHOT.jar home/hadoop/emr-spark-1.0-SNAPSHOT.jar hdfs://111.11.11.111:8020/user/hdfsinputfile.zip 8000
However, it gives me the error java.lang.NoClassDefFoundError: com/example/emr/parser/IParser3. Though the IParser3.class is present in emr-core-1.0-SNAPSHOT.jar. I don't understand why it throws that error. I tried several ways but couldn't succeed. How can I resolve this?
I am able to run the same command in client mode and also as a standalone spark application. Getting this error only when in yarn cluster mode.
Exception from container-launch. Container id: container_e37_1526066605784_0014_02_000001 Exit code: 15 Container exited with a non-zero exit code 15. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : g.ClassLoader.defineClass(ClassLoader.java:763) at java.lang.ClassLoader.defineClass(ClassLoader.java:642) at com.example.hdfs.spark.utils.SimpleClassLoader.loadJarFile(SimpleClassLoader.java:126) at com.example.hdfs.spark.utils.SimpleClassLoader.(SimpleClassLoader.java:38) at com.example.hdfs.spark.input RawInputFormat.loadPlugins(RawInputFormat.java:71) at com.example.hdfs.spark.RawDataAdapter.run(RawDataAdapter.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at com.example.hdfs.spark.RawDataAdapter.main(RawDataAdapter.java:33) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$anon$3.run(ApplicationMaster.scala:646) 18/05/14 14:00:13 ERROR ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:423) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:282) at org.apache.spark.deploy.yarn.ApplicationMaster$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:768) at org.apache.spark.deploy.SparkHadoopUtil$anon$2.run(SparkHadoopUtil.scala:67) at org.apache.spark.deploy.SparkHadoopUtil$anon$2.run(SparkHadoopUtil.scala:66) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:766) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: java.util.concurrent.ExecutionException: Boxed Error at scala.concurrent.impl.Promise$.resolver(Promise.scala:55) at scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$resolveTry(Promise.scala:47) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:244) at scala.concurrent.Promise$class.tryFailure(Promise.scala:112) at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153) at org.apache.spark.deploy.yarn.ApplicationMaster$anon$3.run(ApplicationMaster.scala:664) Caused by: java.lang.NoClassDefFoundError: com/example/emr/parser/IParser3 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.lang.ClassLoader.defineClass(ClassLoader.java:642) at com.example.hdfs.spark.utils.SimpleClassLoader.findClass(SimpleClassLoader.java:152) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.lang.ClassLoader.defineClass(ClassLoader.java:642) at com.example.hdfs.spark.utils.SimpleClassLoader.loadJarFile(SimpleClassLoader.java:126) at com.example.hdfs.spark.utils.SimpleClassLoader.(SimpleClassLoader.java:38) at com.example.hdfs.spark.input.RawInputFormat.loadPlugins(RawInputFormat.java:71) at com.example.hdfs.spark.RawDataAdapter.run(RawDataAdapter.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at com.example.hdfs.spark.RawDataAdapter.main(RawDataAdapter.java:33) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$anon$3.run(ApplicationMaster.scala:646) Failing this attempt. Failing the application.
Quoting from Spark Documentation :-
http://spark.apache.org/docs/latest/running-on-yarn.html
In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application
So in cluster mode, the jar is executed on any available node so , so you can try these 2 ways :-
1) Copy the dependency jar to each node .
2) You can try to copy the jar to Distributed (HDFS system) and then use it .
For more details you can have a look into :
https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management

Running Pig NoClassDefFoundError

I'm trying to get pig running on my machine but whenever I try to start pig I get the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf
at org.apache.pig.Main.run(Main.java:642)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
This happens whenever I run pig or when I try to execute scripts that should work.
I'm not completely certain what is going on but it looks like I'm likely not including some of the hadoop jars correctly. Has anyone seen a similar issue or know how to include the needed jars?
For reference I'm using Apache Pig version 0.12.0-cdh5.4.9 and Hadoop 2.6.0-cdh5.4.9 and I have these environment variables set:
PIG_HOME=/Users/username/cdh5/pig-0.12.0-cdh5.4.9
PIG_CLASSPATH=/etc/hadoop/conf:/Users/username/cdh5/hadoop-2.6.0-cdh5.4.9/*:/Users/username/cdh5/hadoop-2.6.0-cdh5.4.9/lib/*
Do I need to find the hadoop jars and add those to my path or is there something else I should check.
This ended up being because I was setting CDH_MR2_HOME incorrectly and therefore pig could not find some of the jars it needed.

PredictionIO - Error when trainning kmean clustering

I followed the guidance below to train and deploy KMean clustering.
But I got error with pio train:
[WARN] [Template$] template.json does not exist. Template metadata will not be available. (This is safe to ignore if you are not working on a template.)
[INFO] [Runner$] Submission command: /home/lavalamp/PredictionIO/vendors/spark-1.4.1/bin/spark-submit --class io.prediction.workflow.CreateWorkflow --jars file:/home/lavalamp/PredictionIO/MyKmeans/target/scala-2.10/template-scala-parallel-vanilla_2.10-0.1-SNAPSHOT.jar,file:/home/lavalamp/PredictionIO/MyKmeans/target/scala-2.10/template-scala-parallel-vanilla-assembly-0.1-SNAPSHOT-deps.jar --files file:/home/lavalamp/PredictionIO/conf/log4j.properties --driver-class-path /home/lavalamp/PredictionIO/conf file:/home/lavalamp/PredictionIO/lib/pio-assembly-0.9.4.jar --engine-id gYCE4NX4ODPQkryp9Jq9by3OEXxa4fxQ --engine-version b972fa8f340c142fb6dffbebc6d276b3bb32eeda --engine-variant file:/home/lavalamp/PredictionIO/MyKmeans/engine.json --verbosity 0 --json-extractor Both
--env PIO_ENV_LOADED=1,PIO_STORAGE_SOURCES_MYSQL_PASSWORD=123456,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/home/lavalamp/.pio_store,PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://192.168.1.73/pio,PIO_HOME=/home/lavalamp/PredictionIO,
PIO_FS_ENGINESDIR=/home/lavalamp/.pio_store/engines,PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=MYSQL,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=MYSQL,
PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_MYSQL_USERNAME=root,PIO_FS_TMPDIR=/home/lavalamp/.pio_store/tmp,
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=MYSQL,
PIO_CONF_DIR=/home/lavalamp/PredictionIO/conf
Exception in thread "main" java.lang.ClassCastException: com.biglabs.VanillaEngine$ cannot be cast to io.prediction.controller.EngineFactory
at io.prediction.workflow.WorkflowUtils$.getEngine(WorkflowUtils.scala:69)
at io.prediction.workflow.CreateWorkflow$.liftedTree1$1(CreateWorkflow.scala:193)
at io.prediction.workflow.CreateWorkflow$.main(CreateWorkflow.scala:192)
at io.prediction.workflow.CreateWorkflow.main(CreateWorkflow.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Can anyone help me with this issue?
try this solution, https://github.com/singsanj/KMeans-parallel-template
hope this solve your issues.
just dont forget to update the scripts/loadData.py with you newly created app access key and engine.json with your appId.
if you still have issues.. happy to solve.

Using WLST script how to create DbAdapter EIS Connection Factory (remotely)

I am trying to create Create DbAdapter in Admin Server using WLST sript.
def createDbAdapter():
connect('weblogic', 'welcome1','t3://asdf-pdm:7001')
edit()
startEdit()
planPath = get('/AppDeployments/DbAdapter/PlanPath')
#D:\Oracle\Middleware\home_ps2\Oracle_SOA1\soa\connectors\Plan.xml
appPath = get('/AppDeployments/DbAdapter/SourcePath')
# D:/Oracle/Middleware/home_ps2/Oracle_SOA1/soa/connectors/DbAdapter.rar
wpPlan=loadApplication(appPath, planPath) # got exception here
...
...
...
While i am trying to load in to memory DbAdapter.rar with Plan.xml it throws following error
wls:/base_domain/serverConfig> loadApplication(appPath, planPath)
Loading application from D:/Oracle/Middleware/home_ps2/Oracle_SOA1/soa/connectors/DbAdapter.rar ...
Plan for your application will be written to D:\Oracle\Middleware\home_ps2\Oracle_SOA1\soa\connectors\Plan.xml
Traceback (innermost last):
File "<console>", line 1, in ?
File "<iostream>", line 290, in loadApplication
Use dumpStack() to view the full stacktrace
at weblogic.management.scripting.ExceptionHandler.handleException(ExceptionHandler.java:59)
at weblogic.management.scripting.WLSTUtils.throwWLSTException(WLSTUtils.java:181)
at weblogic.management.scripting.JSR88DeployHandler.loadApplication(JSR88DeployHandler.java:196)
at weblogic.management.scripting.WLScriptContext.loadApplication(WLScriptContext.java:787)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
weblogic.management.scripting.ScriptException: weblogic.management.scripting.ScriptException: Error occured while performing loadApplication : Could not read confi
guration. : Exception in AppMerge flows' progression
Basically it tries to load those file from the file system where it running, rather where it connected.
while I am doing the same from the server console (locally) then everything works fine.
So, at this point i want to know, is it possible to update DbAdapter remotely?
Suvankar,
There could be path problem for the Plan.xml that using '\' instead of '/'. Please check and revert.
I too have faced the same issue looks like loadApplication() doesn't support remote loading of the application into the memory . Strangely it's not mentioned anywhere in the doc

why jmap command running with exception java.lang.reflect.InvocationTargetException? [duplicate]

I get the following exception when i take a heapdump using
jmap -F -dump:format=b,file=/tmp/heapdump/before.hprof 10737
Attaching to process ID 10737, please wait...
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at sun.tools.jmap.JMap.runTool(JMap.java:179)
at sun.tools.jmap.JMap.main(JMap.java:110)
Caused by: java.lang.RuntimeException: Type "nmethodBucket*", referenced in VMStructs::localHotSpotVMStructs in the remote VM, was not present in the remote VMStructs::localHotSpotVMTypes table (should have been caught in the debug build of that VM). Can not continue.
at sun.jvm.hotspot.HotSpotTypeDataBase.lookupOrFail(HotSpotTypeDataBase.java:361)
at sun.jvm.hotspot.HotSpotTypeDataBase.readVMStructs(HotSpotTypeDataBase.java:252)
at sun.jvm.hotspot.HotSpotTypeDataBase.<init>(HotSpotTypeDataBase.java:87)
at sun.jvm.hotspot.bugspot.BugSpotAgent.setupVM(BugSpotAgent.java:568)
at sun.jvm.hotspot.bugspot.BugSpotAgent.go(BugSpotAgent.java:494)
at sun.jvm.hotspot.bugspot.BugSpotAgent.attach(BugSpotAgent.java:332)
at sun.jvm.hotspot.tools.Tool.start(Tool.java:163)
at sun.jvm.hotspot.tools.HeapDumper.main(HeapDumper.java:77)
Anyone know how to resolve this ?
I was seeing the same error because my path to jmap wasn't the same as the path to the java process (i.e. targeting two different versions).
Running jmap with the full path to my JDK resolved it.
If OpenJDK is used, it requires installation of debuginfo-packages.
In Centos this works with
- sudo debuginfo-install java-1.8.0-openjdk
- or sudo yum install java-1.8.0-openjdk-debuginfo.x86_64
See
- https://bugzilla.redhat.com/show_bug.cgi?id=1010786#c15
- amazon linux - install openjdk-debuginfo?