PredictionIO - Error when trainning kmean clustering - k-means

I followed the guidance below to train and deploy KMean clustering.
But I got error with pio train:
[WARN] [Template$] template.json does not exist. Template metadata will not be available. (This is safe to ignore if you are not working on a template.)
[INFO] [Runner$] Submission command: /home/lavalamp/PredictionIO/vendors/spark-1.4.1/bin/spark-submit --class io.prediction.workflow.CreateWorkflow --jars file:/home/lavalamp/PredictionIO/MyKmeans/target/scala-2.10/template-scala-parallel-vanilla_2.10-0.1-SNAPSHOT.jar,file:/home/lavalamp/PredictionIO/MyKmeans/target/scala-2.10/template-scala-parallel-vanilla-assembly-0.1-SNAPSHOT-deps.jar --files file:/home/lavalamp/PredictionIO/conf/log4j.properties --driver-class-path /home/lavalamp/PredictionIO/conf file:/home/lavalamp/PredictionIO/lib/pio-assembly-0.9.4.jar --engine-id gYCE4NX4ODPQkryp9Jq9by3OEXxa4fxQ --engine-version b972fa8f340c142fb6dffbebc6d276b3bb32eeda --engine-variant file:/home/lavalamp/PredictionIO/MyKmeans/engine.json --verbosity 0 --json-extractor Both
--env PIO_ENV_LOADED=1,PIO_STORAGE_SOURCES_MYSQL_PASSWORD=123456,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/home/lavalamp/.pio_store,PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://192.168.1.73/pio,PIO_HOME=/home/lavalamp/PredictionIO,
PIO_FS_ENGINESDIR=/home/lavalamp/.pio_store/engines,PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=MYSQL,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=MYSQL,
PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_MYSQL_USERNAME=root,PIO_FS_TMPDIR=/home/lavalamp/.pio_store/tmp,
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=MYSQL,
PIO_CONF_DIR=/home/lavalamp/PredictionIO/conf
Exception in thread "main" java.lang.ClassCastException: com.biglabs.VanillaEngine$ cannot be cast to io.prediction.controller.EngineFactory
at io.prediction.workflow.WorkflowUtils$.getEngine(WorkflowUtils.scala:69)
at io.prediction.workflow.CreateWorkflow$.liftedTree1$1(CreateWorkflow.scala:193)
at io.prediction.workflow.CreateWorkflow$.main(CreateWorkflow.scala:192)
at io.prediction.workflow.CreateWorkflow.main(CreateWorkflow.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Can anyone help me with this issue?

try this solution, https://github.com/singsanj/KMeans-parallel-template
hope this solve your issues.
just dont forget to update the scripts/loadData.py with you newly created app access key and engine.json with your appId.
if you still have issues.. happy to solve.

Related

java.lang.NoClassDefFoundError spark-submit in yarn cluster mode, cluster being setup using Ambari

I'm using the spark-submit command as below:
spark-submit --class com.example.hdfs.spark.RawDataAdapter --master yarn --deploy-mode cluster --jars /home/hadoop/emr/deployment/server/emr-core-1.0-SNAPSHOT.jar home/hadoop/emr-spark-1.0-SNAPSHOT.jar hdfs://111.11.11.111:8020/user/hdfsinputfile.zip 8000
However, it gives me the error java.lang.NoClassDefFoundError: com/example/emr/parser/IParser3. Though the IParser3.class is present in emr-core-1.0-SNAPSHOT.jar. I don't understand why it throws that error. I tried several ways but couldn't succeed. How can I resolve this?
I am able to run the same command in client mode and also as a standalone spark application. Getting this error only when in yarn cluster mode.
Exception from container-launch. Container id: container_e37_1526066605784_0014_02_000001 Exit code: 15 Container exited with a non-zero exit code 15. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : g.ClassLoader.defineClass(ClassLoader.java:763) at java.lang.ClassLoader.defineClass(ClassLoader.java:642) at com.example.hdfs.spark.utils.SimpleClassLoader.loadJarFile(SimpleClassLoader.java:126) at com.example.hdfs.spark.utils.SimpleClassLoader.(SimpleClassLoader.java:38) at com.example.hdfs.spark.input RawInputFormat.loadPlugins(RawInputFormat.java:71) at com.example.hdfs.spark.RawDataAdapter.run(RawDataAdapter.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at com.example.hdfs.spark.RawDataAdapter.main(RawDataAdapter.java:33) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$anon$3.run(ApplicationMaster.scala:646) 18/05/14 14:00:13 ERROR ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:423) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:282) at org.apache.spark.deploy.yarn.ApplicationMaster$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:768) at org.apache.spark.deploy.SparkHadoopUtil$anon$2.run(SparkHadoopUtil.scala:67) at org.apache.spark.deploy.SparkHadoopUtil$anon$2.run(SparkHadoopUtil.scala:66) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:766) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: java.util.concurrent.ExecutionException: Boxed Error at scala.concurrent.impl.Promise$.resolver(Promise.scala:55) at scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$resolveTry(Promise.scala:47) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:244) at scala.concurrent.Promise$class.tryFailure(Promise.scala:112) at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153) at org.apache.spark.deploy.yarn.ApplicationMaster$anon$3.run(ApplicationMaster.scala:664) Caused by: java.lang.NoClassDefFoundError: com/example/emr/parser/IParser3 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.lang.ClassLoader.defineClass(ClassLoader.java:642) at com.example.hdfs.spark.utils.SimpleClassLoader.findClass(SimpleClassLoader.java:152) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.lang.ClassLoader.defineClass(ClassLoader.java:642) at com.example.hdfs.spark.utils.SimpleClassLoader.loadJarFile(SimpleClassLoader.java:126) at com.example.hdfs.spark.utils.SimpleClassLoader.(SimpleClassLoader.java:38) at com.example.hdfs.spark.input.RawInputFormat.loadPlugins(RawInputFormat.java:71) at com.example.hdfs.spark.RawDataAdapter.run(RawDataAdapter.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at com.example.hdfs.spark.RawDataAdapter.main(RawDataAdapter.java:33) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$anon$3.run(ApplicationMaster.scala:646) Failing this attempt. Failing the application.
Quoting from Spark Documentation :-
http://spark.apache.org/docs/latest/running-on-yarn.html
In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application
So in cluster mode, the jar is executed on any available node so , so you can try these 2 ways :-
1) Copy the dependency jar to each node .
2) You can try to copy the jar to Distributed (HDFS system) and then use it .
For more details you can have a look into :
https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management

Running Pig NoClassDefFoundError

I'm trying to get pig running on my machine but whenever I try to start pig I get the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf
at org.apache.pig.Main.run(Main.java:642)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
This happens whenever I run pig or when I try to execute scripts that should work.
I'm not completely certain what is going on but it looks like I'm likely not including some of the hadoop jars correctly. Has anyone seen a similar issue or know how to include the needed jars?
For reference I'm using Apache Pig version 0.12.0-cdh5.4.9 and Hadoop 2.6.0-cdh5.4.9 and I have these environment variables set:
PIG_HOME=/Users/username/cdh5/pig-0.12.0-cdh5.4.9
PIG_CLASSPATH=/etc/hadoop/conf:/Users/username/cdh5/hadoop-2.6.0-cdh5.4.9/*:/Users/username/cdh5/hadoop-2.6.0-cdh5.4.9/lib/*
Do I need to find the hadoop jars and add those to my path or is there something else I should check.
This ended up being because I was setting CDH_MR2_HOME incorrectly and therefore pig could not find some of the jars it needed.

"mfp push" throws NullPointerException while deploying adapter (MobileFirst Platform 7.1)

using MobileFirst Platform CLI version 7.1.0.00.20151227-1730 I have suddenly the following error when trying to push an update I made to an adapter:
Preparing for push...
Verifying Server Configuration...
Runtime 'localMFP' will be used to push the project into.
[Error:
BUILD FAILED
/Applications/IBM/MobileFirst-CLI/mobilefirst-cli/node_modules/generator-worklight-server/lib/build.xml:497: com.worklight.upgrader.UpgradeEngineException: java.lang.NullPointerException
at com.worklight.upgrader.WLUpgradeEngine.<init>(WLUpgradeEngine.java:142)
at com.worklight.upgrader.WLUpgradeEngine.<init>(WLUpgradeEngine.java:147)
at com.worklight.upgrader.ant.UpgraderTask.execute(UpgraderTask.java:100)
at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:292)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
at org.apache.tools.ant.Task.perform(Task.java:348)
at org.apache.tools.ant.Target.execute(Target.java:435)
at org.apache.tools.ant.Target.performTasks(Target.java:456)
at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1393)
at org.apache.tools.ant.Project.executeTarget(Project.java:1364)
at org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
at org.apache.tools.ant.Project.executeTargets(Project.java:1248)
at org.apache.tools.ant.Main.runBuild(Main.java:851)
at org.apache.tools.ant.Main.startAnt(Main.java:235)
at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
Caused by: java.lang.NullPointerException
at java.text.MessageFormat.applyPattern(MessageFormat.java:436)
at java.text.MessageFormat.<init>(MessageFormat.java:362)
at java.text.MessageFormat.format(MessageFormat.java:840)
at com.worklight.upgrader.WLUpgradeEngine.findProjectVersion(WLUpgradeEngine.java:602)
at com.worklight.upgrader.WLUpgradeEngine.<init>(WLUpgradeEngine.java:133)
... 18 more
Total time: 3 seconds
]
Error: Sorry an error has occurred. Please check the stack above for details.
I have tried to cleanup the project, remove what was already deployed, revert my changes to what I had when I succeeded to deploy, re-install mfp cli, but I still have the issue.
Any hint on what I could do to get rid of the exception?
Thanks!
The failure is coming from the upgrader code path, as if something is missing in the adapter files.
My suggestion is to create a new adapter and see that it gets deployed. Then, start adding back code. Maybe you will find the failing part.

Cannot install RHQ

I am trying to install RHQ Version 4.12 according to the book.
It seems simple enough, but I cannot get it to work. The underlying Jboss AS does not seem to boot up and the installation procedure stops with:
15:39:17,223 ERROR [org.rhq.enterprise.server.installer.Installer] The installer will now exit due to previous errors: java.lang.Exception: Cannot obtain client connection to the RHQ app server!!
at org.rhq.enterprise.server.installer.InstallerServiceImpl.testModelControllerClient(InstallerServiceImpl.java:1121) [rhq-installer-util-4.12.0.jar:4.12.0]
at org.rhq.enterprise.server.installer.InstallerServiceImpl.preInstall(InstallerServiceImpl.java:221) [rhq-installer-util-4.12.0.jar:4.12.0]
at org.rhq.enterprise.server.installer.InstallerServiceImpl.test(InstallerServiceImpl.java:146) [rhq-installer-util-4.12.0.jar:4.12.0]
at org.rhq.enterprise.server.installer.Installer.doInstall(Installer.java:90) [rhq-installer-util-4.12.0.jar:4.12.0]
at org.rhq.enterprise.server.installer.Installer.main(Installer.java:57) [rhq-installer-util-4.12.0.jar:4.12.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_65]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [rt.jar:1.7.0_65]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_65]
at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_65]
at org.jboss.modules.Module.run(Module.java:292) [jboss-modules.jar:1.3.0.Final-redhat-2]
at org.jboss.modules.Main.main(Main.java:455) [jboss-modules.jar:1.3.0.Final-redhat-2]
Caused by: java.io.IOException: java.net.ConnectException: JBAS012174: Could not connect to remote://127.0.0.1:9999. The connection failed
I found some threads over at the jboss forum, but they did not offer any help. Example would be https://developer.jboss.org/thread/230622
I am running this on a vagrant machine with Ubuntu 12.04 LTS and Oracle Java 1.7.0_65-b17. I attempt the installation running bin/rhqctl install
Any idea what is going wrong?
Did you try to check the server.log file? It would be interesting to know why the server did not start (if it didn't).

Pig and Jython - Can't Register UDF

I am trying to write it Python UDF; I am using the Datastax package for that. When I try to write a simple UDF such as:
#outputSchema("word:chararray")
def helloworld():
return 'Hello, World'
And then register it in the grunt shell:
REGISTER 'pig.py' USING org.apache.pig.scripting.jython.JythonScriptEngine as myfuncs;
I get the following error:
ERROR 2998: Unhandled internal error. org/python/core/PyObject
java.lang.NoClassDefFoundError: org/python/core/PyObject
at org.apache.pig.scripting.jython.JythonScriptEngine.registerFunctions(JythonScriptEngine.java:304)
at org.apache.pig.PigServer.registerCode(PigServer.java:534)
at org.apache.pig.tools.grunt.GruntParser.processRegister(GruntParser.java:423)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:419)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:190)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: Class org.python.core.PyObject not found in modules [ModuleClassLoader:Ana$
at com.datastax.bdp.loader.SystemClassLoader.loadClass(SystemClassLoader.java:120)
at com.datastax.bdp.loader.ModuleClassLoader.loadClass(ModuleClassLoader.java:38)
at com.datastax.bdp.loader.ModuleClassLoader.loadClass(ModuleClassLoader.java:32)
... 14 more
Does anyone know what could be causing this error?
Add $PIG_HOME/lib/jython.jar to your PIG_CLASSPATH environment variable.