Running Pig NoClassDefFoundError - apache-pig

I'm trying to get pig running on my machine but whenever I try to start pig I get the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf
at org.apache.pig.Main.run(Main.java:642)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
This happens whenever I run pig or when I try to execute scripts that should work.
I'm not completely certain what is going on but it looks like I'm likely not including some of the hadoop jars correctly. Has anyone seen a similar issue or know how to include the needed jars?
For reference I'm using Apache Pig version 0.12.0-cdh5.4.9 and Hadoop 2.6.0-cdh5.4.9 and I have these environment variables set:
PIG_HOME=/Users/username/cdh5/pig-0.12.0-cdh5.4.9
PIG_CLASSPATH=/etc/hadoop/conf:/Users/username/cdh5/hadoop-2.6.0-cdh5.4.9/*:/Users/username/cdh5/hadoop-2.6.0-cdh5.4.9/lib/*
Do I need to find the hadoop jars and add those to my path or is there something else I should check.

This ended up being because I was setting CDH_MR2_HOME incorrectly and therefore pig could not find some of the jars it needed.

Related

Pentaho commandline nullpointer exception

Pentaho PDI version 8.3.0 CE if it matters
When I try to run a job or transformation commandline using kitchen or pan respectively I get a nullpointer exception. This happens only when trying to run something from a repository.
When I try to run the same transformation or job from spoon, all is fine and the job runs great.
I use the following commands, which both provide the same error:
./pan.sh -trans=get_clusters -rep=myrepo -user=admin -pass=mypass -dir=/Transformations
and
./kitchen.sh -job=scheduled_update_job -rep=myrepo -user=admin -pass=mypass -dir=/Jobs
NOTE: This error also happens when I try to run the job or transformation from a docker container.
The error I receive is as follows and identical for PAN and Kitchen:
020/02/05 09:07:56 - Pan - Start of run.
Processing has stopped because of an error: null
java.lang.NullPointerException
at org.pentaho.di.core.plugins.PluginRegistry.getPluginId(PluginRegistry.java:689)
at org.pentaho.di.core.plugins.PluginRegistry.getPlugin(PluginRegistry.java:715)
at org.pentaho.di.core.plugins.PluginRegistry.loadClass(PluginRegistry.java:370)
at org.pentaho.di.base.AbstractBaseCommandExecutor.establishRepositoryConnection(AbstractBaseCommandExecutor.java:195)
at org.pentaho.di.pan.PanCommandExecutor.execute(PanCommandExecutor.java:119)
at org.pentaho.di.pan.Pan.main(Pan.java:270)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.pentaho.commons.launcher.Launcher.main(Launcher.java:92)
Any help would be appreciated.
Run the job from your home directory (as working directory) using the full path of pan.sh or kitchen.sh.
I'm not sure what exactly causes the trouble. Likely causes:
Your KETTLE_HOME is not valid, causing Pentaho to look for .kettle in the working directory. (Do not include .kettle in the HOME)
A variant of this is that you don't have permissions on the files if you copied/moved them as root.
Your user does not have write access to the data-integration directory, causing some failure writing a configuration that would normally go into the working dir. It is normal to run Pentaho with an account that does not have write access here, that is not the problem, just that it doesn't like a non-writable working dir.

java.lang.IllegalAccessError: tried to access method com.google.common.collect.Iterators.emptyIterator()

I'm using hadoop 3.2.1 and hive 2.3.6.
When I run show databases, it shows the following error
'''
hive> show databases;
Exception in thread "main" java.lang.IllegalAccessError: tried to access method com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator; from class org.apache.hadoop.hive.ql.exec.FetchOperator
at org.apache.hadoop.hive.ql.exec.FetchOperator.<init>(FetchOperator.java:108)
at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
'''
What does it mean?And why do i get this error? Please give clarity.
Thanks in Advance.
According to the release page [1] Hive 2.3.3 works with Hadoop 2.x.y (not 3.x.y) so if you want to run with Hadoop 3.2.1 try a newer version.
Other than that the error looks like a classpath problem related with guava. I guess you have one Guava version coming from Hive and another version coming from Hadoop. Try removing one of them. For instance:
cd apache-hive-2.3.3-bin/lib
rm guava*
Even if you solve the problem above most likely you will bump up into another so it is better to choose versions that are compatible.
[1] https://hive.apache.org/downloads.html
Please upgrade to apache-hive-3.1.2 if that is an option for you. I had exact same issue that was resolved by the upgrade. Other option may be would be to compare lib folder of hive 2.3.6 versus hive 3.1.2; this issue is primarily due to incompatible jars.

Create Dataframe issue in Pyspark from Windows 10

I am unable to execute the below command from pyspark windows
schemaPeople = spark.createDataFrame(people)
I have set HADOOP_HOME to winutils
I have provide 77 permission to C:/tmp/hive
Still I am getting the below error -
Py4JJavaError: An error occurred while calling o23.applySchemaToPythonRDD.
: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:189)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
I have gone through a lot of similar questions before posting this , appreciate any help here
I got this error a bunch when trying to setup Spark on windows using the winutils file. I had to setup Spark differently to get around this.
I ended up downloading the Hadoop binary for my version of spark and going from there. I documented the whole thing with a walkthrough if you are interested. Spark on windows
The gist is that the official Hadoop release from Apache does not include a Windows binary and compiling from sources can be tedious so really helpful people have made compiled distributions available. If you want to use Spark 2.0.2 download the binaries from steve loughran's github for 2.1.0 you can download from here from there you should be able to set it up as expected.

PredictionIO - Error when trainning kmean clustering

I followed the guidance below to train and deploy KMean clustering.
But I got error with pio train:
[WARN] [Template$] template.json does not exist. Template metadata will not be available. (This is safe to ignore if you are not working on a template.)
[INFO] [Runner$] Submission command: /home/lavalamp/PredictionIO/vendors/spark-1.4.1/bin/spark-submit --class io.prediction.workflow.CreateWorkflow --jars file:/home/lavalamp/PredictionIO/MyKmeans/target/scala-2.10/template-scala-parallel-vanilla_2.10-0.1-SNAPSHOT.jar,file:/home/lavalamp/PredictionIO/MyKmeans/target/scala-2.10/template-scala-parallel-vanilla-assembly-0.1-SNAPSHOT-deps.jar --files file:/home/lavalamp/PredictionIO/conf/log4j.properties --driver-class-path /home/lavalamp/PredictionIO/conf file:/home/lavalamp/PredictionIO/lib/pio-assembly-0.9.4.jar --engine-id gYCE4NX4ODPQkryp9Jq9by3OEXxa4fxQ --engine-version b972fa8f340c142fb6dffbebc6d276b3bb32eeda --engine-variant file:/home/lavalamp/PredictionIO/MyKmeans/engine.json --verbosity 0 --json-extractor Both
--env PIO_ENV_LOADED=1,PIO_STORAGE_SOURCES_MYSQL_PASSWORD=123456,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/home/lavalamp/.pio_store,PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://192.168.1.73/pio,PIO_HOME=/home/lavalamp/PredictionIO,
PIO_FS_ENGINESDIR=/home/lavalamp/.pio_store/engines,PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=MYSQL,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=MYSQL,
PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_MYSQL_USERNAME=root,PIO_FS_TMPDIR=/home/lavalamp/.pio_store/tmp,
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=MYSQL,
PIO_CONF_DIR=/home/lavalamp/PredictionIO/conf
Exception in thread "main" java.lang.ClassCastException: com.biglabs.VanillaEngine$ cannot be cast to io.prediction.controller.EngineFactory
at io.prediction.workflow.WorkflowUtils$.getEngine(WorkflowUtils.scala:69)
at io.prediction.workflow.CreateWorkflow$.liftedTree1$1(CreateWorkflow.scala:193)
at io.prediction.workflow.CreateWorkflow$.main(CreateWorkflow.scala:192)
at io.prediction.workflow.CreateWorkflow.main(CreateWorkflow.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Can anyone help me with this issue?
try this solution, https://github.com/singsanj/KMeans-parallel-template
hope this solve your issues.
just dont forget to update the scripts/loadData.py with you newly created app access key and engine.json with your appId.
if you still have issues.. happy to solve.

hive0.10.0 Exception in thread "main" java.lang.NoSuchMethodError: org.apache.thrift.EncodingUtils.setBit(BIZ)B

could u help me? I use hive 0.10.0
hive> show tables;
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.thrift.EncodingUtils.setBit(BIZ)B
at org.apache.hadoop.hive.ql.plan.api.Query.setStartedIsSet(Query.java:487)
at org.apache.hadoop.hive.ql.plan.api.Query.setStarted(Query.java:474)
at org.apache.hadoop.hive.ql.QueryPlan.updateCountersInQueryPlan(QueryPlan.java:309)
at org.apache.hadoop.hive.ql.QueryPlan.getQueryPlan(QueryPlan.java:450)
at org.apache.hadoop.hive.ql.QueryPlan.toString(QueryPlan.java:622)
at org.apache.hadoop.hive.ql.history.HiveHistory.logPlanProgress(HiveHistory.java:503)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1097)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:973)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
This issue comes because of incompatible "libthrift" jar version. So, I have downloaded the latest libthrift-0.9.3.jar, and it worked for me.
I faced the similar issue. The version of Hive used is not compatible with Hadoop. The thrift version used by hadoop is different from the one used by hive. It good to use the compatible version of Hive or replace the thirft (jar) library used by Hadoop with one used by hive.
When i faced this problem, this was my situation:
In HADOOP_HOME/lib, I placed mahout-examples-0.7-job.jar, which is not supposed to be there for some other excercises.
When I run Hive, it throwed me the same error like in your question.
I moved mahout.X.y.jar from lib, then started hive CLi and it worked like a charm.