Create Dataframe issue in Pyspark from Windows 10 - dataframe

I am unable to execute the below command from pyspark windows
schemaPeople = spark.createDataFrame(people)
I have set HADOOP_HOME to winutils
I have provide 77 permission to C:/tmp/hive
Still I am getting the below error -
Py4JJavaError: An error occurred while calling o23.applySchemaToPythonRDD.
: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:189)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
I have gone through a lot of similar questions before posting this , appreciate any help here

I got this error a bunch when trying to setup Spark on windows using the winutils file. I had to setup Spark differently to get around this.
I ended up downloading the Hadoop binary for my version of spark and going from there. I documented the whole thing with a walkthrough if you are interested. Spark on windows
The gist is that the official Hadoop release from Apache does not include a Windows binary and compiling from sources can be tedious so really helpful people have made compiled distributions available. If you want to use Spark 2.0.2 download the binaries from steve loughran's github for 2.1.0 you can download from here from there you should be able to set it up as expected.

Related

java.lang.IllegalAccessError: tried to access method com.google.common.collect.Iterators.emptyIterator()

I'm using hadoop 3.2.1 and hive 2.3.6.
When I run show databases, it shows the following error
'''
hive> show databases;
Exception in thread "main" java.lang.IllegalAccessError: tried to access method com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator; from class org.apache.hadoop.hive.ql.exec.FetchOperator
at org.apache.hadoop.hive.ql.exec.FetchOperator.<init>(FetchOperator.java:108)
at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:541)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
'''
What does it mean?And why do i get this error? Please give clarity.
Thanks in Advance.
According to the release page [1] Hive 2.3.3 works with Hadoop 2.x.y (not 3.x.y) so if you want to run with Hadoop 3.2.1 try a newer version.
Other than that the error looks like a classpath problem related with guava. I guess you have one Guava version coming from Hive and another version coming from Hadoop. Try removing one of them. For instance:
cd apache-hive-2.3.3-bin/lib
rm guava*
Even if you solve the problem above most likely you will bump up into another so it is better to choose versions that are compatible.
[1] https://hive.apache.org/downloads.html
Please upgrade to apache-hive-3.1.2 if that is an option for you. I had exact same issue that was resolved by the upgrade. Other option may be would be to compare lib folder of hive 2.3.6 versus hive 3.1.2; this issue is primarily due to incompatible jars.

Running Pig NoClassDefFoundError

I'm trying to get pig running on my machine but whenever I try to start pig I get the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf
at org.apache.pig.Main.run(Main.java:642)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
This happens whenever I run pig or when I try to execute scripts that should work.
I'm not completely certain what is going on but it looks like I'm likely not including some of the hadoop jars correctly. Has anyone seen a similar issue or know how to include the needed jars?
For reference I'm using Apache Pig version 0.12.0-cdh5.4.9 and Hadoop 2.6.0-cdh5.4.9 and I have these environment variables set:
PIG_HOME=/Users/username/cdh5/pig-0.12.0-cdh5.4.9
PIG_CLASSPATH=/etc/hadoop/conf:/Users/username/cdh5/hadoop-2.6.0-cdh5.4.9/*:/Users/username/cdh5/hadoop-2.6.0-cdh5.4.9/lib/*
Do I need to find the hadoop jars and add those to my path or is there something else I should check.
This ended up being because I was setting CDH_MR2_HOME incorrectly and therefore pig could not find some of the jars it needed.

Can't start Idea 14.1.1

I have installed Intellij Idea 14.1.1
After setting up by wizard screen on first run then error caught:
Internal error. Please report to https://youtrack.jetbrains.com
java.lang.RuntimeException: com.intellij.ide.plugins.PluginManager$StartupAbortedException: Fatal error initializing 'org.intellij.images.fileTypes.impl.ImageFileTypeManagerImpl'
at com.intellij.idea.IdeaApplication.run(IdeaApplication.java:178)
at com.intellij.idea.MainImpl$1$1$1.run(MainImpl.java:52)
at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:312)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:738)
at java.awt.EventQueue.access$300(EventQueue.java:103)
at java.awt.EventQueue$3.run(EventQueue.java:699)
at java.awt.EventQueue$3.run(EventQueue.java:697)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:708)
at com.intellij.ide.IdeEventQueue.dispatchEvent(IdeEventQueue.java:362)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:242)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:161)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:150)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:146)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:138)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:91)
Caused by: com.intellij.ide.plugins.PluginManager$StartupAbortedException: Fatal error initializing 'org.intellij.images.fileTypes.impl.ImageFileTypeManagerImpl'
at com.intellij.ide.plugins.PluginManager.handleComponentError(PluginManager.java:248)
at com.intellij.openapi.fileTypes.impl.FileTypeManagerImpl.initStandardFileTypes(FileTypeManagerImpl.java:273)
at com.intellij.openapi.fileTypes.impl.FileTypeManagerImpl.<init>(FileTypeManagerImpl.java:230)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.picocontainer.defaults.InstantiatingComponentAdapter.newInstance(InstantiatingComponentAdapter.java:193)
at org.picocontainer.defaults.ConstructorInjectionComponentAdapter$1.run(ConstructorInjectionComponentAdapter.java:220)
at org.picocontainer.defaults.ThreadLocalCyclicDependencyGuard.observe(ThreadLocalCyclicDependencyGuard.java:53)
at org.picocontainer.defaults.ConstructorInjectionComponentAdapter.getComponentInstance(ConstructorInjectionComponentAdapter.java:248)
at com.intellij.util.pico.ConstructorInjectionComponentAdapter.getComponentInstance(ConstructorInjectionComponentAdapter.java:58)
at com.intellij.openapi.components.impl.ComponentManagerImpl$ComponentConfigComponentAdapter$1.getComponentInstance(ComponentManagerImpl.java:550)
at com.intellij.openapi.components.impl.ComponentManagerImpl$ComponentConfigComponentAdapter.getComponentInstance(ComponentManagerImpl.java:610)
at com.intellij.util.pico.DefaultPicoContainer.getLocalInstance(DefaultPicoContainer.java:245)
at com.intellij.util.pico.DefaultPicoContainer.getComponentInstance(DefaultPicoContainer.java:211)
at com.intellij.openapi.components.impl.ComponentManagerImpl.createComponent(ComponentManagerImpl.java:125)
at com.intellij.openapi.application.impl.ApplicationImpl.createComponent(ApplicationImpl.java:359)
at com.intellij.openapi.components.impl.ComponentManagerImpl.createComponents(ComponentManagerImpl.java:116)
at com.intellij.openapi.components.impl.ComponentManagerImpl.init(ComponentManagerImpl.java:87)
at com.intellij.openapi.components.impl.stores.ApplicationStoreImpl.load(ApplicationStoreImpl.java:101)
at com.intellij.openapi.application.impl.ApplicationImpl.load(ApplicationImpl.java:504)
at com.intellij.openapi.application.impl.ApplicationImpl.load(ApplicationImpl.java:486)
at com.intellij.idea.IdeaApplication.run(IdeaApplication.java:170)
... 16 more
Caused by: java.util.ServiceConfigurationError: javax.imageio.spi.ImageWriterSpi: Provider com.sun.media.imageioimpl.plugins.jpeg.CLibJPEGImageWriterSpi could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:224)
at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
at javax.imageio.spi.IIORegistry.registerApplicationClasspathSpis(IIORegistry.java:210)
at javax.imageio.spi.IIORegistry.<init>(IIORegistry.java:138)
at javax.imageio.spi.IIORegistry.getDefaultInstance(IIORegistry.java:159)
at javax.imageio.ImageIO.<clinit>(ImageIO.java:65)
at org.intellij.images.fileTypes.impl.ImageFileTypeManagerImpl.createFileTypes(ImageFileTypeManagerImpl.java:80)
at com.intellij.openapi.fileTypes.impl.FileTypeManagerImpl.initStandardFileTypes(FileTypeManagerImpl.java:270)
... 38 more
Caused by: java.lang.IllegalArgumentException: vendorName == null!
at javax.imageio.spi.IIOServiceProvider.<init>(IIOServiceProvider.java:76)
at javax.imageio.spi.ImageReaderWriterSpi.<init>(ImageReaderWriterSpi.java:231)
at javax.imageio.spi.ImageWriterSpi.<init>(ImageWriterSpi.java:213)
at com.sun.media.imageioimpl.plugins.jpeg.CLibJPEGImageWriterSpi.<init>(CLibJPEGImageWriterSpi.java:84)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.Class.newInstance(Class.java:379)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
... 45 more
note:
source : https://www.jetbrains.com/idea/download/
there's similar problems found on google, but mine is different from them
i'm using jdk 7
ubuntu 14.04 LTS
Simply deleting:
~/Library/Java/Extensions
(which contains the offending JAI jar) does the job on a Mac.
Relevant issues:
https://youtrack.jetbrains.com/issue/IDEA-137147
http://youtrack.jetbrains.com/issue/IDEA-139178
A similar problem on OSX seems to have a root cause of the commonly used JAI extension jar files being somewhat mis-configured with a null vendorName. If you need the JAI extensions for some reason (because, for example, your local GeoServer upgrade re-installs them in ~/Library/Java/Extensions) then a work-around is to create a ~/Library/Preferences/IdeaIC14/idea.vmoptions file that contains ONLY:
-Djava.ext.dirs=/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/lib/ext
That seems to override OSX Java 6's default behavior of checking the user's ~/Library/Java/Extensions directory in addition to the system's extensions on application start up.
If you are using some other JRE version/operating system then adjusting those paths or making sure that the Java extension directory that the application that needs JAI is different from the one that IntelliJ uses may help. At worst you might install two JDK/JRE's and give each application a different JAVA_HOME value.

Cannot use jpenable

I am stuck at square one of trying to profile my app using JProfiler. I am trying to use "jpenable" on RHEL6, but when I select my VM, it simply crashes to the desktop as follows:
Please select the profiling mode:
GUI mode (attach with JProfiler GUI) [1, Enter]
Offline mode (use config file to set profiling settings) [2]
1
Please enter a profiling port
[31757]
java.lang.NoSuchMethodError: javax.xml.parsers.SAXParserFactory.setSchema(Ljavax/xml/validation/Schema;)V
at org.jdom2.input.sax.XMLReaders.<init>(XMLReaders.java:124)
at org.jdom2.input.sax.XMLReaders.<clinit>(XMLReaders.java:95)
at org.jdom2.input.SAXBuilder.<init>(SAXBuilder.java:338)
at org.jdom2.input.SAXBuilder.<init>(SAXBuilder.java:221)
at com.jprofiler.a.h.a(ejt:500)
<snip so StackOverflow allows the post>
at com.jprofiler.a.i.g.a(ejt:38)
at com.jprofiler.frontend.attach.c.a(ejt:243)
at com.jprofiler.frontend.EnableApplication.a(ejt:118)
at com.jprofiler.frontend.EnableApplication.g(ejt:81)
at com.jprofiler.frontend.EnableApplication.main(ejt:238)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.exe4j.runtime.LauncherEngine.launch(Unknown Source)
at com.install4j.runtime.launcher.Launcher.main(Unknown Source)
Any suggestions on what to do? I installed JProfiler 8.0.5 (the current version) from the RPM and simply entered the jpenable command. Everything else is shown above. This is a rather old RHEL6 image, but I cannot upgrade to a more recent one.
I am at a loss for what to try next.
Any help would be greatly appreciated,
Mike
This is because of a conflicting version of one of the following files
xml-apis.jar
xmlParserAPIs.jar
xercesImpl-x.x.x.jar
jaxp-api.jar
in the extension directory ($JAVA_HOME/lib/ext) of the JRE. You would have to remove that file.

hive0.10.0 Exception in thread "main" java.lang.NoSuchMethodError: org.apache.thrift.EncodingUtils.setBit(BIZ)B

could u help me? I use hive 0.10.0
hive> show tables;
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.thrift.EncodingUtils.setBit(BIZ)B
at org.apache.hadoop.hive.ql.plan.api.Query.setStartedIsSet(Query.java:487)
at org.apache.hadoop.hive.ql.plan.api.Query.setStarted(Query.java:474)
at org.apache.hadoop.hive.ql.QueryPlan.updateCountersInQueryPlan(QueryPlan.java:309)
at org.apache.hadoop.hive.ql.QueryPlan.getQueryPlan(QueryPlan.java:450)
at org.apache.hadoop.hive.ql.QueryPlan.toString(QueryPlan.java:622)
at org.apache.hadoop.hive.ql.history.HiveHistory.logPlanProgress(HiveHistory.java:503)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1097)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:973)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:893)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
This issue comes because of incompatible "libthrift" jar version. So, I have downloaded the latest libthrift-0.9.3.jar, and it worked for me.
I faced the similar issue. The version of Hive used is not compatible with Hadoop. The thrift version used by hadoop is different from the one used by hive. It good to use the compatible version of Hive or replace the thirft (jar) library used by Hadoop with one used by hive.
When i faced this problem, this was my situation:
In HADOOP_HOME/lib, I placed mahout-examples-0.7-job.jar, which is not supposed to be there for some other excercises.
When I run Hive, it throwed me the same error like in your question.
I moved mahout.X.y.jar from lib, then started hive CLi and it worked like a charm.