Spark SQL - Adding partition in hive exception - apache-spark-sql

Getting below exception while trying to run spark sql which is adding partition in hive table.
diagnostics: User class threw exception: java.lang.LinkageError:
ClassCastException: attempting to
castjar:file:/hadoop/hadoop/yarn/local/filecache/11415/spark-hdp-assembly.jar!/javax/ws/rs/ext/RuntimeDelegate.classtojar:file:/hadoop/hadoop/yarn/local/filecache/11415/spark-hdp-assembly.jar!/javax/ws/rs/ext/RuntimeDelegate.class

Related

Unable to select count of rows of an ORC table through Hive Beeline command

I am using the following components - Hadoop 3.1.4 , Hive 3.1.3 and Tez 0.9.2
And there is an ORC table from which I am trying to extract count of the rows in the table. select count(*) from ORC_TABLE and this throws the below set of exceptions
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1670915386694_0182_1_00, diagnostics=[Vertex vertex_1670915386694_0182_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: jio_ar_consumer_events initializer failed, vertex=vertex_1670915386694_0182_1_00 [Map 1], java.lang.RuntimeException: ORC split generation failed with exception: java.lang.NoSuchMethodError: org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1851)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1939)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:519)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:765)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
Caused by: java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1790)
... 17 more
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I
at org.apache.hadoop.hive.ql.io.AcidUtils.lambda$getAcidState$0(AcidUtils.java:1117)
at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
at java.util.TimSort.sort(TimSort.java:220)
at java.util.Arrays.sort(Arrays.java:1512)
at java.util.ArrayList.sort(ArrayList.java:1464)
at java.util.Collections.sort(Collections.java:177)
at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:1115)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.callInternal(OrcInputFormat.java:1207)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.access$1500(OrcInputFormat.java:1142)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:1179)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:1176)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:1176)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:1142)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
There is another article where the same problem has been described ORC Split Generation issue with Hive Table
but there isnt any solution as such yet.
I also tried running CONCATENATE function on top of ORC Table but that didn't help either.
What works though is, if I run select * from ORC_TABLE with or without LIMIT, it seems to extract the records. I reckon issue must only be with aggregate functions or may be I don't get the issue yet.
I am also using Spark 3.3.1 and I can extract the same count through Spark Context Spark Sql utility and able to fetch the rows as well. No issues with Spark in that front.
Adding on to it, When I change the execution engine to MR, then this works. Fails only when I run this on Tez Engine.
Any leads to resolve this issue is much appreciated.
The issue was resolved by the below steps based my previous analysis:
This class org.apache.hadoop.fs.FileStatus comes as a part of hadoop common jar file.
We were using Hadoop 3.1.4 & Tez 0.9.2
Tez 0.9.2 contains a tez.tar.gz that needs to be placed onto HDFS location. This tez.tar.gz contained hadoop-common-2.7.2.jar (This does not have the method compareTo that is thrown as an exception as shown in the error )
Solution :
We extracted the tez.tar.gz and replaced all hadoop 2.7.2 related jars with hadoop 3.1.4 jars. Do this if you dont want to reconfigure again with new tez version. Otherwise you could follow solution 2 as mentioned.
Recreated the tar and placed it across all dependent locations including HDFS as well. For us it was in /user/tez/share/tez.tar.gz location. It changes accordingly.
This error disappeared after I followed the steps and now I am able to do count of records on any table.
Solution 2 :
Other solution that you could easily do is, use 0.10.x Tez version that contains libraries for hadoop 3.x version. Rather than 0.9.2 Tez version which is compatible with hadoop 2.7.x version.

How to fix 'get table schema from server' error in Hive

I'm using Microsoft Hive ODBC driver to connect hive server. An error occurred while I'm trying to execute 'select * from tb limit 100' using a table 'tb' with schema csv and a partition key. Other table without partition key can execute successfully.
ERROR [HY000] [Microsoft][Hardy] (97) Error occurred while trying to
get table schema from server. Error: [Microsoft][Hardy] (35) Error
from server: error code: '0' error message:
'MetaException(message:java.lang.UnsupportedOperationException:
Storage schema reading not supported)'.
Add below configuration under "Custom hive-site":
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
It worked for me.
Note: Restart affected services after saving the configuration.

Error while reviewing file after inserting data in redshift table

I have a table in Redshift in which I am inserting data from S3.
I viewed the table before inserting the data and it returned a blank table.
However, After inserting data in Redshift table, I am getting below error while doing select * from table.
Command to copy data in table from S3 runs successfully without any error.
java.lang.NoClassDefFoundError:
com/amazon/jdbc/utils/DataTypeUtilities$NumericRepresentation error in
redshift
what could be the possible cause and sol for this?
I have faced this error : java.lang.NoClassDefFoundError when the JDBC connection properties are set incorrectly.
If you are using postgres driver then ensure using postgres://
eg : jdbc:postgresql:// HostName:5439/
Let me know if this works.

not able to select the data from a table in hive

When I try to create a table in hive, I am getting OK, and getting OK while I insert the values through a text file. But while selecting the values from that table I am getting the below error message
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.unset(Ljava/lang/String;)
Due to the version inconsistencies, you need to switched to newer version of hadoop 2.X.x.
https://stackoverflow.com/questions/27842004/issue-in-apache-hive-0-14-running-dml-queries
you can check which version of hadoop will work with which version of hive - from hive download page

accesing Views created in Hive using HcatLoader in Pig

I was just trying something in hive and HcatLoader in Pig. What I did is, created a view in Hive and then tried to load data by view I created into pig using HcatLoader. But it seems it is not working. I just wanted to confirm that is there any way to do this? I am getting following error when I tried to load view in pig using HcatLoader
events=Load 'ViewName' using org.apache.hcatalog.pig.HCatLoader();
dump events;
When I use any tableName instead of View from Hive, it seems to work. Further it does not give metastore error. As it says successfully connected to metastore at load statement when it comes to dump, it crashes with the following error.
Any Pointers will be helpful.
Thanks,
Atul
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias events
at org.apache.pig.PigServer.openIterator(PigServer.java:857)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:682)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:555)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias events
at org.apache.pig.PigServer.storeEx(PigServer.java:956)
at org.apache.pig.PigServer.store(PigServer.java:919)
at org.apache.pig.PigServer.openIterator(PigServer.java:832)
... 12 more
Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:731)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:259)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:180)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1270)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1255)
at org.apache.pig.PigServer.storeEx(PigServer.java:952)
Response I recieved by posting it on some other forum.
"HCatLoader does not support reading views in Hive. The issue is that a view is defined as a query on a table (create view V as select x, y from t).
Pig doesn't speak SQL,
and
HCat doesn't contain Hive's execution engine
so it cannot execute the query either. Reading Hive views from Pig and MR will require much tighter integration of the products than we currently have."
I found the same issue the hard way today. Hive cannot read Hive Views (but lacks good exception handling code on this topic).
For the records (anybody else falling into this problem), this is how the current version behaves: On Hortonworks 2.3 with Pig 1.15 I only got the following error in the log:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error
creating job configuration.
Pig fails this way because there is no file to load (as we attempted to load from a View).
Since Pig loads the data from a file in hadoop, reading data from an view (which does not have a physical file) may not work.
May be if we can manage to create a file for the view in hadoop, Pig may be able to load it. Atleast a virtual pointer file to the actual data file.
Not sure if this is possible or has been thought through.