java security exception : checksum failed - authentication

I am using http://webmoli.com/2009/08/29/single-sign-on-in-java-platform/
for SSO in java.
I have KDC Windows server 2008, in that i have created spn by using setspn command for testsso user. And using testsso#MYDOMAIN.COM as principal in jaas.con.
I have Tomcat server in Windows 7 machine(within AD). In this i have created one servlet as of jsp(from webmoli itself).
I sending browser request for that servlet from 3rd machine Windows XP(within AD).
But i get checksum failed error. Stacktrace as follws-
Auth is :: Negotiate Token is YIIE9wYGKwYBBQUCoIIE6zCCBOegJDAiBgkqhkiC9xIBAgIGCSqGSIb3EgECAgYKKwYBBAGCNwICCqKCBL0EggS5YIIEtQYJKoZIhvcSAQICAQBuggSkMIIEoKADAgEFoQMCAQ6iBwMFACAAAACjggPQYYIDzDCCA8igAwIBBaEKGwhDT1JFLkNPTaIkMCKgAwIBAqEbMBkbBEhUVFAbEXBjdml0YTI1LmNvcmUuY29to4IDjTCCA4mgAwIBF6EDAgEBooIDewSCA3coGnD2eNCklPBiuNkZE1ssoVPb1coS8Y1or/y8t3HEk9ME623bckx2+d2IG2tCeDylhxFz6Dy4edE9UP952TZj97YotDeME3ILm45vsN4lOzpXVOfPhVlN+KXXtdXMmKo5kLvuJNxIcT8ZbGFwo9CWcB5eYz9qTYFcE7ACogwPiydd4ibtEY4T9Zg98Pd9JGtBeG5Bxi7fYdKXKPt8X9Nb9/UeSkSDV/hDsQh/cSSPgrg9f/qiy35MvMwk930TJHxaauw7cdmBwQo9Ru66ooGxsDgyIHQXQB6GEXAh7lAvKirSRzw43Wg5Gm3S9EZA3SFd38ICf0bOT62imToOHEmO4cBtFTAbYc4v2AoH0H9ZHrtpv1ItK5EUrMduuHvvsXzuYKfgrf45Wrj7Mc6fIXij5eCmsgPxKaBYlbpMDClwdeSh0zCBkXBmbtrS5fPQwQyBjzTFMimOVQ7yLRQtoPOfGsOn0o+S1qc3SLd5q9Z7IFEafsaYwFES2b6iIYqjY6q8VGgp2mzqZAmdKk4ILeZwfHKHs59KhPaG7FC/YAMtG3TB6VH2uLmUx9dzLb+tCsx5qEETt0hKnO9oTfhgHTUb0O/QMp0eBp3lcQWKGFNvIjXZL25ugJKQW+I0kigXhuDq5gN8coH7ij51FAEfZ35A2YbddEGYzGLzQZFXn951Lkw3HlyQxMo1Qka1NDaMkTkTtECHIhW3qArucCNkXCg8N59OXduXI8z/wM0YYcBubaEMuyCxf5GmmhlxjhQ1bOvuu0CGyJk5tVqf8SOEDarN+kzle9jGNZFJetEgxKXBHbsClu+ektux5lWAwEIXboLhTLQv+h46HRru8xkgtxucCmOZYt0LCR30aD1s//U0w34T40n3fMlIEkrheJQ2bs/CigFQeUZyKMJ/oPPGwO1oleQ0q4+d8SjJL9kQzueHCOGz+/0m3LcsoInUXNaCYoF5lHaMV+qVZRJvdfVhCeHKL9uXLRKsjPHMJdOQTqNYoyAxOJ/N16UnJOsIu1feRK0ipzO9VNPuJI5ulcZqzOS+44i9BhYV6QY1oXZ3xVYLIOsQoZIVYOudfn68A+yO+LNsNsRRuXV+ZNWKW6C+SvRuIhqZcBBAS8AfVmzbxxqwpsgv4Z8un+W7t8tus+rqoy8RqjmeQ+8P3APGG20b1gah4BipAvZKhaSBtjCBs6ADAgEXooGrBIGoQ15WbDD8EQZKsUVM1rJuR5lu3EMiS5itx9pDddgBALaSrf+75w9/XfgyuXnzfXdmZgvZ9nE+AINPfmyY14yp2+tys5LsgOOncfvCaXEjR4Xr07E4JQWhTOT8ecsCLNObggVpxNGCfwxN7YlvHwBVaxPAbrOTrWAbIdfMe042ThBWWRLew048z5Il1iLxemJ10IyLsn3vakdnnz57uzM4PvaOLu57Bfic
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 17 16 23 1 3.
>>> KrbAsReq creating message
>>> KrbKdcReq send: kdc=192.168.10.84 UDP:88, timeout=30000, number of retries =3, #bytes=151
>>> KDCCommunication: kdc=192.168.10.84 UDP:88, timeout=30000,Attempt =1, #bytes=151
>>> KrbKdcReq send: #bytes read=245
>>>Pre-Authentication Data:
PA-DATA type = 19
PA-ETYPE-INFO2 etype = 17, salt = MYDOMAIN.COMHTTPMYDOMAIN.com, s2kparams = null
PA-ETYPE-INFO2 etype = 23, salt = null, s2kparams = null
PA-ETYPE-INFO2 etype = 3, salt = MYDOMAIN.COMHTTPMYDOMAIN.com, s2kparams = null
PA-ETYPE-INFO2 etype = 1, salt = MYDOMAIN.COMHTTPMYDOMAIN.com, s2kparams = null
>>>Pre-Authentication Data:
PA-DATA type = 2
PA-ENC-TIMESTAMP
>>>Pre-Authentication Data:
PA-DATA type = 16
>>>Pre-Authentication Data:
PA-DATA type = 15
>>> KdcAccessibility: remove 192.168.10.84
>>> KDCRep: init() encoding tag is 126 req type is 11
>>>KRBError:
sTime is Wed May 28 17:39:33 IST 2014 1401278973000
suSec is 896308
error code is 25
error Message is Additional pre-authentication required
realm is MYDOMAIN.COM
sname is krbtgt/MYDOMAIN.COM
eData provided.
msgType is 30
>>>Pre-Authentication Data:
PA-DATA type = 19
PA-ETYPE-INFO2 etype = 17, salt = MYDOMAIN.COMHTTPMYDOMAIN.com, s2kparams = null
PA-ETYPE-INFO2 etype = 23, salt = null, s2kparams = null
PA-ETYPE-INFO2 etype = 3, salt = MYDOMAIN.COMHTTPMYDOMAIN.com, s2kparams = null
PA-ETYPE-INFO2 etype = 1, salt = MYDOMAIN.COMHTTPMYDOMAIN.com, s2kparams = null
>>>Pre-Authentication Data:
PA-DATA type = 2
PA-ENC-TIMESTAMP
>>>Pre-Authentication Data:
PA-DATA type = 16
>>>Pre-Authentication Data:
PA-DATA type = 15
KrbAsReqBuilder: PREAUTH FAILED/REQ, re-send AS-REQ
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 17 16 23 1 3.
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 17 16 23 1 3.
>>> EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType
>>> KrbAsReq creating message
>>> KrbKdcReq send: kdc=192.168.10.84 UDP:88, timeout=30000, number of retries =3, #bytes=233
>>> KDCCommunication: kdc=192.168.10.84 UDP:88, timeout=30000,Attempt =1, #bytes=233
>>> KrbKdcReq send: #bytes read=1404
>>> KdcAccessibility: remove 192.168.10.84
>>> EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType
>>> KrbAsRep cons in KrbAsReq.getReply testsso
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 17 16 23 1 3.
Found KerberosKey for testsso#MYDOMAIN.COM
Found KerberosKey for testsso#MYDOMAIN.COM
Found KerberosKey for testsso#MYDOMAIN.COM
Found KerberosKey for testsso#MYDOMAIN.COM
Found KerberosKey for testsso#MYDOMAIN.COM
Entered Krb5Context.acceptSecContext with state=STATE_NEW
>>> EType: sun.security.krb5.internal.crypto.ArcFourHmacEType
GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)
at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Unknown Source)
at sun.security.jgss.GSSContextImpl.acceptSecContext(Unknown Source)
at sun.security.jgss.GSSContextImpl.acceptSecContext(Unknown Source)
at sun.security.jgss.spnego.SpNegoContext.GSS_acceptSecContext(Unknown Source)
at sun.security.jgss.spnego.SpNegoContext.acceptSecContext(Unknown Source)
at sun.security.jgss.GSSContextImpl.acceptSecContext(Unknown Source)
at sun.security.jgss.GSSContextImpl.acceptSecContext(Unknown Source)
at one.TEST$2.run(TEST.java:357)
at one.TEST$2.run(TEST.java:1)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at one.TEST.acceptSecurityContext(TEST.java:279)
at one.TEST.authenticate(TEST.java:146)
at one.TEST.doGet(TEST.java:103)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:304)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:462)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:562)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:395)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:250)
at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: KrbException: Checksum failed
at sun.security.krb5.internal.crypto.ArcFourHmacEType.decrypt(Unknown Source)
at sun.security.krb5.internal.crypto.ArcFourHmacEType.decrypt(Unknown Source)
at sun.security.krb5.EncryptedData.decrypt(Unknown Source)
at sun.security.krb5.KrbApReq.authenticate(Unknown Source)
at sun.security.krb5.KrbApReq.<init>(Unknown Source)
at sun.security.jgss.krb5.InitSecContextToken.<init>(Unknown Source)
... 32 more
Caused by: java.security.GeneralSecurityException: Checksum failed
at sun.security.krb5.internal.crypto.dk.ArcFourCrypto.decrypt(Unknown Source)
at sun.security.krb5.internal.crypto.ArcFourHmac.decrypt(Unknown Source)
... 38 more
Please help me...

If I remember correctly this error is thrown when the service ticket is decrypted with a different key as it had been encrypted.
ktpass /out c:\tomcat.keytab /mapuser tc01#DEV.LOCAL
/princ HTTP/win-tc01.dev.local#DEV.LOCAL
/pass tc01pass /kvno 0
as described at http://tomcat.apache.org/tomcat-7.0-doc/windows-auth-howto.html is only correct if you have a virgin account 'tc01'... AD will automatically increment key version number stored within AD when 'ktpass' is used consecutively.

Related

Spark 3 is failing when I try to execute a simple query

I have this table on Hive:
CREATE TABLE `mydb`.`raw_sales` (
`combustivel` STRING,
`regiao` STRING,
`estado` STRING,
`jan` STRING,
`fev` STRING,
`mar` STRING,
`abr` STRING,
`mai` STRING,
`jun` STRING,
`jul` STRING,
`ago` STRING,
`set` STRING,
`out` STRING,
`nov` STRING,
`dez` STRING,
`total` STRING,
`created_at` TIMESTAMP,
`ano` STRING)
USING orc
LOCATION 'hdfs://localhost:9000/jobs/etl/tables/raw_sales.orc'
TBLPROPERTIES (
'transient_lastDdlTime' = '1601322056',
'ORC.COMPRESS' = 'SNAPPY')
There is data on table, but when I try this query at bellow:
spark.sql("SELECT * FROM mydb.raw_sales WHERE ano = '2000' AND combustivel like '%GASOLINA%'").show()
It's crashing!
>>> spark.sql("SELECT * FROM mydb.raw_sales WHERE ano = 2000 AND combustivel like '%GASOLINA%'").show() [164/3679]
20/09/28 19:25:30 ERROR executor.Executor: Exception in task 0.0 in stage 61.0 (TID 133)
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.castLiteralValue(OrcFilters.scala:163)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:235)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:134)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:137)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:136)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:75)
at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189)
at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:491)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/09/28 19:25:30 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 61.0 (TID 133, 639773a482b8, executor driver): java.lang.ClassCastException: java.lang.String cannot be cast to java.l
ang.Number
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.castLiteralValue(OrcFilters.scala:163)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:235)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:134)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:137)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:136)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:75) [112/3679]
at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189)
at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:491)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/09/28 19:25:30 ERROR scheduler.TaskSetManager: Task 0 in stage 61.0 failed 1 times; aborting job
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark/current/python/pyspark/sql/dataframe.py", line 440, in show
print(self._jdf.showString(n, 20, vertical))
File "/opt/spark/current/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in __call__
File "/opt/spark/current/python/pyspark/sql/utils.py", line 128, in deco
return f(*a, **kw)
File "/opt/spark/current/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o122.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 61.0 failed 1 times, most recent failure: Lost task 0.0 in stage 61.0 (TID 133, 639773a482b8, executor dr
iver): java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.castLiteralValue(OrcFilters.scala:163)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:235)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:134)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:137)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:136)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:75)
at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189)
at scala.Option.map(Option.scala:230) [59/3679]
at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:491)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2059)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2008)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2007)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2007)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:973)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:973)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:973)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2239)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:775)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2120)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2139)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:467)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:420)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:47)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3627)
at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2697)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) [7/3679]
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2697)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2904)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:300)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:337)
at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.castLiteralValue(OrcFilters.scala:163)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:235)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFiltersHelper$1(OrcFilters.scala:134)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.$anonfun$convertibleFilters$4(OrcFilters.scala:137)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.convertibleFilters(OrcFilters.scala:136)
at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.createFilter(OrcFilters.scala:75)
at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4(OrcFileFormat.scala:189)
at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$4$adapted(OrcFileFormat.scala:188)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$1(OrcFileFormat.scala:188)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:116)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:169)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:93)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:491)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:340)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
However, this query at bellow works!
spark.sql("SELECT * FROM mydb.raw_sales WHERE ano = 2000").show()
And this works too!
spark.sql("SELECT combustivel FROM mydb.raw_sales WHERE combustivel like '%GASOLINA C%' ").show()
My environment
Hadoop 2.10.0
Hive 2.3.7
Tez 0.9.2
Spark 3.0.1
The results is the same for pyspark and Scala.
I'm completely lost here! Maybe you can help me!
Thanks!
I've changed the data of 'ano' field type to 'int'. Is not a ideal solution since even using 'cast' was not enough to resolve this issue! I have a good friend that is a monster on Spark that says that is possibly a bug on Spark. I'm not sure about that, but I would like that someone else corroborates this isue if possible!
Anyways, change data type to 'int' works!

Convert spark.read.parquet into Pandas DataFrame

I am working on converting snappy.parquet files into Pandas dataframe. I was able to load in all of my parquet files, but once I tried to convert it to Pandas, it failed. Please see the code below.
from pathlib import Path
import os
import pandas as pd
# Initiate findspark instance to run pyspark
import findspark
findspark.init()
# Importing PySpark
import pyspark
# Create a spark session
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
parquetFile_all = spark.read.parquet("Resources")
display(parquetFile_all)
parquetFile_all.count()
The output of the last line of code is: 12216053, which Pandas should be able to handle.
Once I ran the following code:
file_output_all = spark.sql("SELECT * FROM parquetFile_all")
all_files_df = file_output_all.select("*").toPandas()
That's where it breaks with the following error:
Py4JJavaError Traceback (most recent call last)
<ipython-input-11-83baa41f2e6a> in <module>
1 # Convert to Pandas df
----> 2 all_files_df = file_output_all.select("*").toPandas()
/usr/local/Cellar/apache-spark/2.4.4/libexec/python/pyspark/sql/dataframe.py in toPandas(self)
2141
2142 # Below is toPandas without Arrow optimization.
-> 2143 pdf = pd.DataFrame.from_records(self.collect(), columns=self.columns)
2144
2145 dtype = {}
/usr/local/Cellar/apache-spark/2.4.4/libexec/python/pyspark/sql/dataframe.py in collect(self)
532 """
533 with SCCallSiteSync(self._sc) as css:
--> 534 sock_info = self._jdf.collectToPython()
535 return list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer())))
536
/usr/local/Cellar/apache-spark/2.4.4/libexec/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __call__(self, *args)
1255 answer = self.gateway_client.send_command(command)
1256 return_value = get_return_value(
-> 1257 answer, self.gateway_client, self.target_id, self.name)
1258
1259 for temp_arg in temp_args:
/usr/local/Cellar/apache-spark/2.4.4/libexec/python/pyspark/sql/utils.py in deco(*a, **kw)
61 def deco(*a, **kw):
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
65 s = e.java_exception.toString()
/usr/local/Cellar/apache-spark/2.4.4/libexec/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
Py4JJavaError: An error occurred while calling o45.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 9.0 failed 1 times, most recent failure: Lost task 2.0 in stage 9.0 (TID 335, localhost, executor driver): java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:220)
at net.jpountz.lz4.LZ4BlockOutputStream.write(LZ4BlockOutputStream.java:173)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.spark.sql.catalyst.expressions.UnsafeRow.writeToStream(UnsafeRow.java:554)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:258)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:299)
at org.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:3263)
at org.apache.spark.sql.Dataset$$anonfun$collectToPython$1.apply(Dataset.scala:3260)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3260)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:220)
at net.jpountz.lz4.LZ4BlockOutputStream.write(LZ4BlockOutputStream.java:173)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.spark.sql.catalyst.expressions.UnsafeRow.writeToStream(UnsafeRow.java:554)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:258)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more```

Insertion into hive ORC format table from other hive formats cannot be selected from

After creating a new hive external table in ORC format which is bucketed and inserting from another table (with the exact schema) but in Avro format (and non-bucketed) when selecting from the new table there are many errors. I put the stack of the errors here (some of them are repeated and had to delete from the end due to lack of space):
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1572541024266_0020_1_00, diagnostics=[Task failed, taskId=task_1572541024266_0020_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1572541024266_0020_1_00_000000_0:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: Error reading file: /path/000003_0
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: Error reading file: path/000003_0
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
... 14 more
Caused by: java.io.IOException: java.io.IOException: Error reading file: path/000003_0
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62)
... 16 more
Caused by: java.io.IOException: Error reading file: /path/000003_0
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1191)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:93)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:238)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:213)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
... 22 more
Caused by: java.io.EOFException: Read past EOF for compressed stream Stream for column 44 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
at org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:119)
at org.apache.orc.impl.SerializationUtils.readLongLE(SerializationUtils.java:102)
at org.apache.orc.impl.SerializationUtils.readDouble(SerializationUtils.java:98)
at org.apache.orc.impl.TreeReaderFactory$DoubleTreeReader.nextVector(TreeReaderFactory.java:762)
at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:1833)
at org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:2001)
at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1815)
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1184)
... 27 more
], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : attempt_1572541024266_0020_1_00_000000_1:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: Error reading file: /path/000003_0
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: Error reading file: /path/000003_0
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
... 14 more
Caused by: java.io.IOException: java.io.IOException: Error reading file: /path/000003_0
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62)
... 16 more
Caused by: java.io.IOException: Error reading file: /path/000003_0
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1191)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:93)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:238)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:213)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
... 22 more
Caused by: java.io.EOFException: Read past EOF for compressed stream Stream for column 44 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
at org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:119)
at org.apache.orc.impl.SerializationUtils.readLongLE(SerializationUtils.java:102)
at org.apache.orc.impl.SerializationUtils.readDouble(SerializationUtils.java:98)
at org.apache.orc.impl.TreeReaderFactory$DoubleTreeReader.nextVector(TreeReaderFactory.java:762)
at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:1833)
at org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:2001)
at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1815)
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1184)
... 27 more
], TaskAttempt 2 failed, info=[Error: Error while running task ( failure ) : attempt_1572541024266_0020_1_00_000000_2:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: Error reading file: /path/000003_0
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: Error reading file: /path/000003_0
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
... 14 more
Caused by: java.io.IOException: java.io.IOException: Error reading file: /path/000003_0
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62)
... 16 more
Caused by: java.io.IOException: Error reading file: /path/000003_0
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1191)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:93)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:238)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:213)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
... 22 more
Caused by: java.io.EOFException: Read past EOF for compressed stream Stream for column 44 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
at org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:119)
at org.apache.orc.impl.SerializationUtils.readLongLE(SerializationUtils.java:102)
at org.apache.orc.impl.SerializationUtils.readDouble(SerializationUtils.java:98)
at org.apache.orc.impl.TreeReaderFactory$DoubleTreeReader.nextVector(TreeReaderFactory.java:762)
at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:1833)
at org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:2001)
at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1815)
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1184)
... 27 more
], TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : attempt_1572541024266_0020_1_00_000000_3:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: Error reading file: /path/000003_0
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: Error reading file: /path/000003_0
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
... 14 more
Caused by: java.io.IOException: java.io.IOException: Error reading file: /path/000003_0
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62)
... 16 more
Caused by: java.io.IOException: Error reading file: /path/000003_0
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1191)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:93)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:238)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:213)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
... 22 more
Caused by: java.io.EOFException: Read past EOF for compressed stream Stream for column 44 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
at org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:119)
at org.apache.orc.impl.SerializationUtils.readLongLE(SerializationUtils.java:102)
at org.apache.orc.impl.SerializationUtils.readDouble(SerializationUtils.java:98)
at org.apache.orc.impl.TreeReaderFactory$DoubleTreeReader.nextVector(TreeReaderFactory.java:762)
at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:1833)
at org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:2001)
at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1815)
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1184)
... 27 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1572541024266_0020_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1572541024266_0020_1_00, diagnostics=[Task failed, taskId=task_1572541024266_0020_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1572541024266_0020_1_00_000000_0:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: Error reading file: /path/000003_0
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: Error reading file: /path/000003_0
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
... 14 more
Caused by: java.io.IOException: java.io.IOException: Error reading file: /path/000003_0
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62)
... 16 more
Caused by: java.io.IOException: Error reading file: /path/000003_0
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1191)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:93)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:238)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:213)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
... 22 more
Caused by: java.io.EOFException: Read past EOF for compressed stream Stream for column 44 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
at org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:119)
at org.apache.orc.impl.SerializationUtils.readLongLE(SerializationUtils.java:102)
at org.apache.orc.impl.SerializationUtils.readDouble(SerializationUtils.java:98)
at org.apache.orc.impl.TreeReaderFactory$DoubleTreeReader.nextVector(TreeReaderFactory.java:762)
at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:1833)
at org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:2001)
at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1815)
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1184)
... 27 more
], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : attempt_1572541024266_0020_1_00_000000_1:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: Error reading file: /path/000003_0
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: Error reading file: /path/000003_0
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
... 14 more
Caused by: java.io.IOException: java.io.IOException: Error reading file: /path/000003_0
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62)
... 16 more
Caused by: java.io.IOException: Error reading file:/path/000003_0
at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1191)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:93)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:238)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:213)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
... 22 more
Any suggestion on how to resolve this issue?

Kerberos SSO with weblogic

I have managed to configure the weblogic for the SSO with windows AD, however there are several questions on which I need clarity
1) When I access the application from my browser with the apache web server sitting in between, why is the weblogic requesting for a TGT with the SPN everytime(I can see that in weblogic console), even if it wants to get authenticated with the KDC this should have happened only once during start up and not with every request from same browser.
Theoretically Weblogic should never contact the KDC to validate the existing user's TGT.
2) If the same session key provided by the KDC is used between client and weblogic server for secure communication, they would never require to hit KDC in between unless the session key expires, in which case they also have an option to renew it, so a TGT is never required to be created for each request from browser to weblogic. Is it correct.
Weblogic console logs->
Found ticket for HTTP/APPDEV2011.domain.com#DOMAIN.COM to go to krbtgt/DOMAIN.COM#DOMAIN.COM expiring on Fri May 11 21:06:46 CDT 2018
Debug is true storeKey true useTicketCache true useKeyTab true doNotPrompt true ticketCache is null isInitiator true KeyTab is http_weblogic_test.keytab refreshKrb5Config is false principal is HTTP/APPDEV2011.domain.com#DOMAIN.COM tryFirstPass is false useFirstPass is false storePass is false clearPass is false
Acquire TGT from Cache
KinitOptions cache name is D:\Users\ayadav.DOMAIN.000\krb5cc_ayadav
Acquire default native Credentials
default etypes for default_tkt_enctypes: 17 23.
LSA contains TGT for ayadav#DOMAIN.COM not HTTP/APPDEV2011.domain.com#DOMAIN.COM
Principal is HTTP/APPDEV2011.domain.com#DOMAIN.COM
null credentials from Ticket Cache
Looking for keys for: HTTP/APPDEV2011.domain.com#DOMAIN.COM
Added key: 17version: 14
Added key: 18version: 14
Added key: 23version: 14
Found unsupported keytype (3) for HTTP/APPDEV2011.domain.com#DOMAIN.COM
Found unsupported keytype (1) for HTTP/APPDEV2011.domain.com#DOMAIN.COM
Looking for keys for: HTTP/APPDEV2011.domain.com#DOMAIN.COM
Added key: 17version: 14
Added key: 18version: 14
Added key: 23version: 14
Found unsupported keytype (3) for HTTP/APPDEV2011.domain.com#DOMAIN.COM
Found unsupported keytype (1) for HTTP/APPDEV2011.domain.com#DOMAIN.COM
default etypes for default_tkt_enctypes: 17 23.
KrbAsReq creating message
KrbKdcReq send: kdc=wcosp-dc01.domain.com UDP:88, timeout=30000, number of retries =3, #bytes=163
KDCCommunication: kdc=wcosp-dc01.domain.com UDP:88, timeout=30000,Attempt =1, #bytes=163
KrbKdcReq send: #bytes read=207
Pre-Authentication Data:
PA-DATA type = 19
PA-ETYPE-INFO2 etype = 17, salt = DOMAIN.COMHTTPAPPDEV2011.domain.com, s2kparams = null
PA-ETYPE-INFO2 etype = 23, salt = null, s2kparams = null
Pre-Authentication Data:
PA-DATA type = 2
PA-ENC-TIMESTAMP
Pre-Authentication Data:
PA-DATA type = 16
Pre-Authentication Data:
PA-DATA type = 15
KdcAccessibility: remove wcosp-dc01.domain.com
KDCRep: init() encoding tag is 126 req type is 11
KRBError:
sTime is Fri May 11 11:06:46 CDT 2018 1526054806000
suSec is 633784
error code is 25
error Message is Additional pre-authentication required
sname is krbtgt/DOMAIN.COM#DOMAIN.COM
eData provided.
msgType is 30
Pre-Authentication Data:
PA-DATA type = 19
PA-ETYPE-INFO2 etype = 17, salt = DOMAIN.COMHTTPAPPDEV2011.domain.com, s2kparams = null
PA-ETYPE-INFO2 etype = 23, salt = null, s2kparams = null
Pre-Authentication Data:
PA-DATA type = 2
PA-ENC-TIMESTAMP
Pre-Authentication Data:
PA-DATA type = 16
Pre-Authentication Data:
PA-DATA type = 15
KrbAsReqBuilder: PREAUTH FAILED/REQ, re-send AS-REQ
default etypes for default_tkt_enctypes: 17 23.
Looking for keys for: HTTP/APPDEV2011.domain.com#DOMAIN.COM
Added key: 17version: 14
Added key: 18version: 14
Added key: 23version: 14
Found unsupported keytype (3) for HTTP/APPDEV2011.domain.com#DOMAIN.COM
Found unsupported keytype (1) for HTTP/APPDEV2011.domain.com#DOMAIN.COM
Looking for keys for: HTTP/APPDEV2011.domain.com#DOMAIN.COM
Added key: 17version: 14
Added key: 18version: 14
Added key: 23version: 14
Found unsupported keytype (3) for HTTP/APPDEV2011.domain.com#DOMAIN.COM
Found unsupported keytype (1) for HTTP/APPDEV2011.domain.com#DOMAIN.COM
default etypes for default_tkt_enctypes: 17 23.
EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType
KrbAsReq creating message
KrbKdcReq send: kdc=wcosp-dc01.domain.com UDP:88, timeout=30000, number of retries =3, #bytes=250
KDCCommunication: kdc=wcosp-dc01.domain.com UDP:88, timeout=30000,Attempt =1, #bytes=250
KrbKdcReq send: #bytes read=96
KrbKdcReq send: kdc=wcosp-dc01.domain.com TCP:88, timeout=30000, number of retries =3, #bytes=250
KDCCommunication: kdc=wcosp-dc01.domain.com TCP:88, timeout=30000,Attempt =1, #bytes=250
DEBUG: TCPClient reading 1602 bytes
KrbKdcReq send: #bytes read=1602
KdcAccessibility: remove wcosp-dc01.domain.com
Looking for keys for: HTTP/APPDEV2011.domain.com#DOMAIN.COM
Added key: 17version: 14
Added key: 18version: 14
Added key: 23version: 14
Found unsupported keytype (3) for HTTP/APPDEV2011.domain.com#DOMAIN.COM
Found unsupported keytype (1) for HTTP/APPDEV2011.domain.com#DOMAIN.COM
EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType
KrbAsRep cons in KrbAsReq.getReply HTTP/APPDEV2011.domain.com
principal is HTTP/APPDEV2011.domain.com#DOMAIN.COM
Will use keytab
Commit Succeeded
>
Thanks

Weblogic + Kerberos + SSO

I’m trying to configure Single Sign On with weblogic and Kerberos.
So, but I still get login page, may be you can tell me what is wrong by this log:
Debug is true storeKey true useTicketCache false useKeyTab true doNotPrompt false ticketCache is null isInitiator true KeyTab is /oracle/product12/user_projects/domains/test/krb/test.keytab refreshKrb5Config is false principal is kinp#TEST.ORG tryFirstPass is false useFirstPass is false storePass is false clearPass is false
KeyTab instance already exists
Added key: 23version: 19
Ordering keys wrt default_tkt_enctypes list
default etypes for default_tkt_enctypes: 23 3.
0: EncryptionKey: keyType=23 kvno=19 keyValue (hex dump)=
0000: C3 CB 19 1C 64 6E F9 7F 6A C9 31 FB EE 69 E7 35 ....dn..j.1..i.5
principal's key obtained from the keytab
Acquire TGT using AS Exchange
default etypes for default_tkt_enctypes: 23 3.
>>> KrbAsReq calling createMessage
>>> KrbAsReq in createMessage
>>> KrbKdcReq send: kdc=192.168.0.100 UDP:88, timeout=30000, number of retries =3, #bytes=137
>>> KDCCommunication: kdc=192.168.0.100 UDP:88, timeout=30000,Attempt =1, #bytes=137
>>> KrbKdcReq send: #bytes read=181
>>> KrbKdcReq send: #bytes read=181
>>> KdcAccessibility: remove 192.168.0.100
>>> KDCRep: init() encoding tag is 126 req type is 11
>>>KRBError:
sTime is Tue Jan 20 10:46:05 EET 2015 1421743565000
suSec is 576578
error code is 25
error Message is Additional pre-authentication required
realm is TEST.ORG
sname is krbtgt/TEST.ORG
eData provided.
msgType is 30
>>>Pre-Authentication Data:
PA-DATA type = 11
PA-ETYPE-INFO etype = 23
PA-ETYPE-INFO salt =
>>>Pre-Authentication Data:
PA-DATA type = 19
PA-ETYPE-INFO2 etype = 23
PA-ETYPE-INFO2 salt = null
>>>Pre-Authentication Data:
PA-DATA type = 2
PA-ENC-TIMESTAMP
>>>Pre-Authentication Data:
PA-DATA type = 16
>>>Pre-Authentication Data:
PA-DATA type = 15
AcquireTGT: PREAUTH FAILED/REQUIRED, re-send AS-REQ
>>>KrbAsReq salt is TEST.ORGdev
default etypes for default_tkt_enctypes: 23 3.
Pre-Authenticaton: find key for etype = 23
AS-REQ: Add PA_ENC_TIMESTAMP now
>>> EType: sun.security.krb5.internal.crypto.ArcFourHmacEType
>>> KrbAsReq calling createMessage
>>> KrbAsReq in createMessage
>>> KrbKdcReq send: kdc=192.168.0.100 UDP:88, timeout=30000, number of retries =3, #bytes=220
>>> KDCCommunication: kdc=192.168.0.100 UDP:88, timeout=30000,Attempt =1, #bytes=220
>>> KrbKdcReq send: #bytes read=1408
>>> KrbKdcReq send: #bytes read=1408
>>> KdcAccessibility: remove 192.168.0.100
>>> EType: sun.security.krb5.internal.crypto.ArcFourHmacEType
>>> KrbAsRep cons in KrbAsReq.getReply dev
principal is dev#TEST.ORG
EncryptionKey: keyType=23 keyBytes (hex dump)=0000: C3 CB 19 1C 64 6E F9 7F 6A C9 31 FB EE 69 E7 35 ....dn..j.1..i.5
Added server's keyKerberos Principal dev#TEST.ORGKey Version 19key EncryptionKey: keyType=23 keyBytes (hex dump)=
0000: C3 CB 19 1C 64 6E F9 7F 6A C9 31 FB EE 69 E7 35 ....dn..j.1..i.5
[Krb5LoginModule] added Krb5Principal dev#TEST.ORG to Subject
Commit Succeeded
Found key for dev#TEST.ORG(23)
Entered Krb5Context.acceptSecContext with state=STATE_NEW
I get this log, when I’m trying to access login page.
Error exception:
com.bea.security.utils.kerberos.KerberosException: Failure unspecified at GSS-API level (Mechanism level: Specified version of key is not available (44))
at com.bea.security.utils.kerberos.KerberosTokenHandler.acceptGssInitContextTokenInDoAs(KerberosTokenHandler.java:334)
at com.bea.security.utils.kerberos.KerberosTokenHandler.access$000(KerberosTokenHandler.java:41)
at com.bea.security.utils.kerberos.KerberosTokenHandler$1.run(KerberosTokenHandler.java:226)
...
Caused By: GSSException: Failure unspecified at GSS-API level (Mechanism level: Specified version of key is not available (44))
at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:741)
at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:323)
at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:267)
...
Caused By: KrbException: Specified version of key is not available (44)
at sun.security.krb5.EncryptionKey.findKey(EncryptionKey.java:516)
at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:260)
at sun.security.krb5.KrbApReq.<init>(KrbApReq.java:134)
...
Thanks!
Can't post comment, posting this as an answer. You need to enable Weblogic's authentication logging:
In Weblogic console click the “Lock & Edit” button in the top left corner.
Select Environment – Servers in the Domain Structure portlet on the left.
Select your server on the Summary of Servers page.
Select the “Debug” tab.
Drill down to weblogic – security – atn.
Select the checkbox to the left of word DebugSecurityAtn.
Click the “Enable” button at the top or bottom of the page.
Go to your server again, click on Logging tab,
Scroll down and click on Advanced
In "Message destination(s) - Log file" change the severity level to Debug
Click the “Save” button at the top or bottom of the page.
Click “Activate changes” in the top-left corner.
After that try logging in again, you will have much more info in your log.