Sqoop: Avro with Gzip Codec failing - gzip

When trying to import tables to HDFS using Sqoop with --as-avrodatafile and GzipCodec, it is failing with below exception, I'm running this CDH7 Cloudera quickstart docker image
Is there a reason we cannot use Gzip with Avro or is it some missing configuration that is causing this.
Note: Gzip works when writing without the --as-avrodatafile switch
Error: org.apache.avro.AvroRuntimeException: Unrecognized codec: gzip
at org.apache.avro.file.CodecFactory.fromString(CodecFactory.java:102)
at org.apache.sqoop.mapreduce.AvroOutputFormat.configureDataFileWriter(AvroOutputFormat.java:63)
at org.apache.sqoop.mapreduce.AvroOutputFormat.getRecordWriter(AvroOutputFormat.java:102)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

From the Avro CodecFactory
/** Maps a codec name into a CodecFactory.
*
* Currently there are five codecs registered by default:
* <ul>
* <li>{#code null}</li>
* <li>{#code deflate}</li>
* <li>{#code snappy}</li>
* <li>{#code bzip2}</li>
* <li>{#code xz}</li>
* </ul>
*/
So gzip is supported for other output formats in sqoop but not for avro.

Related

Databricks <-> Kafka - SSL handshake failed

Below is the error we have received when trying to read the stream
Caused by: kafkashaded.org.apache.kafka.common.KafkaException: Failed to load SSL keystore /dbfs/FileStore/Certs/client.keystore.jks
Caused by: java.nio.file.NoSuchFileException: /dbfs/FileStore/Certs/client.keyst
When trying to read a stream from Kafka, Databricks is unable to find keystore files.
df = spark.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers","kafka server with port") \
.option("kafka.security.protocol", "SSL") \
.option("kafka.ssl.truststore.location",'/dbfs/FileStore/Certs/client.truststore.jks' ) \
.option("kafka.ssl.keystore.location", '/dbfs/FileStore/Certs/client.keystore.jks') \
.option("kafka.ssl.keystore.password", keystore_pass) \
.option("kafka.ssl.truststore.password", truststore_pass) \
.option("kafka.ssl.keystore.type", "JKS") \
.option("kafka.ssl.truststore.type", "JKS") \
.option("subscribe","sports") \
.option("startingOffsets", "earliest") \
.load()
The file exists in the dbfs and also able to read the file.
We have also mounted the blob storage in datrbicks and tried to read the files from ADLS gen2.
The driver logs also has additional error: 22/11/04 12:18:07 ERROR DefaultSslEngineFactory: Modification time of key store could not be obtained.
We are trying to read a kafka stream by authenticating it using SSL keystores.
The connection doesn't seem to work as databricks is unable to view the keystores
I was able to access the key-store files by adding dbfs prefix to the original path.
so, instead of using the path /dbfs/FileStore/Certs/client.truststore.jks, I used /dbfs/dbfs/FileStore/Certs/client.truststore.jks. enter code here
But I am now receiving SSL handshake error even though the trust-store I have created is based on server certificate and the fingerprint in the certificate matches the trust-store fingerprint.
kafkashaded.org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake failed Caused by: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: signature check failed Caused by: java.security.cert.CertPathValidatorException: signature check failed Caused by: java.security.SignatureException: Signature does not match.

How to resolve Error while building Passthrough stream java.util.zip.ZipException in WSo2 Gateway?

I am getting the following error on the WSo2 Gateway 2.1.0 when trying to pass a response back to WSo2 EI that originated an API call on APIM and I am not sure what the issue is:
[2021-05-06 15:19:26,401] ERROR org.apache.synapse.transport.passthru.util.RelayUtils:344 - Error while building Passthrough stream
java.util.zip.ZipException: Not in GZIP format
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:165)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:79)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:91)
at org.apache.axis2.transport.http.HTTPTransportUtils.handleGZip(HTTPTransportUtils.java:262)
at org.apache.synapse.transport.passthru.util.DeferredMessageBuilder.getDocument(DeferredMessageBuilder.java:95)
at org.apache.synapse.transport.passthru.util.RelayUtils.builldMessage(RelayUtils.java:151)
at org.apache.synapse.transport.passthru.util.RelayUtils.buildMessage(RelayUtils.java:114)
at org.apache.synapse.transport.passthru.util.RelayUtils.buildMessage(RelayUtils.java:78)
at org.wso2.carbon.apimgt.gateway.handlers.LogsHandler.buildResponseMessage_aroundBody16(LogsHandler.java:264)
at org.wso2.carbon.apimgt.gateway.handlers.LogsHandler.buildResponseMessage(LogsHandler.java:254)
at org.wso2.carbon.apimgt.gateway.handlers.LogsHandler.handleResponseInFlow_aroundBody6(LogsHandler.java:141)
at org.wso2.carbon.apimgt.gateway.handlers.LogsHandler.handleResponseInFlow(LogsHandler.java:131)
at org.apache.synapse.core.axis2.Axis2SynapseEnvironment.invokeHandlers(Axis2SynapseEnvironment.java:1077)
at org.apache.synapse.core.axis2.Axis2SynapseEnvironment.injectMessage(Axis2SynapseEnvironment.java:242)
at org.apache.synapse.core.axis2.SynapseCallbackReceiver.handleMessage(SynapseCallbackReceiver.java:556)
at org.apache.synapse.core.axis2.SynapseCallbackReceiver.receive(SynapseCallbackReceiver.java:186)
at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
at org.apache.synapse.transport.passthru.ClientWorker.run(ClientWorker.java:265)
at org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(NativeWorkerPool.java:172)
at datadog.trace.bootstrap.instrumentation.java.concurrent.Wrapper.run(Wrapper.java:25)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```
There can be some possible scenarios for the above issue.
Content-Encoding: gzip header comes in the response header but the content of the response is not in GZIP format. You can verify that by enabling the wire logs.
You can get rid of this behavior by removing the Content-Encoding header using header mediator as shown below before calling the EI endpoint.
<header action="remove" name="Content-Encoding" scope="transport"/>
The client(according to your case it's EI) is expecting the content encoded as gzip while the APIM is serving it deflated or some other way.
So in such a case, You can use the below property to send the response in gzip format
<property name="Content-Encoding" value="gzip" scope="transport"/>

Can't connect to S3 in PrestoDB: Unable to load credentials from service endpoint

I am connecting to S3 Buckets to Apache Hive so that I can query the Parquet files in S3 directly through PrestoDB. I am using HDP VM for PrestoDB by Teradata.
For this, I configured the hive-site.xml file and added my AWS Access Key and Secret Key in the /etc/hive/conf/hive-site.xml file like:
<property>
<name>hive.s3.aws-access-key</name>
<value>something</value>
</property>
<property>
<name>hive.s3.aws-secret-key</name>
<value>some-other-thing</value>
</property>
Now, my S3 Bucket URL path where Parquet files reside looks like:
https://s3.console.aws.amazon.com/s3/buckets/sb.mycompany.com/someFolder/anotherFolder/?region=us-east-2&tab=overview
While Create an external table, I gave the location for S3 in the query as:
CREATE TABLE hive.project.data (... schema ...)
WITH ( format = 'PARQUET',
external_location = 's3://sb.mycompany.com/someFolder/anotherFolder/?region=us-east-2&tab=overview')
The Apache Hive isn't able to connect to S3 Buckets and giving this error with --debug flag:
Query 20180316_112407_00005_aj9x6 failed: Unable to load credentials from service endpoint
========= TECHNICAL DETAILS =========
[ Error message ]
Unable to load credentials from service endpoint
[ Session information ]
ClientSession{server=http://localhost:8080, user=presto, clientInfo=null, catalog=null, schema=null, timeZone=Zulu, locale=en_US, properties={}, transactionId=null, debug=true, quiet=false}
[ Stack trace ]
com.amazonaws.AmazonClientException: Unable to load credentials from service endpoint
at com.amazonaws.auth.EC2CredentialsFetcher.handleError(EC2CredentialsFetcher.java:180)
at com.amazonaws.auth.EC2CredentialsFetcher.fetchCredentials(EC2CredentialsFetcher.java:159)
at com.amazonaws.auth.EC2CredentialsFetcher.getCredentials(EC2CredentialsFetcher.java:82)
at com.amazonaws.auth.InstanceProfileCredentialsProvider.getCredentials(InstanceProfileCredentialsProvider.java:104)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4016)
at com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:4478)
at com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:4452)
at com.amazonaws.services.s3.AmazonS3Client.resolveServiceEndpoint(AmazonS3Client.java:4426)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1167)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1152)
at com.facebook.presto.hive.PrestoS3FileSystem.lambda$getS3ObjectMetadata$2(PrestoS3FileSystem.java:552)
at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:138)
at com.facebook.presto.hive.PrestoS3FileSystem.getS3ObjectMetadata(PrestoS3FileSystem.java:549)
at com.facebook.presto.hive.PrestoS3FileSystem.getFileStatus(PrestoS3FileSystem.java:305)
at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1439)
at com.facebook.presto.hive.HiveMetadata.getExternalPath(HiveMetadata.java:719)
at com.facebook.presto.hive.HiveMetadata.createTable(HiveMetadata.java:690)
at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.createTable(ClassLoaderSafeConnectorMetadata.java:218)
at com.facebook.presto.metadata.MetadataManager.createTable(MetadataManager.java:505)
at com.facebook.presto.execution.CreateTableTask.execute(CreateTableTask.java:148)
at com.facebook.presto.execution.CreateTableTask.execute(CreateTableTask.java:57)
at com.facebook.presto.execution.DataDefinitionExecution.start(DataDefinitionExecution.java:111)
at com.facebook.presto.execution.QueuedExecution.lambda$start$1(QueuedExecution.java:63)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Network is unreachable
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933)
at com.amazonaws.internal.ConnectionUtils.connectToEndpoint(ConnectionUtils.java:47)
at com.amazonaws.internal.EC2CredentialsUtils.readResource(EC2CredentialsUtils.java:106)
at com.amazonaws.internal.EC2CredentialsUtils.readResource(EC2CredentialsUtils.java:77)
at com.amazonaws.auth.InstanceProfileCredentialsProvider$InstanceMetadataCredentialsEndpointProvider.getCredentialsEndpoint(InstanceProfileCredentialsProvider.java:117)
at com.amazonaws.auth.EC2CredentialsFetcher.fetchCredentials(EC2CredentialsFetcher.java:121)
... 24 more
========= TECHNICAL DETAILS END =========
I even restarted my PrestDB server after I added the Keys. Next, I tried adding my properties to /home/presto/.prestoadmin/catalog/hive.properties:
connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083
hive.allow-drop-table=true
hive.allow-rename-table=true
hive.time-zone=UTC
hive.metastore-cache-ttl=0s
hive.s3.use-instance-credentials=false
hive.s3.aws-access-key=something
hive.s3.aws-secret-key=some-other-thing
Again restarted the PrestoDB server but still the same issue.
I then modified the S3 Bucket location in query with bucket name only:
external_location = 's3://sb.mycompany.com'
And with s3a scheme as well:
external_location = 's3a://sb.mycompany.com'
But the same issue is still present. What am I doing wrong?
This is embarrassing. On the VM I was using, there were issues with the network adapter and so, the VM wasn't able to connect to the Internet. I corrected the adapter and it is working now.

How to configure Apache NiFi for a Kerberized Hadoop Cluster

I have Apache NiFi running standalone and its working fine. But, when I am trying to setup Apache NiFi to access Hive or HDFS Kerberized Cloudera Hadoop Cluster. I am getting issues.
Can someone guide me on the documentation for Setting HDFS/Hive/HBase (with Kerberos)
Here is the configuration I gave in nifi.properties
# kerberos #
nifi.kerberos.krb5.file=/etc/krb5.conf
nifi.kerberos.service.principal=pseeram#JUNIPER.COM
nifi.kerberos.keytab.location=/uhome/pseeram/learning/pseeram.keytab
nifi.kerberos.authentication.expiration=10 hours
I referenced various links like, but none of those are helpful.
(Since the below link said it had issues in NiFi 0.7.1 version, I tried NiFi 1.1.0 version. I had the same bitter experience)
https://community.hortonworks.com/questions/62014/nifi-hive-connection-pool-error.html
https://community.hortonworks.com/articles/4103/hiveserver2-jdbc-connection-url-examples.html
Here are the errors I am getting logs:
ERROR [Timer-Driven Process Thread-7] o.a.nifi.processors.hive.SelectHiveQL
org.apache.nifi.processor.exception.ProcessException: org.apache.commons.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (Could not open client transport with JDBC Uri: jdbc:hive2://ddas1106a:10000/innovate: Peer indicated failure: Unsupported mechanism type PLAIN)
at org.apache.nifi.dbcp.hive.HiveConnectionPool.getConnection(HiveConnectionPool.java:292) ~[nifi-hive-processors-1.1.0.jar:1.1.0]
at sun.reflect.GeneratedMethodAccessor191.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_51]
at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_51]
at org.apache.nifi.controller.service.StandardControllerServiceProvider$1.invoke(StandardControllerServiceProvider.java:177) ~[na:na]
at com.sun.proxy.$Proxy83.getConnection(Unknown Source) ~[na:na]
at org.apache.nifi.processors.hive.SelectHiveQL.onTrigger(SelectHiveQL.java:158) ~[nifi-hive-processors-1.1.0.jar:1.1.0]
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) [nifi-api-1.1.0.jar:1.1.0]
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1099) [nifi-framework-core-1.1.0.jar:1.1.0]
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) [nifi-framework-core-1.1.0.jar:1.1.0]
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) [nifi-framework-core-1.1.0.jar:1.1.0]
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) [nifi-framework-core-1.1.0.jar:1.1.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_51]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_51]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_51]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_51]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_51]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_51]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
Caused by: org.apache.commons.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (Could not open client transport with JDBC Uri: jdbc:hive2://ddas1106a:10000/innovate: Peer indicated failure: Unsupported mechanism type PLAIN)
at org.apache.commons.dbcp.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:1549) ~[commons-dbcp-1.4.jar:1.4]
at org.apache.commons.dbcp.BasicDataSource.createDataSource(BasicDataSource.java:1388) ~[commons-dbcp-1.4.jar:1.4]
at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044) ~[commons-dbcp-1.4.jar:1.4]
at org.apache.nifi.dbcp.hive.HiveConnectionPool.getConnection(HiveConnectionPool.java:288) ~[nifi-hive-processors-1.1.0.jar:1.1.0]
... 18 common frames omitted
Caused by: java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://ddas1106a:10000/innovate: Peer indicated failure: Unsupported mechanism type PLAIN
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:231) ~[hive-jdbc-1.2.1.jar:1.2.1]
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:176) ~[hive-jdbc-1.2.1.jar:1.2.1]
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) ~[hive-jdbc-1.2.1.jar:1.2.1]
at org.apache.commons.dbcp.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:38) ~[commons-dbcp-1.4.jar:1.4]
at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582) ~[commons-dbcp-1.4.jar:1.4]
at org.apache.commons.dbcp.BasicDataSource.validateConnectionFactory(BasicDataSource.java:1556) ~[commons-dbcp-1.4.jar:1.4]
at org.apache.commons.dbcp.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:1545) ~[commons-dbcp-1.4.jar:1.4]
... 21 common frames omitted
Caused by: org.apache.thrift.transport.TTransportException: Peer indicated failure: Unsupported mechanism type PLAIN
at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:199) ~[hive-exec-1.2.1.jar:1.2.1]
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:307) ~[hive-exec-1.2.1.jar:1.2.1]
at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) ~[hive-exec-1.2.1.jar:1.2.1]
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:204) ~[hive-jdbc-1.2.1.jar:1.2.1]
... 27 common frames omitted
WARN [NiFi Web Server-29] o.a.nifi.dbcp.hive.HiveConnectionPool HiveConnectionPool[id=278beb67-0159-1000-cffa-8c8534c285c8] Configuration does not have security enabled, Keytab and Principal will be ignored
What you've added in nifi.properties file is useful for Kerberizing nifi cluster. In order to access kerberized hadoop cluster, you need to provide appropriate config files and keytabs in NiFi's HDFS processor.
For example, if you are using putHDFS to write to a Hadoop cluster:
Hadoop Configuration Resources : paths to core-site.xml and hdfs-site.xml
Kerberos Principal: Your principal to access hadoop cluster
kerberos keytab: Path to keytab generated using krb5.conf of hadoop cluster. nifi.kerberos.krb5.file in nifi.properties must be pointed to appropriate krb5.conf file.
Immaterial of whether NiFi is inside kerberized hadoop cluster or not, this post might be useful.
https://community.hortonworks.com/questions/84659/how-to-use-apache-nifi-on-kerberized-hdp-cluster-n.html

Hive - Out of Memory Exception

which results in MR job. The MR job runs successfully, but when beeswax tries to render the result then I get an OOM Exception.
I was wondering if there is a configuration setting to help me get passed this issue.
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:203)
at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:561)
at java.nio.CharBuffer.toString(CharBuffer.java:1201)
at org.apache.hadoop.io.Text.decode(Text.java:394)
at org.apache.hadoop.io.Text.decode(Text.java:371)
at org.apache.hadoop.io.Text.toString(Text.java:273)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:280)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:220)
at org.apache.hadoop.hive.serde2.DelimitedJSONSerDe.serializeField(DelimitedJSONSerDe.java:59)
at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:427)
at org.apache.hadoop.hive.ql.exec.ListSinkOperator.processOp(ListSinkOperator.java:91)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:498)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1474)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.materializeResults(BeeswaxServiceImpl.java:434)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.fetch(BeeswaxServiceImpl.java:543)
at com.cloudera.beeswax.BeeswaxServiceImpl$5.run(BeeswaxServiceImpl.java:986)
at com.cloudera.beeswax.BeeswaxServiceImpl$5.run(BeeswaxServiceImpl.java:981)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at com.cloudera.beeswax.BeeswaxServiceImpl.doWithState(BeeswaxServiceImpl.java:772)
at com.cloudera.beeswax.BeeswaxServiceImpl.fetch(BeeswaxServiceImpl.java:980)
at com.cloudera.beeswax.api.BeeswaxService$Processor$fetch.getResult(BeeswaxService.java:987)
at com.cloudera.beeswax.api.BeeswaxService$Processor$fetch.getResult(BeeswaxService.java:971)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Update
I increased the memory settings in cloudera manager but no cigar. After restarting the service the first time I run the query it works. The second time I run it fails:
Hue - Beeswax Server (Default) / Resource Management - Java Heap Size
of Beeswax Server in Bytes [1 Gib]
Hive - Gateway (Default) / Resource Management - Client Java Heap Size in Bytes [1 Gib]
Hive - HiveServer2
(Default) / Resource Management - Java Heap Size of HiveServer2 in
Bytes [1 Gib]
There are three -Xmx that you can play with (increase) - client java, Hive Serever 2 and Hive Meta Store Server. I guess you're hitting one of these limtits.