Hive3.1 metastore process generates a large number of s3 accesses resulting in additional overhead - amazon-s3

We store the hdfs data in s3 in the production environment.
After upgrading the hive version from 1.2 to 3.1, we found that the number of visits to s3 increased rapidly, which resulted in some additional expenses.
The following figure shows the access statistics of the s3 bucket. When hive is running, there are still a large number of s3 access requests even if no external application is used. When hive is shut down, the access volume returns to normal immediately.
Check the hive metastore log when all upper-level applications of hive are closed, and find that the log grows rapidly, and there are a large number of accesses to the hive table (s3 bucket)
The following code is part of get_table access. I analyzed the full amount of logs and found that it is a traversal scan of all tables in hive (thousands of tables), so a large number of s3 access requests are generated.
2023-02-06T16:46:42,899 INFO [pool-6-thread-3]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(952)) - 3: source:**.***.*.*** get_table : tbl=hive.ods.ods_wkfl_act_ru_event_subscr
2023-02-06T16:46:42,899 INFO [pool-6-thread-3]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(360)) - ugi=hive ip=**.***.*.*** cmd=source:**.***.*.*** get_table :tbl=hive.ods.ods_wkfl_act_ru_event_subscr
2023-02-06T16:46:42,907 INFO [pool-6-thread-3]: metastore.MetastoreDefaultTransformer (MetastoreDefaultTransformer.java:transform(96)) - Starting translation for processor HMSClient-#master1 on list 1
2023-02-06T16:46:42,907 INFO [pool-6-thread-3]: metastore.MetastoreDefaultTransformer (MetastoreDefaultTransformer.java:transform(115)) - Table ods_wkfl_act_ru_event_subscr,#bucket=-1,isBucketed:false,tableType=EXTERNAL_TABLE,tableCapabilities=null
2023-02-06T16:46:42,908 INFO [pool-6-thread-3]: metastore.MetastoreDefaultTransformer (MetastoreDefaultTransformer.java:transform(438)) - Transformer return list of 1
2023-02-06T16:46:42,908 INFO [pool-6-thread-3]: authorization.StorageBasedAuthorizationProvider (StorageBasedAuthorizationProvider.java:userHasProxyPrivilege(172)) - userhive has host proxy privilege.
2023-02-06T16:46:42,908 INFO [pool-6-thread-3]: authorization.StorageBasedAuthorizationProvider (StorageBasedAuthorizationProvider.java:checkPermissions(395)) - Path authorization is skipped for path s3a://**/apps/hive/warehouse/ods_qa/data/wkfl_act_ru_event_subscr20230202.
2023-02-06T16:46:42,940 INFO [pool-6-thread-3]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(952)) - 3: source:**.***.*.*** get_table : tbl=hive.ods.ods_wkfl_act_ru_event_subscr
2023-02-06T16:46:42,940 INFO [pool-6-thread-3]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(360)) - ugi=hive ip=**.***.*.*** cmd=source:**.***.*.*** get_table :tbl=hive.ods.ods_wkfl_act_ru_event_subscr
2023-02-06T16:46:42,948 INFO [pool-6-thread-3]: metastore.MetastoreDefaultTransformer (MetastoreDefaultTransformer.java:transform(96)) - Starting translation for processor HMSClient-#master1 on list 1
I compared the configurations of hive1.2.1 and hive3, found their differences, and tried to modify the value of hive.compactor.initiator.on to false, but this did not solve the problem
I compared the difference between the metastore logs of hive1.2 and hive3.1, and found that get_all_databases is used in hive1.2, which means that only one s3 access will be generated in one scan cycle. For specific logs, see the following code
2023-02-06 16:49:57,578 INFO [pool-5-thread-200]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(317)) - ugi=ambari-qa ip=**.**.**.** cmd=source:**.**.**.** get_all_databases
2023-02-06 16:49:57,661 WARN [pool-5-thread-200]: conf.HiveConf (HiveConf.java:initialize(3093)) - HiveConf of name hive.log.dir does not exist
2023-02-06 16:49:57,661 WARN [pool-5-thread-200]: conf.HiveConf (HiveConf.java:initialize(3093)) - HiveConf of name hive.driver.parallel.compilation does not exist
2023-02-06 16:49:57,661 WARN [pool-5-thread-200]: conf.HiveConf (HiveConf.java:initialize(3093)) - HiveConf of name hive.log.file does not exist
2023-02-06 16:49:57,661 INFO [pool-5-thread-200]: metastore.HiveMetaStore (HiveMetaStore.java:newRawStoreForConf(619)) - 200: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore
2023-02-06 16:49:57,787 INFO [pool-5-thread-200]: metastore.ObjectStore (ObjectStore.java:initializeHelper(383)) - ObjectStore, initialize called
2023-02-06 16:49:57,871 WARN [pool-5-thread-200]: conf.HiveConf (HiveConf.java:initialize(3093)) - HiveConf of name hive.log.dir does not exist
2023-02-06 16:49:57,871 WARN [pool-5-thread-200]: conf.HiveConf (HiveConf.java:initialize(3093)) - HiveConf of name hive.driver.parallel.compilation does not exist
2023-02-06 16:49:57,872 WARN [pool-5-thread-200]: conf.HiveConf (HiveConf.java:initialize(3093)) - HiveConf of name hive.log.file does not exist
2023-02-06 16:49:57,893 INFO [pool-5-thread-200]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(163)) - Using direct SQL, underlying DB is MYSQL
2023-02-06 16:49:57,893 INFO [pool-5-thread-200]: metastore.ObjectStore (ObjectStore.java:setConf(297)) - Initialized ObjectStore
2023-02-06 16:52:54,140 INFO [pool-5-thread-200]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(773)) - 200: source:**.**.**.** get_all_functions
2023-02-06 16:52:54,140 INFO [pool-5-thread-200]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(317)) - ugi=ambari-qa ip=**.**.**.** cmd=source:**.**.**.** get_all_functions
2023-02-06 16:52:56,155 INFO [pool-5-thread-200]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(773)) - 200: Shutting down the object store...
2023-02-06 16:52:56,155 INFO [pool-5-thread-200]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(317)) - ugi=ambari-qa ip=**.**.**.** cmd=Shutting down the object store...
After the above analysis, I think it is due to the surge of s3 access caused by this problem
Now what I want to know is which hive parameters I can modify, or take another way, so that the hive metastore scans no longer traverses the entire table, but uses the get_all_databases method. I see few similar answers on the Internet. Now I am not sure whether my thinking is correct, I really need your help, thank you very much!

Related

Filebeat does not complete on close_eof + --once

Using filebeat 7.5.2:
I'm using a filebeat configuration with close_eof enabled and I run filebeat with the flag --once. I can see the harvester reaching eof but the filebeat keeps going.
Flebeat conf:
filebeat.inputs:
- type: log
close_eof: true
enabled: true
paths:
- "${LOGS_PATH}"
scan_frequency: 1s
fields: {
machine: "${HOST}"
}
output.logstash:
hosts: ["192.168.41.6:5044"]
bulk_max_size: 1024
timeout: 30s
pipelining: 1
workers: 1
And I run it using:
filebeat run --once -v -c "PATH TO CONF..."
And some logs from the filebeat instance:
...
2020-02-04T18:30:16.950Z INFO instance/beat.go:297 Setup Beat: filebeat; Version: 7.5.2
2020-02-04T18:30:17.059Z INFO [publisher] pipeline/module.go:97 Beat name: logstash
2020-02-04T18:30:17.167Z WARN beater/filebeat.go:152 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch out
put is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2020-02-04T18:30:17.168Z INFO instance/beat.go:429 filebeat start running.
2020-02-04T18:30:17.168Z INFO [monitoring] log/log.go:118 Starting metrics logging every 30s
2020-02-04T18:30:17.168Z INFO registrar/migrate.go:104 No registry home found. Create: /tmp/tmp.BXJtfiaEzb/data/registry/filebeat
2020-02-04T18:30:17.179Z INFO registrar/migrate.go:112 Initialize registry meta file
2020-02-04T18:30:17.192Z INFO registrar/registrar.go:108 No registry file found under: /tmp/tmp.BXJtfiaEzb/data/registry/filebeat/data.json. Creating a new re
gistry file.
2020-02-04T18:30:17.193Z INFO registrar/registrar.go:145 Loading registrar data from /tmp/tmp.BXJtfiaEzb/data/registry/filebeat/data.json
2020-02-04T18:30:17.193Z INFO registrar/registrar.go:152 States Loaded from registrar: 0
2020-02-04T18:30:17.193Z WARN beater/filebeat.go:368 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch out
put is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2020-02-04T18:30:17.193Z INFO crawler/crawler.go:72 Loading Inputs: 1
2020-02-04T18:30:17.194Z INFO log/input.go:152 Configured paths: [/tmp/tmp.BXJtfiaEzb/*.log]
2020-02-04T18:30:17.206Z INFO input/input.go:114 Starting input of type: log; ID: 13918413832820009056
2020-02-04T18:30:17.225Z INFO input/input.go:167 Stopping Input: 13918413832820009056
2020-02-04T18:30:17.225Z INFO crawler/crawler.go:106 Loading and starting Inputs completed. Enabled inputs: 1
2020-02-04T18:30:17.225Z INFO log/harvester.go:251 Harvester started for file: /tmp/tmp.BXJtfiaEzb/dcbgw-20200124080032_darkblue.log
2020-02-04T18:30:17.231Z INFO beater/filebeat.go:384 Running filebeat once. Waiting for completion ...
2020-02-04T18:30:17.231Z INFO beater/filebeat.go:386 All data collection completed. Shutting down.
2020-02-04T18:30:17.231Z INFO crawler/crawler.go:139 Stopping Crawler
2020-02-04T18:30:17.231Z INFO crawler/crawler.go:149 Stopping 1 inputs
2020-02-04T18:30:17.258Z INFO pipeline/output.go:95 Connecting to backoff(async(tcp://192.168.41.6:5044))
2020-02-04T18:30:17.296Z INFO pipeline/output.go:105 Connection to backoff(async(tcp://192.168.41.6:5044)) established
... Only metrics here ...
2020-02-04T18:35:55.686Z INFO log/harvester.go:274 End of file reached: /tmp/tmp.BXJtfiaEzb/dcbgw-20200124080032_darkblue.log. Closing because close_eof is enabled.
2020-02-04T18:35:55.686Z INFO crawler/crawler.go:165 Crawler stopped
... MORE METRICS ...
2020-02-04T18:36:26.609Z ERROR logstash/async.go:256 Failed to publish events caused by: read tcp 192.168.41.6:49662->192.168.41.6:5044: i/o timeout
2020-02-04T18:36:26.621Z ERROR logstash/async.go:256 Failed to publish events caused by: client is not connected
2020-02-04T18:36:28.520Z ERROR pipeline/output.go:121 Failed to publish events: client is not connected
2020-02-04T18:36:28.520Z INFO pipeline/output.go:95 Connecting to backoff(async(tcp://192.168.41.6:5044))
2020-02-04T18:36:28.521Z INFO pipeline/output.go:105 Connection to backoff(async(tcp://192.168.41.6:5044)) established
... MORE METRICS ...
From this I'm outputing this to Logstash 7.5.2 running in the same Ubuntu 18 VM. Running Logstash with log level trace does not output any error.

Apache flink - Timeout after submitting job on hadoop / yarn cluster

I am trying to upgrade our job from flink 1.4.2 to 1.7.1 but I keep running into timeouts after submitting the job. The flink job runs on our hadoop cluster (version 2.7) with Yarn.
I've seen the following behavior:
Using the same flink-conf.yaml as we used in 1.4.2: 1.5.6 / 1.6.3 / 1.7.1 all versions timeout while 1.4.2 works.
Using 1.5.6 with "mode: legacy" (to switch off flip-6) works
Using 1.7.1 with "mode: legacy" gives timeout (I assume this option was removed but the documentation is outdated? https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#legacy)
When the timeout happens I get the following stacktrace:
INFO class java.time.Instant does not contain a getter for field seconds
INFO class com.bol.fin_hdp.cm1.domain.Cm1Transportable does not contain a getter for field globalId
INFO Submitting job 5af931bcef395a78b5af2b97e92dcffe (detached: false).
INFO ------------------------------------------------------------
INFO The program finished with the following exception:
INFO org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.
INFO at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)
INFO at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)
INFO at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)
INFO at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:798)
INFO at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:289)
INFO at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:215)
INFO at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1035)
INFO at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1111)
INFO at java.security.AccessController.doPrivileged(Native Method)
INFO at javax.security.auth.Subject.doAs(Subject.java:422)
INFO at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
INFO at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
INFO at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1111)
INFO Caused by: java.lang.RuntimeException: org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.
INFO at com.bol.fin_hdp.job.starter.IntervalJobStarter.startJob(IntervalJobStarter.java:43)
INFO at com.bol.fin_hdp.job.starter.IntervalJobStarter.startJobWithConfig(IntervalJobStarter.java:32)
INFO at com.bol.fin_hdp.Main.main(Main.java:8)
INFO at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
INFO at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
INFO at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
INFO at java.lang.reflect.Method.invoke(Method.java:498)
INFO at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
INFO ... 12 more
INFO Caused by: org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.
INFO at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:258)
INFO at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)
INFO at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
INFO at com.bol.fin_hdp.cm1.job.Job.execute(Job.java:54)
INFO at com.bol.fin_hdp.job.starter.IntervalJobStarter.startJob(IntervalJobStarter.java:41)
INFO ... 19 more
INFO Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
INFO at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:371)
INFO at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
INFO at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
INFO at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
INFO at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
INFO at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:216)
INFO at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
INFO at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
INFO at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
INFO at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
INFO at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:301)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
INFO at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:214)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
INFO at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
INFO at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
INFO at java.lang.Thread.run(Thread.java:748)
INFO Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not complete the operation. Number of retries has been exhausted.
INFO at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
INFO ... 17 more
INFO Caused by: java.util.concurrent.CompletionException: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: shd-hdp-b-slave-01...
INFO at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
INFO at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
INFO at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
INFO at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
INFO ... 15 more
INFO Caused by: org.apache.flink.shaded.netty4.io.netty.channel.ConnectTimeoutException: connection timed out: shd-hdp-b-slave-017.example.com/some.ip.address:46500
INFO at org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:212)
INFO ... 7 more
What changed in flip-6 that might cause this behavior and how can I fix this?
For our jobs on YARN w/Flink 1.6, we had to bump up the web.timeout setting via -yD web.timeout=100000.
In our case, there was a firewall between the machine submitting the job and our Hadoop cluster.
In newer Flink versions (1.7 and up) Flink uses REST to submit jobs. The port number for this REST service is random on yarn setups and could not be set.
Flink 1.8.0 introduced a config option to set this to a port or port range using:
rest.bind-port: 55520-55530

Hiveserver2 does not start after installing HDP 2.6.4.0-91 using cloudbreak on AWS

Hiveserver2 does not start after installing HDP 2.6.4.0-91 using cloudbreak on AWS.
Start the hiveserver2 in the Ambari UI and check the contents of /var/log/hive/hiveserver2.log.
Below is the error log.
Any help would be appreciated.
Contents of hiveserver2.log
2018-03-08 04:41:53,345 WARN [main-EventThread]: server.HiveServer2 (HiveServer2.java:process(343)) - This instance of HiveServer2 has been removed from the list of server instances available for dynamic service discovery. The last client session has ended - will shutdown now.
2018-03-08 04:41:53,347 INFO [main]: zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x16203aad5af0040 closed
2018-03-08 04:41:53,347 INFO [main]: server.HiveServer2 (HiveServer2.java:removeServerInstanceFromZooKeeper(361)) - Server instance removed from ZooKeeper.
2018-03-08 04:41:53,348 INFO [main-EventThread]: server.HiveServer2 (HiveServer2.java:stop(405)) - Shutting down HiveServer2
2018-03-08 04:41:53,348 INFO [main-EventThread]: server.HiveServer2 (HiveServer2.java:removeServerInstanceFromZooKeeper(361)) - Server instance removed from ZooKeeper.
2018-03-08 04:41:53,348 INFO [main-EventThread]: zookeeper.ClientCnxn (ClientCnxn.java:run(524)) - EventThread shut down
2018-03-08 04:41:53,348 WARN [main]: server.HiveServer2 (HiveServer2.java:startHiveServer2(508)) - Error starting HiveServer2 on attempt 1, will retry in 60 seconds
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1520480101488_0046 failed 2 times due to AM Container for appattempt_1520480101488_0046_000002 exited with exitCode: -1000
For more detailed output, check the application tracking page: http://ip-10-0-91-7.ap-northeast-2.compute.internal:8088/cluster/app/application_1520480101488_0046 Then click on links to logs of each attempt.
Diagnostics: ExitCodeException exitCode=2: tar: Removing leading `/' from member names
tar: Skipping to next header
gzip: /hadoopfs/fs1/yarn/nodemanager/filecache/60_tmp/tmp_tez.tar.gz: invalid compressed data--format violated
tar: Exiting with failure status due to previous errors
Failing this attempt. Failing the application.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:699)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:218)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:116)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.startPool(TezSessionPoolManager.java:76)
at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:488)
at org.apache.hive.service.server.HiveServer2.access$700(HiveServer2.java:87)
at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:720)
at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:593)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
I had exactly the same issue with HDP on AWS. FYI, In my case the issue was with HDP version 2.6.4.5-2. I'm going to show how I fixed using this version because it is the latest at this time.
As the error log shows the problem is with tez.tar.gz that is corrupted then YARN is unable to decompress it in the YARN container.
This tez.tar.gz file is copied from the hdfs:///hdp/apps/<hdp_version>/tez/tez.tar.gz.
To reproduce the error and confirm that this file is corrupted, you can run the following command:
sudo su
su hdfs
hdfs dfs -get /hdp/apps/2.6.4.5-2/tez.tar.gz
tar -xvzf tez.tar.gz
You will get the following error:
gzip: stdin: invalid compressed data--format violated
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
The fix is pretty simple, you must just replace the HDFS file with the one that you have on your local file-system running the following command:
hdfs dfs -rm /hdp/apps/2.6.4.5-2/tez/tez.tar.gz
hdfs dfs -put /usr/hdp/current/tez-client/lib/tez.tar.gz /hdp/apps/2.6.4.5-2/tez/tez.tar.gz
Now restart Hive Server 2 service and done!
NOTE: If something similar happens with other services you can do the same thing. Please check the following link that has more details: https://community.hortonworks.com/articles/30096/foxing-broken-targz-and-jar-files-in-hdp-24.html
Hope this helps!

Pig Hcatalog failed to read data from hive table

grunt> table_load = load ‘test_table_one’ USING org.apache.hive.hcatalog.pig.HCatLoader();
grunt> dump table_load;
2016-10-05 17:25:43,798 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is
deprecated. Instead, use fs.defaultFS 2016-10-05 17:25:43,930 [main]
INFO hive.metastore – Trying to connect to metastore with URI
thrift://localhost:9084 2016-10-05 17:25:43,931 [main] INFO
hive.metastore – Opened a connection to metastore, current
connections: 1 2016-10-05 17:25:43,934 [main] INFO hive.metastore –
Connected to metastore. … 2016-10-05 17:25:58,707 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
– HadoopJobId: job_1475669003352_0017 2016-10-05 17:25:58,707 [main]
INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
– Processing aliases table_load 2016-10-05 17:25:58,707 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
– detailed locations: M: table_load[7,13] C: R: 2016-10-05
17:25:58,716 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
– 0% complete 2016-10-05 17:25:58,716 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
– Running jobs are [job_1475669003352_0017] 2016-10-05 17:26:13,753
[main] WARN
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
– Ooops! Some job has failed! Specify -stop_on_failure if you want Pig
to stop immediately on failure. 2016-10-05 17:26:13,753 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
– job job_1475669003352_0017 has failed! Stop running all dependent
jobs 2016-10-05 17:26:13,753 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
– 100% complete 2016-10-05 17:26:13,882 [main] ERROR
org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil – 1 map reduce
job(s) failed! 2016-10-05 17:26:13,883 [main] INFO
org.apache.pig.tools.pigstats.mapreduce.SimplePigStats – Script
Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.6.0 0.15.0 hadoop 2016-10-05 17:25:57 2016-10-05 17:26:13 UNKNOWN
Failed!
Failed Jobs: JobId Alias Feature Message Outputs
job_1475669003352_0017 table_load MAP_ONLY Message: Job failed!
hdfs://mycluster/tmp/temp81690062/tmp2002161033,
Input(s): Failed to read data from “test_table_one”
Output(s): Failed to produce result in
“hdfs://mycluster/tmp/temp81690062/tmp2002161033”
Counters: Total records written : 0 Total bytes written : 0 Spillable
Memory Manager spill count : 0 Total bags proactively spilled: 0 Total
records proactively spilled: 0
Job DAG: job_1475669003352_0017
2016-10-05 17:26:13,883 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
– Failed! 2016-10-05 17:26:13,889 [main] ERROR
org.apache.pig.tools.grunt.Grunt – ERROR 1066: Unable to open iterator
for alias table_load Details at logfile:
/home/hadoop/pig_1475674706670.log
Can you help me to find why it is happening to me.?
Either use pig -useHCatalog or use pig and REGISTER the supporting JARS for HCAT to work with grunt.
You can find the required jars that are been shared into HDFS when you use pig -useHCatalog.
grunt> table_load = load ‘test_table_one’ USING org.apache.hive.hcatalog.pig.HCatLoader();
grunt> dump table_load;
This may be the reason that you haven't created Hive table with the exact name. Check the hive table and schema for the same.
Before using Hcatlog we have to create table schema on top on the location from where we are loading the data. uSE any queue name if require. Before executing please check for the table in hive.
Hope it will help. Try

ERROR 1066: Unable to open iterator for alias

I am getting an error Unable to open iterator for alias for a simple LOAD and DUMP operation with Pig. I already took a look at the answers at:
ERROR 1066: Unable to open iterator for alias - Pig
But they don't help me.
My environment:
OS: Windows 7
Pig version: 0.13.0
Mode: Local
It shows in the error that the exception is 'Caused by' a failure to change
permission of a file in the TMP directory. But when I checked the TMP directory, there's no such file (probably it got deleted after the command is finished?).
Logs below (with -v and -w options) :
'D:\H\HADOOP-2.6.0\bin\hadoop-config.cmd' is not recognized as an internal or external command,
operable program or batch file.
15/01/24 09:20:22 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
15/01/24 09:20:22 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType
2015-01-24 09:20:22,909 [main] INFO org.apache.pig.Main - Apache Pig version 0.13.0 (r1606446) compiled Jun 29 2014, 02:29:34
2015-01-24 09:20:22,909 [main] INFO org.apache.pig.Main - Logging error messages to: d:\Pig\pig-0.13.0\bin\
2015-01-24 09:20:24,267 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file C:\Users\Venkat/.pigbootup not found
2015-01-24 09:20:24,438 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2015-01-24 09:20:26,205 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-01-24 09:20:26,236 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias a
2015-01-24 09:20:26,236 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias a
at org.apache.pig.PigServer.openIterator(PigServer.java:912)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:752)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:228)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:479)
at org.apache.pig.Main.main(Main.java:156)
Caused by: org.apache.pig.backend.datastorage.DataStorageException: ERROR 0: java.io.IOException: Failed to set permissions of path: \tmp\temp946561981 to 0700
at org.apache.pig.impl.io.FileLocalizer.relativeRoot(FileLocalizer.java:484)
at org.apache.pig.impl.io.FileLocalizer.getTemporaryPath(FileLocalizer.java:515)
at org.apache.pig.impl.io.FileLocalizer.getTemporaryPath(FileLocalizer.java:511)
at org.apache.pig.PigServer.openIterator(PigServer.java:887)
... 7 more
Caused by: java.io.IOException: Failed to set permissions of path: \tmp\temp946561981 to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:286)
at org.apache.pig.backend.hadoop.datastorage.HPath.setPermission(HPath.java:122)
at org.apache.pig.impl.io.FileLocalizer.createRelativeRoot(FileLocalizer.java:495)
at org.apache.pig.impl.io.FileLocalizer.relativeRoot(FileLocalizer.java:481)
... 10 more
Details also at logfile: D:\Pig\pig-0.13.0\bin\data-1.txt1422071424329.log
Pig Stack Trace
ERROR 1066: Unable to open iterator for alias a
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias a
at org.apache.pig.PigServer.openIterator(PigServer.java:912)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:752)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:228)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:479)
at org.apache.pig.Main.main(Main.java:156)
Caused by: org.apache.pig.backend.datastorage.DataStorageException: ERROR 0: java.io.IOException: Failed to set permissions of path: \tmp\temp946561981 to 0700
at org.apache.pig.impl.io.FileLocalizer.relativeRoot(FileLocalizer.java:484)
at org.apache.pig.impl.io.FileLocalizer.getTemporaryPath(FileLocalizer.java:515)
at org.apache.pig.impl.io.FileLocalizer.getTemporaryPath(FileLocalizer.java:511)
at org.apache.pig.PigServer.openIterator(PigServer.java:887)
... 7 more
Caused by: java.io.IOException: Failed to set permissions of path: \tmp\temp946561981 to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:286)
at org.apache.pig.backend.hadoop.datastorage.HPath.setPermission(HPath.java:122)
at org.apache.pig.impl.io.FileLocalizer.createRelativeRoot(FileLocalizer.java:495)
at org.apache.pig.impl.io.FileLocalizer.relativeRoot(FileLocalizer.java:481)
... 10 more
Pig file contents:
a = LOAD 's.csv' AS (NAME:chararray,COUNTRY:chararray,YEAR:int,SPORT:chararray,GOLD:int,SILVER:int,BRONZE:int,TOTAL:int);
DUMP a;
Contents of s.csv:
Yang Yilin China 2008 Gymnastics 1 0 2 3
Leisel Jones Australia 2000 Swimming 0 2 0 2
Is there anything wrong in the syntax of the LOAD statement?
Are there any environment variables that need to be set specifically other
than JAVA and JAVA_HOME?
first check your hadoop permission status if its root change it too user.
$sudo chown -R testuser:testuser /(path of hadoop folder)
permission problem will be solve.
I hope this will work
Generally we will get this error as "ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias" due to incorrect path or not able to access the file path.
check the file location is correct or not.
check are you able to access the file.
I got this issue with hadoop namenode safemode is on.
So I did below command to OFF safemode:
sudo -u hdfs hadoop dfsadmin -safemode leave
Then it works for me.