Presto-Glue-EMR integration: presto-cli giving NullPointerException - amazon-emr

I am trying to connect my Glue catalog to Presto and Hive in EMR. While running the queries in presto-cli, I am getting NullPointerException whereas the same query succeeds in hive-cli.
Started the cli like below
presto-cli --catalog hive
Exception on executing a query:
Query 20180814_174636_00003_iika5 failed: java.lang.NullPointerException: parameters is null
EMR Configuration looks like this:
[
{
"classification": "presto-connector-hive",
"properties": {
"hive.metastore": "glue"
},
"configurations": []
},
{
"classification": "hive-site",
"properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
},
"configurations": []
}
]
EMR version: 5.16.0
Presto version: 0.203
Reference Doc: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-presto-glue.html
Debug logs
Query 20180816_060942_00001_m9i52 failed: java.lang.NullPointerException: parameters is null
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.NullPointerException: parameters is null
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2052)
at com.google.common.cache.LocalCache.get(LocalCache.java:3943)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3967)
at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4952)
at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4958)
at com.facebook.presto.hive.metastore.CachingHiveMetastore.get(CachingHiveMetastore.java:207)
at com.facebook.presto.hive.metastore.CachingHiveMetastore.getPartitionNamesByParts(CachingHiveMetastore.java:499)
at com.facebook.presto.hive.metastore.SemiTransactionalHiveMetastore.doGetPartitionNames(SemiTransactionalHiveMetastore.java:467)
at com.facebook.presto.hive.metastore.SemiTransactionalHiveMetastore.getPartitionNamesByParts(SemiTransactionalHiveMetastore.java:445)
at com.facebook.presto.hive.HivePartitionManager.getFilteredPartitionNames(HivePartitionManager.java:284)
at com.facebook.presto.hive.HivePartitionManager.getPartitions(HivePartitionManager.java:146)
at com.facebook.presto.hive.HiveMetadata.getTableLayouts(HiveMetadata.java:1305)
at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.getTableLayouts(ClassLoaderSafeConnectorMetadata.java:73)
at com.facebook.presto.metadata.MetadataManager.getLayouts(MetadataManager.java:346)
at com.facebook.presto.sql.planner.iterative.rule.PickTableLayout.planTableScan(PickTableLayout.java:203)
at com.facebook.presto.sql.planner.iterative.rule.PickTableLayout.access$200(PickTableLayout.java:61)
at com.facebook.presto.sql.planner.iterative.rule.PickTableLayout$PickTableLayoutWithoutPredicate.apply(PickTableLayout.java:186)
at com.facebook.presto.sql.planner.iterative.rule.PickTableLayout$PickTableLayoutWithoutPredicate.apply(PickTableLayout.java:153)
at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.transform(IterativeOptimizer.java:168)
at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreNode(IterativeOptimizer.java:141)
at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:104)
at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:193)
at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:106)
at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:193)
at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:106)
at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.optimize(IterativeOptimizer.java:95)
at com.facebook.presto.sql.planner.LogicalPlanner.plan(LogicalPlanner.java:140)
at com.facebook.presto.sql.planner.LogicalPlanner.plan(LogicalPlanner.java:129)
at com.facebook.presto.execution.SqlQueryExecution.doAnalyzeQuery(SqlQueryExecution.java:327)
at com.facebook.presto.execution.SqlQueryExecution.analyzeQuery(SqlQueryExecution.java:312)
at com.facebook.presto.execution.SqlQueryExecution.start(SqlQueryExecution.java:268)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException: parameters is null
at java.util.Objects.requireNonNull(Objects.java:228)
at com.facebook.presto.hive.metastore.Partition.<init>(Partition.java:54)
at com.facebook.presto.hive.metastore.Partition$Builder.build(Partition.java:180)
at com.facebook.presto.hive.metastore.glue.converter.GlueToPrestoConverter.convertPartition(GlueToPrestoConverter.java:141)
at com.facebook.presto.hive.metastore.glue.GlueHiveMetastore.lambda$getPartitions$8(GlueHiveMetastore.java:558)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at com.facebook.presto.hive.metastore.glue.GlueHiveMetastore.getPartitions(GlueHiveMetastore.java:558)
at com.facebook.presto.hive.metastore.glue.GlueHiveMetastore.getPartitionNamesByParts(GlueHiveMetastore.java:541)
at com.facebook.presto.hive.metastore.CachingHiveMetastore.loadPartitionNamesByParts(CachingHiveMetastore.java:504)
at com.google.common.cache.CacheLoader$FunctionToCacheLoader.load(CacheLoader.java:165)
at com.google.common.cache.CacheLoader$1.load(CacheLoader.java:188)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3524)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2273)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2156)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2046)
... 33 more

Seems like presto 0.203 has this bug, I faced it too, I switched to a newer version and it worked.
In the time I am writing this answer, EMR 5.17 is released and it has presto 0.206 which has this problem resolved.

Related

Pyodbc to Hive on Dataproc intermittent error: (79) Failed to reconnect to server. (79) (SQLExecDirectW)

I launch a Dataproc cluster with Hive using GoogleAPIs in Python and connect to the Hive with pyodbc. Hive queries succeed and fail seemingly randomly.
Cloudera Hive ODBC driver 2.6.9
pyodbc 4.0.30
Error: "pyodbc.OperationalError: ('08S01', '[08S01] [Cloudera][Hardy] (79) Failed to reconnect to server. (79) (SQLExecDirectW)')"
Some server logs:
{
"insertId": "xxx",
"jsonPayload": {
"filename": "yarn-yarn-timelineserver-xxx-m.log",
"class": "org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager",
"message": "ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted"
},
{
"insertId": "xxx",
"jsonPayload": {
"container": "container_xxx_0002_01_000001",
"thread": "ORC_GET_SPLITS #4",
"application": "application_xxx_0002",
"message": "Failed to get files with ID; using regular API: Only supported for DFS; got class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem",
"class": "|io.AcidUtils|",
"filename": "application_xxx_0002.container_xxx_0002_01_000001.syslog_dag_xxx_0002_4",
"container_logname": "syslog_dag_xxx_0002_4"
},

'serverless invoke -f hello' gives KeyError

I am following a tutorial in order to learn how to work with the serverless framework. The goal is to deploy a Django application. This tutorial suggests putting the necessary environment variables in a separate yml-file. Unfortunately, following the tutorial gets me a KeyError.
I have a serverless.yml, variables.yml and a handler.py. I will incert all code underneath, together with the given error.
serverless.yml:
service: serverless-django
custom: ${file(./variables.yml)}
provider:
name: aws
runtime: python3.8
functions:
hello:
environment:
- THE_ANSWER: ${self:custom.THE_ANSWER}
handler: handler.hello
variables.yml:
THE_ANSWER: 42
handler.py:
import os
def hello(event, context):
return {
"statusCode": 200,
"body": "The answer is: " + os.environ["THE_ANSWER"]
}
The error in my terminal:
{
"errorMessage": "'THE_ANSWER'",
"errorType": "KeyError",
"stackTrace": [
" File \"/var/task/handler.py\", line 7, in hello\n \"body\": \"The answer is: \" + os.environ[\"THE_ANSWER\"]\n",
" File \"/var/lang/lib/python3.8/os.py\", line 675, in __getitem__\n raise KeyError(key) from None\n"
]
}
Error --------------------------------------------------
Error: Invoked function failed
at AwsInvoke.log (/snapshot/serverless/lib/plugins/aws/invoke/index.js:105:31)
at AwsInvoke.tryCatcher (/snapshot/serverless/node_modules/bluebird/js/release/util.js:16:23)
at Promise._settlePromiseFromHandler (/snapshot/serverless/node_modules/bluebird/js/release/promise.js:547:31)
at Promise._settlePromise (/snapshot/serverless/node_modules/bluebird/js/release/promise.js:604:18)
at Promise._settlePromise0 (/snapshot/serverless/node_modules/bluebird/js/release/promise.js:649:10)
at Promise._settlePromises (/snapshot/serverless/node_modules/bluebird/js/release/promise.js:729:18)
at _drainQueueStep (/snapshot/serverless/node_modules/bluebird/js/release/async.js:93:12)
at _drainQueue (/snapshot/serverless/node_modules/bluebird/js/release/async.js:86:9)
at Async._drainQueues (/snapshot/serverless/node_modules/bluebird/js/release/async.js:102:5)
at Immediate._onImmediate (/snapshot/serverless/node_modules/bluebird/js/release/async.js:15:14)
at processImmediate (internal/timers.js:456:21)
at process.topLevelDomainCallback (domain.js:137:15)
For debugging logs, run again after setting the "SLS_DEBUG=*" environment variable.
Get Support --------------------------------------------
Docs: docs.serverless.com
Bugs: github.com/serverless/serverless/issues
Issues: forum.serverless.com
Your Environment Information ---------------------------
Operating System: linux
Node Version: 12.18.1
Framework Version: 2.0.0 (standalone)
Plugin Version: 4.0.2
SDK Version: 2.3.1
Components Version: 3.1.2
The command i'm trying is 'sls invoke -f hello'. The command 'sls deploy' has been already executed succesfully.
I am new to serverless, so please let me know how to fix this, or if any more information is needed.
First of all, there is an error in yml script:
Serverless Error ---------------------------------------
Invalid characters in environment variable 0
The error is defining environment variables as an array instead of key-value pairs
And then after deployment, everything works smoothly (sls deploy -v)
sls invoke --f hello
{
"statusCode": 200,
"body": "The answer is: 42"
}
serverless.yml
service: sls-example
custom: ${file(./variables.yml)}
provider:
name: aws
runtime: python3.8
functions:
hello:
environment:
THE_ANSWER: ${self:custom.THE_ANSWER}
handler: handler.hello
variables.yml
THE_ANSWER: 42

Databricks spark_jar_task failed when submitted via API

I am using to submit a sample spark_jar_task
My sample spark_jar_task request to calculate Pi :
"libraries": [
{
"jar": "dbfs:/mnt/test-prd-foundational-projects1/spark-examples_2.11-2.4.5.jar"
}
],
"spark_jar_task": {
"main_class_name": "org.apache.spark.examples.SparkPi"
}
Databricks sysout logs where it prints the Pi value as expected
....
(This session will block until Rserve is shut down) Spark package found in SPARK_HOME: /databricks/spark DATABRICKS_STDOUT_END-19fc0fbc-b643-4801-b87c-9d22b9e01cd2-1589148096455
Executing command, time = 1589148103046.
Executing command, time = 1589148115170.
Pi is roughly 3.1370956854784273
Heap
.....
Spark_jar_task though prints the PI value in log, the job got terminated with failed status without stating the error. Below are response of api /api/2.0/jobs/runs/list/?job_id=23.
{
"runs": [
{
"job_id": 23,
"run_id": 23,
"number_in_job": 1,
"state": {
"life_cycle_state": "TERMINATED",
"result_state": "FAILED",
"state_message": ""
},
"task": {
"spark_jar_task": {
"jar_uri": "",
"main_class_name": "org.apache.spark.examples.SparkPi",
"run_as_repl": true
}
},
"cluster_spec": {
"new_cluster": {
"spark_version": "6.4.x-scala2.11",
......
.......
Why the job failed here? Any suggestions will be appreciated!
EDIT :
The errorlog says
20/05/11 18:24:15 INFO ProgressReporter$: Removed result fetcher for 740457789401555410_9000204515761834296_job-34-run-1-action-34
20/05/11 18:24:15 WARN ScalaDriverWrapper: Spark is detected to be down after running a command
20/05/11 18:24:15 WARN ScalaDriverWrapper: Fatal exception (spark down) in ReplId-a46a2-6fb47-361d2
com.databricks.backend.common.rpc.SparkStoppedException: Spark down:
at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:493)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:597)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:390)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
at java.lang.Thread.run(Thread.java:748)
20/05/11 18:24:17 INFO ShutdownHookManager: Shutdown hook called
I found answer from this post https://github.com/dotnet/spark/issues/126
Looks like, we shouldnt deliberately call
spark.stop()
when running as a jar in databricks

Apache Beam on Cloud Dataflow - Failed to query Cadvisor

I have a cloud dataflow that is reading from a Pub/Sub and pushing data out to BQ. Recently the dataflow is reporting the error below and not writing any data to BQ.
{
insertId: "3878608796276796502:822931:0:1075"
jsonPayload: {
line: "work_service_client.cc:490"
message: "gcpnoelevationcall-01211413-b90e-harness-n1wd Failed to query CAdvisor at URL=<IPAddress>:<PORT>/api/v2.0/stats?count=1, error: INTERNAL: Couldn't connect to server"
thread: "231"
}
labels: {
compute.googleapis.com/resource_id: "3878608796276796502"
compute.googleapis.com/resource_name: "gcpnoelevationcall-01211413-b90e-harness-n1wd"
compute.googleapis.com/resource_type: "instance"
dataflow.googleapis.com/job_id: "2018-01-21_14_13_45"
dataflow.googleapis.com/job_name: "gcpnoelevationcall"
dataflow.googleapis.com/region: "global"
}
logName: "projects/poc/logs/dataflow.googleapis.com%2Fshuffler"
receiveTimestamp: "2018-01-21T22:41:40.053806623Z"
resource: {
labels: {
job_id: "2018-01-21_14_13_45"
job_name: "gcpnoelevationcall"
project_id: "poc"
region: "global"
step_id: ""
}
type: "dataflow_step"
}
severity: "ERROR"
timestamp: "2018-01-21T22:41:39.524005Z"
}
Any ideas, on how could I help this? Has anyone faced a similar issue before?
If this just happened once it could be attributed to a transient issue. The process running on the worker node can't reach cAdvisor. Either the cAdvisor container is not running or there is a temporal problem on the worker that can't contact cAdvisor and the job gets stuck.

Elasticsearch snapshot restore throwing "repository missing" exception

"error": "RemoteTransportException[[Francis Underwood][inet[/xx.xx.xx.xx:9300]][cluster/snapshot/get]]; nested: RepositoryMissingException[[xxxxxxxxx] missing]; ",
"status": 404
I am also unable to create new snapshot repository for snapshots on s3
PUT _snapshot/bkp_xxxxx_master
{
"type": "s3",
settings": {
"region": "us-xxxx-x",
"bucket": "elasticsearch-backups",
"access_key": "xxxxxxxxxxxx",
"secret_key": "xxxxxxxxxxxxxxxxxxx"
}
}
Response I receive for this PUT is below:
{
"error": "RemoteTransportException[[Francis Underwood][inet[/xx.xx.xx.xx:9300]][cluster/repository/put]]; nested: RepositoryException[[bkp_xxxxxxx_master] failed to create repository]; nested:'AbstractMethodError[org.elasticsearch.cloud.aws.blobstore.S3BlobStore.immutableBlobContainer(Lorg/elasticsearch/common/blobstore/BlobPath;)Lorg/elasticsearch/common/blobstore/ImmutableBlobContainer;]; ",
"status": 500
}
Thanks in advance!
I know this is an old issue but I had been able to replicate this over multiple ElasticSearch versions and it turns out that the reason was conflict between JVM versions and elasticsearch-aws-cloud plugin versions.
As long as you have consistent versions across the cluster (in my case it was Joda version in elasticsearch-aws-cloud was not compatible with the latest JVM version I had installed on the newer nodes.