Hive persmission denied error - permissions

I have two ids us_id1 and us_id2, both ids are part of group alldev.
With us_id1 I can go and view all directories in HDFS, and I can execute all Hive commands
With us_id2 I can view all directories and log into hive.
But when I run a simple command like show databases; I get the following error.
on database directory /datalake/sample_dev/metastore_db in READ ONLY mode
Database Class Loader started - derby.database.classpath=''
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
What is missing for us_id2?

Related

Presto - Unable to enable performance tuning

I'm getting below errors after enabling the performance tuning parameters,
Performance tuning parameters used,
optimizer.join-reordering-strategy=AUTOMATIC optimizer.join_distribution_type=AUTOMATIC experimental.enable-dynamic-filtering=TRUE
I'm using amazon emr,
presto version: Presto CLI 0.267-amzn-1
I'm adding these parameters,
/etc/presto/conf/config.properties
`2022-07-11T11:02:36.728Z ERROR main com.facebook.presto.server.PrestoServer Unable to create injector, see the following errors:
Configuration property 'optimizer.join_distribution_type' was not used
at com.facebook.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:244)
1 error
com.google.inject.CreationException: Unable to create injector, see the following errors:
Configuration property 'optimizer.join_distribution_type' was not used
at com.facebook.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:244)
1 error
at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:543)
at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:159)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
at com.google.inject.Guice.createInjector(Guice.java:87)
at com.facebook.airlift.bootstrap.Bootstrap.initialize(Bootstrap.java:251)
at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:143)
at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:85)
2022-07-11T11:02:42.674Z INFO main com.facebook.airlift.log.Logging Disabling stderr output`
Any idea how to fix this issue?

Create table name using username in Hive query running in Oozie workflow?

I've got a Hive SQL script/action as part of an Oozie workflow. I'm doing a CREATE TABLE AS SELECT to output the results. I want to name the table using the username plus an appended string (e.g. "User123456_output_table"), but can't seem to get the correct syntax.
set tablename=${hivevar:current_user()};
CREATE TABLE `${hiveconf:tablename}_output_table` AS SELECT ...
That doesn't work and gives:
Error while compiling statement: FAILED: IllegalArgumentException java.net.URISyntaxException: Relative path in absolute URI: ${hivevar:current_user()%7D_output_table
Or changing the first line to set tablename=${current_user()}; starts running the SELECT query but eventually stops with:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.ql.metadata.HiveException: [${current_user()}_output_table]: is not a valid table name
Or changing the first line to set tablename=current_user(); starts running the SELECT query but eventually stops with:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.ql.metadata.HiveException: [current_user()_output_table]: is not a valid table name
Alternatively, is there a way to pass the username from the Oozie workflow via a parameter?
I'm using Hue to do all this rather than the command line.
Thanks
This is wrong: set tablename=${hivevar:current_user()}; - it will not be resolved and substituted as is.
Hive does not calculate variables before substitution, it substitutes them as is, all functions in variables are NOT calculated. variables are just text replacement.
This:
set tablename=current_user();
CREATE TABLE `${hiveconf:tablename}_output_table` ...
gets resolved as
CREATE TABLE `current_user()_output_table` ...
And functions are not supported in table names, it will not work this way.
The solution is to calculate functions outside the script and pass them as parameters.
See this blog: https://prodlife.wordpress.com/2013/12/06/parameterizing-hive-actions-in-oozie-workflows/

BigQuery loads manually but not through the Java SDK

I have a Dataflow pipeline, running locally. The objective is to read a JSON file using TEXTIO, make sessions and load it into BigQuery. Given the structure I have to create a temp directory in GCS and then load it into BigQuery using that. Previously I had a data schema error that prevented me to load the data, see here. That issue is resolved.
So now when I run the pipeline locally it ends with dumping a temporary JSON newline delimited file into GCS. The SDK then gives me the following:
Starting BigQuery load job beam_job_xxxx_00001-1: try 1/3
INFO [main] (BigQueryIO.java:2191) - BigQuery load job failed: beam_job_xxxx_00001-1
...
Exception in thread "main" com.google.cloud.dataflow.sdk.Pipeline$PipelineExecutionException: java.lang.RuntimeException: Failed to create the load job beam_job_xxxx_00001, reached max retries: 3
at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:187)
at pedesys.Dataflow.main(Dataflow.java:148)
Caused by: java.lang.RuntimeException: Failed to create the load job beam_job_xxxx_00001, reached max retries: 3
at com.google.cloud.dataflow.sdk.io.BigQueryIO$Write$WriteTables.load(BigQueryIO.java:2198)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$Write$WriteTables.processElement(BigQueryIO.java:2146)
The errors are not very descriptive and the data is still not loaded in BigQuery. What is puzzling is that if I go to the BigQuery UI and load the same temporary file from GCS that was dumped by the SDK's Dataflow pipeline manually, in the same table, it works beautifully.
The relevant code parts are as follows:
PipelineOptions options = PipelineOptionsFactory.create();
options.as(BigQueryOptions.class)
.setTempLocation("gs://test/temp");
Pipeline p = Pipeline.create(options)
...
...
session_windowed_items.apply(ParDo.of(new FormatAsTableRowFn()))
.apply(BigQueryIO.Write
.named("loadJob")
.to("myproject:db.table")
.withSchema(schema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
);
The SDK is swallowing the error/exception and not reporting it to users. It's most likely a schema problem. To get the actual error that is happening you need to fetch the job details by either:
CLI - bq show -j job beam_job_<xxxx>_00001-1
Browser/Web: use "try it" at the bottom of the page here.
#jkff has raised an issue here to improve the error reporting.

Dbeaver Connecting to Hive - SQLException: Method not supported

I'm getting this error when trying to run a select after connecting to Hive.
Is this a bad jar file?
org.jkiss.dbeaver.model.impl.jdbc.JDBCException: SQL Error: Method not supported
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCConnectionImpl.prepareStatement(JDBCConnectionImpl.java:170)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCConnectionImpl.prepareStatement(JDBCConnectionImpl.java:1)
at org.jkiss.dbeaver.model.DBUtils.createStatement(DBUtils.java:985)
at org.jkiss.dbeaver.model.DBUtils.prepareStatement(DBUtils.java:963)
at org.jkiss.dbeaver.runtime.sql.SQLQueryJob.executeSingleQuery(SQLQueryJob.java:313)
at org.jkiss.dbeaver.runtime.sql.SQLQueryJob.extractData(SQLQueryJob.java:633)
at org.jkiss.dbeaver.ui.editors.sql.SQLEditor$QueryResultsProvider.readData(SQLEditor.java:1169)
at org.jkiss.dbeaver.ui.controls.resultset.ResultSetDataPumpJob.run(ResultSetDataPumpJob.java:132)
at org.jkiss.dbeaver.runtime.AbstractJob.run(AbstractJob.java:91)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)
Caused by: java.sql.SQLException: Method not supported
at org.apache.hadoop.hive.jdbc.HiveConnection.createStatement(HiveConnection.java:229)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCConnectionImpl.createStatement(JDBCConnectionImpl.java:350)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCConnectionImpl.prepareStatement(JDBCConnectionImpl.java:138)
... 9 more
There is a calls in hive jdbc jar called org.apache.hive.jdbc.HiveResultSetMetaData . This class contains a method isWritable which is not supported by hive yet. This is the reason why you get the error "Method not supported".
Take the source code of this class and update the above method. Then generate the class and replaced it in the jar. This worked for me.

Lucene Search Error Stack

I am seeing the following error when trying to search using Lucene. (version 1.4.3). Any ideas as to why I could be seeing this and how to fix it?
Caused by: java.io.IOException: read past EOF
at org.apache.lucene.store.InputStream.refill(InputStream.java:154)
at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:195)
at org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:55)
at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:109)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:89)
at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118)
at org.apache.lucene.store.Lock$With.run(Lock.java:109)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:111)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:106)
at org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:43)
In this same environment I also see the following error:
Caused by: java.io.IOException: Lock obtain timed out:
Lock#/tmp/lucene-3ec31395c8e06a56e2939f1fdda16c67-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:58)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:223)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:213)
The same code works in a test environment, however not in production. Cannot identify any obvious differences between the two environments.
File permissions are wrong (it needs write permission) or your are not able to access a locked file that the current process needs.