Cant create ORC external tables on Hawq PXF - hive

I'm using Pivotal Hawq with ambari and now I'm trying to run some queries over ORC hive tables with hawq.
Previously I was able to create the external queries on pqsql using SELECT * FROM hcatalog.hive-db-name.hive-table-name distributed randomly;
But now everytime I get the error:
Exception report message java.lang.Exception: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.
Can you provide some help on how to surpass this?

I believe you have missed a step to update your pxf-profiles.xml file that's required after upgrading to HDB 2.2. Please see the instructions listed here:
http://hdb.docs.pivotal.io/220/hdb/install/install-ambari.html#post-install-212-req

Related

Committing hudi files manually

I am using spark 3.x with apache-hudi 0.8.0 version.
While I am trying to create presto table by using hudi-hive-sync tool I am getting below error.
Got runtime exception when hive syncing
java.lang.IllegalArgumentException: Could not find any data file written for commit [20220116033425__commit__COMPLETED], could not get schema for table
But I checked all data for partitiionKeys using zepplin notebook , I see all data present.
Its understood that I need to do manually commit the file. How to do it ?

Where does hive metastore store locks info?

I am trying to create indexes on one hive table and getting error:
FAILED: Error in acquiring locks: Lock acquisition for
LockRequest(component:[LockComponent(type:EXCLUSIVE, level:PARTITION,
dbname:,
tablename:jobs_indx_jobs_title,
partitionname:year=2016/month=1/sourcecd=BYD),
LockComponent(type:SHARED_READ, level:TABLE, dbname:,
tablename:jobs), LockComponent(type:SHARED_READ, level:PARTITION,
dbname:, tablename:jobs,
partitionname:year=2016/month=1/sourcecd=BD)], txnid:0, user:hadoop,
hostname:Hortorn-NN-2.b2vheq12ivkfdsjdskdf3nba.dx.internal.cloudapp.net)
timed out after 5504043ms. LockResponse(lockid:58318, state:WAITING)
I want to know in which table hive metastore locks info that it shows while executing "show locks" command?
It's not in the Metastore, it's in a ZooKeeper topic...
Just read the documentation and the design decisions back in 2010 for HIVE-1293
If the table is non-transactional, try setting hive.support.concurrency=false.

Redshift drop/create/select query failing in Data Pipeline

I'm trying to run a daily migration script in Redshift using Data Pipeline.
The script works as expected when I run it directly using SQL Workbench/J, but fails when triggered through Data Pipeline.
I have reproduced the problem with this simple code:
drop table if exists image_stg;
create table image_stg (like image_full);
select * from image_stg;
When I run it in Data Pipeline, I get this error:
[Amazon](500310) Invalid operation: relation "image_stg" does not exist;
I also got this error once, for the exact same code, without changing anything:
[Amazon](500310) Invalid operation: Relation with OID 108425 does not exist.;
Here's a screenshot of the two error messages:
I've found this thread on the AWS forums, but it didn't help: Pipeline started failing on simple Redshift SqlActivity and temp table
What is causing this error? Is there a workaround?
I've contacted Amazon, and it looks like a problem in Data Pipeline.
They did suggest a workaround that seems to work in my case: Change the JDBC connection string from jdbc:redshift://… to jdbc:postgresql://… .
I had the same problem when creating a temporary table in Redshift via Pipeline but the workaround of changing the connection string from jdbc:redshift://… to jdbc:postgresql://… didn't work for me though. My last resort is to create the table as physical table and drop it after use - through Pipeline.

FAILED: Hive Internal Error: java.util.NoSuchElementException(null) while running a CREATE TABLE query from shark command line

I am trying to create a table in hive metastore using shark by executing the following command:
CREATE TABLE src(key int, value string);
but i always get:
FAILED: Hive Internal Error: java.util.NoSuchElementException(null)
Read about the same thing in the google group- shark-users but alas.
My spark version is 0.8.1
My shark version is 0.8.1
Hive binary version is 0.9.0
I have pre installed hive-0.10.0 from cdh4.5.0 but i cant use it since shark 0.8.1 is not compatible with hive-0.10.0 yet.
I can run various queries like select * from table_name; but not create table query.
Even trying to create a cached table fails.
If i try and do sbt build using my HADOOP_VERSION=2.0.0cdh4.5.0, i get DistributedFileSystem error and i am not able to run any query.
I am dire need of a solution. Ill be glad if somebody can put me on to a right direction. I have mysql database and not derby.
I encountered a similar problem, and it seems that this occurs only in 0.8.1 of Shark. I solved it by reverting to Spark and Shark 0.8.0, and it works fine.
0.8.0 and 0.8.1 are very similar in functionality and unless you are using Spark for the added functionality between the two releases, you would be better off staying with 0.8.0.
By the way, it's SPARK_HADOOP_VERSION and SHARK_HADOOP_VERSION if you intend to build those two from the source code. It's not just HADOOP_VERSION.

accesing Views created in Hive using HcatLoader in Pig

I was just trying something in hive and HcatLoader in Pig. What I did is, created a view in Hive and then tried to load data by view I created into pig using HcatLoader. But it seems it is not working. I just wanted to confirm that is there any way to do this? I am getting following error when I tried to load view in pig using HcatLoader
events=Load 'ViewName' using org.apache.hcatalog.pig.HCatLoader();
dump events;
When I use any tableName instead of View from Hive, it seems to work. Further it does not give metastore error. As it says successfully connected to metastore at load statement when it comes to dump, it crashes with the following error.
Any Pointers will be helpful.
Thanks,
Atul
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias events
at org.apache.pig.PigServer.openIterator(PigServer.java:857)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:682)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:555)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias events
at org.apache.pig.PigServer.storeEx(PigServer.java:956)
at org.apache.pig.PigServer.store(PigServer.java:919)
at org.apache.pig.PigServer.openIterator(PigServer.java:832)
... 12 more
Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:731)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:259)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:180)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1270)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1255)
at org.apache.pig.PigServer.storeEx(PigServer.java:952)
Response I recieved by posting it on some other forum.
"HCatLoader does not support reading views in Hive. The issue is that a view is defined as a query on a table (create view V as select x, y from t).
Pig doesn't speak SQL,
and
HCat doesn't contain Hive's execution engine
so it cannot execute the query either. Reading Hive views from Pig and MR will require much tighter integration of the products than we currently have."
I found the same issue the hard way today. Hive cannot read Hive Views (but lacks good exception handling code on this topic).
For the records (anybody else falling into this problem), this is how the current version behaves: On Hortonworks 2.3 with Pig 1.15 I only got the following error in the log:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error
creating job configuration.
Pig fails this way because there is no file to load (as we attempted to load from a View).
Since Pig loads the data from a file in hadoop, reading data from an view (which does not have a physical file) may not work.
May be if we can manage to create a file for the view in hadoop, Pig may be able to load it. Atleast a virtual pointer file to the actual data file.
Not sure if this is possible or has been thought through.