Committing hudi files manually - hive

I am using spark 3.x with apache-hudi 0.8.0 version.
While I am trying to create presto table by using hudi-hive-sync tool I am getting below error.
Got runtime exception when hive syncing
java.lang.IllegalArgumentException: Could not find any data file written for commit [20220116033425__commit__COMPLETED], could not get schema for table
But I checked all data for partitiionKeys using zepplin notebook , I see all data present.
Its understood that I need to do manually commit the file. How to do it ?

Related

error while trying to insert data into hive using nifi

hi I have transformed a file into csv using nifi and i have tried loading it into hive manually and it is working fine but while i am trying to insert same csv file into hive using nifi it is unable load data and it is giving error while using puthive streaming and here is my flow
sample csvfile:-
2019-12-13,9594838484,mmssr rwfhjrbf hrfbhrbfhbf jrf
2018-3-12,9534338484,mms4er fhjrbf hrfbhrbfhbf jrf
2019-5-15,9534338484,mr5ms4er fsfhjrbf hssrfbhrbfhbf jrf
I have only 3columns
I have added schema in convertcsvtoavro processor still i am getting error

Incompatible Parquet schema due to different hive version

We have an emr server which is running Hive 0.13.1 ( I know very archaic it is , but this cluster has a lot of dependencies due to which we are not able to do away with it )
Anyway cutting the chase , we processed something like 10 TB of TSV data in parquet using a different emr cluster which has a latest version of hive.
This was a temporary thing to faciliate the hugh data processing.
Now we are back to the old emr to do incremental processing of TSV to parquet.
We are using aws redshift spectrum coupled with glue to do query on this data. Glue crawls the s3 path where data resides thereby giving us a schema to work with.
Now the data which has been processed by the old emr gives us issues regarding has an incompatible Parquet schema.
The error we get when we try to read parquet data comprising of data processed by newer hive and old hive is ,
[2018-08-13 09:40:36] error: S3 Query Exception (Fetch)
[2018-08-13 09:40:36] code: 15001
[2018-08-13 09:40:36] context: Task failed due to an internal error. File '<Some s3 path >/parquet/<Some table name>/caldate=2018080900/8e71ebbe-b398-483c-bda0-81db6f848d42-000000 has an incompatible Parquet schema for column
[2018-08-13 09:40:36] query: 11500732
[2018-08-13 09:40:36] location: dory_util.cpp:724
[2018-08-13 09:40:36] process: query1_703_11500732 [pid=5384]
My hunch is it is because of the different hive version or it could be a redshift spectrum bug.
Does anyone has faced the same issue ?
I think this particular post will help you in solving this issue. It talks about the issues where schema is written by a different version and read by another version.
https://slack.engineering/data-wrangling-at-slack-f2e0ff633b69

Cannot load jdbc driver class org.apache.hive.jdbc.hivedriver in Kylo

I am trying to create a Data Ingest Feed but all the jobs are failing. I checked Nifi and there are error marks saying that "org.apache.hive.jdbc.hivedriver" was not found. I checked the nifi logs and found the following error :
So where exactly do I need to put the hivedriver jar?
Based on the comments, this seems to be the solution as mentioned by #Greg Hart:
Have you tried using a Data Transformation feed? The Data Ingest
template is for loading data into Hive, but it looks like you're using
it to move data from one Hive table into another.

FAILED: Hive Internal Error: java.util.NoSuchElementException(null) while running a CREATE TABLE query from shark command line

I am trying to create a table in hive metastore using shark by executing the following command:
CREATE TABLE src(key int, value string);
but i always get:
FAILED: Hive Internal Error: java.util.NoSuchElementException(null)
Read about the same thing in the google group- shark-users but alas.
My spark version is 0.8.1
My shark version is 0.8.1
Hive binary version is 0.9.0
I have pre installed hive-0.10.0 from cdh4.5.0 but i cant use it since shark 0.8.1 is not compatible with hive-0.10.0 yet.
I can run various queries like select * from table_name; but not create table query.
Even trying to create a cached table fails.
If i try and do sbt build using my HADOOP_VERSION=2.0.0cdh4.5.0, i get DistributedFileSystem error and i am not able to run any query.
I am dire need of a solution. Ill be glad if somebody can put me on to a right direction. I have mysql database and not derby.
I encountered a similar problem, and it seems that this occurs only in 0.8.1 of Shark. I solved it by reverting to Spark and Shark 0.8.0, and it works fine.
0.8.0 and 0.8.1 are very similar in functionality and unless you are using Spark for the added functionality between the two releases, you would be better off staying with 0.8.0.
By the way, it's SPARK_HADOOP_VERSION and SHARK_HADOOP_VERSION if you intend to build those two from the source code. It's not just HADOOP_VERSION.

Pentaho Kettle is not working for Vertica DB

I need to parse a CSV file and write the data to a Vertica database. The issue is that I get an error when I create a Vertica database connection in Spoon. The following is the error at the end of the post.
I tried copying the following two JAR files and adding them to libext/jdbc:
vertica-jdbc-4.1.14.jar and vertica-jdk5-6.1.2-0.jar
But the above didn't help. I am looking for pointers!
Error:
Error connecting to database [Vertica Dev] : org.pentaho.di.core.exception.KettleDatabaseException:
Error occured while trying to connect to the database
Exception while loading class
com.vertica.jdbc.Driver
org.pentaho.di.core.exception.KettleDatabaseException:
Error occured while trying to connect to the database
Exception while loading class
com.vertica.jdbc.Driver
at org.pentaho.di.core.database.Database.normalConnect(Database.java:366)
The two JAR files you copied are of two different versions of Vertica and do not use the same class.
vertica-jdk5-6.1.2-0.jar will expose com.vertica.jdbc.Driver whereas version 4 will expose com.vertica.Driver.
The error message thus makes obvious that Pentaho is looking for com.vertica.jdbc.Driver (version 5, thus). If it fails, it probably is because the JAR version 4 is loaded first.
Try to delete the version 4 only from the libext/jdbc, keep the version 5, and restart Pentaho.
On a side note, this class is hardcoded in Pentaho, so if you do need to use the JAR version 4 and feel adventurous, you just need to get the Pentaho source, update VerticaDatabaseMeta.java, and recompile.