Pentaho Kettle is not working for Vertica DB - pentaho

I need to parse a CSV file and write the data to a Vertica database. The issue is that I get an error when I create a Vertica database connection in Spoon. The following is the error at the end of the post.
I tried copying the following two JAR files and adding them to libext/jdbc:
vertica-jdbc-4.1.14.jar and vertica-jdk5-6.1.2-0.jar
But the above didn't help. I am looking for pointers!
Error:
Error connecting to database [Vertica Dev] : org.pentaho.di.core.exception.KettleDatabaseException:
Error occured while trying to connect to the database
Exception while loading class
com.vertica.jdbc.Driver
org.pentaho.di.core.exception.KettleDatabaseException:
Error occured while trying to connect to the database
Exception while loading class
com.vertica.jdbc.Driver
at org.pentaho.di.core.database.Database.normalConnect(Database.java:366)

The two JAR files you copied are of two different versions of Vertica and do not use the same class.
vertica-jdk5-6.1.2-0.jar will expose com.vertica.jdbc.Driver whereas version 4 will expose com.vertica.Driver.
The error message thus makes obvious that Pentaho is looking for com.vertica.jdbc.Driver (version 5, thus). If it fails, it probably is because the JAR version 4 is loaded first.
Try to delete the version 4 only from the libext/jdbc, keep the version 5, and restart Pentaho.
On a side note, this class is hardcoded in Pentaho, so if you do need to use the JAR version 4 and feel adventurous, you just need to get the Pentaho source, update VerticaDatabaseMeta.java, and recompile.

Related

Committing hudi files manually

I am using spark 3.x with apache-hudi 0.8.0 version.
While I am trying to create presto table by using hudi-hive-sync tool I am getting below error.
Got runtime exception when hive syncing
java.lang.IllegalArgumentException: Could not find any data file written for commit [20220116033425__commit__COMPLETED], could not get schema for table
But I checked all data for partitiionKeys using zepplin notebook , I see all data present.
Its understood that I need to do manually commit the file. How to do it ?

Talend Open Studio: Load input files into database

I have an empty SQLlite database. Next to that, I have 6 input files (delimited, excel, json, xml).
Now, all I want to do is load the input files into the empty database.
I tried to connect one input file with the DB and just run it. That didn't work (the DB doens't have anything in it, I suspect that is a problem).
Then, I tried to connect an input file with a tMap, define the table there, define the schema and connect the tMap to the DB (tSQLliteOutput).
When I tried to run it, I receive the following error:
Starting job ProductDemo_Load at 16:46 15/11/2015.
[statistics] connecting to socket on port 3843
[statistics] connected
Exception in component tSQLiteOutput_1
java.sql.SQLException: no such table:
at org.sqlite.DB.throwex(DB.java:288)
at org.sqlite.NativeDB.prepare(Native Method)
at org.sqlite.DB.prepare(DB.java:114)
at org.sqlite.PrepStmt.<init>(PrepStmt.java:37)
at org.sqlite.Conn.prepareStatement(Conn.java:231)
at org.sqlite.Conn.prepareStatement(Conn.java:224)
at org.sqlite.Conn.prepareStatement(Conn.java:213)
at workshop_test.productdemo_load_0_1.ProductDemo_Load.tFileInputExcel_1Process(ProductDemo_Load.java:751)
at workshop_test.productdemo_load_0_1.ProductDemo_Load.runJobInTOS(ProductDemo_Load.java:1672)
at workshop_test.productdemo_load_0_1.ProductDemo_Load.main(ProductDemo_Load.java:1529)
[statistics] disconnected
Job ProductDemo_Load ended at 16:46 15/11/2015. [exit code=1]
I see there's something wrong with the import, but what exactly?
What should I do in order to succesfully load the data from the input files in the database?
I did the exact steps from this little tutorial:
Talend Job: load data into database.
Most talend output components have create table if not exists option.. Did u checked this in your tsqliteoutput..error seems that when talend is inserting data into empty database your table it is not able to find it as it does not exists.. So you to tell talend to create the table first..

Sql power architect to compare two data models

I need to compare the current data model with the old data model.
I am using sql power architect for it to do the comparison, I can able to configure the connections for accessing the database, where the connection is successful.
(I am using amazon redshift DB as the source for this.)
But when I tried to expand the children, i am getting the list of table objects associated with it and when I tried to do a compare datamodel option, I am seeing the below error.
Help me to resolve the same.
Caused by: ca.sqlpower.sqlobject.SQLObjectException:
relationship.populate at
ca.sqlpower.sqlobject.SQLRelationship.fetchExportedKeys(SQLRelationship.java:740)
at
ca.sqlpower.sqlobject.SQLTable.populateRelationships(SQLTable.java:731)
at ca.sqlpower.sqlobject.SQLTable.populateImpl(SQLTable.java:1337)
at ca.sqlpower.sqlobject.SQLObject.populate(SQLObject.java:186) ...
4 more Caused by: org.postgresql.util.PSQLException: Unable to
determine a value for MaxIndexKeys due to missing system catalog data.
at
org.postgresql.jdbc2.AbstractJdbc2DatabaseMetaData.getMaxIndexKeys(AbstractJdbc2DatabaseMetaData.java:64)
at
org.postgresql.jdbc2.AbstractJdbc2DatabaseMetaData.getImportedExportedKeys(AbstractJdbc2DatabaseMetaData.java:3196)
at
org.postgresql.jdbc2.AbstractJdbc2DatabaseMetaData.getExportedKeys(AbstractJdbc2DatabaseMetaData.java:3584)
at
ca.sqlpower.sql.jdbcwrapper.DatabaseMetaDataDecorator.getExportedKeys(DatabaseMetaDataDecorator.java:388)
at
ca.sqlpower.sqlobject.SQLRelationship.fetchExportedKeys(SQLRelationship.java:735)
... 7 more
you are using the wrong version of JDBC .
please follow the below steps
- download the JDBC driver from AWS redshit from AWS website.
- configure the JDBC driver part in sql power architect from connection manager.
- go to JDBC driver -> select postgres -> in add jars configure the jar downloaded -> configure the driver class name - click OK
- now go back to connection manager
- select the appropriate connection and select edit and test for connection
- you can see the downloaded jar configured
- now you can add the data object to sql power architect.

Pentaho not retaining the log and temp files

I am running a pentaho ETL kettle transformation(.ktr) to load data from a source db2 database into a destination netezza database.
When I run the transformation, I specify the directory to store the log files and temp .txt files. But after the transformation finishes, these files are no longer there, so I guess pentaho is cleaning them up. IS there a way to retain these files?
The other problem is that I am getting a sql exception while the transformation step is inserting into netezza like this:
error
2013/10/30 14:13:17 - Load XXX_TABLE_NAME - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.netezza.internal.QueryExecutor.getNextResult(QueryExecutor.java:279)
No further details are there. How can I troubleshoot this?
That seems like an issue with pentaho. Is there no way to generate a trace of what it's doing in the transformation ? are you sure it's reading data ? what happens if the target is not netezza ?
If you've got access to the netezza appliance, there are a few options, all in the documentation. off the top of my head:
look in the current queries view while it's running
enable query history logging (requires admin access + restarting the instance)
check the pg.log file in /nz/kit/log/postgres/ (logs all queries by default)

accesing Views created in Hive using HcatLoader in Pig

I was just trying something in hive and HcatLoader in Pig. What I did is, created a view in Hive and then tried to load data by view I created into pig using HcatLoader. But it seems it is not working. I just wanted to confirm that is there any way to do this? I am getting following error when I tried to load view in pig using HcatLoader
events=Load 'ViewName' using org.apache.hcatalog.pig.HCatLoader();
dump events;
When I use any tableName instead of View from Hive, it seems to work. Further it does not give metastore error. As it says successfully connected to metastore at load statement when it comes to dump, it crashes with the following error.
Any Pointers will be helpful.
Thanks,
Atul
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias events
at org.apache.pig.PigServer.openIterator(PigServer.java:857)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:682)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:555)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias events
at org.apache.pig.PigServer.storeEx(PigServer.java:956)
at org.apache.pig.PigServer.store(PigServer.java:919)
at org.apache.pig.PigServer.openIterator(PigServer.java:832)
... 12 more
Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException: ERROR 2017: Internal error creating job configuration.
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:731)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:259)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:180)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1270)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1255)
at org.apache.pig.PigServer.storeEx(PigServer.java:952)
Response I recieved by posting it on some other forum.
"HCatLoader does not support reading views in Hive. The issue is that a view is defined as a query on a table (create view V as select x, y from t).
Pig doesn't speak SQL,
and
HCat doesn't contain Hive's execution engine
so it cannot execute the query either. Reading Hive views from Pig and MR will require much tighter integration of the products than we currently have."
I found the same issue the hard way today. Hive cannot read Hive Views (but lacks good exception handling code on this topic).
For the records (anybody else falling into this problem), this is how the current version behaves: On Hortonworks 2.3 with Pig 1.15 I only got the following error in the log:
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2017: Internal error
creating job configuration.
Pig fails this way because there is no file to load (as we attempted to load from a View).
Since Pig loads the data from a file in hadoop, reading data from an view (which does not have a physical file) may not work.
May be if we can manage to create a file for the view in hadoop, Pig may be able to load it. Atleast a virtual pointer file to the actual data file.
Not sure if this is possible or has been thought through.