I currently have a pretty big job up and running on Pentaho Spoon 5.4.0.1-130, but, unfortunately, I keep getting an error when I try to execute the same job on Pentaho Kitchen:
2016/09/08 03:36:05 - Staging Titular.0 - ERROR (version 5.4.0.1-130, build 1 from 2015-06-14_12-34-55 by buildguy) : Unexpected error rolling back the database connection.
2016/09/08 03:36:05 - Staging Titular.0 - ERROR (version 5.4.0.1-130, build 1 from 2015-06-14_12-34-55 by buildguy) : org.pentaho.di.core.exception.KettleDatabaseException:
2016/09/08 03:36:05 - Staging Titular.0 - Unable to get database metadata from this database connection
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.job.Job.run (Job.java:424)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.job.Job.execute (Job.java:532)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.job.Job.execute (Job.java:859)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.job.Job.execute (Job.java:859)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.job.Job.execute (Job.java:859)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.job.Job.execute (Job.java:859)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.job.Job.execute (Job.java:716)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.job.entries.trans.JobEntryTrans.execute (JobEntryTrans.java:1065)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.trans.Trans.execute (Trans.java:607)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.trans.Trans.prepareExecution (Trans.java:1120)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.trans.steps.tableoutput.TableOutput.dispose (TableOutput.java:610)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.core.database.Database.rollback (Database.java:845)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.core.database.Database.rollback (Database.java:853)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.core.database.Database.getDatabaseMetaData (Database.java:2758)
2016/09/08 03:36:05 - Staging Titular.0 -
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.core.database.Database.getDatabaseMetaData(Database.java:2760)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.core.database.Database.rollback(Database.java:853)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.core.database.Database.rollback(Database.java:845)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.trans.steps.tableoutput.TableOutput.dispose(TableOutput.java:610)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.trans.Trans.prepareExecution(Trans.java:1120)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.trans.Trans.execute(Trans.java:607)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.job.entries.trans.JobEntryTrans.execute(JobEntryTrans.java:1065)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.job.Job.execute(Job.java:716)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.job.Job.execute(Job.java:859)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.job.Job.execute(Job.java:859)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.job.Job.execute(Job.java:859)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.job.Job.execute(Job.java:859)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.job.Job.execute(Job.java:532)
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.job.Job.run(Job.java:424)
2016/09/08 03:36:05 - Staging Titular.0 - Caused by: java.lang.NullPointerException
2016/09/08 03:36:05 - Staging Titular.0 - at org.pentaho.di.core.database.Database.getDatabaseMetaData(Database.java:2758)
2016/09/08 03:36:05 - Staging Titular.0 - ... 13 more
I have no idea what might be happening, so any help would be appreciated.
Thanks for your time!
Thankfully, Dirk Trilsbeek was right! (see the first comment on the question)
The problem was with a shared database connection. Once I configured schtasks (windows task scheduler) to use the correct user, I had no problem using Kitchen to execute my job.
Below is the configuration I'm currently using to run the task:
REM call-pentaho-job.bat
c:
cd /d "C:\pentaho\data-integration"
call Kitchen.bat /file:"C:\app\my-job.kjb" /level:Detailed /logfile:"C:\app\logs\my-job.txt"
exit
REM
REM schtasks /create /tn "MY-PENTAHO-JOB" /tr "\"C:\app\call-pentaho-job.bat\"" /ru MYDOMAIN\myuser /sc daily /st 03:00
REM
Thank you, Dirk!
Related
I have this ssh file which i run on my linux server with Pentaho. On my local machine it works fine.
but if i run this command on my server
./incremental_job.sh
it returns me this on the log output
2022/08/19 14:17:32 - job_etl_incremental - Starting entry [Job failed alert]
2022/08/19 14:17:37 - job_etl_incremental - Finished job entry [Job failed alert] (result=[true])
2022/08/19 14:17:37 - job_etl_incremental - Finished job entry [Job ERP Stock] (result=[true])
2022/08/19 14:17:37 - job_etl_incremental - Job execution finished
2022/08/19 14:17:37 - Carte - Installing timer to purge stale objects after 1440 minutes.
2022/08/19 14:17:37 - Kitchen - Finished!
Any ideas how to fix i this? I run other SSH files without having this issue.
In pentaho I get an error when I read a BigQuery table with a "Table Entry", I have these considerations:
This table was created from a Google Drive sheet with the service account
I can read this table with "Google Sheet Plugins"
[1]: https://i.stack.imgur.com/q9dl0.png
[2]: https://i.stack.imgur.com/gCxQK.png
2022/07/29 21:20:36 - Select.0 - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : An error occurred, processing will be stopped:
2022/07/29 21:20:36 - Select.0 - An error occurred executing SQL:
2022/07/29 21:20:36 - Select.0 - select 1 from `ms-data-warehouse.ms_Dev_Staging.ET_ods_hour`
2022/07/29 21:20:36 - Select.0 - [Simba][BigQueryJDBCDriver](100032) Error executing query job. Message: BIGQUERY_API_ERR
2022/07/29 21:20:36 - Select.0 - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : Error initializing step [Select]
2022/07/29 21:20:36 - insert drive - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : Step [Select.0] failed to initialize!
2022/07/29 21:20:36 - Select.0 - Finished reading query, closing connection.
2022/07/29 21:20:36 - Spoon - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : insert drive: preparing transformation execution failed
2022/07/29 21:20:36 - Spoon - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : org.pentaho.di.core.exception.KettleException:
2022/07/29 21:20:36 - Spoon - We failed to initialize at least one step. Execution can not begin!
Your second screenshot says that it doesn't have the Drive access.
BigQuery doesn't keep the credential for accessing the Google Drive, instead, BigQuery uses the "current user" credential trying to access Google Drive.
Apparently the "service account" has the Google Drive access (in order to create that table) but either your account or the account used to setup the Simba BigQueryJDBCDriver doesn't have the access to the Google Drive file.
Im running this job on Pentaho 8.2 and I have this error log.
2022/05/17 14:47:51 - job_etl_incremental - Finished job entry [Job User Sales Support] (result=[false])
2022/05/17 14:47:51 - job_etl_incremental - Finished job entry [Job User] (result=[false])
2022/05/17 14:47:51 - job_etl_incremental - Job execution finished
2022/05/17 14:47:51 - Kitchen - Finished!
2022/05/17 14:47:51 - Kitchen - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : Finished with errors
2022/05/17 14:47:51 - Kitchen - Start=2022/05/17 14:46:04.982, Stop=2022/05/17 14:47:51.711
2022/05/17 14:47:51 - Kitchen - Processing ended after 1 minutes and 46 seconds (106 seconds total).
it does run some jobs but not completely with (result=[false]). Any ideas on how to fix this?
I do have this pentaho-metadata-8.2.0.0-342.jar in my /home/pentaho/data-integration/lib
I have installed CDH 5.3 cluster on Ubuntu, I respected all the configurations recommended by Cloudera, it has hadoop + HBase.
The problem arise when i try to load the data and dump it using PIG the job is still stagnate, and I always reload 0%
OS: Ubuntu 14.04 64
Parcel CDH 5.3 (or 5.5.1)
Job : a = load '/user/nadir/data.txt' ; dump a ;
logs:
2016-02-12 04: 06: 33.869 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1455246282704_0001
2016-02-12 04: 06: 33.869 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing has aliases
2016-02-12 04: 06: 33.869 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: a [1,4] C: R:
2016-02-12 04: 06: 34.121 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% Complete
I have the latest version of Kettle/PDI. Carte is running locally on Windows with the following configuration:
<slave_config>
<slaveserver>
<name>master1</name>
<hostname>localhost</hostname>
<port>8081</port>
<master>Y</master>
</slaveserver>
<repository>
<name>PDI Repo</name>
<username>username</username>
<password>password</password>
</repository>
</slave_config>
And in .kettle/repositories.xml:
<repositories>
<connection>
<name>PDI Repo</name>
<server>127.0.0.1</server>
<type>MYSQL</type>
<access>Native</access>
<database>pdi</database>
<port>3306</port>
<username>username</username>
<password>Encrypted password</password>
<servername/>
<data_tablespace/>
<index_tablespace/>
<attributes>
<attribute><code>EXTRA_OPTION_MYSQL.defaultFetchSize</code><attribute>500</attribute></attribute>
<attribute><code>EXTRA_OPTION_MYSQL.useCursorFetch</code><attribute>true</attribute></attribute>
<attribute><code>FORCE_IDENTIFIERS_TO_LOWERCASE</code><attribute>N</attribute></attribute>
<attribute><code>FORCE_IDENTIFIERS_TO_UPPERCASE</code><attribute>N</attribute></attribute>
<attribute><code>IS_CLUSTERED</code><attribute>N</attribute></attribute>
<attribute><code>PORT_NUMBER</code><attribute>3306</attribute></attribute>
<attribute><code>PRESERVE_RESERVED_WORD_CASE</code><attribute>N</attribute></attribute>
<attribute><code>QUOTE_ALL_FIELDS</code><attribute>N</attribute></attribute>
<attribute><code>STREAM_RESULTS</code><attribute>Y</attribute></attribute>
<attribute><code>SUPPORTS_BOOLEAN_DATA_TYPE</code><attribute>Y</attribute></attribute>
<attribute><code>SUPPORTS_TIMESTAMP_DATA_TYPE</code><attribute>Y</attribute></attribute>
<attribute><code>USE_POOLING</code><attribute>N</attribute></attribute>
</attributes>
</connection>
<repository>
<id>KettleDatabaseRepository</id>
<name>PDI Repo</name>
<description>PDI Repo</description>
<connection>PDI Repo</connection>
</repository>
</repositories>
You'll notice these are pretty much the default with some specific configuration for the repository database. Through Spoon, I've created a simple transformation that runs a select on a database table, performs some simple calculations on the columns, and returns some JSON.
If I tell the transformation to run on master1, it works and spits out the JSON.
If I run exactly the same command again, it errors:
2016/01/29 10:05:53 - PDI Repo - ERROR (version 6.0.1.0-386, build 1 from 2015-12-03 11.37.25 by buildguy) : Error disconnecting from database :
2016/01/29 10:05:53 - PDI Repo - Unable to commit repository connection
2016/01/29 10:05:53 - PDI Repo -
2016/01/29 10:05:53 - PDI Repo - Error comitting connection
2016/01/29 10:05:53 - PDI Repo - at java.lang.Thread.run (Thread.java:745)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.trans.step.RunThread.run (RunThread.java:121)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.trans.step.BaseStep.markStop (BaseStep.java:2992)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.trans.Trans$1.stepFinished (Trans.java:1233)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.trans.Trans.fireTransFinishedListeners (Trans.java:1478)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.www.BaseJobServlet$3.transFinished (BaseJobServlet.java:170)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.repository.kdr.KettleDatabaseRepository.disconnect (KettleDatabaseRepository.java:1655)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.repository.kdr.delegates.KettleDatabaseRepositoryConnectionDelegate.disconnect(KettleDatabaseRepositoryConnectionDelegate.java:257)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.repository.kdr.delegates.KettleDatabaseRepositoryConnectionDelegate.commit(KettleDatabaseRepositoryConnectionDelegate.java:283)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.core.database.Database.commit (Database.java:738)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.core.database.Database.commit (Database.java:757)
I don't understand why the connection to the repository database fails after the first request. Carte continues to run, despite this error, but will throw errors like this when accessed via URL:
<webresult>
<result>ERROR</result>
<message>Unexpected error executing the transformation:
java.lang.NullPointerException
at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:128)
at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:106)
at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2716)
at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2684)
at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2661)
at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2641)
at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2606)
at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2569)
at org.pentaho.di.www.ExecuteTransServlet.loadTransformation(ExecuteTransServlet.java:316)
at org.pentaho.di.www.ExecuteTransServlet.doGet(ExecuteTransServlet.java:232)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:522)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)</message>
<id />
</webresult>
I dug through the code for that stack trace, and that means the Repository object is null. So, for some reason, Carte can connect to the PDI repository, but when it succeeds once, something errors and it drops the connection and can no longer find the transformations.