Pentaho Kettle/PDI fails on second request - pentaho

I have the latest version of Kettle/PDI. Carte is running locally on Windows with the following configuration:
<slave_config>
<slaveserver>
<name>master1</name>
<hostname>localhost</hostname>
<port>8081</port>
<master>Y</master>
</slaveserver>
<repository>
<name>PDI Repo</name>
<username>username</username>
<password>password</password>
</repository>
</slave_config>
And in .kettle/repositories.xml:
<repositories>
<connection>
<name>PDI Repo</name>
<server>127.0.0.1</server>
<type>MYSQL</type>
<access>Native</access>
<database>pdi</database>
<port>3306</port>
<username>username</username>
<password>Encrypted password</password>
<servername/>
<data_tablespace/>
<index_tablespace/>
<attributes>
<attribute><code>EXTRA_OPTION_MYSQL.defaultFetchSize</code><attribute>500</attribute></attribute>
<attribute><code>EXTRA_OPTION_MYSQL.useCursorFetch</code><attribute>true</attribute></attribute>
<attribute><code>FORCE_IDENTIFIERS_TO_LOWERCASE</code><attribute>N</attribute></attribute>
<attribute><code>FORCE_IDENTIFIERS_TO_UPPERCASE</code><attribute>N</attribute></attribute>
<attribute><code>IS_CLUSTERED</code><attribute>N</attribute></attribute>
<attribute><code>PORT_NUMBER</code><attribute>3306</attribute></attribute>
<attribute><code>PRESERVE_RESERVED_WORD_CASE</code><attribute>N</attribute></attribute>
<attribute><code>QUOTE_ALL_FIELDS</code><attribute>N</attribute></attribute>
<attribute><code>STREAM_RESULTS</code><attribute>Y</attribute></attribute>
<attribute><code>SUPPORTS_BOOLEAN_DATA_TYPE</code><attribute>Y</attribute></attribute>
<attribute><code>SUPPORTS_TIMESTAMP_DATA_TYPE</code><attribute>Y</attribute></attribute>
<attribute><code>USE_POOLING</code><attribute>N</attribute></attribute>
</attributes>
</connection>
<repository>
<id>KettleDatabaseRepository</id>
<name>PDI Repo</name>
<description>PDI Repo</description>
<connection>PDI Repo</connection>
</repository>
</repositories>
You'll notice these are pretty much the default with some specific configuration for the repository database. Through Spoon, I've created a simple transformation that runs a select on a database table, performs some simple calculations on the columns, and returns some JSON.
If I tell the transformation to run on master1, it works and spits out the JSON.
If I run exactly the same command again, it errors:
2016/01/29 10:05:53 - PDI Repo - ERROR (version 6.0.1.0-386, build 1 from 2015-12-03 11.37.25 by buildguy) : Error disconnecting from database :
2016/01/29 10:05:53 - PDI Repo - Unable to commit repository connection
2016/01/29 10:05:53 - PDI Repo -
2016/01/29 10:05:53 - PDI Repo - Error comitting connection
2016/01/29 10:05:53 - PDI Repo - at java.lang.Thread.run (Thread.java:745)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.trans.step.RunThread.run (RunThread.java:121)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.trans.step.BaseStep.markStop (BaseStep.java:2992)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.trans.Trans$1.stepFinished (Trans.java:1233)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.trans.Trans.fireTransFinishedListeners (Trans.java:1478)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.www.BaseJobServlet$3.transFinished (BaseJobServlet.java:170)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.repository.kdr.KettleDatabaseRepository.disconnect (KettleDatabaseRepository.java:1655)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.repository.kdr.delegates.KettleDatabaseRepositoryConnectionDelegate.disconnect(KettleDatabaseRepositoryConnectionDelegate.java:257)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.repository.kdr.delegates.KettleDatabaseRepositoryConnectionDelegate.commit(KettleDatabaseRepositoryConnectionDelegate.java:283)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.core.database.Database.commit (Database.java:738)
2016/01/29 10:05:53 - PDI Repo - at org.pentaho.di.core.database.Database.commit (Database.java:757)
I don't understand why the connection to the repository database fails after the first request. Carte continues to run, despite this error, but will throw errors like this when accessed via URL:
<webresult>
<result>ERROR</result>
<message>Unexpected error executing the transformation:
java.lang.NullPointerException
at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:128)
at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:106)
at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2716)
at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2684)
at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2661)
at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2641)
at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2606)
at org.pentaho.di.trans.TransMeta.<init>(TransMeta.java:2569)
at org.pentaho.di.www.ExecuteTransServlet.loadTransformation(ExecuteTransServlet.java:316)
at org.pentaho.di.www.ExecuteTransServlet.doGet(ExecuteTransServlet.java:232)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:522)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)</message>
<id />
</webresult>
I dug through the code for that stack trace, and that means the Repository object is null. So, for some reason, Carte can connect to the PDI repository, but when it succeeds once, something errors and it drops the connection and can no longer find the transformations.

Related

Unable to initialize hive with Derby from Brew install

It had been my understanding that Derby creates file(s) in the current directory. But there are none there.
So I had tried to do the hive initialization using Derby: but .. it seems there is a derby database already.
schematool --verbose -initSchema -dbType derby
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.derby.sql
Connecting to jdbc:derby:;databaseName=metastore_db;create=true
Connected to: Apache Derby (version 10.10.2.0 - (1582446))
Driver: Apache Derby Embedded JDBC Driver (version 10.10.2.0 - (1582446))
Transaction isolation: TRANSACTION_READ_COMMITTED
0: jdbc:derby:> !autocommit on
Autocommit status: true
0: jdbc:derby:> CREATE FUNCTION "APP"."NUCLEUS_ASCII" (C CHAR(1)) RETURNS INTEGER LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA CALLED ON NULL INPUT EXTERNAL NAME 'org.datanucleus.store.rdbms.adapter.DerbySQLFunction.ascii'
Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
Closing: 0: jdbc:derby:;databaseName=metastore_db;create=true
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:291)
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:264)
at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:505)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Schema script failed, errorcode 2
at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:390)
at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:347)
at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:287)
So .. where is it?
Update I have reinstalled hive from scratch using
brew reinstall hive
And the same error occurs.
Another update Given the new direction of this error it now is answered by within another question:
An answer to a non-os/x - but similar otherwise - question was found that can serve here:
https://stackoverflow.com/a/40017753/1056563
I installed hive with HomeBrew(MacOS) at /usr/local/Cellar/hive and afer running schematool -dbType derby -initSchema I get the following error message:
Starting metastore schema initialization to 2.0.0 Initialization script hive-schema-2.0.0.derby.sql Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000) org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
However, I can't find either metastore_db or metastore_db.tmp folder under install path, so I tried:
find /usr/ -name hive-schema-2.0.0.derby.sql
vi /usr/local/Cellar/hive/2.0.1/libexec/scripts/metastore/upgrade/derby/hive-schema-2.0.0.derby.sql
comment the 'NUCLEUS_ASCII' function and 'NUCLEUS_MATCHES' function
rerun schematool -dbType derby -initSchema, then everything goes well!
Homebrew installs Hive (version 2.3.1) unconfigured. The default settings are to use in-process Derby database (Hive already includes the required lib).
The only thing you have to do (immediatelly after brew install hive) is to initialize the database:
schematool -initSchema -dbType derby
and then you can run hive, and it will work. However, if you tried to run hive before initializing the database, Hive will actually semi-create an incomplete database and will fail to work:
show tables;
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Since the database is semi-created, schematool will now fail as well:
Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
To fix that, you will have to delete the database:
rm -Rf metastore_db
and run the initilization command again.
Noticed that I deleted the metastore_db from current directory? This is another problem: Hive is configured to create and use the Derby database in current working dir. This is because it has the following default value for ‘javax.jdo.option.ConnectionURL’:
jdbc:derby:;databaseName=metastore_db;create=true
To fix that, create file /usr/local/opt/hive/libexec/conf/hive-site.xml as
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:/usr/local/var/hive/metastore_db;create=true</value>
</property>
</configuration>
and recreate the database like before. Now the database is in /usr/local/var/hive, so in case you again accidentally ran hive before initializing the DB, delete it with:
rm -Rf /usr/local/var/hive
You might have to look at the hive configuration file. That should tell you where it is being initialized.

Maven build issue due to codehause termination

As codehause services are terminated, getting below build error. How do I change the codehause repository to another repository or is there any another way to handle this issue.
*** CHECKSUM FAILED - Checksum failed on download: local = '7226cd6d25
06216c083eb2a25d71db156ed7e3f3'; remote = - RETRYING
Downloading: repository.codehaus.org/org/apache/geronimo/genesis/config/c
onfig/1.1/config-1.1.pom
318b downloaded
[WARNING] *** CHECKSUM FAILED - Checksum failed on download: local = '7226cd6d25
06216c083eb2a25d71db156ed7e3f3'; remote = - IGNORING
[ERROR] BUILD ERROR
I have below dependency in my project,
<dependency>
<groupId>org.codehaus.woodstox</groupId>
<artifactId>wstx-asl</artifactId>
<version>3.2.1</version>
</dependency>

Nexus Proxy Repo does not want to fetch

I am using nexus 1.9.2. I setup a proxy repo to a remote location (which can be accessed via http://somelocation.com). I added this proxy repo to Nexus' Public Repositories group. My maven's settings.xml is set to use Nexus (in the <mirror /> section).
When I login to Nexus via a web browser and click on this newly added proxy repo and then to Browse Remote tab I can see all the artifacts. However, when I click the tabs Browse Storage or Browse Index I do not see any artifact.
When I do mvn clean install I do get missing artifact, it simply does not want to fetch from the remote site.
I am getting the following
1 required artifact is missing.
for artifact:
com.somelocation:someserverapp:jar:1.1.0-SNAPSHOT
from the specified remote repositories:
release-repo (http://localrepo:8081/nexus/content/groups/public),
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:711)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalWithLifecycle(DefaultLifecycleExecutor.java:556)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:535)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:387)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:348)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:180)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:328)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:138)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:362)
at org.apache.maven.cli.compat.CompatibleMain.main(CompatibleMain.java:60)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315)
at org.codehaus.classworlds.Launcher.launch(Launcher.java:255)
at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430)
at org.codehaus.classworlds.Launcher.main(Launcher.java:375)
Caused by: org.apache.maven.artifact.resolver.MultipleArtifactsNotFoundException: Missing:
1) com.somelocation:somelocation-networking-packet:jar:1.0.0
Try downloading the file manually from the project website.
Then, install it using the command:
mvn install:install-file -DgroupId=com.somelocation -DartifactId=somelocation-networking-packet -Dversion=1.0.0 -Dpackaging=jar -Dfile=/path/to/file
Alternatively, if you host your own repository you can deploy the file there:
mvn deploy:deploy-file -DgroupId=com.somelocation -DartifactId=somelocation-networking-packet -Dversion=1.0.0 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]
Path to dependency:
1) com.somelocation:someserverapp:jar:1.1.0-SNAPSHOT
2) com.somelocation:somelocation-networking-packet:jar:1.0.0
Any ideas why?

maven and lift using scala 2.8 : lift-mapper missing?

Newbie question since I'm not up to speed using
maven at all.
I'm trying to use scala + lift using scala 2.8, environment
is a win7 box if that matters.
I create a basic project using:
mvn archetype:generate -U -DarchetypeGroupId=net.liftweb -DarchetypeArtifactId=lift-archetype-basic -DarchetypeVersion=2.0-scala280-SNAPSHOT -DarchetypeRepository=http://scala-tools.org/repo-snapshots -DremoteRepositories=http://scala-tools.org/repo-snapshots -DgroupId=com.liftworkshop
-DartifactId=todo -Dversion=1.0-SNAPSHOT
So far so good, but then, I try to cd into my new project
and do:
mvn jetty:run
I after quite a few downloads end up with a error like below:
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] Failed to resolve artifact.
Missing:
----------
1) net.liftweb:lift-mapper:jar:2.0-scala280-SNAPSHOT
Try downloading the file manually from the project website.
Then, install it using the command:
mvn install:install-file -DgroupId=net.liftweb -DartifactId=lift-mapper -D
version=2.0-scala280-SNAPSHOT -Dpackaging=jar -Dfile=/path/to/file
Alternatively, if you host your own repository you can deploy the file there:
mvn deploy:deploy-file -DgroupId=net.liftweb -DartifactId=lift-mapper -Dve
rsion=2.0-scala280-SNAPSHOT -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -Dr
epositoryId=[id]
Path to dependency:
1) com.liftworkshop:todo:war:1.0-SNAPSHOT
2) net.liftweb:lift-mapper:jar:2.0-scala280-SNAPSHOT
----------
1 required artifact is missing.
for artifact:
com.liftworkshop:todo:war:1.0-SNAPSHOT
from the specified remote repositories:
scala-tools.snapshots (http://scala-tools.org/repo-snapshots),
scala-tools.releases (http://scala-tools.org/repo-releases),
central (http://repo1.maven.org/maven2)
Any ideas?
I created the same project using the mvn archetype:generate command you provided but I couldn't reproduce your problem. The lift-mapper-2.0-scala280-SNAPSHOT.jar artifact is definitely in the scala snapshots repository and Maven downloaded it:
...
1619K downloaded (lift-mapper-2.0-scala280-SNAPSHOT.jar)
[WARNING] *** CHECKSUM FAILED - Checksum failed on download: local = '0c857e2c5de9d5cabb7c972e519528606f19697b'; remote = 'a258cf7d7a49a8d7163d499da06a4d1e231a78e0' - RETRYING
Downloading: http://scala-tools.org/repo-snapshots/net/liftweb/lift-mapper/2.0-scala280-SNAPSHOT/lift-mapper-2.0-scala280-SNAPSHOT.jar
1619K downloaded (lift-mapper-2.0-scala280-SNAPSHOT.jar)
As you can see, Maven had to retry the download because of a failed CHECKSUM check but it worked.
Just try again.

How to Get Maven to Fail on Deploy when warned about "CHECKSUM FAILED"

On maven deploy maven attempts to retrieve the previous metadata form the repository. If it is corrupt maven issues a warning, calls the build successful but doesn't upload my artifact. This was caused by corruption in my repository and I'd like to either avoid it in future or make it more obvious with a build failure.
Can I alter my pom to change this warning into an error so I'll see it quickly?
[INFO] Retrieving previous metadata from daeng-snap
[WARNING] *** CHECKSUM FAILED - Checksum failed on download: local = 'ea12f35b3bc6d88f7336891562d91985b412bf1a'; remote = '51a6f4a52ad8f3926dbb28807317a90b9cd62ec1' - RETRYING
[WARNING] *** CHECKSUM FAILED - Checksum failed on download: local = 'ea12f35b3bc6d88f7336891562d91985b412bf1a'; remote = '51a6f4a52ad8f3926dbb28807317a90b9cd62ec1' - IGNORING
[INFO] Uploading repository metadata for: 'artifact com.myco.xyz'
[INFO] Uploading project information for xyz 5.0.2-20091224.163241-12
[INFO] Retrieving previous metadata from snaphots
[WARNING] *** CHECKSUM FAILED - Checksum failed on download: local = '00766e1a0130c3499442c06b52523960c5860f3c'; remote = 'c9bcfc92b3145688aa8ec77dcac244c70be4d0b4' - RETRYING
[WARNING] *** CHECKSUM FAILED - Checksum failed on download: local = '00766e1a0130c3499442c06b52523960c5860f3c'; remote = 'c9bcfc92b3145688aa8ec77dcac244c70be4d0b4' - IGNORING
[INFO] Uploading repository metadata for: 'snapshot com.myco.xyz:xyz:5.0.2-SNAPSHOT'
You can fail your build due to a bad checksum. Simply configure your repository element - preferably in your settings.xml or inside your repository manager such as nexus.
Example:
<repository>
<id>central</id>
<name>My Central Repository</name>
<url>http://repo1.maven.org/maven2</url>
<releases>
<checksumPolicy>fail</checksumPolicy>
</releases>
<snapshots>
<checksumPolicy>fail</checksumPolicy>
</snapshots>
</repository>
More info here: http://www.sonatype.com/books/maven-book/reference/appendix-settings-sect-settings-repository.html