Use java 8 features (newer janino version) in pentaho data integration

Use java 8 features (newer janino version) in pentaho data integration - pentaho

Pentaho Data Integration 8.0.x is using Janino 2.5.16, released in 2010 for compiling the User Defined Java Class step. There is a JIRA in pentaho for updating this to use a newer Janino version which would bring new java 8 related features in pentaho v8.2.0 GA. But there is no info on when will this be released.
Is there any other way I can use a newer janino version (janino-3.0.8.jar) with exiting pentaho for UDJC? I tried to copy updated jar in the lib and also added commons-compiler-3.0.8.jar to fulfill dependency. Now when I open Spoon, I get the following error:
Please advise on how this can be achieved. I understand that just replacing the jar may not be enough but just want if something else can be done.

This is not easy. Even now, since you got ClassNotFound, public api of janino is changed. Some classes are removed some are changed. What is actual needs to update it?
If you need really complicated business logic, then create custom plugin. Documentation and tutorials are available and you can look into sources of current builtin plugins (sources are available on github).
What important new version of janino has, that old doesn't (beside java8 support)? Checkout kettle engine, look into sources of UserDefinedClass step, change code to support new janino version, test and make own build of pdi kettle, and try to send push request to maintainers of repository.
Any of this quite complicated, This plugin is builtin into engine, and you have to make own build. Own build means, you have to support it by yourself. This is non trivial, project is huge and now even bigger and continue evolving, I spent several days to make my first custom build (version of 4, was in ivy) just for purpose to know better and debug complicated cases, and it used never in production.
Maintainers of repository must have good reason to include your changes into stream, it must be well tested and it is long procedure and most probably doesn't worth it. A lots of changed since 2010, I probable have seen in release notes, new version of java already have abilities to compile at runtime.
My advice is to make you own plugin.

Related

Mule4 with Open JRE 11 - Adding additional file module-info.java?

I have downloaded fresh Mulesoft studio and changed configuration to point Open JRE- 11 and compiler to point 11.
Studio version - 7.8
When I try creating the mule project it is now also adding module-info.java along with mule.xml files.
Wonder why it is creating module-info.java I don't used to see when i was working with 1.8 version or before.
Any Idea ?? Thanks in advance.

Java 9 introduced a whole new level of encapsulation. Larger than packages, and more robust too. These are modules.
Chances are you should in the long term, migrate your project to use modules (for additional security, and for better code organization). However, the chances are also high that you won't want to do it right now, just because.
In that latter case, it would be reasonable to simply delete the module-info.java file. Provided you don't have any other module-info.java files in the system, and provided you run with everything on the classpath rather than module path (there's a good chance that's your default anyway) you should not have any problem.
Meanwhile, you have some homework to do, so you can decide if you will migrate to modules, and if so, how to do it.

Versioning APIs during internal development

In our team we have a number of APIs specified using the Open API Specification (formerly Swagger). We use Maven and OpenAPI Generator to generate code, build and publish the artifact to our local nexus. We build our code on TeamCity. The artifact is given the version that is specified in the pom.xml file of Maven.
During development we only use snapshot versions, that is versions which can be overwritten and will be cleaned up. This is opposite to release versions, that cannot be overwritten and needs administrative privileges to clean up. The reason for this is, that a developer usually changes a little bit at the time, which is much more convenient with snapshot versions. This also makes cleaning up outdated unreleased artifacts much easier.
Our problem is, that from time to time a developer makes API changes but forgets to set a new version. This works fine locally, but when the code is build on TeamCity the changed API overwrites the artifact of an older version. A developer not working on this branch will then experience a compile error, because the code does not match the API artifact being used.
What does others do? Is there a best practice? Preferably with standard tools. We have tried many things and nothing works well. At the same time this issue is so basic that someone must have a good solution - or at least experience enough to point to the least bad solution.

Mule - Updating third party library in runtime

I'm using Mule Server 3.8 EE which brings commons-lang 2.4 with it. A third-party library in my project needs commons-lang 2.6, because it uses a method that was introduced in this version.
So when I just start my application, I get a java.lang.NoSuchMethodError
Is there a way to update the dependency in the runtime? What I tried so far:
including commons-lang 2.6 in my app -> no effect, the one from the runtime is picked up first
replacing the jar directly in the runtime -> errors in studio, that the 2.4 jar is missing

so maybe i am late BUT -- this is your answer. Add the libraries that are newer in the jar distribution to the Build Path. Under Java Build Path screen you should see the libraries listed. I needed to use Apache http-client 4.5.6 and that's very interesting because it brings with it a lot of other dependencies, so your question was VERY relevant. The solution is to rely on JAVA (and not mule -- oops Anypoint or whatever) conventions and make sure the JVM loads my class files first. Then, it won't load the old ones from mule's jar. And so I went to the tab Order and Export, and moved Mule to the bottom. This simple, trivial change makes it work. I think if we would work with command line and vim, we would all know this. But all the IDE gui and everything else makes us forget the simplest things. Please use it in good health. :)

How to install TinkerPop

I have just recently come across graph databases and Tinkerpop.
I am somewhat confused on how/what to install to use Tinkerpop 2.5.0/2.6.0. Does it have to be installed on each Database separately (as you would a plugin) or can I set it up and then use it to access different supported software.
My goal is to use it to try out 2 (possibly more) different databases (mainly Neo4j and OrientDB or perhaps Titan) and be able to query them using Gremlin.

How you use TinkerPop is entirely dependent on what you intend to do with it. If you are just getting started, I suggest you simply download the Gremlin distribution, unpackage it and start the console with bin/gremlin.sh. Working in the REPL will help you learn quickly as the feedback time for trying things out is basically instantaneous. Even as your Gremlin code makes its way to production, you will find the Gremlin Console to be a good friend as it provides a way to try out ideas before committing them to code. It also provides a mechanism for maintaining/administering your database with Gremlin.
If you intend to use TinkerPop in a JVM-based application then you will want to use a dependency management tool like Maven and reference the appropriate TinkerPop dependencies you'd like to use. Alternatively, I suppose you could try to manually manage the dependencies by downloading them individually from Maven Central and adding them to your path (though I wouldn't recommend that for obvious reasons). I guess my point for suggesting that, is to just make it clear that the TinkerPop library is just a set of jars that can be included in your JVM development tools like any other.
How you work with a particular database is dependent on the one that you choose, but again the process is little different than what I described above. Neo4j is packaged with the Gremlin Console, so you can work with it right away in there. For OrientDB, you will want to copy those dependencies into the Gremlin Console path (i.e. the /lib directory). If you are building an application, then maven is again your friend and you simply reference the Neo4j or OrientDB maven coordinates and all require dependencies will come with it.
Some implementations, like Titan, have separate prerequisites (e.g. install cassandra or hbase). In those cases, you will need to refer to their documentation for specifics on how to set them up.
All that said, if you are just getting started, I recommend that you look into TinkerPop3. It is the next major line of development for TinkerPop and quit different from it's previous incarnations. It does not yet have all the of the implementations in play as of yet, but database vendors are at work to bring them online. All that I wrote about TinkerPop 2.x "installation" above generally applies to TinkerPop3, however, the TinkerPop3 Gremlin Console does have a plugin system that can help make it a little easier to bring in external dependencies, preventing you from having to worry about dealing with them manually.

Differences between CruiseControl (original) and CruiseControl.NET

Are there any differences between the original CruiseControl and the .NET port? I've compared the 2, but can't find any big differences except the language it has been developed in. I want to use either one of them for (automated) testing of web applications, using Selenium and Subversion, perhaps even Groovy but don't know which to choose.
[edit]
After looking at CC and Hudson, I've chosen Hudson for it's simplicity, it already has plugins to run Groovy scripts and Selenium as well

Choose me, choose me! (I work on the original CruiseControl.)
I've never used CC.NET but from what I know I agree that they are pretty comparable. Probably the most important difference is cross-platform vs. Windows only.
Now I wonder how long until someone comes by and says their both crap and you should try Hudson? ;)
(And of course there are lots of other choices...)

CruiseControl.NET (cc.net henceforth) has build queues (http://confluence.public.thoughtworks.org/display/CCNET/Project+Configuration+Block), which allows you to serialize builds that depends on a certain build order. I'm in the process of emulating this behavior in the java version of cruisecontrol but the functionality doesn't map one to one. The reason however, that I'm at all moving from the .net to the java version is that the .net version core dumps with mono (cc.net nightly build and mono nightly build as of two months ago). The fault lies with monos thread handling but voids attempts to get cc.net up and running.
The documentation on this can be tricky to find, if you don't notice the version numbers that the configuration examples/documentation adhere to (confluence.public.thoughtworks.org has the updated configuration documentation whereas ccnet.sourceforge.net has not. I know that the ccnet is most likely a dead site, but if your're not carefully reading the datestamps on every page you're visiting, this may bite you).
Furthermore, the sourcecontrol blocks for cvs and svn in cc.net are more granular and featurerich than their counterpart in the java version, but this has not been a problem in my work. The java version is also easy to extend/modify re: plugin behavior, but you would really just like to see this kind of work going upstream instead of forking.
I'm fairly impressed with both the java version and the fork in .net (modulo mono runtime behavior), but you really do not want to try any of the other forks of cruisecontrol. I've had peripheral experience with hudson, and the features were just not compelling enough to veer me from cruisecontrol. Hudson has a (somewhat coloured) comparison map of Hudson and CruiseControl (java) at http://hudson.gotdns.com/wiki/display/HUDSON/Home
A viable alternative is the python implemented buildbot (http://buildbot.net/trac). It does not have fancy gui dashboards and the setup is somewhat more commandline-bound, but if you're doing distributed builds, it's very easy to set up and get running.

I think for you it will come down to operating system, original can run on nix, and .net version runs on windows.
There are other automated build utilities that can do this as well, such as TeamCity in the windows space, and cruisecontrol.rb in the ruby world.
Also there is a PowerShell based build utility called pSake that can poll subversion and perform tasks.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas