Pig versions and UDF - apache-pig

I am using pig version as 0.12,But for creating UDFs i am using the jar file of Pig 0.9 version.
I simply downloaded the jar file for Pig 0.9 version and added that in my eclipse classpath.
All the UDFs that I created using Pig 0.9 version API works fine.
But I would like to know the impact on that.
Is there any problem that I will face in future

The issues that you will face is API inconsistencies as time goes by. Some of the core APIs are relatively stable. Heck, most. But the longer you use an old Pig API the higher the chance you'll get an issue running in the cluster.
Something else to think about is are you overriding your Pig version in the cluster. For example, say you have an uber-jar with the pig scripts in it. If that JAR contains Pig v.09, you'll actually use that version rather than .12. By not migrating, you might be pulling in the wrong version of Pig.

Related

Testing in Gitlab CI/CD with different dependacies versions

I'm currently developing a (Laravel) package on Gitlab, and i want to automate testing using its CI/CD pipeline.
The problem
I already know ho to set up a pipeline in Gitlab, but what i want to achieve is to automate testing against different versions of the same dependancy, in order to keep checking compatibility with old version and add checking with upcoming new ones.
The case
My Laravel package is not so complex right now and don't use some particular nor specific Laravel features, so i would like to keep it compatible with te more versions of laravel possible: i would trigger different testing stages in my pipeline to run my tests against laravel 5.6, 5.7, 5.8, 6, 7, and 8.
The question
How do i trigger different testing stages using different laravel/framework versions?
When downloading dependancies composer will go for the latest version available if i define it with '^', so which files do i have to edit?
Ok, i've analyzed the problem a bit more, and made some considerations about it.
I'm writing not to properly answer my question, since i hope someone will eventually came up with a better solution/idea, but to just share some toughts with everyone is facing the same problem.
First: since i'm developing a package for Laravel i cannot declare laravel as dependancy for it, production nor develop, it is my laravel project that need to declare my package as a dependancy.
Second: to test my package compatibility with laravel i'm using orchestra/testbench as a dev-dependancy, and as for its documentation every release target a single and precise Laravel version, so if i want to test my package against different Laravel version i need to test it with different orchestra/testbench releases.
Third: the only dependancy my package has is just php 7.3, so i can easily test against this and subsequent version using Gitlab pipeline and creating a job for each php version that use a docker image with the correct php version and the last composer one.
Conclusion
It is not trivial nor straight to test a Laravel package against different Laravel version.
The only idea i came up with, but not tried since i gave it up aj just test php versions (for now) is to make a branch for each Laravel version i want to test with and update its composer.json dev-dependancy with the correct orchestra/testbench release.
Than i can execute php tests on my features branch merge request, and in case of success merge the develop branch on each "laravel branch" and execute on those the laravel compatibility test.
At last, if every laravel branch pass its tests, or at least the ones i decide to keep deevelopment/support active for, i can merge the develop branch on the master.
I'm not goig for it
I decided to avoid all of this since i'm not quite sure on how implement all of this on the pipeline, and i strongly think that it just adds mantainence burden to this project.
So i just keep the php jobs to check against different php versions, this way i just need to copy/paste a job definition in my gitlab-cy.yml file and change the docker image version accordingly to the new php version to test against.

Use java 8 features (newer janino version) in pentaho data integration

Pentaho Data Integration 8.0.x is using Janino 2.5.16, released in 2010 for compiling the User Defined Java Class step. There is a JIRA in pentaho for updating this to use a newer Janino version which would bring new java 8 related features in pentaho v8.2.0 GA. But there is no info on when will this be released.
Is there any other way I can use a newer janino version (janino-3.0.8.jar) with exiting pentaho for UDJC? I tried to copy updated jar in the lib and also added commons-compiler-3.0.8.jar to fulfill dependency. Now when I open Spoon, I get the following error:
Please advise on how this can be achieved. I understand that just replacing the jar may not be enough but just want if something else can be done.
This is not easy. Even now, since you got ClassNotFound, public api of janino is changed. Some classes are removed some are changed. What is actual needs to update it?
If you need really complicated business logic, then create custom plugin. Documentation and tutorials are available and you can look into sources of current builtin plugins (sources are available on github).
What important new version of janino has, that old doesn't (beside java8 support)? Checkout kettle engine, look into sources of UserDefinedClass step, change code to support new janino version, test and make own build of pdi kettle, and try to send push request to maintainers of repository.
Any of this quite complicated, This plugin is builtin into engine, and you have to make own build. Own build means, you have to support it by yourself. This is non trivial, project is huge and now even bigger and continue evolving, I spent several days to make my first custom build (version of 4, was in ivy) just for purpose to know better and debug complicated cases, and it used never in production.
Maintainers of repository must have good reason to include your changes into stream, it must be well tested and it is long procedure and most probably doesn't worth it. A lots of changed since 2010, I probable have seen in release notes, new version of java already have abilities to compile at runtime.
My advice is to make you own plugin.

/home/hadoop/bin/hadoop missing in ami 4.x

I am trying to migrate a legacy mapreduce pipeline that is using ami 3.x to ami 4.x. It currently has bash scripts as part of the bootstrapping and one of them calls hadoop fs-get s3n://somefile ~/otherfile. This fails in my current migration attempt to ami 4.x. And adding ls /home/hadoop/bin the script shows that the directory /home/hadoop/bin does not exist so of course the binary /home/hadoop/bin/hadoop would not exist. Is there something I need to configure to ensure the hadoop binary exists? I can't seem to find anything obvious in the documentation.
The file system layout changed considerably between 3.x and 4.x. The differences between 3.x and 4.x and instructions for migrating can be found here: http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-4.1.0/emr-release-differences.html
The short answer for solving your issue though is that you should use "aws s3 cp" instead of "hadoop fs-get" in bootstrap actions, since Hadoop is not installed until after bootstrap actions run on 4.x+.

Use Nutch with newest Elasticsearch

Based on this question I have managed to successfully integrate Nutch and Elasticsearch, albeit by downgrading my Elasticsearch version. How can I modify the Nutch source code to accommodate the latest version of Elasticsearch (0.90.2+)? I have tried modifying the Ivy dependency to this version of Elasticsearch and also modified Nutch's Elasticsearch compatibility code so it would build properly, but I end up with an error as Nutch times out waiting for a response from Elasticsearch; the two are unable to communicate.
I think I found the solution. You need to modify all references to the version number, both in ivy/ivy.xml and pom.xml (which is the file I forgot to change). Changing both 0.19.4s to 0.90.2s should do the trick. Also, you need to change item.failed() in src/java/org/apache/nutch/indexer/elastic/ElasticWriter.java to item.isFailed() to match the newer Elasticsearch refactoring.

When do we get an "AssertionError: HDF dataset not available. Check your clearsilver installation"

I am trying to install a dbauth plugin for trac. I know that I probably should be chasing this on other trac and trac-hacks related forums but still I am wondering, why do one get this error? What exactly is happening?
In my case the dbauth plugin is trying to read things like: "trac_permissions" and "trac_users" from a sqlite or mysql database. I have checked the databases, the values are in there but neither of them work. clearsilver is installed and running as well.
So what is usually causing this error? Is it that the HDF parser is receiving wrong info? Please do not take this as a trac question, just explain me why these types of errors occur.
Thanks.
a Google Search should get you started. You should also consider an alternative, because DbAuth is deprecated.
What version of Trac are you running? Recent versions use Genshi instead of Clearsilver, which means that Clearsilver-based plugins likely won't work correctly (not without modifications, at least). According to the Trac wiki, Trac version 0.11 still had the infrastructure to support Clearsilver-based plugins, version 0.12 retained this support in an unsupported form (meaning use at your own risk, you're on your own if something doesn't work), and version 0.13 dropped support for Clearsilver-based plugins entirely. Unless you're still running an older Trac install that's version 0.10 or 0.11, I'm inclined to say that this problem is due to the phasing out of Clearsilver support.
According to this trac-hacks ticket, you may want to try re-compiling Clearsilver with the Python bindings (this would only be useful if you're running Trac 0.11 or older).